diff options
Diffstat (limited to 'libs/hunspell/docs/ChangeLog')
-rw-r--r-- | libs/hunspell/docs/ChangeLog | 1993 |
1 files changed, 1993 insertions, 0 deletions
diff --git a/libs/hunspell/docs/ChangeLog b/libs/hunspell/docs/ChangeLog new file mode 100644 index 0000000000..1f6e774a63 --- /dev/null +++ b/libs/hunspell/docs/ChangeLog @@ -0,0 +1,1993 @@ +2016-04-29 Caolán McNamara <caolanm at LibO>: + * deprecate old api and add new one + old one remains implemented in terms of new one + and will eventually be removed + * shrink exposed api down to just hunspell.hxx + * next major release is likely to require C++11 + +2016-04-15 Caolán McNamara <caolanm at LibO>: + * generally using std::string and std::vector internally + +2016-04-13 Caolán McNamara <caolanm at LibO>: + * gh#371 drop experimental code + +2015-09-11 Caolán McNamara <caolanm at LibO>: + * rhbz#1261421 crash on mashing hangul korean keyboard + +2014-12-03 Németh László <nemeth at numbertext dot org>: + * tools/hunspell.cxx: security fixes of the Hunspell executable + - secure file name handling, the problem (checking + OpenDocument files with malicious file names) + reported by Eric Sesterhenn + - using tmpnam() only with system("mkdir tempname && ...") + +2014-10-17 Caolán McNamara <caolanm at LibO>: + * sf#245 Feature from Anish Patil -S mode + to show suggestions for completion of + correctly spelled words + * sf#248 Fix manpage about how to include + +2014-10-16 Caolán McNamara <caolanm at LibO>: + * rhbz#915448, sf#57, sf#185 report character offset + and not byte offset in ispell mode + * sf#56 segv in experimental mode + * sf#228 don't translate init string + +2014-09-22 Németh László <nemeth at numbertext dot org>: + * fix crash in morphological analysis of the Hungarian + compound word 'művészegyéniség', reported by Gáspár Sinai + +2014-08-26 Németh László <nemeth at numbertext dot org>: + * unmunch separates flags of prefixes from the word, + bug reported by Daniel Naber + +2014-08-05 Németh László <nemeth at numbertext dot org>: + * moz#318040 Mozzilla accepts abbreviations without dots + * myfopen(): add _wfullpath to expand relative parts of absolute paths + +2014-07-16 Caolán McNamara <caolanm at LibO>: + * moz#675553 Switch from PRBool to bool + * moz#690892 replace PR_TRUE/PR_FALSE with true/false + * Silence the warning about empty while body loop in clang + * moz#777292 Make nsresult an enum + * moz#579517 Use stdint types in gecko + * moz#784776 consistently use FLAG_NULL + * moz#927728 Convert PRUnichar to char16_t + * moz#943268 Remove nsCharsetAlias and nsCharsetConverterManager + * Don't include config.h in license.hunspell if MOZILLA_CLIENT is set + +2014-06-26 Caolán McNamara <caolanm at LibO>: + * clang scan-build: Allocator sizeof operand mismatch + * clang scan-build: other low hanging warnings + * clang scan-build: significant warnings + +2014-06-02 Németh László <nemeth at numbertext dot org>: + * escape spaces in paths of ODF files + +2014-05-28 Németh László <nemeth at numbertext dot org>: + * add long path/Unicode path support in WIN32 environment: + - hunspell#233 (reported by mahak gark) and LibreOffice fdo#48017 + * flat ODF support, eg.: + hunspell doc.fodt + cat doc.fodt | hunspell -l -O + * new options: + - -X (XML) input format + - -O (ODF or flat ODF) input format + - --check-apostrophe: check and force Unicode apostrophe usage + (ASCII or Unicode apostrophe has to be in the + WORDCHARS section of the affix file) + * fix ODF support: + - break 1-line XML of ODT documents at </style:style>, too, + not only at </text:p> (limiting tokenization problems, when + fgets stops within an XML tag) + - show ODF file path on the UI instead of the temporary file + * fix XML support: + - ', ", &, < and > in replacements converted to XML entities + - recognize &apos at tokenization, depending from WORDCHARS + - ' in tokens converted to ' before spell checking and + in the output of the pipe interface + * better apostrophe usage: + - WORDCHARS only with one of the Unicode or ASCII apostrophe + results extended word tokenization: both of them will be part of + the words (if they are inside: eg. word's, but not words'). + - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries + (eg. English dictionaries), or for UTF-8 dictionaries only + with ASCII apostrophe supports (eg. French dictionaries). + * updated manual: + - hunspell.4 renamed to hunspell.5, see + hunspell#241 reported by Cristopher Yeleighton + - updated translations + - note about long/Unicode paths in WIN32 (hunspell.3) + +2014-04-25 Németh László <nemeth at numbertext dot org>: + * OpenDocument support, eg. + hunspell *.odt + hunspell -l *.odt + * always load default personal dictionary (fix + filtering bad words - reduce this word list - using + it as a personal dictionary workflow) + * fix parsing/URL recognition problem (bad tokens + with aposthrophes) + +2013-07-25 pchang9@cs.wisc.edu + * moz#897255 Wasted work in line_uniq + * moz#897780 Wasted work in SuggestMgr::twowords + +2013-07-25 Caolán McNamara <caolanm at LibO>: + * hunspell#167 layout problems with long lines + - based on the original fix by xorho + adapted to HEAD + * rhbz#925562 upgrade config.guess for aarch64 + +2013-07-24 pchang9@cs.wisc.edu + * moz#896301 Wasted work in SfxEntry::checkword + * moz#896844 Wasted work in AffixMgr::defcpd_check + +2013-06-13 Konstantin Khlebniko + * #49 HashMgr::add_word computes wrong size for struct hentry + +2013-06-13 Ville Skyttä + * #53 Man page syntax fixes + +2013-04-19 John Thomson <john thomson at SIL> + * win_api: add remove() of Hunspell API (hun#3606435) + +2013-04-19 Rouslan Solomokhin <at sf.net> + * fix crash in suggestions for 99-character long words + by extending arrays of SuggestMgr::forgotchar_* + (hun#3595024, also http://crbug.com/130128), + thanks to also Paweł Hajdan to report the patch + +2013-04-01 Caolán McNamara <caolanm at LibO>: + * hunspell: -Werror=undef + +2013-03-13 Caolán McNamara <caolanm at LibO>: + * rhbz#918938 crash in interaction with danish thesaurus + +2012-09-18 Németh László <nemeth at numbertext dot org>: + * src/hunspell/affixmgr.*: - fix morphological analysis of + compound words (hun#3544994, reported by Dávid Nemeskey, fdo#55045) + +2012-06-29 Caolán McNamara <caolanm at LibO>: + * fix various coverity warnings + +2012-01-10 Ehsan Akhgari <ehsan at mozilla dot com> + * moz#710940 Firefox Crash [@ AffixMgr::parse_file(char const*, char + const*) ] + +2011-12-16 Jared Wein <jwein at mozilla dot com> + * moz#710967 Incorrect argument passed to strncmp in + AffixMgr::parse_convtable + +2011-12-06 Caolán McNamara <caolanm at LibO>: + * rhbz#759647 fixed tempname of hunSPELL.bak collides with other users + when multiple edits in one dir + +2011-10-13 Caolán McNamara <caolanm at LibO>: + * moz#694002 crash in hunspell affixmgr on exit with bad .aff + * leak in hunspell affixmgr with bad .aff + +2011-09-19 Caolán McNamara <caolanm at LibO>: + * make libparsers.a not installed thanks to Tomáš Chvátal + +2011-06-23 Caolán McNamara <caolanm at LibO>: + * fix some windows compiler warnings + +2011-05-24 Németh László <nemeth at numbertext dot org>: + * src/hunspell/affixmgr.*: allow twofold suffixes in compounds + by extended version of Arno Teigseth's patch, see hun#3288562. + - new option for this feature: COMPOUNDMORESUFFIXES + +2011-02-16 Németh László <nemeth at numbertext dot org>: + * src/*/Makefile.am: fix library versioning, the probem reported by + Rene Engerhald and Simon Brouwer. + + * man/hunspell.4: new version based on the revised version of Ruud Baars + +2011-02-02 Németh László <nemeth at OOo>: + * suggestngr.cxx: fix ngram PHONE suggestion for input words with + diacritics using UTF-8 encoded dictionaries (add byte length to the + 8-bit phonet() argument instead of character length) + + * suggestmgr.cxx: fix missing csconv problem with UTF-8 encoding + dictionares, when the input contains non-BMP characters + - tests/utf8_nonbmp.sug: test file + + * suggestmgr.cxx: mixed and keyboard based character suggestions + don't forbid ngram suggestion search (optimized tests/suggestiontest) + + * affixmgr.cxx: fix hun#2999225: interfering compounding mechanisms, + tested on Dutch word list and reported by Ruud Baars + + * affixmgr.cxx: allomorph fix for hun#2970240 (Hungarian + compound "vadász+gép" was analyzed as vad+ász+gép, and rejected + by the ss->s rep rule (verb "vadássz"), but the analysis + didn't continue for the longer word parts (vadász+gép). + + * csutil.cxx: add lang code "az_AZ", "hu_HU", "tr_TR" for back + compatibility (fixing Azeri and Turkish casing conversion, also + Hungarian compound handling) + + * affixmgr.cxx: fix morphological analysis + +2011-01-26 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix for moz#626195 (memcheck problem with FULLSTRIP). + + * affixmgr.*, suggestmgr.cxx: FORBIDWARN parameter (see manual) + +2011-01-24 Németh László <nemeth at OOo>: + * suffixmgr.cxx: fix bad suggestion of forbidden compound words, eg. + "termijndoel" with the Dutch dictionary. Reported by Ruud Baars. + + * latexparser.cxx: fix double apostrophe TeX quoation mark tokenization + (hun#3119776), reported by Wybodekker at SF.net. + + * tests/suggestiontest/*: multilanguage and single Hunspell version, see README + * tests/suggestiontest/prepare2: for make -f Makefile.orig single + +2011-01-22 Németh László <nemeth at OOo>: + * affixmgr.*, suggestmgr.*: new features + ONLYMAXDIFF: remove all bad ngram suggestions (default mode keeps one) + NONGRAMSUGGEST: similar to NOSUGGEST, but it forbids to use the word + in ngram based (more, than 1-character distance) suggestions. + +2011-01-21 Németh László <nemeth at OOo>: + * suggestmgr.*: limit wild suggestions (hun#2970237 by Ruud Baars) + - limited compound word suggestions + - improved and limited ngram based suggestions + * tests/*.sug: modified test files + - feature MAXCPDSUGS: + MAXCPDSUGS 0 : no compound suggestion, suggested by + Finn Gruwier Larsen in hunfeat#2836033 + MAXCPDSUGS n : max. ~n compound suggestions + - feature MAXDIFF: differency limit for ngram suggestions: 0-10 + eg. MAXDIFF 5: normal (default) limit + MAXDIFF 0: only one ngram suggestion + MAXDIFF 10: ~maxngramsugs ngram suggestions + + * affixmgr.*, hunspell.*: add flag FORCEUCASE (hun#2999228), force + capitalization of compound words, see Hunspell 4 manual), + suggested by Ruud Baars + test/forceucase.*: test files + + * affixmgr.*, hunspell.*: add flag WARN (hun#1808861), optional warning feature + for rare words, suggested by Ruud Baars + tests/warn: test files + * tools/hunspell.cxx: add option -r for optional filtering of rare words + + * affixmgr.cxx: fix hun#3161359 (gcc warnings) reported by Ryan VanderMeulen. + +2011-01-17 Németh László <nemeth at OOo>: + * suggestmgr.cxx: fix hun#3158994 and hun#3159027 (missing csconv table + using awkward 8bit capitalization of UTF-8 encoded dictionary words with PHONE + suggestion, reported by benjarobin and dicollecte at SF.net). + +2011-01-13 Németh László <nemeth at OOo>: + * affixmgr.cxx: ONLYINCOMPOUND fix for hun#2999224 (fogemorphene + was allowed in end position of compoundings). Reported by Ruud Baars. + * tests/onlyincompound2.*: test files + +2011-01-10 Ingo H. de Boer <idb_winshell at SF.net>: + * win_api/{hunspell,libhunspell, testparser}.vcproj: updated project + files for the library and the executables. Compiling problem + also reported by Don Walker. + +2011-01-06 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix freedesktop#32850 (program halt during Hungarian + spell checking of the word "6csillagocska6", reported by András Tímár) + + * tools/hunspell.cxx: add Mac OS X Hunspell dictionary paths, asked by + Vidar Gundersen in hunfeat#3142010 + +2011-01-05 Caolán McNamara <cmc at OOo>: + * moz#620626 NS_UNICHARUTIL_CID doesn't support + case conversion + +2011-01-03 Németh László <nemeth at OOo>: + * NEWS and THANKS: update for release 1.2.13 + +2010-12-20 Németh László <nemeth at OOo>: + * affixmgr.cxx: hun#3140784 + +2010-12-16 Németh László <nemeth at OOo>: + * affixmgr.cxx: + - improved fix of hun#2970242 (supporting + zero affixes, reported by Ruud Baars + - tests/opentaal_cpdpat{,2}: test files + + - switching off default BREAK parameters by BREAK 0, + reported by Ruud Baars + + - hun#2999225: interfering compounding mechanisms, reported by Ruud Baars + +2010-12-11 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix hun#2970242 (CHECKCOMPOUNDPATTERN only with flags), + the bug reported by Ruud Baars + * tests/2970242.*: test files + + * tests/2970240.*: test files for CHECKCOMPOUNDPATTERN fix (check all + boundaries in compound words, fixed by the previous CHECKCOMPOUNDREP + fix), the bug reported by Ruud Baars + + * win_api/Makefile.cygwin: update + +2010-12-09 Caolán McNamara <cmc at OOo>: + * moz#617953 fix leak + +2010-11-08 Caolán McNamara <cmc at OOo>: + * rhbz#650503 crash in arabic dictionary + +2010-11-05 Caolán McNamara <cmc at OOo>: + * rhbz#648740 don't warn on empty flagvector + +2010-11-03 Caolán McNamara <cmc at OOo>: + * logically we shouldn't need a csconv table in utf-8 mode + +2010-10-27 Németh László <nemeth at OOo>: + * hun#3000055 (requested by Ruud Baars) add REP boundary specifiation: + REP ^word$ xxxx + REP ^wordstarting xxxx + REP wordending$ xxxx + + * hun#3008434 (requested by Adrián Chaves Fernández) and + hun#3018929 (requested by Ruud Baars): REP with more than 2 words: + REP morethantwo more_than_two + + * suggestmgr.cxx: fix incomplete suggestion list for capitalized words, + eg. missing Machtstrijd->Machtsstrijd in the Dutch dictionary + (reported by Ruud Bars) + + * tests, man: related updates + +2010-10-12 Caolán McNamara <cmc at OOo>: + * moz#603311 HashMgr::load_tables leaks dict when decode_flags fails + * fix mem leak found with new tests + * hun#3084340 allow underscores in html entity names + +2010-10-07 Németh László <nemeth at OOo>: + * affixmgr.cxx: + - hun#2970239 fix bad suggestion of forbidden compound words + - hun#2999224 fix keepcase feature on compound words (only partial + fix for COMPOUNDRULE based compounding) + - fix checkcompoundrep feature in compound words (check all boundaries, + not only the last one) + Problems reported by Ruud Baars. + + * tests/opentaal_forbiddenword[12]*, tests/opentaal_keepcase*: + new test files for the previous fixes + * tests/checkcompoundrep: extended test file. + +2010-09-05 Caolán McNamara <cmc at OOo>: + * moz#583582 fix double buffer gcc fortify issue + +2010-08-13 Caolán McNamara <cmc at OOo>: + * moz#586671 AffixMgr::parse_convtable leaks pattern/pattern2 if it + can't create both + * moz#586686 tidy up get_xml_list and friends + +2010-08-10 Caolán McNamara <cmc at OOo>: + * hun#3022860 fix remove duplicate code + +2010-07-17 Caolán McNamara <cmc at OOo>: + * remove ununsed get_default_enc and avoid potential misrecognition of + three letter language ids + * normalize encoding names before lookup + +2010-07-05 Caolán McNamara <cmc at OOo>: + * hun#2286060 add Hangul syllables to unicode tables + +2010-06-26 Caolán McNamara <cmc at OOo>: + * moz#571728 keep new[]/delete[] wrappers in sync for embedded in moz + case + +2010-06-13 Caolán McNamara <cmc at OOo>: + * moz#571728 keep new[]/delete[] wrappers in sync for embedded in moz + case + +2010-06-02 Caolán McNamara <cmc at OOo>: + * moz#569611 compile cleanly under win64 + +2010-05-22 Caolán McNamara <cmc at OOo>: + * moz#525581 apply mozilla's current preferred get_current_cs impl + +2010-05-17 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix bad limitation of parenthesized flags at + COMPOUNDRULEs. Windows crash reported by Ruud Baars and Simon Brouwer. + +2010-05-05 Caolán McNamara <cmc at OOo>: + * rhbz#589326 malloc of int that should have been of char** + * hun#2997388 fix ironic misspellings + +2010-04-28 Caolán McNamara <cmc at OOo>: + * moz#550942 get_xml_list doesn't handle failure from get_xml_par + +2010-04-27 Caolán McNamara <cmc at OOo>: + * moz#465612 mozilla-specific code leaks + * moz#430900 phone is dereferenced before oom check + * moz#418348 ckey_utf alloc is used unchecked in SuggestMgr::badcharkey_utf + * CID#1487 pointer "rl" dereferenced before NULL check + * CID#1464 Returned without freeing storage "ptr" + * CID#1459 Avoid duplicate strchr + * CID#1443 Avoid any chance of dereferencing *slst + * CID#1442 Unsafe to have a null morph + * CID#1440 Avoid null filenames + * CID#1302 Dereferencing NULL value "apostrophe" + * CID#1441 Avoid deferencing null ppfx + +2010-04-16 Caolán McNamara <cmc at OOo>: + * hun#2344123 fix U)ncap in utf-8 locale + * fix up hunspell text UI and lines wider than terminal + +2010-04-15 Caolán McNamara <cmc at OOo>: + * hun#2613701 fix small leak in FileMgr::FileMgr + * fix small leak in tools/hunspell + * hun#2871300 avoid crash if def and words are NULL + * hun#2904479 fix length of hzip file + * hun#2986756 mingw build fix + * hun#2986756 fix double-free + * hun#2059896 fix crash in interactive mode without nls + * hun#2917914 add some extra words to the latexparser + * make some structs static + * C-api has duped symbol names + * regenerate gettext/intl with recent version + * hun#2796772 build a .dll under MinGW + * rhbz#502387 allow cross-compiling for MinGW target + * hun#2467643 update .vcproj files to include replist.?xx + * unify visiblity/dll_export support across platforms + * hun#2831289 sizeof(short) typo + * hun#2986756 add -u3 gcc style output + +2010-04-14 Caolán McNamara <cmc at OOo>: + * hun#2813804 fix segfault on hu_HU stemming + +2010-04-13 Caolán McNamara <cmc at OOo>: + * hun#2806689 fix ironic misspellings + * hun#2836240 add Italian translations + +2010-04-09 Caolán McNamara <cmc at OOo>: + * fix titchy possible leak in command-line spellchecker + +2010-04-07 Caolán McNamara <cmc at OOo>: + * hun#2973827 apply win64 patch + * hun#2005643 fix broken mystrdup + +2010-03-04 Caolán McNamara <cmc at OOo>: + * ooo#107768 fix crash in long strings in spellml mode + * hun#1999737 add some malloc checks + * hun#1999769 drop old buffer on realloc failure + * hun#2005643 tidy string functions + * hun#2005643 micro-opt + * hun#2006077 free strings on failed dict parse + * hun#2110783 ispell-alike verbose mode implementation + +2010-03-03 Németh László <nemeth at OOo>: + * hunspell/(affixmgr, suggestmgr).cxx: add character sequence + support for MAP suggestion, using parenthesized character groups + in the syntax, eg. MAP ß(ss). + * man/hunspell.4, tests/map*: documentation and test files + +2010-02-25 Németh László <nemeth at OOo>: + * hunspell/hunspell.cxx: add recursion limit for BREAK (fix OOo Issue 106267) + + * hunspell/hunspell.cxx: fix crash in morphological analysis of + capitalized words with ending dashes + + * affixmgr.cxx: fix morphological analysis of long numbers combined with dash, + eg. 45-00000045 (reported by a@freeblog.hu). + +2010-02-23 Caolán McNamara <cmc at OOo>: + * hun#2314461 improve ispell-alike mode + * hun#2784983 improve default language detection + * hun#2812045 fix some compiler warnings + * hun#2910695 survive missing HOME dir + * hun#2934195 fix suggestmgr crash + * hun#2921129 remove unused variables + * hun#2826164 make sure make check uses the in-tree libhunspell + * bump toolchain to support --disable-rpath + * hun#2843984 fix coverity warning + * hun#2843986 fix coverity warning + * hun#2077630 add iconv lib + * make gcc strict-aliasing warning free + * make cppcheck warning free + +2008-11-01 Németh László <nemeth at OOo>: + * replist.*, hunspell.cxx, affixmgr.cxx: new input and output + conversion support, see ICONV and OCONV keywords in the Hunspell(4) + manual page and the test examples. The input/output conversion + problem of syllabic languages reported by Daniel Yacob and + Shewangizaw Gulilat. + - tests/{iconv,oconv}.*: test examples + + * tools/wordforms: word generation script for dictionary developers + (Hunspell version of the unmunch program) + + * hunspell/hunspell.cxx: extended BREAK feature: ^ and $ mean in break + patterns the beginning and end of the word. + - tests/BREAK.*: modified examples. + + * hunspell/hunspell.cxx: set default break at hyphen characters. + The associated problem reported by S Page in Hunspell Bug 2174061. + See Mozilla Bug ID 355178 and OOo Issue 64400, too. + - tests/breakdefault.*: test data + The following definition is equivalent of the default word break: + + BREAK 3 + BREAK - + BREAK ^- + BREAK -$ + + * affixmgr.cxx: SIMPLIFIEDTRIPLE is a new affix file keyword to allow + simplified forms of the compound words with triple repeating letters. + It is useful for Swedish and Norwegian languages. + + * affixmgr.cxx: extend CHECKCOMPOUNDPATTERN to support + alternations of compound words for example by sandhi + feature of Indian and other languages. The problem reported + by Kiran Chittella associated with Telugu writing system + (see Telugu example in tests/checkcompoundpattern4.test). + The new optional field of CHECKCOMPOUNDPATTERN definition is the + replacement of the compound boundary defined by the previous fields: + CHECKCOMPOUNDPATTERN ff f ff + means ff|f compound boundary has been replaced by "ff", like in + the (prereform) German Schiffahrt (Schiff+fahrt). + - CHECKCOMPOUNDPATTERN supports also optional flag conditions now: + CHECKCOMPOUNDPATTERN ff/A f/B ff + means that the first word of the compound needs flag "A" and + the second word of the compound needs flag "B" to the operation. + + * tools/hunspell.cxx: add empty lines as separators to the output of + the stemming and morphological analysis. + + * affixmgr.cxx: fix condition checking algorithm. Bad suggestion + generation reported by Mehmet Akin in SF.net Bug 2124186 with help of + Eleonora Goldman. + + * affixmgr,cxx: fix COMPOUNDWORDMAX feature. The problem and its + code details reported by Göran Andersson under SF.net Bug ID 2138001. + + * csutil.cxx: fix bad conditional code for Mozilla compilation. + Patch by Serge Gautherie. The problem reported by Ryan VanderMeulen. + + * hunspell/hunspell.cxx: add missing ngram suggestion for HUHINITCAP + (capitalized mixed case) words. + + * w_char.hxx: use GCC conditions for GCC related code. Patch by + Ryan VanderMeulen. + + * affixmgr.cxx: check morphological description in morphgen() + (fix potential program fault by incomplete morphological + description of affix rules) + + * src/win_api: config.h: switch on warning messages on Windows + + * tools/affixcompress: extended help for -h (use LC_ALL=C sort + for input word list) + + * man/hunspell.4: updated manual: + - new and modified features (SIMPLIFIEDTRIPLE, ICONV, OCONV, + BREAK, CHECKCOMPOUNDPATTERN). + - note about costs of zero affixes, suggested by Olivier Ronez. + + * hunspell/hunspell.cxx: remove deprecated word breaking codes. + +2008-08-15 Németh László <nemeth at OOo>: + * affentry.cxx: add FULLSTRIP option. With FULLSTRIP, affix rules can + strip full words, not only one less characters. Suggested by + Davide Prina and other developers in OOo Issue 80145. + * tests/fullstrip.*: Test data based on Davide Prina's example. + * tools/unmunch.cxx: modified for FULLSTRIP. + + * affixmgr.cxx: COMPOUNDRULE now works with long and numerical flag + types by parenthesized flags. Syntax: (flag)*, (flag)(flag)?(flag)*. + * tests/compoundrule[78].*: tests with parenthesized COMPOUNDRULE + definitions. + + * suggestmgr.cxx: modified badchar*(), forgotchar*() and extrachar*() + 1-character distance suggestion algorithms: search a TRY character + in all position instead of all TRY characters in a character position + (it can give more readable suggestion order, also better suggestions + in the first positions, when TRY characters are sorted by frequency.) + For example, suggestions for "moze": + ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6), + maze, more, mote, ooze, mole etc. (Hunspell 1.2.7). + + * suggestmgr.cxx: extended compound word checking for better COMPOUNDRULE + related suggestions, for example English ordinal numbers: 121323th -> + 121323rd (it needs also a th->rd REP definition). + + * phonet.cxx: cast unsigned char parameter of isdigit() and fix + isalpha by myisalpha() (potential problems in Windows environment). + Reported by Thomas Lange in OOo Issue 92736. + + * hunspell/csutil.*,hunspell/{affentry,affixmgr,hunspell,suggestmgr}.cxx: + fix potential buffer overloading under morphological analysis by the + new mystrcat() function. Reported by Molnár Andor (dolhpy at true + dot hu) in SF.net Bug 2026203. + + * affixmgr.cxx: add recursion limit to defcpd(). Fix OOo Issue 76067: + crash-like deceleration by checking hexadecimal numbers with long FFF + sequence (combinatory explosion by the en_US words "f" and "ff"). + Missing fix reported by Mathias Bauer. + + * affixmgr.cxx: fix the difference in the Unicode and non-Unicode + parts of cpdcase_check(). Bug report by Brett Wilson. + + * filemgr.*, affixmgr.cxx, csutil.*, hashmgr.*: warning messages now + contain line numbers (use --with-warnings configure option for + warning messages). + + * hunspell.cxx: analyze(): fix case conversion of stemming and + morphological analysis of UTF-8 encoded input. Reported by Ferenc Godó. + + * tools/hunspell.cxx: fix LaTeX Unicode support in filter mode. + Reported by Jan Seeger in SF.net Bug 2039990. + + * affixmgr.hxx: 0.5 or in 64 bit environment, 1 MB (virtual) memory + saving using only the requested size for sFlag and pFlag arrays. + Bug report by Brett Wilson. + + * affixmgr.cxx,tools/hunspell.cxx: get_version() returns with full + VERSION affix parameter instead of its first word. Fixes for + Hunspell's header. Some problems with Hunspell header reported in + SF.net Bug 2043080. + +2008-07-15 Németh László <nemeth at OOo>: + * affentry.cxx: fixes of the affix rule matching algorithm (affected + only the sk_SK dictionary from all OpenOffice.org dictionaries): + - fix dot pattern + accented letters matching (in non Unicode encoding) + - word-length conditions work again + * tests/condition.*: extended test for the fix. + + * hashmgr.cxx: load multiword expressions: spaces may be parts + of the dictionary words again (but spaces also work as morphological + field separators: word word2 -> "word word2", word po:noun -> "word"). + * man/hunspell.4: updated manual + + * tools/hunspell.cxx: add iconv character conversion support to + stemming and morphological analysis + + * tools/hunspell.cxx: add /usr/share/myspell/dicts search path for + Ubuntu support + +2008-07-09 Németh László <nemeth at OOo>: + * affentry.cxx: fixes of the affix rule matching algorithm: + - right ASCII character handling in bracket expression; + - fault-tolerant nextchar() for bad rules. + Problem with the en_GB dictionary and nextchar() with a detailed + code analysis reported by John Winters in SF.net Bug ID 2012753. + * tests/condition.*: extended test for the fix. + + * hunspell/hunspell.*, parsers/*, tools/hunspell.cxx: fix compiler + warnings (deprecated const-free char consts) + + * win_api/hunspelldll.*: add hunspell_free_list(), the problem + reported by Laurier Mercer. + +2008-06-30 Török László <torok_laszlo at users dot SF dot net>: + * tests/affixmgr.cxx: fix morphological analysis: strcat() on + an uninitialized char array in suffix_check_morph(). + +2008-06-18 Németh László <nemeth at OOo>: + * src/hunspell/affixmgr.cxx: fix GCC compiler warnings + (comparisons with string literal results in unspecified behaviour). + The problem reported by Ladislav Michnovič. + +2008-06-17 Németh László <nemeth at OOo>: + * src/hunspell/{hunspell.cxx,hunspell.h}: add free_list() to the C and + C++ interface to deallocate suggestion lists. The problem + reported by Laurie Mercer and Christophe Paris. + * csutil.cxx: fix freelist() to deallocate non-NULL list, when n = 0. + * tools/{analyze,example,chmorph,hunspell}.cxx: use free_list(). + + * tools/hunspell.cxx: fix only --with-readline compiling problem. + Reported by Volkov Peter in SF.net Bug 1995842. + + * man/hunspell.3,hunspell.hxx: fix analyze and generate examples in + the manual and comments (using char*** parameter instead of char**). + + * tools/example.cxx: fix suggestion example. + +2008-06-17 Németh László <nemeth at OOo>: + * affentry.cxx: fix the new affix rule matching algorithm of + Hunspell 1.2. Arabic dictionary problem reported by Khaled Hosny + in SF.net Bug ID 1975530. Mohamed Kebdani also sent a + prepared test data. + * tests/{1975530,condition*}: tests for the fix + +2008-06-13 Ingo H. de Boer <idb_winshell at SF.net>: + * src/hunspell/{affixmgr.cxx,hunspell.cxx}: add missing type + cast to strstr() calls for VC8 compatibility. + +2008-06-13 Németh László <nemeth at OOo>: + * suggestmgr.cxx: add also part1-part2 suggestion with dash + for bad part1part2 word forms, suggested by Ruud Baars. + For example, now suggestion of "parttime": "part time" + and "part-time". + NOTE: this feature will work only when the TRY definition + contains "-" or the letter "a". + + * hunspell.cxx: new XML API in spell() and suggest() (see hunspell(3)). + + * src/hunspell/*: fixes for OpenOffice.org build environment. + + * man/{hunspell.3,hzip.1,hunzip.1}: add new manual pages for + Hunspell programming API and dictionary compression and + encryption utilities. + + * src/hunspell/*: handle failed mystrdup() calls and other potential + insufficient memory problems. The problem reported by Elio Voci + in OpenOffice.org Issue 90604 and others. + + * src/tools/affixmgr.cxx: restore original behaviour of get_wordchars + without conditional code. Problem reported by Ingo H. de Boer + in SF.net Bug 1763105. + + * win_api/hunspelldll.h: put_word() renamed to add() in the (old) + Windows DLL API bug reported in SF.net Bug 1943236. Also reported + by Bartkó Zoltán. + + * tools/hunspell.cxx: fix chench() for environments without + native language support (ENABLE_NLS 0 in config.h), + PHP system_exec() bug reported by Michel Weimerskirch in + SF.net Bug 1951087. + + * hunspell.cxx, affixmgr.cxx: remove "result" from the + (result && *result) conditions, when "result" is a static variable. + The problem and a possible solution reported by Ladislav Michnovič. + + * affixmgr.cxx: parse_affix(): print line instead of NULL in + the warning message, when affix class header is bad. + The problem reported by Ladislav Michnovič. + +2008-06-01 Christian Lohmaier <cloph at OOo> + * configure.ac: patch to fix --with-readline, --with-ui logic. + Reported in the SF.net Bug 981395. + +2008-05-04: Volkov Peter <volkov_peter at users sourceforge net> + * configure.ac: fix LibTool 2.22 incompatibility by removing + unused LT_* macros. Report and patch in SF.net Bug 1957383. + The problem reported and fixed by Ladislav Michnovič, too. + +2008-04-23: Ladislav Michnovič <lmichnovic at suse cz> + * hunspell.pc.in: fix wrongly set directories. + +2008-04-12 Németh László <nemeth at OOo>: + * src/tools/hunspell.cxx: + - Multilingual spell checking and special dictionary support with -d. + Multilingual spell checking suggested by Khaled Hosny (SF.net + Bug 1834280). Example for the new syntax: + + -d en_US,en_geo,en_med,de_DE,de_med + + en_US and de_DE are base dictionaries, and en_geo, en_med, de_med + are special dictionaries (dictionaries without affix file). + Special dictionaries are optional extension of the base dictionaries. + There is no explicit naming convention for special dictionaries, + only the ".dic" extension: dictionaries without affix file will + be an extension of the preceding base dictionary. First dictionary + in -d parameter must have an affix file (it must be a base + dictionary). + + - new options for debugging, morphological analysis and stemming: + -m: morphological analysis or flag debug mode (without affix + rule data it signs the flag of the affix rules) + -s: stemming mode + -D: show also available dictionaries and search path + (suggested by Aaron Digulla in SF.net Bug 1902133) + + - add missing refresh() to print bad words before the slower suggestion + search in UI (better user experience) + + - fix tabulator problems (reported by ugli-kid-joe AT sf DOT net) + + - fix different encoding of dic and input, and suggestions + + - add per mille sign to LANG hu_HU section. + + - rewrite program messages. Concatenating multiple printfs for + easier translation suggested by András Tímár and Gábor Kelemen. + + * src/hunspell/csutil.cxx: set static encds variable. Patch by + Rene Engerhald. SF.net Bug 1896207 and 1939988. + + * src/hunspell/w_char.hxx,csutil.hxx: reorganizing + w_char typedef and HENTRY_DATA, HENTRY_FIND consts + + * src/hunspell/hunzip.cxx: fopen(): using rb options instead of r (fix + for Windows) + + * src/tools/affixmgr.cxx: restore original behaviour of get_wordchars + in an #ifdef WINSHELL section. Problem reported by Ingo H. de Boer + in SF.net Bug 1763105. + + * src/tools/chmorph.cxx: remove the experimental modifications + + * src/tools/hzip.c: fopen(): using wb options instead of w (fix + for Windows) + + * src/tools/hunzip.cxx: add missing MOZILLA_CLIENT. Reported + by Ryan VanderMeulen. + + * man/*, man/hu/*: updated manual + + * man/hunspell.4: fix formatting problem (missing header) + + * tools/makealias: now works with the extra data fields. + + * phonet.cxx: use HASHSIZE const + + * tests/rep.aff: fix REP count + + * src/win_api/Makefile.cygwin, README: native Windows compilation + in Cygwin environment without cygwin1.dll dependency (see README + for compiling instructions). + +2008-04-08 Roland Smith <rsmith AT xs4all DOT nl>: + * src/parsers/latexparser.cxx: fix PATTERN_LEN for AMD64 and + other platforms with different struct padding (SF.net Bug 1937995). + +2008-04-03 Kelemen Gábor <kelemeng AT gnome DOT hu>: + * po/POTFILES.in: fix path of the source file + + * po/Makevars: add --from-code=UTF-8 gettext option + + * hunspell.cxx: add comments for shortkey translation + +2008-02-04 Flemming Frandsen <flfr AT stibo DOT com> + * src/hunspell.h: fix Windows DLL support + - this patch also reported by Zoltán Bartkó. + +2008-01-30 Mark McClain <marc_mcclain AT users DOT sf DOT net> + * src/hunspell.cxx: stem(): fix function call side effect + for PPC platform (SF.net Bug 1882105). + +2008-01-30 Németh László <nemeth at OOo>: + * hunspell.cxx, csutil.cxx, hunspelldll.c: fix + SF.et Bug 1851246, patch also by Ingo H. de Boer. + + * hunspell.h: fix SF.net Bug 1856572 (C prototype problem), + patch by Mark de Does. + + * hunspell.pc.in: fix SF.net Bug 1857450 wrong prefix, reported + by Mark de Does. + + * hunspell.pc.in: reset numbering scheme: libhunspell-1.2. + Fix SF.net Bug 1857512 reported by Mark de Does, + also by Rene Engelhard. + + * csutil.cxx: patches for ARM platform, signed_chars.dpatch + by Rene Engelhard and arm_structure_alignment.dpatch by + Steinar H. Gunderson <sesse@debian.org> + + * hunzip.*, hzip.c: new hzip compression format + + * tools/affixcompressor: affix compressor utility (similar to + munch, but it generates affix table automatically), works + with million-words dictionaries of agglutinative languages. + + * README: fix problems reported by Pham Ngoc Khanh. + + * csutil.cxx, suggestmgr: Warning-free in OOo builds. + + * hashmgr.*, csutil.*: fix protected memory problems with + stored pointers on several not x86 platforms by + store_pointer(), get_stored_pointer(). + + * src/tools/hunspell.cxx: fix iconv support on Solaris platform. + + * tests/IJ.good: add missing test file + + * csutil.cxx: fix const char* related errors. Compiling bug + with Visual C++ reported by Ryan VanderMeulen and Ingo H. de Boer. + +2008-01-03 Caolan McNamara <cmc at OO.o>: + * csutil.cxx: SF.net Bug 1863239, notrailingcomma patch and + optimization of get_currect_cs(). + +2007-11-01 Németh László <nemeth at OOo>: + * hunspell/*: new feature: morphological generation, + also fix experimental morphological analysis and stemming. + - new API functions and improved API: + - analyze(word): (instead of morph()) morphological analysis + - stem(word): stemming + - stem(list): stemming based on the result of an analysis + - generate(word, word2): morphological generation + - generate(word, list): morphological generation + - add(word): add word to the run-time dictionary (renamed put_word()) + - add_with_affix(word, word2): (renamed put_word_pattern()): + add word to the run-time dictionary with affix flags of the + second parameter: all affixed forms of the user words will be + recognised by the spell checker. Especially useful for + agglutinative languages. + - remove(word): remove word from the run-time dictionary (not + implemented) + - see manual and hunspell/hunspell.hxx header and tests/morph.* + * tests/morph.*: test data, example for morphological analysis, + stemming and generation + + * tools/analyze, tools/chmorph: extended and new demo applications: + - analyze (originally hunmorph): analyses and stems input words, + generates word forms from input word pairs. + - chmorph: morphological transformation filter + + * configure.ac, hunspell/makefile.am: set library version number. + Bug reported by Rene Engelhard. + + * affentry.cxx, affixmgr.cxx: new pattern matching algorithm in + condition checking of affix rules instead of the Dömölki-algorithm: + - Unlimited condition length (instead of max. 8 characters). + - Less memory consumption, especially useful for affix rich languages: + 5,4 MB memory savings with hu_HU dictionary. + - Speed change depends from dictionaries and CPU caches: English spell + checking is 4% faster on Linux words with en_US dictionary, Hungarian + spell checking is 25% slower on most frequent words of Hungarian + Webcorpus. + + * tests/sug.*, sugutf.*: updated test data (use "a" and "lot" + dictionary items instead of "a lot".) + + * src/hunspell/hunspell.cxx: free(csconv) instead of delete csconv. + Report and patch by Sylvain Paschein in Mozilla Issue 398268. + + * suggestmgr.cxx, tools/hunspell.cxx: bad spelling of "misspelled". + Ubuntu Bug #134792, patch by Malcolm Parsons. + + * tests/base_utf.*: use Unicode apostrophe instead of 8-bit one. + + * hunspell.cxx, hashmgr.cxx: add(): use HashMgr::add() + +2007-10-25 Pavel Janík <pjanik at OOo>: + * hunspell/csutil.cxx: Fix type cast warnings on 64bit Linux in + printing of character positions in u8_u16(). OOo issue 82984. + +2007-09-05 Németh László <nemeth at OOo>: + * win_api/Hunspell.vproj, parsers/testparser.cxx,textparser.hxx: + warning fixes and removing unnecessary Windows project file. + Reported by Ingo H. de Boer. + + * hashmgr.*, {affixmgr,suggestmgr}.cxx: optimized data structure + for variable-count fields (only "ph" transliteration field in + this version, see next item). Also less memory consumption: + -13% (0.75 MB) with en_US dictionary, -6% (1 MB) with hu_HU. + + * suggestmgr.cxx: dictionary based phonetic suggestion for special + or foreign pronounciation (see also rule-based PHONE in manual). + Usage: tab separated field in dictionary lines, started with "ph:". + The field contains a phonetic transliteration of the word: + +Marseille ph:maarsayl + * tests/phone.*: test data for dictionary and rule based phonetic + suggestion. + + * hunspell.cxx: fix potential bad memory access in allcap word + capitalization in suggest() (bug of previous version). + + * hunspell.cxx, atypes.hxx: set correct limit for UTF-8 encoded + input words (256 byte). + + * suggestmgr.cxx: improved REP suggestions with spaces: it works + without dictionary modification. + OOo issue 80147, reported by Davide Prina. + * tests/rep.*: new test data: higher priority for "alot" -> "a lot", + and Italian suggestion "un'alunno" -> "un alunno". + + * affixmgr.cxx: fix Unicode ngram suggestions in expand_rootword(). + (Suggestions with bad affixes.) + Bug reported by Vitaly Piryatinksy <piv dot v dot vitaly at gmail>. + * tests/ngram_utf_fix.*: test based on Vitaly Piryatinksy's data. + + * suggestmgr.cxx: fix twowords() for last UTF-8 multibyte character. + (conditional jump or move depended on uninitialised value). + +2007-08-29 Ingo H. de Boer <idb_winshell at SF.net>: + * win_api/{hunspell,libhunspell, testparser}.vcproj: new project + files for the library and the executables. + + * Hunspell.rc, Hunspell.sln, config.h: updated versions. + Version number problem also reported by András Tímár. + +2007-08-27 Németh László <nemeth at OOo>: + * suggestmgr.hxx: put fixed version. Bug report by Ingo H. de Boer. + + * suggestmgr.cxx: remove variable-length local character array + reported by Ingo H. de Boer. + +2007-08-27 Németh László <nemeth at OOo>: + * suggestmgr.hxx: change bad time_t to clock_t in header, too. + Bug reports or patches by Ingo H. de Boer under SF.net + Bug ID 1781951, János Mohácsi and Gábor Zahemszky, András Tímár, + OMax3 at SF.net under SF.net Bug ID 1781592. + + * phonet.*: change variable-length local character array to + portable fixed size character array. Problem reported by + Ingo H. de Boer under SF.net Bug ID 1781951 and + Ryan VanderMeulen. + + * suggestmgr.cxx: remove debug message (also by + Ingo H. de Boer). + +2007-08-26 Ingo H. de Boer <idb_winshell at SF.net>: + * win_api/Hunspell.vcproj: updated version (with phonet.*) + +2007-08-23 Németh László <nemeth at OOo>: + * phonet.{c,h}xx, suggestmgr.cxx: PHONE parameter: + pronounciation based suggestion using Björn Jacke's original Aspell + phonetic transcription algorithm (http://aspell.net), relicensed + under GPL/LGPL/MPL tri-license with the permission of the author. + Usage: see manual. + + * affixmgr,suggestmgr.cxx: add KEY parameter for keyboard and + input method error related suggestions. + Example: KEY qwertyuiop|asdfghjkl|zxcvbnm + + * man/hunspell.4: description about PHONE and KEY suggestion parameters. + + * suggestmgr.cxx: enhancements for better suggestions: + - Set ngram suggestions for badchar-type errors + and only two word and compound word suggestions, too. + - Separate not compound and compound word + suggestions for MAP suggestion, too. + - Double swap suggestions for short words. + For example: ahev -> have, hwihc -> which. + - Better time limits using clock() instead of time() + (tenths of a second resolution instead of second ones). + - leftcommonsubstring() weigth function. + + * htype.hxx, hashmgr.cxx: blen (byte length) and clen (character + length) fields instead of wlen + + * affixmgr.cxx: fix get_syllable() for bad Unicode inputs. + + * tests/suggestiontest/*: test environment for suggestions + +2007-08-07 Martijn Wargers: + * csutil.cxx: fix Mingw build error associated with ToUpper() call. + Report and patch in Mozilla Issue 391447. + +2007-08-07 Robert Longson: + * atypes.cxx: use empty inline function HUNSPELL_WARNING instead of + variadic macros to switch of Hunspell warnings. + Reported by Gavin Sharp in Mozilla Issue 391147. + +2007-08-05 Ginn Chen: + * hashmgr.cxx: Hunspell failed to compile on OpenSolaris (use stdio + instead of csdio). Report and patch in Mozilla Issue 391040. + +2007-07-25 Németh László <nemeth at OOo>: + * parsers/*.cxx: Hunspell executable recognises and accepts URLs, + e-mail addresses, directory paths, reported by Jeppe Bundsgaard. + * src/tools/hunspell.cxx: --check-url: new option of Hunspell program. + Use --check-url, if you want check URLs, e-mail addresses and paths. + + * parsers/textparser.cxx: strip colon at end of words for Finnish + and Swedish (colon may be in words in Finnish and Swedish). + Problem reported by Lars Aronsson. + * tests/colons_in_words.*: test data + + * tests/digits_in_words.*: example for using digits in words + (eg. 1-jährig, 112-jährig etc. in German), reported by Lars Aronsson. + + * hashmgr.cxx: Hunspell accepts allcaps forms of mixed case + words of personal dictionaries (+allcaps custom dictionary words with + allcaps affixes). + Sf.net Bug ID 1755272, reported by Ellis Miller. + + * hashmgr.cxx: fix small memory leaks with alias compressed + dictionaries (free flag vectors of affixed personal dictionary words + and flag vectors of hidden capitalized forms of mixed case and + allcaps words). + + * affixmgr.cxx: fix COMPOUNDRULE checking with affixed compounds. + Sf.net Bug ID 1706659, reported by Björn Jacke. Also fixing for + OOo Issue 76067 (crash-like deceleration for hexadecimal numbers + with long FFFFFF sequence using en_US dictionary). + + * tools/hunspell.cxx: add missing return to save_privdic(). + + * man/hunspell.4: add information about affixation of personal words: + "Personal dictionaries are simple word lists, but with optional + word patterns for affixation, separated by a slash: + + foo + Foo/Simpson + + In this example, "foo" and "Foo" are personal words, plus Foo + will be recognised with affixes of Simpson (Foo's etc.)." + +2007-07-18 Németh László <nemeth at OOo>: + * src/win_api/: add missing resource files, reported by Ingo H. de Boer. + +2007-07-16 Németh László <nemeth at OOo>: + * hunspell.cxx: fix dot removing from UTF-8 encoded words in cleanword2() + (Capitalised words with dots, as "Something." were not recognised + using Unicode encoded dictionaries.) + * tests/{base.*,base_utf.*}: extended and new test files for + dot removing and Unicode support. + + * tools/hunspell.cxx: fix Cygwin, OS X compatibility using platform + specifics iconv() header by ICONV_CONST macro of Autoconf. + Sf.net Bug ID 1746030, reported by Mike Tian-Jian Jiang. + Sf.net Bug ID 1753939, reported by Jean-Christophe Helary. + + * tools/hunspell.cxx: fix missing global path setting with -d option. + + * tests/test.sh: fix broken Valgrind checking (missing warnings + with VALGRIND=memcheck make check). + + * csutil.cxx: fix condition in u8_u16() to avoid invalid read + of not null-terminated character arrays (detected by Valgrind + in Hunspell executable: associated with 8-bit character table + conversion in tools/hunspell.cxx). + + * csutil.cxx: free_utf_tbl(): use utf_tbl_count-- instead of utf_tbl--. + Memory leak in Hunspell executable detected by Valgrind. + + * hashmgr.cxx: add missing free_utf_tbl(), memory leak in Hunspell + executable detected by Valgrind. + + * hashmgr.cxx: load_tables(): fix memory error in spec. capitalization. + Use sizeof(unsigned short) instead of bad sizeof(unsigned short*). + Invalid memory read detected by Valgrind. + + * hashmgr.cxx: add_word(): fix memory error in spec. capitalization. + Update also affix array length of capitalized homonyms. Invalid + memory read detected by Valgrind. + + * hunspell.cxx: suggest(): fix invalid memory write and leak. + Bad realloc() and missing free() detected by Valgrind associated + with suggestions for "something.The" type spelling errors. + + * {dictmgr,csutil,hashmgr,suggestmgr}.cxx: check memory allocation. + Sf.net Bug ID 1747507, based on the patch by Jose da Silva. + +2007-07-13 Ingo H. de Boer <idb_winshell at SF.net>: + * atypes.cxx: fix Visual C compatibility: Using + "HUNSPELL_WARNING(a,b,...} {}" macro instead of empty "X(a,b...)". + + * hunspell.cxx: changes for Windows API. + * win_api/Hunspell.*: new resource files + * win_api/hunspelldll.*: set optional Hunspell and Borland spec. codes + Sf.net Bug ID 1753802, patch by Ingo H. de Boer. + See also Sf.net Bug ID 1751406, patch by Mike Tian-Jian Jiang. + +2007-07-09 Caolan McNamara <cmc at OO.o>: + * {hunspell,hashmgr,affentry}.cxx: fix warnings of Coverity program + analyzer. Sf.net Bug ID, 1750219. + +2007-07-06 Németh László <nemeth at OOo>: + * atypes.cxx: warning-free swallowing of conditional warning messages + and their parameters using empty HUNSPELL_WARNING(a,b...) macro. + * {affixmgr,atypes,csutil}.cxx: fix unused variable warnings + using WARNVAR macro for conditionally named variables. + * hashmgr.cxx: fix unused variable warning in add_word() by cond. name + * hunspell.cxx: fix shadowed declaration of captype var. in suggest() + +2006-06-29 Caolan McNamara <cmc at OO.o>: + * hunspell.cxx: patch to fix possible memory leak in analyze() of + experimental morphological analyzer code. Sf.net Bug ID 1745263. + +2007-06-29 Németh László <nemeth at OOo>: +improvements: + * src/hunspell/hunspell.cxx: check bad capitalisation of Dutch letter IJ. + - Sf.net Feature Request ID 1640985, reported by Frank Fesevur. + - Solution: FORBIDDENWORD for capitalised word forms (need + an improved Dutch dictionary with forbidden words: Ijs/*, etc.). + * tests/IJ.*: test data and example. + + * hashmgr.cxx, hunspell.cxx: check capitalization of special word forms + - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG + Sf.net Bug ID 1398550, reported by Dmitri Gabinski. + - allcap words and suffixes: UNICEF's - UNICEF'S + - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA + For Catalan, French and Italian languages. + Reported by Davide Prina in OOo Issue 68568. + * tests/allcaps*: tests for OPENOFFICE.ORG, UNICEF'S capitalization. + * tests/i68568*: tests for SANT'ELIA capitalization. + + * hunspell/hunspell.cxx: suggestion for missing sentence spacing: + something.The -> something. The + + * tools/hunspell.cxx: multiple character encoding support + - -i option: custom input encoding + Sf.net Bug ID 1610866, reported by Thobias Schlemmer. + Sf.net Bug ID 1633413, reported by Dan Kenigsberg. + See also hunspell-1.1.5-encoding.patch of Fedora from Caolan Mc'Namara. + * tests/*.test: add input encodings + + * tools/hunspell.cxx: use locale data for default dictionary names. + Sf.net Bug ID 1731630, report and patch from Bernhard Rosenkraenzer, + See also hunspell-1.1.4-defaultdictfromlang.patch of Fedora Linux + from Caolan McNamara. + + * tools/hunspell.cxx: fix 8-bit tokenization (letters without + casing, like ß or Hebrew characters now are handled well) + + * tools/hunspell.cxx: dictionary search path + - DICPATH environmental variable + - -D option: show directory path of loaded dictionary + - automatic detection of OpenOffice.org directories + +fixes: + * affixmgr.cxx: fault-tolerant patch for REP and other affix + table data problems. Problem with Hunspell and en_GB dictionary + reported by Thomas Lange in OOo Issue 76098 and + Stephan Bergmann in OOo Issue 76100. + Sf.net Bug ID 1698240, reported by Ingo H. de Boer. + + * csutil.cxx: fix mkallcap_utf() for allcaps suggestion in UTF-8. + + * suggestmgr.cxx: fix bad movechar_utf() (missing strlen()). + + * hunspell.cxx: fix bad degree sign detection in Unicode + hu_HU environment. + + * hunspell/hunspell.cxx: free allocated memory of csconv in + ported Mozilla code. + - Mozilla Bugzilla Bug 383564, report and Mozilla MySpell patch + by Andrew Geul. Reported by Ryan VanderMeulen for Hunspell. + + * suggestmgr.cxx: fix minor difference in Unicode suggestion + (ngram suggestion of allcaps words in Unicode). + + * hashmgr.cxx: close file handle after errors. + Sf.net Bug ID 1736286, reported by John Nisly. + + * configure.ac: syntax error (shell variable with spaces). + Sf.net Bug ID 1731625, reported by Bernhard Rosenkraenzer. + + * hunspell.cxx: check_word(): fix bad usage of info pointer. + + * hashmgr.cxx: fix de_DE related bug (accept words with leading dash). + Sf.net Bug ID 1696134, reported by Björn Jacke. + + * suggestmgr.cxx, tests/1695964.*: fix NEEDAFFIX homonym suggestion. + Sf.net Bug ID 1695964, reported by Björn Jacke. + + * tests/1463589*: capitalized ngram suggestion test data for + Sf.net Bug ID 1463589, reported by Frederik Fouvry. + + * csutil.cxx, affixmgr.cxx: fix possible heap error with + multiple instances of utf_tbl. + Sf.net Bug ID 1693875, reported by Ingo H. de Boer. + + * affixmgr.cxx, suggestmgr.cxx, license.hunspell: convert to ASCII. + Locale dependent compiling problems. Sf.net Bug ID 1694379, reported + by Mike Tian-Jian Jiang. OOo Issue 78018 reported by Thomas Lange. + + * tests/test.sh: compatibility issues + - fix Valgrind support (check shared library instead of shell wrapper) + - remove deprecated "tail +2" syntax + - set 8-bit locale for testing (LC_ALL=C) + + * hunspell.hxx: remove license.* and config.h dependencies. + - hunspell-1.1.5-badheader.patch from Caolan McNamara <cmc at OO.o> + +2007-03-21 Németh László <nemeth at OOo>: + * tools/Makefile.am, munch.h, unmunch.h: add missing munch.h and unmunch.h + Reported by Björn Jacke and Khaled Hosny (sf.net Bug ID 1684144) + * hunspell/hunspell.cxx, hunspell.hxx: fix --with-ui compliling error (add get_csconv()) + Reported by Khaled Hosny (sf.net Bug ID 1685010) + +2007-03-19 Németh László <nemeth at OOo>: + * csutil.cxx, hunspell/hunspell.cxx: Unicode non BMP area (>65K character range) support + (except conditional patterns and strip characters of affix rules) + * tests/utf8_nonbmp*: test data + + * src/hunspell/*: add Mozilla patches from David Einstein + - run-time generated 8-bit character tables + - other Mozilla related changes (see Mozilla Bugzilla Bug 319778) + + * csutil.cxx, affixmgr.cxx, hashmgr.cxx: optimized version of IGNORE feature + - IGNORE works with affixes (except strip characters and affix conditions) + * tests/ignore*: test data with latin characters + * tests/ignoreutf*: Unicode test data with Arabic diacritics (Harakat) + + * src/hunspell/suggestmgr.cxx: new edit distance suggestion methods + - capitalization: nasa -> NASA + - long swap: permenant -> permanent + - long mov.: Ghandi -> Gandhi + - double two characters: vacacation -> vacation + * tests/sug.*: test data + + * src/hunspell/affixmgr.cxx: space in REP strings (alot -> a lot) + Note: Underline character signs the space in REP strings: REP alot a_lot, and + put the expression with space ("a lot") into the dic file (see tests/sug). + + * hashmgr.cxx, affixmgr.cxx: ignore Unicode byte order mark (BOM sequence) + * tests/utf8_bom*: test data + + * hunspell/*.cxx: OOo Issue 68903 - Make lingucomponent warning-free on wntmsci10 + - fix Hunspell related warning messages on Windows platform (except some assignment + within conditional expressions). Reported and started by Stephan Bergmann. + + * hunspell/affixmgr.cxx: fix OOo Issue 66683 - hunspell dmake debug=x fails + - Reported by Stephan Bergmann. + + * src/hunspell/hunspell.[ch]xx: thread safe API for Hunspell executable + (removing prev*() functions, new spell(word, info, root) function) + + * configure.ac, src/hunspell/*: HUNSPELL_EXPERIMENTAL code + --with-experimental configure option (conditional compiling of morphological analyser + and stemmer tools) + + * configure.ac, src/hunspell/*: conditional Hunspell warning messages + --with-warnings configure option + + * affixmgr.cxx: new, optimized parsing functions + + * affixmgr.cxx: fix homonym handling for German dictionary project, + reported by Björn Jacke (sf.net Bug ID 1592880). + * tests/1592880.*: test data by Björn Jacke + + * src/hunspell/affixmgr.cxx: fix CIRCUMFIX suggestion + Bug reported by Erdal Ronahi. + + * hunspell.cxx: reverse root word output (complex prefixes) + Bug reported by Munzir Taha. + + * tools/hunspell.cxx: fix Emacs compatibility, patch by marot at sf.net + - no % command in PIPE mode (SourceForge BugTracker 1595607) + - fix HUNSPELL_VERSION string + + * suggestmgr.[hc]xx: rename check() functions to checkword() (OOo Issue 68296) + adopt MySpell patch by Bryan Petty (tierra at ooo) for Hunspell source + + * csutil.cxx, munch.c, unmunch.c: adopt relevant parts of the MinGW patch + (OOo Issue 42504) by tonal at ooo + + * affigmgr.cxx: remove double candidate_check() call, reported by Bram Moolenaar + + * tests/test.sh: add LC_ALL="C" environment. Locale dependency of make check + reported by Gentoo project. + + * src/tools/hunspell.cxx: UTF-8 highlighting fix for console UI + (not solved: breaking long UTF-8 lines) + + * src/tools/unmunch.c: fix bad generation if strip is shorter than condition, + reported by Davide Prina + * src/tools/unmunch.h: increase 5000 -> 500000 + + * src/tools/hunspell.cxx: fix memory error in suggestion (uninitialized parameter), + Bug also reported by Björn Jacke in SourceForge Bug 1469957 + + * csutil.cxx, affixmgr.cxx: fix Caolan McNamara's patch for non OOo environment + +2006-11-11 Caolan McNamara <cmc at OO.o>: + * csutil.cxx, affixmgr.cxx: UTF-8 table patch (OOo Issue 71449) + Description: memory optimization (OOo doesn't use the large UTF-8 table). + + * Makefile.am: shared library patch (Sourceforge ID 1610756) + + * hunspell.h, hunspell.cxx: C API patch (Sourceforge ID 1616353) + + * hunspell.pc: pkgconfig patch (Sourceforge ID 1639128) + +2006-10-17 Ryan Jones <at Mozilla Bugzilla>: + * affixmgr.cxx: missing fclose(affixlst) calls + Reported by <gavins at ooo> in OOo Issue 70408 + +2007-07-11 Taha Zerrouki <taha at gawab>: + * affixmgr.cxx, hunspell.cxx, hashmgr.cxx, csutil.cxx: IGNORE feature to remove + optional Arabic and other characters from input and dictionary words. + * src/hunspell/langnum.hxx: add Arabic language number, lang_ar=96 + * tests/ignore.*: test data + +2006-05-28 Miha Vrhovnik <mvrhov at users.sourceforge>: + * src/win_api/*: C API for Windows DLLs + - also Delphi text editor example (see on Hunspell Sourceforge page) + +2006-05-18 Kevin F. Quinn <kevquinn at gentoo>: + * utf_info.cxx: struct -> static struct + Shared library patch also developed by Gentoo developers (Hanno Meyer-Thurow, + Diego Pettenò, Kevin F. Quinn) + +2006-02-02 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/hunspell.cxx: suggest(): replace "fooBar" -> "foo bar" suggestions + with "fooBar" ->"foo Bar" (missing spaces are typical OCR bugs). + Bug reported by stowrob at OOo in Issue 58202. + * src/hunspell/suggestmgr.cxx: twowords(): permit 1-character words. + (restore MySpell's original behavior). Here: "aNew" -> "a New". + * tests/i58202.*: test data + + * src/parsers/textparser.cxx: fix Unicode tokenization in is_wordchar() + (extra word characters (WORDCHARS) didn't work on big-endian platforms). + + * src/hunspell/{csutil,affixmgr}.cxx: inline isSubset(), isRevSubset(): + little speed optimalization for languages with rich morphology. + + * src/tools/hunspell.cxx: fix bad --with-ui and --with-readline compiling + when (N)curses is missing. Reported by Daniel Naber. + +2006-01-19 Tor Lillqvist <tml@novell.com> + * src/hunspell/csutil.cxx: mystrsep(): fix locale-dependent isspace() tokenization + +2006-01-06 András Tímár <timar@fsf.hu> + * src/hunspell/{hashmgr.hxx,hunspell.cxx}: fix Visual C++ compiling errors + +2006-01-05 Németh László <nemethl@gyorsposta.hu>: + * COPYING: set GPL/LGPL/MPL tri-license for Mozilla integration. + Rationale: Mozilla source code contains an old MySpell version + with GPL/LGPL/MPL tri-license. (MPL license is a copyleft license, similar + to the LGPL, but it acts on file level.) + * COPYING.LGPL: GNU Lesser General Public License 2.1 (LGPL) + * COPYING.MPL: Mozilla Public License 1.1 (MPL) + * license.hunspell, src/hunspell/license.hunspell: GPL/LGPL/MPL tri-license + + * src/hunspell/{affixmgr,hashmgr}.*: AF, AM alias definitions in affix file: + compression of flag sets and morphological descriptions (see manual, + and tests/alias* test files). + Rationale: Alias compression is also good for loading time and memory + efficiency, not only smaller resources. + * src/tools/makealias: alias compression utility + (usage: ./makealias file.dic file.aff) + * tests/alias{,2,3}: AF, AM tests + * man/hunspell.4: add AF, AM documentation + * src/hunspell/affentry.cxx, atypes.hxx: add new opts bits (aeALIASM, aeALIASF) + + * tools/hunspell, src/parser/*, src/hunspell/*: Hunspell program + tokenizes Unicode texts (only with UTF-8 encoded dictionaries). + Missing Unicode tokenization reported by Björn Jacke, Egmont Koblinger, + Jess Body and others. + Note: Curses interactive interface hasn't worked perfectly yet. + * tests/*.tests: remove -1 parameters of Hunspell + * tests/*.{good,wrong}: remove tabulators + + * src/hunspell/{hunspell,affixmgr}.cxx: BREAK option: break words at + specified break points and checking word parts separately (see manual). + Note: COMPOUNDRULE is better (or will be better) for handling dashes and + other compound joining characters or character strings. Use BREAK, if you + want check words with dashes or other joining characters and there is no time + or possibility to describe precise compound rules with COMPOUNDRULE. + * tests/break.*: BREAK example. + + * src/hunspell/{affixmgr,hunspell}.cxx: add CHECKSHARPS declaration instead + of LANG de_DE definitions to handle German sharp s in both spelling and + suggestion. + * src/hunspell/hunspell.cxx: With CHECKSHARPS, uppercase words are valid + with both lower sharp s (it's is optional for names in German legal texts) + and SS (MÜßIG, MÜSSIG). Missing lower sharp s form reported by Björn Jacke. + * src/hunspell/hunspell.cxx: KEEPCASE flag on a sharp s word has a special + meaning with CHECKSHARPS declaration: KEEPCASE permits capitalisation and SS upper + casing of a sharp s word (Müßig and MÜSSIG), but forbids the upper cased form + with lower sharp s character(s): *MÜßIG. + * tests/germancompounding*: add CHECKSHARPS, remove LANG + * tests/checksharps*: add CHECKSHARPS and KEEPCASE, remove LANG + + * src/hunspell/hunspell.cxx: improved suggestions: + - suggestions for pressed Caps Lock problems: macARONI -> macaroni + - suggestions for long shift problems: MAcaroni -> Macaroni, macaroni + - suggestions for KEEPCASE words: KG -> kg + * src/hunspell/csutil.cxx: fix mystrrep() function: + - suggestions for lower sharp s in uppercased words: MÜßIG -> MÜSSIG + * tests/checksharps{,utf}.sug: add tests for mystrrep() fix + + * src/hunspell/hashmgr.cxx: Now dictionary words can contain slashes + with the "\/" syntax. Problem reported by Frederik Fouvry. + + * src/hunspell/hunspell.cxx: fix bad duplicate filter in suggest(). + (Suggesting some capitalised compound words caused program crash + with Hungarian dictionary, OOo Issue 59055). + + * src/hunspell/affixmgr.cxx: fix bad defcpd_check() call in compound_check(). + (Overlapping new COMPOUNDRULE and old compounding methods caused program + crash at suggestion.) + + * src/hunspell/affixmgr.{cxx,hxx}: check affix flag duplication at affix classes. + Suggested by Daniel Naber. + + * src/hunspell/affentry.cxx: remove unused variable declarations (OOo i58338). + Compiler warnings reported by András Tímár and Martin Hollmichel. + + * src/hunspell/hunspell.cxx: morph(): not analyse bad mixed uppercased forms + (fix Arabic morphological analysis with Buckwalter's Arabic transliteration) + + * src/hunspell/affentry.{cxx,hxx}, atypes.hxx: little memory optimization + in affentry: + - using unsigned char fields instead of short (stripl, appndl, numconds) + - rename xpflg field to opts + - removing utf8 field, use aeUTF8 bit of opts field + + * configure.ac: set tests/maputf.test to XFAILED on ARM platform. + Fail reported by Rene Engelhard. + + * configure.ac: link Ncursesw library, if exists. + + * BUGS: add BUGS file + + * tests/complexprefixes2.*: test for morphological analysis with COMPLEXPREFIXES + + * src/hunspell/affixmgr.cxx: use "COMPOUNDRULE" instead of + "COMPOUND". The new name suggested by Bram Moolenaar. + * tests/compoundrule*: modified and renamed compound.* test files + + * man/hunspell.4: AF, AM, BREAK, CHECKSHARPS, COMPOUNDRULE, KEEPCASE. + - also new addition to the documentation: + Header of the dictionary file define approximate dictionary size: + ``A dictionary file (*.dic) contains a list of words, one per line. + The first line of the dictionaries (except personal dictionaries) + contains the _approximate_ word count (for optimal hash memory size).'' + Asked by Frederik Foudry. + + One-character replacements in REP definitions: ``It's very useful to + define replacements for the most typical one-character mistakes, too: + with REP you can add higher priority to a subset of the TRY suggestions + (suggestion list begins with the REP suggestions).'' + +2005-11-11 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/affixmgr.*: fix Unicode MAP errors (sorted only n-1 + characters instead of n ones in UTF-16 MAP character lists). + Bug reported by Rene Engelhard. + + * src/hunspell/affixmgr.*: fix infinite COMPOUND matching (default char + type is unsigned on PowerPC, s390 and ARM platforms and it will never + be negative). Bug reported by Rene Engelhard. + + * src/hunspell/{affixmgr,suggestmgr}.cxx: fix bad ONLYINCOMPOUND + word suggestions. + * tests/onlyincompound.sug: empty test file to check this fix. + Bug reported by Björn Jacke. + + * src/hunspell/affixmgr.cxx: fix backtracking in COMPOUND pattern matching. + * tests/compound6.*: test files to check this fix. + + * csutil.cxx: set bigger range types in flag_qsort() and flag_bsearch(). + + * affixmgr.hxx: set better type for cont_classes[] Boolean data (short -> char) + + * configure.ac, tests/automake.am: set platform specific XFAIL test + (flagutf8.test on ARM platform) + +2005-11-09 Németh László <nemethl@gyorsposta.hu>: +improvements: + * src/hunspell/affixmgr.*: new and improved affix file parameters: + + - COMPOUND definitions: compound patterns with regexp-like matching. + See manual and test files: tests/compound*.* + Suggested by Bram Moolenaar. + Also useful for simple word-level lexical scanning, for example + analysing numbers or words with numbers (OOo Issue #53643): + http://qa.openoffice.org/issues/show_bug.cgi?id=53643 + Examples: tests/compound{4,5}.*. + + - NOSUGGEST flag: words signed with NOSUGGEST flag are not suggested. + Proposed flag for vulgar and obscene words (OOo Issue #55498). + Example: tests/nosuggest.*. + Problem reported by bobharvey at OOo: + http://qa.openoffice.org/issues/show_bug.cgi?id=55498 + + - KEEPCASE flag: Forbid capitalized and uppercased forms of words + signed with KEEPCASE flags. Useful for special ortographies + (measurements and currency often keep their case in uppercased + texts) and other writing systems (eg. keeping lower case of IPA + characters). + + - CHECKCOMPOUNDCASE: Forbid upper case characters at word bound in compounds. + Examples: tests/checkcompoundcase* and tests/germancompounding.* + + - FLAG UTF-8: New flag type: Unicode character encoded with UTF-8. + Example: tests/flagutf8.*. + Rationale: Unicode character type can be more readable + (in a Unicode text editor) than `long' or `num' flag type. + +bug fixes: + * src/hunspell/hunspell.cxx: accept numbers and numbers with separators (i53643) + Bug reported by skelet at OOo: + http://qa.openoffice.org/issues/show_bug.cgi?id=53643 + + * src/hunspell/csutil.cxx: fix casing data in ISO 8859-13 character table. + + * src/hunspell/csutil.cxx: add ISO-8859-15 character encoding (i54980) + Rationale: ISO-8859-15 is the default encoding of the French OpenOffice.org + dictionary. ISO-8859-15 is a modified version of ISO-8859-1 + (latin-1) character encoding with French œ ligatures and euro + symbol. Problem reported by cbrunet at OOo in OOo Issue 54980: + http://qa.openoffice.org/issues/show_bug.cgi?id=54980 + + * src/hunspell/affixmgr.cxx: fix zero-byte malloc after a bad affix header. + Patch by Harri Pitkänen. + + * src/hunspell/suggestmgr.cxx: fix bad NEEDAFFIX word suggestion + in ngram suggestions. Reported by Daniel Naber and Friedel Wolff. + + * src/hunspell/hashmgr.cxx: fix bad white space checking in affix files. + src/hunspell/{csutil,affixmgr}.cxx: add other white space separators. + Problems with tabulators reported by Frederik Fouvry. + + * src/hunspell/*: replace system-dependent <license.*> #include + parameters with quoted ones. Problem reported by Dafydd Jones. + + * src/hunspell/hunspell.cxx: fix missing morphological analysis of dot(s) + Reported by Trón Viktor. + +changes: + * src/hunspell/affixmgr.cxx: rename PSEUDOROOT to NEEDAFFIX. + Suggested by Bram Moolenaar. + + * src/hunspell/suggestmgr.hxx: Increase default maximum of + ngram suggestions (3->5). Suggested by Kevin Hendricks. + + * src/hunspell/htypes.hxx: Increase MAXDELEN for long affix flags. + + * src/hunspell/suggestmgr.cxx: modify (perhaps fix) Unicode map suggestion. + tests/maputf test fail on ARM platform reported by Rene Engelhard. + + * src/hunspell/{affentry.cxx,atypes.hxx}: remove [PREFIX] and + MISSING_DESCRIPTION messages from morphological analysis. + Problems reported by Trón Viktor. + + * tests/germancompounding.{aff,good}: Add "Computer-Arbeit" test word. + Suggested by Daniel Naber. + + * doc/man/hunspell.4: Proof-reading patch by Goldman Eleonóra. + + * doc/man/hunspell.4: Fix bad affix example (replace `move' with `work'). + Bug reported by Frederik Fouvry. + + * tests/*: new test files: + affixes.*: simple affix compression example from Hunspell 4 manual page + checkcompoundcase.*, checkcompoundcase2.*, checkcompoundcaseutf.* + compound.*, compound2.*, compound3.*, compound4.*, compound5.* + compoundflag.* (former compound.*) + flagutf8.*: test for FLAG UTF-8 + germancompounding.*: simplification with CHECKCOMPOUNDCASE. + germancompoundingold.* (former germancompounding.*) + i53643.*: check numbers with separators + i54980.*: ISO8859-15 test + keepcase.*: test for KEEPCASE + needaffix*.* (former pseudoroot*.* tests) + nosuggest.*: test for NOSUGGEST + +2005-09-19 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/suggestmgr.cxx: improved ngram suggestion: + - detect not neighboring swap characters (pernament -> permanent) + Rationale: ngram method has a significant error with not neighboring + swap characters, especially when swap is in the middle of the word. + - suggest uppercase forms (unesco -> UNESCO, siggraph's -> SIGGRAPH's) + - suggest only ngram swap character and uppercase form, if they exist. + Rationale: swap character and casing equivalence give mutch better + suggestions as any other (weighted) ngram suggestions. + - add uppercase suggestion (PERMENANT -> PERMANENT) + + * src/hunspell/*: complete comparison with MySpell 3.2 (in OOo beta 2): + - affixmgr.cxx: add missing numrep initialization + - hashmgr.cxx: add_word(): don't allocate temporary records + - hunspell.cxx: in suggest(): + - check capitalized words first (better sug. order for proper names), + - check pSMgr->suggest() return value + - set pSMgr->suggest() call to not optional in HUHCAP + - csutil.cxx: fix bad KOI8-U -> koi8r_tbl reference in enc_entry encds + - csutil.cxx: fix casing data in ISO 8859-2, Windows 1251 and KOI8-U + encoding tables. Bug reported by Dmitri Gabinski. + + * src/hunspell/affixmgr.*: improved compound word and other features + - generalize hu_HU specific compound word features with new affix file + parameters, suggested by Bram Moolenaar: + - CHECKCOMPOUNDDUP: forbid word duplication in compounds (eg. foo|foo) + - CHECKCOMPOUNDTRIPLE: forbid triple letters in compounds (eg. foo|obar) + - CHECKCOMPOUNDPATTERN: forbid patterns at word bounds in compounds + - CHECKCOMPOUNDREP: using REP replacement table, forbid presumably bad + compounds (useful for languages with unlimited number of compounds) + - ONLYINCOMPOUND flag works also with words (see tests/onlyincompound.*) + Suggested by Daniel Naber, Björn Jacke, Trón Viktor & Bram Moolenaar. + - PSEUDOROOT works also with prefixes and prefix + suffix combinations + (see tests/pseudoroot5.*). Suggested by Trón Viktor. + - man/hunspell.4: updated man page + + * src/hunspell/affixmgr.*: fix incomplete prefix handling with twofold + suffixes (delete unnecessary contclasses[] conditions in + prefix_check_twosfx() and prefix_check_twosfx_morph()). + Bug reported by Trón Viktor. + + * src/hunspell/affixmgr.*: complete also *_morph() functions with + conditions of new Hunspell features (circumfix, pseudoroot etc.). + + * src/hunspell/suggestmgr.cxx: + - fix missing suggestions for words with crossed prefix and suffix + - fix redundant non compound word checking + - fix losing suggestions problem. Bug reported by Dmitri Gabinski. + + * src/hunspell/dictmgr.*: + - add new dictionary manager for Hunspell UNO modul + Problems with eo_ANY Esperanto locale reported by Dmitri Gabinski. + + * src/hunspell/*: use precise constant sizes for 8-bit and 16-bit character + arrays with MAXWORDUTF8LEN and MAXSWUTF8L macros. + + * src/hunspell/affixmgr.cxx: fix bad MAXNGRAMSUGS parameter handling + + * src/hunspell/affixmgr.cxx, src/tools/{un}munch.*: fix GCC 4.0 warnings + on fgets(), reported by Dvornik László + + * po/hu.po: improved translation by Dvornik László + + * tests/test.sh: improved test environment + - add suggestion testing (see tests/*.sug) + - add memory debugging environment, based on the excellent Valgrind debugger. + Usage on Linux and experimental platforms of Valgrind: + VALGRIND=memcheck make check + - rename test_hunmorph to test.sh + + * tests/*: new tests: + - base.*: base example based on MySpell's checkme.lst. + - map{,utf}.*, rep{,utf}: MAP and REP suggestion examples + - tests on new CHECKCOMPOUND, ONLYINCOMPOUND and PSEUDOROOT features + - i54633.*: capitalized suggestion test for Issue 54633 from OOo's Issuezilla + - i35725.*: improved ngram suggestion test for Issue 35725 + +2005-08-26 Németh László <nemethl@gyorsposta.hu>: +improvements: + + * src/hunspell/suggestmgr.cxx: + Unicode support in related character map suggestion + + * src/hunspell/suggestmgr.cxx: Unicode support in ngram suggestion + + * src/hunspell/{suggestmgr,affixmgr,hunspell}.cxx: improve ngram suggestion. + Fix http://qa.openoffice.org/issues/show_bug.cgi?id=35725. See release + notes for examples. This problem reported by beccablain at OOo. + - ngram suggestions now are case insensitive (see `Permenant' bug in Issuezilla) + - weight ngram suggestions (with the longest common subsequent algorithm, + also considering lengths of bad word and suggestion, identical first + letters and almost completely identical character positions) + - set strict affix congruency in expand_rootword(). Now ngram suggestions + are good for languages with rich morphology and also better for English. + Rationale: affixed forms of the first ngram suggestion + very often suppress the second and subsequent root word suggestions. But + faults in affixes are more uncommon, and can be fix without suggestions. + We must prefer the more informative second and subsequent root word + suggestions instead of the suggestions for bad affixes. + - a better suggestion may not be substring of a less good suggestion + Rationale: Suggesting affixed forms of a root word is + unnecessary, when root word has got better weighted ngram value. + (Checking substrings is a good approximation for this refinement.) + - lesser ngram suggestions (default 3 maximum instead of 10) + Rationale: For users need a big extra effort to check a lot of bad ngram + suggestions, nine times out of ten unnecessarily. It is very + distracting, because ngram suggestions could be very different. + Usually Myspell and Hunspell suggest one or two suggestions with + the old suggestion algorithms (maximum is 15), with ngram algorithm + often gives maximum number suggestions. With strict affix congruency + and other refinements, the good suggestion there is usually among the + first three elements. + - new affix parameter: MAXNGRAMSUG + + * src/hunspell/*: support agglutinative languages with rich prefix + morphology or with right-to-left writing system (for example, Turkic + and Austronesian languages with (modified) Arabic scripts). + - new affix parameter: COMPLEXPREFIXES + Set twofold prefix stripping (but single suffix stripping) + * src/hunspell/affixmgr.cxx: + - speed up prefix loading with tree sorting algorithm. + * tests/complexprefixes.*, tests/complexprefixesutf.*: + Coptic example posted by Moheb Mekhaiel + + * src/hunspell/hashmgr.cxx: check size attribute in dic file + suggested by Daniel Naber + Rationale: With missing size attribute Hunspell allocates too small and + more slower hash memory, and Hunspell can lose first dictionary word. + + * src/hunspell/affixmgr.cxx: check stripping characters and condition + compatibility in affix rules (bugs detected in cs_CZ, es_ES, es_NEW, + es_MX, lt_LT, nn_NO, pt_PT, ro_RO and sk_SK dictionaries). See release + notes of Hunspell 1.0.9 in NEWS. + + * src/hunspell/affixmgr.cxx: check unnecessary fields in affix rules + (bugs detected in ro_RO and sv_SE dictionaries). See release notes. + + * src/hunspell/affixmgr.cxx: remove redundant condition checking + in affix rules with stripping characters (redundancy in OpenOffice.org + dictionaries reported by Eleonóra Goldman) + Rationale: this is a little optimization, but it was excellent for + detect the bad ngram affixation with bad or weak affix conditions. + + * tests/germancompounding.aff: improve compound definition + - use dash prefix instead of language specific tokenizer + Rationale: Using uniform approach is the right way to check and analyze + compound words. Language specific word breaking is deprecated, need + a sophisticated grammar checking for word-like word pairs + (for example in Hungarian there is a substandard, but accepted + syntax with dash for word pairs: cats, dogs -> kutyák-macskák (like + cats/dogs in English). + + * test Hunspell with 54 OpenOffice.org dictionaries: see release notes + +bug fixes: + + * src/hunspell/suggestmgr.*: add time limit to exponential + algorithm of the related character map suggestion + Rationale: a long word in agglutinative languages or a special pattern + (for example a horizontal rule) made of map characters can `crash' the + spell checker. + + * src/hunspell/affentry.cxx: add() functions: fix bad word generation + checking stripping characters (see similar bug in unmunch) + + * src/hunspell/affixmgr.cxx: parse_file(): fix unconditional getNext() + call for ~AffixMgr() when affix file is corrupt. + + * src/hunspell/affixmgr.*: AffixMgr(), parse_cpdsyllable(): fix missing + string duplications for ~AffixMgr() when affix file is corrupt. + + * src/hunspell/affixmgr.*: parse_affix(): fix fprintf() call when affix + file is corrupt. Bug reported by Daniel Naber. + + * suggestmgr.cxx: replace single usage of 'strdup' with 'mystrdup' + patch by Chris Halls (debian.org) + + * src/hunspell/makefile.mk: add makefile.mk for compiling in OpenOffice.org + See README in Hunspell UNO modul. + Problems with separated compiling reported by Rene Engelhard + + * src/hunspell/hunspell.cxx: fix pseudoroot support + - search a not pseudoroot homonym in check() + * tests/pseudoroot4.*: test this fix + + * src/tools/unmunch.c: fix bad word generation when conditions + are shorter or incompatible with stripping characters in affix rules + + * src/tools/unmunch.c: fix mychomp() for de_AT.dic and other dic files + without last new line character. + +other changes: + * src/hunspell/suggestmgr.*: erase ACCENT suggestion + Rationale: ACCENT suggestion was the same as Kevin Hendrick's map + suggestion algorithm, but with a less good interface in affix file. + + * src/hunspell/suggestmgr.*: combine cycle number limit + in badchar(), and forgotchar() with a time limit. + + * src/hunspell/affixmgr.*: remove NOMAPSUGS affix parameter + + * src/hunspell/{suggestmgr,hunspell}.*: strip periods from + suggestions (restore MySpell's original behaviour) + Rationale: OpenOffice.org has an automatic period handling mechanism + and suggestions look better without periods. + - new affix file parameter: SUGSWITHDOTS + Add period(s) to suggestions, if input word terminates in period(s). + (No need for OpenOffice.org dictionaries.) + + * tests/germancompounding.aff: improve bad german affix in affix example + (computeren->computern). Suggested by Daniel Naber. + + * src/tools/example.cxx: add Myspell's example + + * src/tools/munch.cxx: add Myspell's munch + + * man{,/hu}/hunspell.4: refresh manual pages + +2005-08-01 Németh László <nemethl@gyorsposta.hu>: + * add missing MySpell files and features: + - add MySpell license.readme, README and CONTRIBUTORS ({license,README,AUTHORS}.myspell) + - add MySpell unmunch program (src/tools/unmunch.c) + - add licenses to source (src/hunspell/license.{myspell,hunspell}) + - port MAP suggestion (with imperfect UTF-8 support) + - add NOSPLITSUGS affix parameter + - add NOMAPSUGS affix parameter + + * src/man/man.4: MAP, COMPOUNDPERMITFLAG, NOSPLITSUGS, NOMAPSUGS + + * src/hunspell/aff{entry,ixmgr}.cxx: + - improve compound word support + - new affix parameter: COMPOUNDPERMITFLAG (see manual) + * src/tests/compoundaffix{,2}.*: examples for COMPOUNDPERMITFLAG + * src/tests/germancompounding.*: new solution for German compounding + Problems with German compounding reported by Daniel Naber + + * src/hunspell/hunspell.cxx: fix German uppercase word spelling + with the spellsharps() recursive algorithm. + Default recursive depth is 5 (MAXSHARPS). + * src/tests/germansharps*: extended German sharp s tests + + * src/tools/hunspell.cxx: fix fatal memory bug in non-interactive + subshells without HOME environmental variable + Bug detected with PHP by András Izsók. + +2005-07-22 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/csutil.hxx: utf16_u8() + - fix 3-byte UTF-8 character conversion + +2005-07-21 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/csutil.hxx: hunspell_version() for OOo UNO modul + +2005-07-19 Németh László <nemethl@gyorsposta.hu>: + * renaming: + - src/morphbase -> src/hunspell + - src/hunspell, src/hunmorph -> src/tools + - src/huntokens -> src/parsers + + * src/tools/hunstem.cxx: add stemmer example + +2005-07-18 Németh László <nemethl@gyorsposta.hu>: + * configure.ac: --with-ui, --with-readline configure options + * src/hunspell/hunspell.cxx: fix conditional compiling + + * src/hunspell/hunspell.cxx: set HunSPELL.bak temporaly file + in the same dictionary with the checked file. + + * src/morphbase/morphbase.cxx: + + - handling German sharp s (ß) + + - fix (temporaly) analyize() + + * tests: a lot of new tests + + * po/, intl/, m4/: add gettext from GNU hello + + * po/hu.po: add Hungarian translation + + * doc/, man/: rename doc to man + +2005-07-04 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase/hashmgr.cxx: set FLAG attributum instead of FLAG_NUM and FLAG_LONG + + * doc/hunspell.4: manual in English + +2005-06-30 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase/csutil.cxx: add character tables from csutil.cxx of OOo 1.1.4 + + * src/morphbase/affentry.cxx: fix Unicode condition checking + + * tests/{,utf}compound.*: tests compounding + +2005-06-27 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase/*: fix Unicode compound handling + +2005-06-23 Halácsy Péter: + * src/hunmorph/hunmorph.cxx: delete spelling error message and suggest_auto() call + +2005-06-21 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase: Unicode support + * tests/utf8.*: SET UTF-8 test + + * src/morphbase: checking and fixing with Valgrind + Memory handling error reported by Ferenc Szidarovszky + +2005-05-26 Németh László <nemethl@gyorsposta.hu>: + * suggestmgr.cxx: fix stemming + * AUTHORS, COPYING, ChangeLog: set CC-LGPL free software license + +2004-05-25 Varga Dániel <daniel@all.hu> + * src/stemtool: new subproject + +2005-05-25 Halácsy Péter <peter@halacsy.com> + * AUTHORS, COPYING: set CC Attribution license + +2004-05-23 Varga Dániel <daniel@all.hu> + * src: - modifications for compiling with Visual C++ + + * src/hunmorph/csutil.cxx: correcting header of flag_qsort(), + * src/hunmorph/*: correct csutil include + +2005-05-19 Németh László <nemethl@gyorsposta.hu> + * csutil.cxx: fix loop condition in lineuniq() + bug reported by Viktor Nagy (nagyv nyelvtud hu). + + * morphbase.cxx: handle PSEUDOROOT with zero affixes + bug reported by Viktor Nagy (nagyv nyelvtud hu). + * tests/zeroaffix.*: add zeroaffix tests + +2005-04-09 Németh László <nemethl@gyorsposta.hu> + * config.h.in: reset with autoheader + + * src/hunspell/hunspell.cxx: set version + +2005-04-06 Németh László <nemethl@gyorsposta.hu> + * tests: tests + + * src/morphbase: + New optional parameters in affix file: + - PSEUDOROOT: for forbidding root with not forbidden suffixed forms. + - COMPOUNDWORDMAX: max. words in compounds (default is no limit) + - COMPOUNDROOT: signs compounds in dictionary for handling special compound rules + - remove COMPOUNDWORD, ONLYROOT + +2005-03-21 Németh László <nemethl@gyorsposta.hu> + * src/morphbase/*: + - 2-byte flags, FLAG_NUM, FLAG_LONG + - CIRCUMFIX: signed suffixes and prefixes can only occur together + - ONLYINCOMPOUND for fogemorpheme (Swedish, Danish) or Flute-elements (German) + - COMPOUNDBEGIN: allow signed roots, and roots with signed suffix in begin of compounds + - COMPOUNDMIDDLE: like before, but middle of compounds + - COMPOUNDEND: like before, but end of compounds + - remove COMPOUNDFIRST, COMPOUNDLAST |