diff options
author | Kirill Volinsky <mataes2007@gmail.com> | 2017-12-03 15:00:42 +0300 |
---|---|---|
committer | Kirill Volinsky <mataes2007@gmail.com> | 2017-12-03 15:01:25 +0300 |
commit | 97e2d186da4024c7ac62f7549f3243bd15204118 (patch) | |
tree | a0fdb451333c952b3eb773094380d88d3464ac30 /libs | |
parent | d1f75ef5d26e7071fd1f6071e6c9a306fd19c33d (diff) |
Hunspell: lib updated to 1.6.2
Diffstat (limited to 'libs')
54 files changed, 10593 insertions, 4743 deletions
diff --git a/libs/hunspell/docs/ABOUT-NLS b/libs/hunspell/docs/ABOUT-NLS new file mode 100644 index 0000000000..15514263f9 --- /dev/null +++ b/libs/hunspell/docs/ABOUT-NLS @@ -0,0 +1,1379 @@ +1 Notes on the Free Translation Project +*************************************** + +Free software is going international! The Free Translation Project is a +way to get maintainers of free software, translators, and users all +together, so that free software will gradually become able to speak many +languages. A few packages already provide translations for their +messages. + + If you found this 'ABOUT-NLS' file inside a distribution, you may +assume that the distributed package does use GNU 'gettext' internally, +itself available at your nearest GNU archive site. But you do _not_ +need to install GNU 'gettext' prior to configuring, installing or using +this package with messages translated. + + Installers will find here some useful hints. These notes also +explain how users should proceed for getting the programs to use the +available translations. They tell how people wanting to contribute and +work on translations can contact the appropriate team. + +1.1 INSTALL Matters +=================== + +Some packages are "localizable" when properly installed; the programs +they contain can be made to speak your own native language. Most such +packages use GNU 'gettext'. Other packages have their own ways to +internationalization, predating GNU 'gettext'. + + By default, this package will be installed to allow translation of +messages. It will automatically detect whether the system already +provides the GNU 'gettext' functions. Installers may use special +options at configuration time for changing the default behaviour. The +command: + + ./configure --disable-nls + +will _totally_ disable translation of messages. + + When you already have GNU 'gettext' installed on your system and run +configure without an option for your new package, 'configure' will +probably detect the previously built and installed 'libintl' library and +will decide to use it. If not, you may have to to use the +'--with-libintl-prefix' option to tell 'configure' where to look for it. + + Internationalized packages usually have many 'po/LL.po' files, where +LL gives an ISO 639 two-letter code identifying the language. Unless +translations have been forbidden at 'configure' time by using the +'--disable-nls' switch, all available translations are installed +together with the package. However, the environment variable 'LINGUAS' +may be set, prior to configuration, to limit the installed set. +'LINGUAS' should then contain a space separated list of two-letter +codes, stating which languages are allowed. + +1.2 Using This Package +====================== + +As a user, if your language has been installed for this package, you +only have to set the 'LANG' environment variable to the appropriate +'LL_CC' combination. If you happen to have the 'LC_ALL' or some other +'LC_xxx' environment variables set, you should unset them before setting +'LANG', otherwise the setting of 'LANG' will not have the desired +effect. Here 'LL' is an ISO 639 two-letter language code, and 'CC' is +an ISO 3166 two-letter country code. For example, let's suppose that +you speak German and live in Germany. At the shell prompt, merely +execute 'setenv LANG de_DE' (in 'csh'), 'export LANG; LANG=de_DE' (in +'sh') or 'export LANG=de_DE' (in 'bash'). This can be done from your +'.login' or '.profile' file, once and for all. + + You might think that the country code specification is redundant. +But in fact, some languages have dialects in different countries. For +example, 'de_AT' is used for Austria, and 'pt_BR' for Brazil. The +country code serves to distinguish the dialects. + + The locale naming convention of 'LL_CC', with 'LL' denoting the +language and 'CC' denoting the country, is the one use on systems based +on GNU libc. On other systems, some variations of this scheme are used, +such as 'LL' or 'LL_CC.ENCODING'. You can get the list of locales +supported by your system for your language by running the command +'locale -a | grep '^LL''. + + Not all programs have translations for all languages. By default, an +English message is shown in place of a nonexistent translation. If you +understand other languages, you can set up a priority list of languages. +This is done through a different environment variable, called +'LANGUAGE'. GNU 'gettext' gives preference to 'LANGUAGE' over 'LANG' +for the purpose of message handling, but you still need to have 'LANG' +set to the primary language; this is required by other parts of the +system libraries. For example, some Swedish users who would rather read +translations in German than English for when Swedish is not available, +set 'LANGUAGE' to 'sv:de' while leaving 'LANG' to 'sv_SE'. + + Special advice for Norwegian users: The language code for Norwegian +bokma*l changed from 'no' to 'nb' recently (in 2003). During the +transition period, while some message catalogs for this language are +installed under 'nb' and some older ones under 'no', it's recommended +for Norwegian users to set 'LANGUAGE' to 'nb:no' so that both newer and +older translations are used. + + In the 'LANGUAGE' environment variable, but not in the 'LANG' +environment variable, 'LL_CC' combinations can be abbreviated as 'LL' to +denote the language's main dialect. For example, 'de' is equivalent to +'de_DE' (German as spoken in Germany), and 'pt' to 'pt_PT' (Portuguese +as spoken in Portugal) in this context. + +1.3 Translating Teams +===================== + +For the Free Translation Project to be a success, we need interested +people who like their own language and write it well, and who are also +able to synergize with other translators speaking the same language. +Each translation team has its own mailing list. The up-to-date list of +teams can be found at the Free Translation Project's homepage, +'http://translationproject.org/', in the "Teams" area. + + If you'd like to volunteer to _work_ at translating messages, you +should become a member of the translating team for your own language. +The subscribing address is _not_ the same as the list itself, it has +'-request' appended. For example, speakers of Swedish can send a +message to 'sv-request@li.org', having this message body: + + subscribe + + Keep in mind that team members are expected to participate _actively_ +in translations, or at solving translational difficulties, rather than +merely lurking around. If your team does not exist yet and you want to +start one, or if you are unsure about what to do or how to get started, +please write to 'coordinator@translationproject.org' to reach the +coordinator for all translator teams. + + The English team is special. It works at improving and uniformizing +the terminology in use. Proven linguistic skills are praised more than +programming skills, here. + +1.4 Available Packages +====================== + +Languages are not equally supported in all packages. The following +matrix shows the current state of internationalization, as of Jun 2014. +The matrix shows, in regard of each package, for which languages PO +files have been submitted to translation coordination, with a +translation percentage of at least 50%. + + Ready PO files af am an ar as ast az be bg bn bn_IN bs ca crh cs + +---------------------------------------------------+ + a2ps | [] [] [] | + aegis | | + anubis | | + aspell | [] [] [] | + bash | [] [] [] | + bfd | | + binutils | [] | + bison | | + bison-runtime | [] | + buzztrax | [] | + ccd2cue | | + ccide | | + cflow | | + clisp | | + coreutils | [] [] | + cpio | | + cppi | | + cpplib | [] | + cryptsetup | [] | + datamash | | + denemo | [] [] | + dfarc | [] | + dialog | [] [] [] | + dico | | + diffutils | [] | + dink | [] | + direvent | | + doodle | [] | + dos2unix | | + dos2unix-man | | + e2fsprogs | [] [] | + enscript | [] | + exif | [] | + fetchmail | [] [] | + findutils | [] | + flex | [] | + freedink | [] [] | + fusionforge | | + gas | | + gawk | [] | + gcal | [] | + gcc | | + gdbm | | + gettext-examples | [] [] [] [] [] | + gettext-runtime | [] [] [] | + gettext-tools | [] [] | + gjay | | + glunarclock | [] [] [] | + gnubiff | [] | + gnubik | [] | + gnucash | () () [] | + gnuchess | | + gnulib | [] | + gnunet | | + gnunet-gtk | | + gold | | + gphoto2 | [] | + gprof | [] | + gramadoir | | + grep | [] [] [] | + grub | [] | + gsasl | | + gss | | + gst-plugins-bad | [] | + gst-plugins-base | [] [] [] | + gst-plugins-good | [] [] [] | + gst-plugins-ugly | [] [] [] | + gstreamer | [] [] [] [] | + gtick | [] | + gtkam | [] [] | + gtkspell | [] [] [] [] [] | + guix | | + guix-packages | | + gutenprint | [] | + hello | [] | + help2man | | + help2man-texi | | + hylafax | | + idutils | | + iso_15924 | [] | + iso_3166 | [] [] [] [] [] [] [] [] [] [] | + iso_3166_2 | | + iso_4217 | [] | + iso_639 | [] [] [] [] [] [] [] [] [] | + iso_639_3 | [] [] | + iso_639_5 | | + jwhois | | + kbd | [] | + klavaro | [] [] [] [] [] | + latrine | | + ld | [] | + leafpad | [] [] [] [] | + libc | [] [] [] | + libexif | () | + libextractor | | + libgnutls | [] | + libgphoto2 | [] | + libgphoto2_port | [] | + libgsasl | | + libiconv | [] [] | + libidn | [] | + liferea | [] [] [] [] | + lilypond | [] [] | + lordsawar | [] | + lprng | | + lynx | [] [] | + m4 | [] | + mailfromd | | + mailutils | | + make | [] | + man-db | [] [] | + man-db-manpages | | + midi-instruments | [] [] [] | + minicom | [] | + mkisofs | [] | + myserver | [] | + nano | [] [] [] | + opcodes | | + parted | [] | + pies | | + popt | [] | + procps-ng | | + procps-ng-man | | + psmisc | [] | + pspp | [] | + pushover | [] | + pwdutils | | + pyspread | | + radius | [] | + recode | [] [] [] | + recutils | | + rpm | | + rush | | + sarg | | + sed | [] [] [] | + sharutils | [] | + shishi | | + skribilo | | + solfege | [] | + solfege-manual | | + spotmachine | | + sudo | [] [] | + sudoers | [] [] | + sysstat | [] | + tar | [] [] [] | + texinfo | [] [] | + texinfo_document | [] | + tigervnc | [] | + tin | | + tin-man | | + tracgoogleappsa... | | + trader | | + util-linux | [] | + ve | | + vice | | + vmm | | + vorbis-tools | [] | + wastesedge | | + wcd | | + wcd-man | | + wdiff | [] [] | + wget | [] | + wyslij-po | | + xboard | | + xdg-user-dirs | [] [] [] [] [] [] [] [] [] [] | + xkeyboard-config | [] [] [] | + +---------------------------------------------------+ + af am an ar as ast az be bg bn bn_IN bs ca crh cs + 4 0 2 5 3 11 0 8 23 3 3 1 54 4 73 + + da de el en en_GB en_ZA eo es et eu fa fi fr + +--------------------------------------------------+ + a2ps | [] [] [] [] [] [] [] [] [] | + aegis | [] [] [] [] | + anubis | [] [] [] [] [] | + aspell | [] [] [] [] [] [] [] | + bash | [] [] [] | + bfd | [] [] [] [] | + binutils | [] [] [] | + bison | [] [] [] [] [] [] [] [] | + bison-runtime | [] [] [] [] [] [] [] [] | + buzztrax | [] [] [] [] | + ccd2cue | [] [] [] | + ccide | [] [] [] [] [] [] | + cflow | [] [] [] [] [] | + clisp | [] [] [] [] [] | + coreutils | [] [] [] [] [] | + cpio | [] [] [] [] [] | + cppi | [] [] [] [] [] | + cpplib | [] [] [] [] [] [] | + cryptsetup | [] [] [] [] [] | + datamash | [] [] [] [] | + denemo | [] | + dfarc | [] [] [] [] [] [] | + dialog | [] [] [] [] [] [] [] [] [] | + dico | [] [] [] [] | + diffutils | [] [] [] [] [] [] | + dink | [] [] [] [] [] [] | + direvent | [] [] [] [] | + doodle | [] [] [] [] | + dos2unix | [] [] [] [] [] | + dos2unix-man | [] [] [] | + e2fsprogs | [] [] [] [] [] | + enscript | [] [] [] [] [] [] | + exif | [] [] [] [] [] [] | + fetchmail | [] () [] [] [] [] [] | + findutils | [] [] [] [] [] [] [] [] | + flex | [] [] [] [] [] [] | + freedink | [] [] [] [] [] [] [] [] | + fusionforge | [] [] [] | + gas | [] [] [] | + gawk | [] [] [] [] [] | + gcal | [] [] [] [] | + gcc | [] [] | + gdbm | [] [] [] [] [] | + gettext-examples | [] [] [] [] [] [] [] | + gettext-runtime | [] [] [] [] [] [] | + gettext-tools | [] [] [] [] [] | + gjay | [] [] [] [] | + glunarclock | [] [] [] [] [] | + gnubiff | () [] [] () | + gnubik | [] [] [] [] [] | + gnucash | [] () () () () () () | + gnuchess | [] [] [] [] | + gnulib | [] [] [] [] [] [] [] | + gnunet | [] | + gnunet-gtk | [] | + gold | [] [] [] | + gphoto2 | [] () [] [] | + gprof | [] [] [] [] [] [] | + gramadoir | [] [] [] [] [] | + grep | [] [] [] [] [] [] [] | + grub | [] [] [] [] [] | + gsasl | [] [] [] [] [] | + gss | [] [] [] [] [] | + gst-plugins-bad | [] [] | + gst-plugins-base | [] [] [] [] [] [] | + gst-plugins-good | [] [] [] [] [] [] [] | + gst-plugins-ugly | [] [] [] [] [] [] [] [] | + gstreamer | [] [] [] [] [] [] [] | + gtick | [] () [] [] [] | + gtkam | [] () [] [] [] [] | + gtkspell | [] [] [] [] [] [] [] [] | + guix | [] [] | + guix-packages | | + gutenprint | [] [] [] [] | + hello | [] [] [] [] [] [] [] [] | + help2man | [] [] [] [] [] [] [] | + help2man-texi | [] [] [] | + hylafax | [] [] | + idutils | [] [] [] [] [] | + iso_15924 | [] () [] [] () [] () | + iso_3166 | [] () [] [] [] [] () [] () | + iso_3166_2 | [] () () () | + iso_4217 | [] () [] [] [] () [] () | + iso_639 | [] () [] [] () [] () | + iso_639_3 | () () () | + iso_639_5 | () () () | + jwhois | [] [] [] [] [] | + kbd | [] [] [] [] [] [] | + klavaro | [] [] [] [] [] [] [] | + latrine | [] () [] [] | + ld | [] [] [] [] | + leafpad | [] [] [] [] [] [] [] [] | + libc | [] [] [] [] [] | + libexif | [] [] () [] [] | + libextractor | [] | + libgnutls | [] [] [] [] | + libgphoto2 | [] () [] | + libgphoto2_port | [] () [] [] [] [] | + libgsasl | [] [] [] [] [] | + libiconv | [] [] [] [] [] [] [] | + libidn | [] [] [] [] [] | + liferea | [] () [] [] [] [] [] | + lilypond | [] [] [] [] [] [] | + lordsawar | [] [] | + lprng | | + lynx | [] [] [] [] [] [] | + m4 | [] [] [] [] [] [] | + mailfromd | [] | + mailutils | [] [] [] [] | + make | [] [] [] [] [] | + man-db | [] [] [] [] | + man-db-manpages | [] [] | + midi-instruments | [] [] [] [] [] [] [] [] [] | + minicom | [] [] [] [] [] | + mkisofs | [] [] [] | + myserver | [] [] [] [] | + nano | [] [] [] [] [] [] [] | + opcodes | [] [] [] [] [] | + parted | [] [] [] | + pies | [] | + popt | [] [] [] [] [] [] | + procps-ng | [] [] | + procps-ng-man | [] [] | + psmisc | [] [] [] [] [] [] [] | + pspp | [] [] [] | + pushover | () [] [] [] | + pwdutils | [] [] [] | + pyspread | [] [] [] | + radius | [] [] | + recode | [] [] [] [] [] [] [] | + recutils | [] [] [] [] | + rpm | [] [] [] [] [] | + rush | [] [] [] | + sarg | [] [] | + sed | [] [] [] [] [] [] [] [] | + sharutils | [] [] [] [] | + shishi | [] [] [] | + skribilo | [] [] | + solfege | [] [] [] [] [] [] [] [] | + solfege-manual | [] [] [] [] [] | + spotmachine | [] [] [] [] | + sudo | [] [] [] [] [] [] | + sudoers | [] [] [] [] [] [] | + sysstat | [] [] [] [] [] [] | + tar | [] [] [] [] [] [] [] | + texinfo | [] [] [] [] [] | + texinfo_document | [] [] [] [] | + tigervnc | [] [] [] [] [] [] | + tin | [] [] [] [] | + tin-man | [] | + tracgoogleappsa... | [] [] [] [] [] | + trader | [] [] [] [] [] [] | + util-linux | [] [] [] [] | + ve | [] [] [] [] [] | + vice | () () () | + vmm | [] [] | + vorbis-tools | [] [] [] [] | + wastesedge | [] () | + wcd | [] [] [] [] | + wcd-man | [] | + wdiff | [] [] [] [] [] [] [] | + wget | [] [] [] [] [] [] | + wyslij-po | [] [] [] [] | + xboard | [] [] [] [] | + xdg-user-dirs | [] [] [] [] [] [] [] [] [] [] | + xkeyboard-config | [] [] [] [] [] [] [] | + +--------------------------------------------------+ + da de el en en_GB en_ZA eo es et eu fa fi fr + 120 130 32 1 6 0 94 95 22 13 4 103 136 + + ga gd gl gu he hi hr hu hy ia id is it ja ka kk + +-------------------------------------------------+ + a2ps | [] [] [] [] | + aegis | [] | + anubis | [] [] [] [] | + aspell | [] [] [] [] [] | + bash | [] [] [] | + bfd | [] [] | + binutils | [] [] [] | + bison | [] | + bison-runtime | [] [] [] [] [] [] [] [] | + buzztrax | | + ccd2cue | [] | + ccide | [] [] | + cflow | [] [] [] | + clisp | | + coreutils | [] [] [] | + cpio | [] [] [] [] [] [] | + cppi | [] [] [] [] [] | + cpplib | [] [] | + cryptsetup | [] | + datamash | | + denemo | [] | + dfarc | [] [] [] | + dialog | [] [] [] [] [] [] [] [] [] [] | + dico | | + diffutils | [] [] [] [] | + dink | [] | + direvent | [] | + doodle | [] [] | + dos2unix | [] [] | + dos2unix-man | | + e2fsprogs | [] | + enscript | [] [] [] | + exif | [] [] [] [] [] [] | + fetchmail | [] [] [] | + findutils | [] [] [] [] [] [] [] | + flex | [] | + freedink | [] [] [] [] | + fusionforge | | + gas | [] | + gawk | [] () [] | + gcal | | + gcc | | + gdbm | | + gettext-examples | [] [] [] [] [] [] [] | + gettext-runtime | [] [] [] [] [] [] [] | + gettext-tools | [] [] [] | + gjay | [] | + glunarclock | [] [] [] [] [] [] | + gnubiff | [] [] () | + gnubik | [] [] [] | + gnucash | () () () () () [] | + gnuchess | | + gnulib | [] [] [] [] [] | + gnunet | | + gnunet-gtk | | + gold | [] [] | + gphoto2 | [] [] [] [] | + gprof | [] [] [] [] | + gramadoir | [] [] [] | + grep | [] [] [] [] [] [] [] | + grub | [] [] [] | + gsasl | [] [] [] [] [] | + gss | [] [] [] [] [] | + gst-plugins-bad | [] | + gst-plugins-base | [] [] [] [] | + gst-plugins-good | [] [] [] [] [] [] | + gst-plugins-ugly | [] [] [] [] [] [] | + gstreamer | [] [] [] [] [] | + gtick | [] [] [] [] [] | + gtkam | [] [] [] [] [] | + gtkspell | [] [] [] [] [] [] [] [] [] [] | + guix | | + guix-packages | | + gutenprint | [] [] [] | + hello | [] [] [] [] [] | + help2man | [] [] [] | + help2man-texi | | + hylafax | [] | + idutils | [] [] | + iso_15924 | [] [] [] [] [] [] | + iso_3166 | [] [] [] [] [] [] [] [] [] [] [] [] [] | + iso_3166_2 | [] [] | + iso_4217 | [] [] [] [] [] [] | + iso_639 | [] [] [] [] [] [] [] [] [] | + iso_639_3 | [] [] | + iso_639_5 | | + jwhois | [] [] [] [] | + kbd | [] [] [] | + klavaro | [] [] [] [] [] | + latrine | [] | + ld | [] [] [] [] | + leafpad | [] [] [] [] [] [] [] () | + libc | [] [] [] [] [] | + libexif | [] | + libextractor | | + libgnutls | [] | + libgphoto2 | [] [] | + libgphoto2_port | [] [] | + libgsasl | [] [] [] [] | + libiconv | [] [] [] [] [] [] [] | + libidn | [] [] [] [] | + liferea | [] [] [] [] [] | + lilypond | [] | + lordsawar | | + lprng | [] | + lynx | [] [] [] [] | + m4 | [] [] [] [] [] | + mailfromd | | + mailutils | | + make | [] [] [] [] | + man-db | [] [] | + man-db-manpages | [] [] | + midi-instruments | [] [] [] [] [] [] [] [] [] | + minicom | [] [] [] | + mkisofs | [] [] | + myserver | [] | + nano | [] [] [] [] [] | + opcodes | [] [] [] | + parted | [] [] [] [] | + pies | | + popt | [] [] [] [] [] [] [] [] [] [] | + procps-ng | | + procps-ng-man | | + psmisc | [] [] [] [] | + pspp | [] [] | + pushover | [] | + pwdutils | [] | + pyspread | | + radius | [] | + recode | [] [] [] [] [] [] [] | + recutils | | + rpm | [] | + rush | [] | + sarg | | + sed | [] [] [] [] [] [] [] | + sharutils | | + shishi | | + skribilo | [] | + solfege | [] [] | + solfege-manual | | + spotmachine | | + sudo | [] [] [] [] | + sudoers | [] [] [] | + sysstat | [] [] [] | + tar | [] [] [] [] [] [] | + texinfo | [] [] [] | + texinfo_document | [] [] | + tigervnc | | + tin | | + tin-man | | + tracgoogleappsa... | [] [] [] [] | + trader | [] [] | + util-linux | [] | + ve | [] | + vice | () () | + vmm | | + vorbis-tools | [] [] | + wastesedge | () | + wcd | | + wcd-man | | + wdiff | [] [] [] | + wget | [] [] [] | + wyslij-po | [] [] [] | + xboard | | + xdg-user-dirs | [] [] [] [] [] [] [] [] [] [] [] [] [] [] | + xkeyboard-config | [] [] [] [] [] | + +-------------------------------------------------+ + ga gd gl gu he hi hr hu hy ia id is it ja ka kk + 35 2 47 4 8 2 53 69 2 6 80 11 86 58 0 3 + + kn ko ku ky lg lt lv mk ml mn mr ms mt nb ne nl + +--------------------------------------------------+ + a2ps | [] [] | + aegis | [] | + anubis | [] [] [] | + aspell | [] [] | + bash | [] [] | + bfd | | + binutils | | + bison | [] | + bison-runtime | [] [] [] [] [] [] | + buzztrax | | + ccd2cue | | + ccide | [] [] | + cflow | [] | + clisp | [] | + coreutils | [] [] | + cpio | [] | + cppi | | + cpplib | [] | + cryptsetup | [] | + datamash | [] [] | + denemo | | + dfarc | [] [] | + dialog | [] [] [] [] [] [] | + dico | | + diffutils | [] [] [] | + dink | [] | + direvent | [] | + doodle | [] | + dos2unix | [] [] | + dos2unix-man | [] | + e2fsprogs | [] | + enscript | [] | + exif | [] [] | + fetchmail | [] | + findutils | [] [] | + flex | [] | + freedink | [] [] | + fusionforge | | + gas | | + gawk | [] | + gcal | | + gcc | | + gdbm | | + gettext-examples | [] [] [] [] [] [] | + gettext-runtime | [] [] | + gettext-tools | [] | + gjay | | + glunarclock | [] [] | + gnubiff | [] | + gnubik | [] [] | + gnucash | () () () () () () () [] | + gnuchess | [] [] | + gnulib | [] | + gnunet | | + gnunet-gtk | | + gold | | + gphoto2 | [] | + gprof | [] [] | + gramadoir | [] | + grep | [] [] | + grub | [] [] [] | + gsasl | [] | + gss | | + gst-plugins-bad | [] [] | + gst-plugins-base | [] [] [] | + gst-plugins-good | [] [] [] [] | + gst-plugins-ugly | [] [] [] [] [] | + gstreamer | [] [] | + gtick | [] | + gtkam | [] [] | + gtkspell | [] [] [] [] [] [] [] | + guix | | + guix-packages | | + gutenprint | [] | + hello | [] [] [] | + help2man | [] | + help2man-texi | | + hylafax | [] | + idutils | [] | + iso_15924 | () [] [] | + iso_3166 | [] [] [] () [] [] [] [] [] [] | + iso_3166_2 | () [] | + iso_4217 | () [] [] [] | + iso_639 | [] [] () [] [] [] [] | + iso_639_3 | [] () [] | + iso_639_5 | () | + jwhois | [] [] | + kbd | [] | + klavaro | [] [] | + latrine | | + ld | | + leafpad | [] [] [] [] [] | + libc | [] [] | + libexif | [] | + libextractor | [] | + libgnutls | [] [] | + libgphoto2 | [] | + libgphoto2_port | [] | + libgsasl | [] | + libiconv | [] [] | + libidn | [] | + liferea | [] [] [] | + lilypond | [] | + lordsawar | | + lprng | | + lynx | [] | + m4 | [] | + mailfromd | | + mailutils | | + make | [] [] | + man-db | [] | + man-db-manpages | [] | + midi-instruments | [] [] [] [] [] [] [] | + minicom | [] | + mkisofs | [] | + myserver | | + nano | [] [] [] | + opcodes | [] | + parted | [] | + pies | | + popt | [] [] [] [] [] | + procps-ng | | + procps-ng-man | | + psmisc | [] | + pspp | [] [] | + pushover | | + pwdutils | [] | + pyspread | | + radius | [] | + recode | [] [] | + recutils | [] | + rpm | [] | + rush | [] | + sarg | | + sed | [] [] | + sharutils | [] | + shishi | | + skribilo | | + solfege | [] [] | + solfege-manual | [] | + spotmachine | [] | + sudo | [] [] | + sudoers | [] [] | + sysstat | [] [] | + tar | [] [] [] | + texinfo | [] | + texinfo_document | [] | + tigervnc | [] | + tin | | + tin-man | | + tracgoogleappsa... | [] [] [] | + trader | [] | + util-linux | [] | + ve | [] | + vice | [] | + vmm | [] | + vorbis-tools | [] | + wastesedge | [] | + wcd | [] | + wcd-man | [] | + wdiff | [] | + wget | [] [] | + wyslij-po | [] | + xboard | [] | + xdg-user-dirs | [] [] [] [] [] [] [] [] [] [] [] | + xkeyboard-config | [] [] [] | + +--------------------------------------------------+ + kn ko ku ky lg lt lv mk ml mn mr ms mt nb ne nl + 5 11 4 6 0 13 22 3 3 3 4 11 2 40 1 124 + + nn or os pa pl ps pt pt_BR ro ru rw sk sl sq sr + +--------------------------------------------------+ + a2ps | [] [] [] [] [] [] [] | + aegis | [] [] | + anubis | [] [] [] | + aspell | [] [] [] [] [] [] [] | + bash | [] [] [] [] [] | + bfd | [] | + binutils | [] [] | + bison | [] [] [] | + bison-runtime | [] [] [] [] [] [] [] [] | + buzztrax | | + ccd2cue | [] | + ccide | [] [] [] | + cflow | [] [] | + clisp | [] | + coreutils | [] [] [] [] | + cpio | [] [] [] | + cppi | [] [] [] | + cpplib | [] [] [] | + cryptsetup | [] [] | + datamash | [] [] | + denemo | | + dfarc | [] [] [] | + dialog | [] [] [] [] [] [] [] | + dico | [] | + diffutils | [] [] | + dink | | + direvent | [] [] | + doodle | [] [] | + dos2unix | [] [] [] [] | + dos2unix-man | [] [] | + e2fsprogs | [] | + enscript | [] [] [] [] [] [] | + exif | [] [] [] [] [] [] | + fetchmail | [] [] [] | + findutils | [] [] [] [] [] | + flex | [] [] [] [] [] | + freedink | [] [] [] [] [] | + fusionforge | | + gas | | + gawk | [] | + gcal | | + gcc | | + gdbm | [] [] [] | + gettext-examples | [] [] [] [] [] [] [] [] | + gettext-runtime | [] [] [] [] [] [] [] [] [] | + gettext-tools | [] [] [] [] [] [] [] | + gjay | [] | + glunarclock | [] [] [] [] [] [] | + gnubiff | [] | + gnubik | [] [] [] [] | + gnucash | () () () () [] | + gnuchess | [] [] | + gnulib | [] [] [] [] [] | + gnunet | | + gnunet-gtk | | + gold | | + gphoto2 | [] [] [] [] [] | + gprof | [] [] [] [] | + gramadoir | [] [] | + grep | [] [] [] [] [] [] | + grub | [] [] [] [] [] | + gsasl | [] [] [] | + gss | [] [] [] [] | + gst-plugins-bad | [] [] [] [] | + gst-plugins-base | [] [] [] [] [] [] | + gst-plugins-good | [] [] [] [] [] [] [] | + gst-plugins-ugly | [] [] [] [] [] [] [] | + gstreamer | [] [] [] [] [] [] [] | + gtick | [] [] [] [] [] | + gtkam | [] [] [] [] [] [] | + gtkspell | [] [] [] [] [] [] [] [] [] | + guix | | + guix-packages | | + gutenprint | [] [] | + hello | [] [] [] [] [] [] | + help2man | [] [] [] [] | + help2man-texi | [] | + hylafax | | + idutils | [] [] [] | + iso_15924 | [] () [] [] [] [] | + iso_3166 | [] [] [] [] () [] [] [] [] [] [] [] [] | + iso_3166_2 | [] () [] | + iso_4217 | [] [] () [] [] [] [] [] | + iso_639 | [] [] [] () [] [] [] [] [] [] | + iso_639_3 | [] () | + iso_639_5 | () [] | + jwhois | [] [] [] [] | + kbd | [] [] | + klavaro | [] [] [] [] [] | + latrine | [] | + ld | | + leafpad | [] [] [] [] [] [] [] [] [] | + libc | [] [] [] | + libexif | [] () [] | + libextractor | [] | + libgnutls | [] | + libgphoto2 | [] | + libgphoto2_port | [] [] [] [] [] | + libgsasl | [] [] [] [] | + libiconv | [] [] [] [] [] | + libidn | [] [] [] | + liferea | [] [] [] [] () [] [] | + lilypond | | + lordsawar | | + lprng | [] | + lynx | [] [] | + m4 | [] [] [] [] [] | + mailfromd | [] | + mailutils | [] | + make | [] [] [] | + man-db | [] [] [] | + man-db-manpages | [] [] [] | + midi-instruments | [] [] [] [] [] [] [] [] | + minicom | [] [] [] [] | + mkisofs | [] [] [] | + myserver | [] [] | + nano | [] [] [] [] [] [] | + opcodes | | + parted | [] [] [] [] [] [] | + pies | [] | + popt | [] [] [] [] [] [] | + procps-ng | [] | + procps-ng-man | [] | + psmisc | [] [] [] [] | + pspp | [] [] | + pushover | | + pwdutils | [] | + pyspread | [] [] | + radius | [] [] | + recode | [] [] [] [] [] [] [] [] | + recutils | [] | + rpm | [] | + rush | [] [] [] | + sarg | [] [] | + sed | [] [] [] [] [] [] [] [] | + sharutils | [] [] [] | + shishi | [] [] | + skribilo | | + solfege | [] [] [] | + solfege-manual | [] [] | + spotmachine | [] [] | + sudo | [] [] [] [] [] [] | + sudoers | [] [] [] [] | + sysstat | [] [] [] [] [] | + tar | [] [] [] [] [] | + texinfo | [] [] [] | + texinfo_document | [] [] | + tigervnc | [] | + tin | [] | + tin-man | | + tracgoogleappsa... | [] [] [] [] | + trader | [] | + util-linux | [] [] | + ve | [] [] [] | + vice | | + vmm | | + vorbis-tools | [] [] [] | + wastesedge | | + wcd | | + wcd-man | | + wdiff | [] [] [] [] [] | + wget | [] [] [] [] | + wyslij-po | [] [] [] [] | + xboard | [] [] [] | + xdg-user-dirs | [] [] [] [] [] [] [] [] [] [] [] [] [] | + xkeyboard-config | [] [] [] [] | + +--------------------------------------------------+ + nn or os pa pl ps pt pt_BR ro ru rw sk sl sq sr + 7 3 1 6 114 1 12 83 32 80 3 38 45 7 94 + + sv sw ta te tg th tr uk ur vi wa wo zh_CN zh_HK + +---------------------------------------------------+ + a2ps | [] [] [] [] [] | + aegis | [] | + anubis | [] [] [] [] | + aspell | [] [] [] [] | + bash | [] [] [] [] | + bfd | [] [] | + binutils | [] [] [] | + bison | [] [] [] [] | + bison-runtime | [] [] [] [] [] [] | + buzztrax | [] [] [] | + ccd2cue | [] [] [] | + ccide | [] [] [] [] | + cflow | [] [] [] [] | + clisp | | + coreutils | [] [] [] [] | + cpio | [] [] [] [] [] | + cppi | [] [] [] [] | + cpplib | [] [] [] [] [] | + cryptsetup | [] [] [] | + datamash | [] [] [] | + denemo | | + dfarc | [] | + dialog | [] [] [] [] [] [] | + dico | [] | + diffutils | [] [] [] [] [] | + dink | | + direvent | [] [] | + doodle | [] [] | + dos2unix | [] [] [] [] | + dos2unix-man | [] [] [] | + e2fsprogs | [] [] [] [] | + enscript | [] [] [] [] | + exif | [] [] [] [] [] | + fetchmail | [] [] [] [] | + findutils | [] [] [] [] [] | + flex | [] [] [] [] | + freedink | [] [] | + fusionforge | | + gas | [] | + gawk | [] [] | + gcal | [] [] | + gcc | [] [] | + gdbm | [] [] | + gettext-examples | [] [] [] [] [] [] | + gettext-runtime | [] [] [] [] [] [] | + gettext-tools | [] [] [] [] [] | + gjay | [] [] | + glunarclock | [] [] [] [] | + gnubiff | [] [] | + gnubik | [] [] [] [] | + gnucash | () () () () [] | + gnuchess | [] [] | + gnulib | [] [] [] [] | + gnunet | | + gnunet-gtk | | + gold | [] [] | + gphoto2 | [] [] [] [] | + gprof | [] [] [] [] | + gramadoir | [] [] [] | + grep | [] [] [] [] [] | + grub | [] [] [] [] | + gsasl | [] [] [] [] | + gss | [] [] [] | + gst-plugins-bad | [] [] [] [] | + gst-plugins-base | [] [] [] [] [] | + gst-plugins-good | [] [] [] [] [] | + gst-plugins-ugly | [] [] [] [] [] | + gstreamer | [] [] [] [] [] | + gtick | [] [] [] | + gtkam | [] [] [] [] | + gtkspell | [] [] [] [] [] [] [] [] | + guix | [] | + guix-packages | | + gutenprint | [] [] [] [] | + hello | [] [] [] [] [] [] | + help2man | [] [] [] | + help2man-texi | [] | + hylafax | [] | + idutils | [] [] [] | + iso_15924 | [] () [] [] () [] | + iso_3166 | [] [] () [] [] () [] [] [] | + iso_3166_2 | () [] [] () [] | + iso_4217 | [] () [] [] () [] [] | + iso_639 | [] [] [] () [] [] () [] [] [] | + iso_639_3 | [] () [] [] () | + iso_639_5 | () [] () | + jwhois | [] [] [] [] | + kbd | [] [] [] | + klavaro | [] [] [] [] [] [] | + latrine | [] [] | + ld | [] [] [] [] [] | + leafpad | [] [] [] [] [] [] | + libc | [] [] [] [] [] | + libexif | [] () | + libextractor | [] [] | + libgnutls | [] [] [] [] | + libgphoto2 | [] [] | + libgphoto2_port | [] [] [] [] | + libgsasl | [] [] [] [] | + libiconv | [] [] [] [] [] | + libidn | () [] [] [] | + liferea | [] [] [] [] [] | + lilypond | [] | + lordsawar | | + lprng | [] | + lynx | [] [] [] [] | + m4 | [] [] [] | + mailfromd | [] [] | + mailutils | [] | + make | [] [] [] [] | + man-db | [] [] | + man-db-manpages | [] | + midi-instruments | [] [] [] [] [] [] | + minicom | [] [] | + mkisofs | [] [] [] | + myserver | [] | + nano | [] [] [] [] | + opcodes | [] [] [] | + parted | [] [] [] [] [] | + pies | [] [] | + popt | [] [] [] [] [] [] [] | + procps-ng | [] [] | + procps-ng-man | [] | + psmisc | [] [] [] [] | + pspp | [] [] [] | + pushover | [] | + pwdutils | [] [] | + pyspread | [] | + radius | [] [] | + recode | [] [] [] [] | + recutils | [] [] [] | + rpm | [] [] [] [] | + rush | [] [] | + sarg | | + sed | [] [] [] [] [] | + sharutils | [] [] [] | + shishi | [] [] | + skribilo | [] | + solfege | [] [] [] | + solfege-manual | [] | + spotmachine | [] [] [] | + sudo | [] [] [] [] | + sudoers | [] [] [] | + sysstat | [] [] [] [] [] | + tar | [] [] [] [] [] | + texinfo | [] [] [] | + texinfo_document | [] | + tigervnc | [] [] | + tin | [] | + tin-man | | + tracgoogleappsa... | [] [] [] [] [] | + trader | [] | + util-linux | [] [] [] | + ve | [] [] [] [] | + vice | () () | + vmm | | + vorbis-tools | [] [] | + wastesedge | | + wcd | [] [] [] | + wcd-man | [] | + wdiff | [] [] [] [] | + wget | [] [] [] | + wyslij-po | [] [] | + xboard | [] | + xdg-user-dirs | [] [] [] [] [] [] [] [] [] | + xkeyboard-config | [] [] [] [] | + +---------------------------------------------------+ + sv sw ta te tg th tr uk ur vi wa wo zh_CN zh_HK + 91 1 4 3 0 13 50 113 1 126 7 1 95 7 + + zh_TW + +-------+ + a2ps | | 30 + aegis | | 9 + anubis | | 19 + aspell | | 28 + bash | [] | 21 + bfd | | 9 + binutils | | 12 + bison | [] | 18 + bison-runtime | [] | 38 + buzztrax | | 8 + ccd2cue | | 8 + ccide | | 17 + cflow | | 15 + clisp | | 10 + coreutils | | 20 + cpio | | 20 + cppi | | 17 + cpplib | [] | 19 + cryptsetup | | 13 + datamash | | 11 + denemo | | 4 + dfarc | | 16 + dialog | [] | 42 + dico | | 6 + diffutils | | 21 + dink | | 9 + direvent | | 10 + doodle | | 12 + dos2unix | [] | 18 + dos2unix-man | | 9 + e2fsprogs | | 14 + enscript | | 21 + exif | | 26 + fetchmail | | 19 + findutils | | 28 + flex | [] | 19 + freedink | | 23 + fusionforge | | 3 + gas | | 5 + gawk | | 12 + gcal | | 7 + gcc | | 4 + gdbm | | 10 + gettext-examples | [] | 40 + gettext-runtime | [] | 34 + gettext-tools | [] | 24 + gjay | | 8 + glunarclock | [] | 27 + gnubiff | | 9 + gnubik | | 19 + gnucash | () | 7 + gnuchess | | 10 + gnulib | | 23 + gnunet | | 1 + gnunet-gtk | | 1 + gold | | 7 + gphoto2 | [] | 19 + gprof | | 21 + gramadoir | | 14 + grep | [] | 31 + grub | | 21 + gsasl | [] | 19 + gss | | 17 + gst-plugins-bad | | 14 + gst-plugins-base | | 27 + gst-plugins-good | | 32 + gst-plugins-ugly | | 34 + gstreamer | [] | 31 + gtick | | 19 + gtkam | | 24 + gtkspell | [] | 48 + guix | | 3 + guix-packages | | 0 + gutenprint | | 15 + hello | [] | 30 + help2man | | 18 + help2man-texi | | 5 + hylafax | | 5 + idutils | | 14 + iso_15924 | [] | 23 + iso_3166 | [] | 58 + iso_3166_2 | | 9 + iso_4217 | [] | 28 + iso_639 | [] | 46 + iso_639_3 | | 10 + iso_639_5 | | 2 + jwhois | [] | 20 + kbd | | 16 + klavaro | | 30 + latrine | | 7 + ld | [] | 15 + leafpad | [] | 40 + libc | [] | 24 + libexif | | 9 + libextractor | | 5 + libgnutls | | 13 + libgphoto2 | | 9 + libgphoto2_port | [] | 19 + libgsasl | | 18 + libiconv | [] | 29 + libidn | | 17 + liferea | | 29 + lilypond | | 11 + lordsawar | | 3 + lprng | | 3 + lynx | | 19 + m4 | [] | 22 + mailfromd | | 4 + mailutils | | 6 + make | | 19 + man-db | | 14 + man-db-manpages | | 9 + midi-instruments | [] | 43 + minicom | [] | 17 + mkisofs | | 13 + myserver | | 9 + nano | [] | 29 + opcodes | | 12 + parted | [] | 21 + pies | | 4 + popt | [] | 36 + procps-ng | | 5 + procps-ng-man | | 4 + psmisc | [] | 22 + pspp | | 13 + pushover | | 6 + pwdutils | | 8 + pyspread | | 6 + radius | | 9 + recode | | 31 + recutils | | 9 + rpm | [] | 13 + rush | | 10 + sarg | | 4 + sed | [] | 34 + sharutils | | 12 + shishi | | 7 + skribilo | | 4 + solfege | | 19 + solfege-manual | | 9 + spotmachine | | 10 + sudo | | 24 + sudoers | | 20 + sysstat | | 22 + tar | [] | 30 + texinfo | | 17 + texinfo_document | | 11 + tigervnc | | 11 + tin | [] | 7 + tin-man | | 1 + tracgoogleappsa... | [] | 22 + trader | | 11 + util-linux | | 12 + ve | | 14 + vice | | 1 + vmm | | 3 + vorbis-tools | | 13 + wastesedge | | 2 + wcd | | 8 + wcd-man | | 3 + wdiff | [] | 23 + wget | | 19 + wyslij-po | | 14 + xboard | | 9 + xdg-user-dirs | [] | 68 + xkeyboard-config | [] | 27 + +-------+ + 90 teams zh_TW + 166 domains 42 2748 + + Some counters in the preceding matrix are higher than the number of +visible blocks let us expect. This is because a few extra PO files are +used for implementing regional variants of languages, or language +dialects. + + For a PO file in the matrix above to be effective, the package to +which it applies should also have been internationalized and distributed +as such by its maintainer. There might be an observable lag between the +mere existence a PO file and its wide availability in a distribution. + + If Jun 2014 seems to be old, you may fetch a more recent copy of this +'ABOUT-NLS' file on most GNU archive sites. The most up-to-date matrix +with full percentage details can be found at +'http://translationproject.org/extra/matrix.html'. + +1.5 Using 'gettext' in new packages +=================================== + +If you are writing a freely available program and want to +internationalize it you are welcome to use GNU 'gettext' in your +package. Of course you have to respect the GNU Lesser General Public +License which covers the use of the GNU 'gettext' library. This means +in particular that even non-free programs can use 'libintl' as a shared +library, whereas only free software can use 'libintl' as a static +library or use modified versions of 'libintl'. + + Once the sources are changed appropriately and the setup can handle +the use of 'gettext' the only thing missing are the translations. The +Free Translation Project is also available for packages which are not +developed inside the GNU project. Therefore the information given above +applies also for every other Free Software Project. Contact +'coordinator@translationproject.org' to make the '.pot' files available +to the translation teams. diff --git a/libs/hunspell/docs/AUTHORS b/libs/hunspell/docs/AUTHORS new file mode 100644 index 0000000000..f137fa26b8 --- /dev/null +++ b/libs/hunspell/docs/AUTHORS @@ -0,0 +1,5 @@ +Author of Hunspell: +Németh László nemeth (at) numbertext.org + +Hunspell based on OpenOffice.org's Myspell. MySpell's author: +Kevin Hendricks kevin.hendricks (at) sympatico.ca diff --git a/libs/hunspell/docs/AUTHORS.myspell b/libs/hunspell/docs/AUTHORS.myspell new file mode 100644 index 0000000000..36f8589e32 --- /dev/null +++ b/libs/hunspell/docs/AUTHORS.myspell @@ -0,0 +1,67 @@ +Developer Credits: + +Special credit and thanks go to ispell's creator Geoff Kuenning. +Ispell affix compression code was used as the basis for the +affix code used in MySpell. Specifically Geoff's use of a +conds[] array that makes it easy to check if the conditions +required for a particular affix are present was very +ingenious! Kudos to Geoff. Very nicely done. +BTW: ispell is available under a BSD style license +from Geoff Kuennings ispell website: +http://www.cs.ucla.edu/ficus-members/geoff/ispell.html + + +Kevin Hendricks <kevin.hendricks@sympatico.ca> is the original +author and now maintainer of the MySpell codebase. Recent +additions include ngram support, and related character maps +to help improve and create suggestions for very poorly +spelled words. + +Please send any and all contributions or improvements +to him or to dev@lingucomponent.openoffice.org. + + +David Einstein (Deinst@world.std.com) developed an almost +complete rewrite of MySpell for use by the Mozilla project. +David and I are now working on parallel development tracks +to help our respective projects (Mozilla and OpenOffice.org) +and we will maintain full affix file and dictionary file +compatibility and work on merging our versions of MySpell +back into a single tree. David has been a significant help +in improving MySpell. + + +Németh László <nemethl@gyorsposta.hu> is the author of +the Hungarian dictionary and he developed and contributed +extensive changes to MySpell including ... + * code to support compound words in MySpell + * fixed numerous problems with encoding case conversion tables. + * designed/developed replacement tables to improve suggestions + * changed affix file parsing to trees to greatly speed loading + * removed the need for malloc/free pairs in suffix_check which + speeds up spell checking in suffix rich languages by 20% + +Davide Prina <davideprina@uahoo.com>, Giuseppe Modugno +<gppe.modugno@libero.it>, Gianluca Turconi <luctur@comeg.it> +all from the it_IT OpenOffice.org team performed an +extremely detailed code review of MySpell and generated +fixes for bugs, leaks, and speedup improvements. + +Simon Brouwer <simon.oo.o@xs4all.nl> for fixes and enhancements +that have greatly improved MySpell auggestions + * n-gram suggestions for an initcap word have an init. cap. + * fix for too many n-gram suggestions from specialized dictionary, + * fix for long suggestions rather than close ones in case of + dictionaries with many compound words (kompuuter) + * optionally disabling split-word suggestions (controlled + by NOSPLITSUGS line in affix file) + + +Special Thanks to all others who have either contributed ideas or +testing for MySpell + + +Thanks, + +Kevin Hendricks +kevin.hendricks@sympatico.ca diff --git a/libs/hunspell/docs/BUGS b/libs/hunspell/docs/BUGS new file mode 100644 index 0000000000..6a5468e0f3 --- /dev/null +++ b/libs/hunspell/docs/BUGS @@ -0,0 +1,5 @@ +* Interactive interface has some visualization problem with long lines + +* Experimental -U, -u options don't support Unicode. + +* Compound handling is not thread safe in Hungarian specific code. diff --git a/libs/hunspell/docs/COPYING b/libs/hunspell/docs/COPYING new file mode 100644 index 0000000000..94a9ed024d --- /dev/null +++ b/libs/hunspell/docs/COPYING @@ -0,0 +1,674 @@ + GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + <program> Copyright (C) <year> <name of author> + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +<http://www.gnu.org/licenses/>. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +<http://www.gnu.org/philosophy/why-not-lgpl.html>. diff --git a/libs/hunspell/docs/COPYING.LESSER b/libs/hunspell/docs/COPYING.LESSER new file mode 100644 index 0000000000..65c5ca88a6 --- /dev/null +++ b/libs/hunspell/docs/COPYING.LESSER @@ -0,0 +1,165 @@ + GNU LESSER GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + + This version of the GNU Lesser General Public License incorporates +the terms and conditions of version 3 of the GNU General Public +License, supplemented by the additional permissions listed below. + + 0. Additional Definitions. + + As used herein, "this License" refers to version 3 of the GNU Lesser +General Public License, and the "GNU GPL" refers to version 3 of the GNU +General Public License. + + "The Library" refers to a covered work governed by this License, +other than an Application or a Combined Work as defined below. + + An "Application" is any work that makes use of an interface provided +by the Library, but which is not otherwise based on the Library. +Defining a subclass of a class defined by the Library is deemed a mode +of using an interface provided by the Library. + + A "Combined Work" is a work produced by combining or linking an +Application with the Library. The particular version of the Library +with which the Combined Work was made is also called the "Linked +Version". + + The "Minimal Corresponding Source" for a Combined Work means the +Corresponding Source for the Combined Work, excluding any source code +for portions of the Combined Work that, considered in isolation, are +based on the Application, and not on the Linked Version. + + The "Corresponding Application Code" for a Combined Work means the +object code and/or source code for the Application, including any data +and utility programs needed for reproducing the Combined Work from the +Application, but excluding the System Libraries of the Combined Work. + + 1. Exception to Section 3 of the GNU GPL. + + You may convey a covered work under sections 3 and 4 of this License +without being bound by section 3 of the GNU GPL. + + 2. Conveying Modified Versions. + + If you modify a copy of the Library, and, in your modifications, a +facility refers to a function or data to be supplied by an Application +that uses the facility (other than as an argument passed when the +facility is invoked), then you may convey a copy of the modified +version: + + a) under this License, provided that you make a good faith effort to + ensure that, in the event an Application does not supply the + function or data, the facility still operates, and performs + whatever part of its purpose remains meaningful, or + + b) under the GNU GPL, with none of the additional permissions of + this License applicable to that copy. + + 3. Object Code Incorporating Material from Library Header Files. + + The object code form of an Application may incorporate material from +a header file that is part of the Library. You may convey such object +code under terms of your choice, provided that, if the incorporated +material is not limited to numerical parameters, data structure +layouts and accessors, or small macros, inline functions and templates +(ten or fewer lines in length), you do both of the following: + + a) Give prominent notice with each copy of the object code that the + Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the object code with a copy of the GNU GPL and this license + document. + + 4. Combined Works. + + You may convey a Combined Work under terms of your choice that, +taken together, effectively do not restrict modification of the +portions of the Library contained in the Combined Work and reverse +engineering for debugging such modifications, if you also do each of +the following: + + a) Give prominent notice with each copy of the Combined Work that + the Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the Combined Work with a copy of the GNU GPL and this license + document. + + c) For a Combined Work that displays copyright notices during + execution, include the copyright notice for the Library among + these notices, as well as a reference directing the user to the + copies of the GNU GPL and this license document. + + d) Do one of the following: + + 0) Convey the Minimal Corresponding Source under the terms of this + License, and the Corresponding Application Code in a form + suitable for, and under terms that permit, the user to + recombine or relink the Application with a modified version of + the Linked Version to produce a modified Combined Work, in the + manner specified by section 6 of the GNU GPL for conveying + Corresponding Source. + + 1) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (a) uses at run time + a copy of the Library already present on the user's computer + system, and (b) will operate properly with a modified version + of the Library that is interface-compatible with the Linked + Version. + + e) Provide Installation Information, but only if you would otherwise + be required to provide such information under section 6 of the + GNU GPL, and only to the extent that such information is + necessary to install and execute a modified version of the + Combined Work produced by recombining or relinking the + Application with a modified version of the Linked Version. (If + you use option 4d0, the Installation Information must accompany + the Minimal Corresponding Source and Corresponding Application + Code. If you use option 4d1, you must provide the Installation + Information in the manner specified by section 6 of the GNU GPL + for conveying Corresponding Source.) + + 5. Combined Libraries. + + You may place library facilities that are a work based on the +Library side by side in a single library together with other library +facilities that are not Applications and are not covered by this +License, and convey such a combined library under terms of your +choice, if you do both of the following: + + a) Accompany the combined library with a copy of the same work based + on the Library, uncombined with any other library facilities, + conveyed under the terms of this License. + + b) Give prominent notice with the combined library that part of it + is a work based on the Library, and explaining where to find the + accompanying uncombined form of the same work. + + 6. Revised Versions of the GNU Lesser General Public License. + + The Free Software Foundation may publish revised and/or new versions +of the GNU Lesser General Public License from time to time. Such new +versions will be similar in spirit to the present version, but may +differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the +Library as you received it specifies that a certain numbered version +of the GNU Lesser General Public License "or any later version" +applies to it, you have the option of following the terms and +conditions either of that published version or of any later version +published by the Free Software Foundation. If the Library as you +received it does not specify a version number of the GNU Lesser +General Public License, you may choose any version of the GNU Lesser +General Public License ever published by the Free Software Foundation. + + If the Library as you received it specifies that a proxy can decide +whether future versions of the GNU Lesser General Public License shall +apply, that proxy's public statement of acceptance of any version is +permanent authorization for you to choose that version for the +Library. diff --git a/libs/hunspell/docs/COPYING.MPL b/libs/hunspell/docs/COPYING.MPL new file mode 100644 index 0000000000..7714141d15 --- /dev/null +++ b/libs/hunspell/docs/COPYING.MPL @@ -0,0 +1,470 @@ + MOZILLA PUBLIC LICENSE + Version 1.1 + + --------------- + +1. Definitions. + + 1.0.1. "Commercial Use" means distribution or otherwise making the + Covered Code available to a third party. + + 1.1. "Contributor" means each entity that creates or contributes to + the creation of Modifications. + + 1.2. "Contributor Version" means the combination of the Original + Code, prior Modifications used by a Contributor, and the Modifications + made by that particular Contributor. + + 1.3. "Covered Code" means the Original Code or Modifications or the + combination of the Original Code and Modifications, in each case + including portions thereof. + + 1.4. "Electronic Distribution Mechanism" means a mechanism generally + accepted in the software development community for the electronic + transfer of data. + + 1.5. "Executable" means Covered Code in any form other than Source + Code. + + 1.6. "Initial Developer" means the individual or entity identified + as the Initial Developer in the Source Code notice required by Exhibit + A. + + 1.7. "Larger Work" means a work which combines Covered Code or + portions thereof with code not governed by the terms of this License. + + 1.8. "License" means this document. + + 1.8.1. "Licensable" means having the right to grant, to the maximum + extent possible, whether at the time of the initial grant or + subsequently acquired, any and all of the rights conveyed herein. + + 1.9. "Modifications" means any addition to or deletion from the + substance or structure of either the Original Code or any previous + Modifications. When Covered Code is released as a series of files, a + Modification is: + A. Any addition to or deletion from the contents of a file + containing Original Code or previous Modifications. + + B. Any new file that contains any part of the Original Code or + previous Modifications. + + 1.10. "Original Code" means Source Code of computer software code + which is described in the Source Code notice required by Exhibit A as + Original Code, and which, at the time of its release under this + License is not already Covered Code governed by this License. + + 1.10.1. "Patent Claims" means any patent claim(s), now owned or + hereafter acquired, including without limitation, method, process, + and apparatus claims, in any patent Licensable by grantor. + + 1.11. "Source Code" means the preferred form of the Covered Code for + making modifications to it, including all modules it contains, plus + any associated interface definition files, scripts used to control + compilation and installation of an Executable, or source code + differential comparisons against either the Original Code or another + well known, available Covered Code of the Contributor's choice. The + Source Code can be in a compressed or archival form, provided the + appropriate decompression or de-archiving software is widely available + for no charge. + + 1.12. "You" (or "Your") means an individual or a legal entity + exercising rights under, and complying with all of the terms of, this + License or a future version of this License issued under Section 6.1. + For legal entities, "You" includes any entity which controls, is + controlled by, or is under common control with You. For purposes of + this definition, "control" means (a) the power, direct or indirect, + to cause the direction or management of such entity, whether by + contract or otherwise, or (b) ownership of more than fifty percent + (50%) of the outstanding shares or beneficial ownership of such + entity. + +2. Source Code License. + + 2.1. The Initial Developer Grant. + The Initial Developer hereby grants You a world-wide, royalty-free, + non-exclusive license, subject to third party intellectual property + claims: + (a) under intellectual property rights (other than patent or + trademark) Licensable by Initial Developer to use, reproduce, + modify, display, perform, sublicense and distribute the Original + Code (or portions thereof) with or without Modifications, and/or + as part of a Larger Work; and + + (b) under Patents Claims infringed by the making, using or + selling of Original Code, to make, have made, use, practice, + sell, and offer for sale, and/or otherwise dispose of the + Original Code (or portions thereof). + + (c) the licenses granted in this Section 2.1(a) and (b) are + effective on the date Initial Developer first distributes + Original Code under the terms of this License. + + (d) Notwithstanding Section 2.1(b) above, no patent license is + granted: 1) for code that You delete from the Original Code; 2) + separate from the Original Code; or 3) for infringements caused + by: i) the modification of the Original Code or ii) the + combination of the Original Code with other software or devices. + + 2.2. Contributor Grant. + Subject to third party intellectual property claims, each Contributor + hereby grants You a world-wide, royalty-free, non-exclusive license + + (a) under intellectual property rights (other than patent or + trademark) Licensable by Contributor, to use, reproduce, modify, + display, perform, sublicense and distribute the Modifications + created by such Contributor (or portions thereof) either on an + unmodified basis, with other Modifications, as Covered Code + and/or as part of a Larger Work; and + + (b) under Patent Claims infringed by the making, using, or + selling of Modifications made by that Contributor either alone + and/or in combination with its Contributor Version (or portions + of such combination), to make, use, sell, offer for sale, have + made, and/or otherwise dispose of: 1) Modifications made by that + Contributor (or portions thereof); and 2) the combination of + Modifications made by that Contributor with its Contributor + Version (or portions of such combination). + + (c) the licenses granted in Sections 2.2(a) and 2.2(b) are + effective on the date Contributor first makes Commercial Use of + the Covered Code. + + (d) Notwithstanding Section 2.2(b) above, no patent license is + granted: 1) for any code that Contributor has deleted from the + Contributor Version; 2) separate from the Contributor Version; + 3) for infringements caused by: i) third party modifications of + Contributor Version or ii) the combination of Modifications made + by that Contributor with other software (except as part of the + Contributor Version) or other devices; or 4) under Patent Claims + infringed by Covered Code in the absence of Modifications made by + that Contributor. + +3. Distribution Obligations. + + 3.1. Application of License. + The Modifications which You create or to which You contribute are + governed by the terms of this License, including without limitation + Section 2.2. The Source Code version of Covered Code may be + distributed only under the terms of this License or a future version + of this License released under Section 6.1, and You must include a + copy of this License with every copy of the Source Code You + distribute. You may not offer or impose any terms on any Source Code + version that alters or restricts the applicable version of this + License or the recipients' rights hereunder. However, You may include + an additional document offering the additional rights described in + Section 3.5. + + 3.2. Availability of Source Code. + Any Modification which You create or to which You contribute must be + made available in Source Code form under the terms of this License + either on the same media as an Executable version or via an accepted + Electronic Distribution Mechanism to anyone to whom you made an + Executable version available; and if made available via Electronic + Distribution Mechanism, must remain available for at least twelve (12) + months after the date it initially became available, or at least six + (6) months after a subsequent version of that particular Modification + has been made available to such recipients. You are responsible for + ensuring that the Source Code version remains available even if the + Electronic Distribution Mechanism is maintained by a third party. + + 3.3. Description of Modifications. + You must cause all Covered Code to which You contribute to contain a + file documenting the changes You made to create that Covered Code and + the date of any change. You must include a prominent statement that + the Modification is derived, directly or indirectly, from Original + Code provided by the Initial Developer and including the name of the + Initial Developer in (a) the Source Code, and (b) in any notice in an + Executable version or related documentation in which You describe the + origin or ownership of the Covered Code. + + 3.4. Intellectual Property Matters + (a) Third Party Claims. + If Contributor has knowledge that a license under a third party's + intellectual property rights is required to exercise the rights + granted by such Contributor under Sections 2.1 or 2.2, + Contributor must include a text file with the Source Code + distribution titled "LEGAL" which describes the claim and the + party making the claim in sufficient detail that a recipient will + know whom to contact. If Contributor obtains such knowledge after + the Modification is made available as described in Section 3.2, + Contributor shall promptly modify the LEGAL file in all copies + Contributor makes available thereafter and shall take other steps + (such as notifying appropriate mailing lists or newsgroups) + reasonably calculated to inform those who received the Covered + Code that new knowledge has been obtained. + + (b) Contributor APIs. + If Contributor's Modifications include an application programming + interface and Contributor has knowledge of patent licenses which + are reasonably necessary to implement that API, Contributor must + also include this information in the LEGAL file. + + (c) Representations. + Contributor represents that, except as disclosed pursuant to + Section 3.4(a) above, Contributor believes that Contributor's + Modifications are Contributor's original creation(s) and/or + Contributor has sufficient rights to grant the rights conveyed by + this License. + + 3.5. Required Notices. + You must duplicate the notice in Exhibit A in each file of the Source + Code. If it is not possible to put such notice in a particular Source + Code file due to its structure, then You must include such notice in a + location (such as a relevant directory) where a user would be likely + to look for such a notice. If You created one or more Modification(s) + You may add your name as a Contributor to the notice described in + Exhibit A. You must also duplicate this License in any documentation + for the Source Code where You describe recipients' rights or ownership + rights relating to Covered Code. You may choose to offer, and to + charge a fee for, warranty, support, indemnity or liability + obligations to one or more recipients of Covered Code. However, You + may do so only on Your own behalf, and not on behalf of the Initial + Developer or any Contributor. You must make it absolutely clear than + any such warranty, support, indemnity or liability obligation is + offered by You alone, and You hereby agree to indemnify the Initial + Developer and every Contributor for any liability incurred by the + Initial Developer or such Contributor as a result of warranty, + support, indemnity or liability terms You offer. + + 3.6. Distribution of Executable Versions. + You may distribute Covered Code in Executable form only if the + requirements of Section 3.1-3.5 have been met for that Covered Code, + and if You include a notice stating that the Source Code version of + the Covered Code is available under the terms of this License, + including a description of how and where You have fulfilled the + obligations of Section 3.2. The notice must be conspicuously included + in any notice in an Executable version, related documentation or + collateral in which You describe recipients' rights relating to the + Covered Code. You may distribute the Executable version of Covered + Code or ownership rights under a license of Your choice, which may + contain terms different from this License, provided that You are in + compliance with the terms of this License and that the license for the + Executable version does not attempt to limit or alter the recipient's + rights in the Source Code version from the rights set forth in this + License. If You distribute the Executable version under a different + license You must make it absolutely clear that any terms which differ + from this License are offered by You alone, not by the Initial + Developer or any Contributor. You hereby agree to indemnify the + Initial Developer and every Contributor for any liability incurred by + the Initial Developer or such Contributor as a result of any such + terms You offer. + + 3.7. Larger Works. + You may create a Larger Work by combining Covered Code with other code + not governed by the terms of this License and distribute the Larger + Work as a single product. In such a case, You must make sure the + requirements of this License are fulfilled for the Covered Code. + +4. Inability to Comply Due to Statute or Regulation. + + If it is impossible for You to comply with any of the terms of this + License with respect to some or all of the Covered Code due to + statute, judicial order, or regulation then You must: (a) comply with + the terms of this License to the maximum extent possible; and (b) + describe the limitations and the code they affect. Such description + must be included in the LEGAL file described in Section 3.4 and must + be included with all distributions of the Source Code. Except to the + extent prohibited by statute or regulation, such description must be + sufficiently detailed for a recipient of ordinary skill to be able to + understand it. + +5. Application of this License. + + This License applies to code to which the Initial Developer has + attached the notice in Exhibit A and to related Covered Code. + +6. Versions of the License. + + 6.1. New Versions. + Netscape Communications Corporation ("Netscape") may publish revised + and/or new versions of the License from time to time. Each version + will be given a distinguishing version number. + + 6.2. Effect of New Versions. + Once Covered Code has been published under a particular version of the + License, You may always continue to use it under the terms of that + version. You may also choose to use such Covered Code under the terms + of any subsequent version of the License published by Netscape. No one + other than Netscape has the right to modify the terms applicable to + Covered Code created under this License. + + 6.3. Derivative Works. + If You create or use a modified version of this License (which you may + only do in order to apply it to code which is not already Covered Code + governed by this License), You must (a) rename Your license so that + the phrases "Mozilla", "MOZILLAPL", "MOZPL", "Netscape", + "MPL", "NPL" or any confusingly similar phrase do not appear in your + license (except to note that your license differs from this License) + and (b) otherwise make it clear that Your version of the license + contains terms which differ from the Mozilla Public License and + Netscape Public License. (Filling in the name of the Initial + Developer, Original Code or Contributor in the notice described in + Exhibit A shall not of themselves be deemed to be modifications of + this License.) + +7. DISCLAIMER OF WARRANTY. + + COVERED CODE IS PROVIDED UNDER THIS LICENSE ON AN "AS IS" BASIS, + WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, + WITHOUT LIMITATION, WARRANTIES THAT THE COVERED CODE IS FREE OF + DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. + THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED CODE + IS WITH YOU. SHOULD ANY COVERED CODE PROVE DEFECTIVE IN ANY RESPECT, + YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE + COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER + OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF + ANY COVERED CODE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. + +8. TERMINATION. + + 8.1. This License and the rights granted hereunder will terminate + automatically if You fail to comply with terms herein and fail to cure + such breach within 30 days of becoming aware of the breach. All + sublicenses to the Covered Code which are properly granted shall + survive any termination of this License. Provisions which, by their + nature, must remain in effect beyond the termination of this License + shall survive. + + 8.2. If You initiate litigation by asserting a patent infringement + claim (excluding declatory judgment actions) against Initial Developer + or a Contributor (the Initial Developer or Contributor against whom + You file such action is referred to as "Participant") alleging that: + + (a) such Participant's Contributor Version directly or indirectly + infringes any patent, then any and all rights granted by such + Participant to You under Sections 2.1 and/or 2.2 of this License + shall, upon 60 days notice from Participant terminate prospectively, + unless if within 60 days after receipt of notice You either: (i) + agree in writing to pay Participant a mutually agreeable reasonable + royalty for Your past and future use of Modifications made by such + Participant, or (ii) withdraw Your litigation claim with respect to + the Contributor Version against such Participant. If within 60 days + of notice, a reasonable royalty and payment arrangement are not + mutually agreed upon in writing by the parties or the litigation claim + is not withdrawn, the rights granted by Participant to You under + Sections 2.1 and/or 2.2 automatically terminate at the expiration of + the 60 day notice period specified above. + + (b) any software, hardware, or device, other than such Participant's + Contributor Version, directly or indirectly infringes any patent, then + any rights granted to You by such Participant under Sections 2.1(b) + and 2.2(b) are revoked effective as of the date You first made, used, + sold, distributed, or had made, Modifications made by that + Participant. + + 8.3. If You assert a patent infringement claim against Participant + alleging that such Participant's Contributor Version directly or + indirectly infringes any patent where such claim is resolved (such as + by license or settlement) prior to the initiation of patent + infringement litigation, then the reasonable value of the licenses + granted by such Participant under Sections 2.1 or 2.2 shall be taken + into account in determining the amount or value of any payment or + license. + + 8.4. In the event of termination under Sections 8.1 or 8.2 above, + all end user license agreements (excluding distributors and resellers) + which have been validly granted by You or any distributor hereunder + prior to termination shall survive termination. + +9. LIMITATION OF LIABILITY. + + UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT + (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL YOU, THE INITIAL + DEVELOPER, ANY OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF COVERED CODE, + OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR + ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY + CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, + WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER + COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN + INFORMED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF + LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY + RESULTING FROM SUCH PARTY'S NEGLIGENCE TO THE EXTENT APPLICABLE LAW + PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE + EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO + THIS EXCLUSION AND LIMITATION MAY NOT APPLY TO YOU. + +10. U.S. GOVERNMENT END USERS. + + The Covered Code is a "commercial item," as that term is defined in + 48 C.F.R. 2.101 (Oct. 1995), consisting of "commercial computer + software" and "commercial computer software documentation," as such + terms are used in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 + C.F.R. 12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), + all U.S. Government End Users acquire Covered Code with only those + rights set forth herein. + +11. MISCELLANEOUS. + + This License represents the complete agreement concerning subject + matter hereof. If any provision of this License is held to be + unenforceable, such provision shall be reformed only to the extent + necessary to make it enforceable. This License shall be governed by + California law provisions (except to the extent applicable law, if + any, provides otherwise), excluding its conflict-of-law provisions. + With respect to disputes in which at least one party is a citizen of, + or an entity chartered or registered to do business in the United + States of America, any litigation relating to this License shall be + subject to the jurisdiction of the Federal Courts of the Northern + District of California, with venue lying in Santa Clara County, + California, with the losing party responsible for costs, including + without limitation, court costs and reasonable attorneys' fees and + expenses. The application of the United Nations Convention on + Contracts for the International Sale of Goods is expressly excluded. + Any law or regulation which provides that the language of a contract + shall be construed against the drafter shall not apply to this + License. + +12. RESPONSIBILITY FOR CLAIMS. + + As between Initial Developer and the Contributors, each party is + responsible for claims and damages arising, directly or indirectly, + out of its utilization of rights under this License and You agree to + work with Initial Developer and Contributors to distribute such + responsibility on an equitable basis. Nothing herein is intended or + shall be deemed to constitute any admission of liability. + +13. MULTIPLE-LICENSED CODE. + + Initial Developer may designate portions of the Covered Code as + "Multiple-Licensed". "Multiple-Licensed" means that the Initial + Developer permits you to utilize portions of the Covered Code under + Your choice of the NPL or the alternative licenses, if any, specified + by the Initial Developer in the file described in Exhibit A. + +EXHIBIT A -Mozilla Public License. + + ``The contents of this file are subject to the Mozilla Public License + Version 1.1 (the "License"); you may not use this file except in + compliance with the License. You may obtain a copy of the License at + http://www.mozilla.org/MPL/ + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the + License for the specific language governing rights and limitations + under the License. + + The Original Code is ______________________________________. + + The Initial Developer of the Original Code is ________________________. + Portions created by ______________________ are Copyright (C) ______ + _______________________. All Rights Reserved. + + Contributor(s): ______________________________________. + + Alternatively, the contents of this file may be used under the terms + of the _____ license (the "[___] License"), in which case the + provisions of [______] License are applicable instead of those + above. If you wish to allow use of your version of this file only + under the terms of the [____] License and not to allow others to use + your version of this file under the MPL, indicate your decision by + deleting the provisions above and replace them with the notice and + other provisions required by the [___] License. If you do not delete + the provisions above, a recipient may use your version of this file + under either the MPL or the [___] License." + + [NOTE: The text of this Exhibit A may differ slightly from the text of + the notices in the Source Code files of the Original Code. You should + use the text of this Exhibit A rather than the text found in the + Original Code Source Code for Your Modifications.] + diff --git a/libs/hunspell/docs/ChangeLog b/libs/hunspell/docs/ChangeLog new file mode 100644 index 0000000000..1f6e774a63 --- /dev/null +++ b/libs/hunspell/docs/ChangeLog @@ -0,0 +1,1993 @@ +2016-04-29 Caolán McNamara <caolanm at LibO>: + * deprecate old api and add new one + old one remains implemented in terms of new one + and will eventually be removed + * shrink exposed api down to just hunspell.hxx + * next major release is likely to require C++11 + +2016-04-15 Caolán McNamara <caolanm at LibO>: + * generally using std::string and std::vector internally + +2016-04-13 Caolán McNamara <caolanm at LibO>: + * gh#371 drop experimental code + +2015-09-11 Caolán McNamara <caolanm at LibO>: + * rhbz#1261421 crash on mashing hangul korean keyboard + +2014-12-03 Németh László <nemeth at numbertext dot org>: + * tools/hunspell.cxx: security fixes of the Hunspell executable + - secure file name handling, the problem (checking + OpenDocument files with malicious file names) + reported by Eric Sesterhenn + - using tmpnam() only with system("mkdir tempname && ...") + +2014-10-17 Caolán McNamara <caolanm at LibO>: + * sf#245 Feature from Anish Patil -S mode + to show suggestions for completion of + correctly spelled words + * sf#248 Fix manpage about how to include + +2014-10-16 Caolán McNamara <caolanm at LibO>: + * rhbz#915448, sf#57, sf#185 report character offset + and not byte offset in ispell mode + * sf#56 segv in experimental mode + * sf#228 don't translate init string + +2014-09-22 Németh László <nemeth at numbertext dot org>: + * fix crash in morphological analysis of the Hungarian + compound word 'művészegyéniség', reported by Gáspár Sinai + +2014-08-26 Németh László <nemeth at numbertext dot org>: + * unmunch separates flags of prefixes from the word, + bug reported by Daniel Naber + +2014-08-05 Németh László <nemeth at numbertext dot org>: + * moz#318040 Mozzilla accepts abbreviations without dots + * myfopen(): add _wfullpath to expand relative parts of absolute paths + +2014-07-16 Caolán McNamara <caolanm at LibO>: + * moz#675553 Switch from PRBool to bool + * moz#690892 replace PR_TRUE/PR_FALSE with true/false + * Silence the warning about empty while body loop in clang + * moz#777292 Make nsresult an enum + * moz#579517 Use stdint types in gecko + * moz#784776 consistently use FLAG_NULL + * moz#927728 Convert PRUnichar to char16_t + * moz#943268 Remove nsCharsetAlias and nsCharsetConverterManager + * Don't include config.h in license.hunspell if MOZILLA_CLIENT is set + +2014-06-26 Caolán McNamara <caolanm at LibO>: + * clang scan-build: Allocator sizeof operand mismatch + * clang scan-build: other low hanging warnings + * clang scan-build: significant warnings + +2014-06-02 Németh László <nemeth at numbertext dot org>: + * escape spaces in paths of ODF files + +2014-05-28 Németh László <nemeth at numbertext dot org>: + * add long path/Unicode path support in WIN32 environment: + - hunspell#233 (reported by mahak gark) and LibreOffice fdo#48017 + * flat ODF support, eg.: + hunspell doc.fodt + cat doc.fodt | hunspell -l -O + * new options: + - -X (XML) input format + - -O (ODF or flat ODF) input format + - --check-apostrophe: check and force Unicode apostrophe usage + (ASCII or Unicode apostrophe has to be in the + WORDCHARS section of the affix file) + * fix ODF support: + - break 1-line XML of ODT documents at </style:style>, too, + not only at </text:p> (limiting tokenization problems, when + fgets stops within an XML tag) + - show ODF file path on the UI instead of the temporary file + * fix XML support: + - ', ", &, < and > in replacements converted to XML entities + - recognize &apos at tokenization, depending from WORDCHARS + - ' in tokens converted to ' before spell checking and + in the output of the pipe interface + * better apostrophe usage: + - WORDCHARS only with one of the Unicode or ASCII apostrophe + results extended word tokenization: both of them will be part of + the words (if they are inside: eg. word's, but not words'). + - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries + (eg. English dictionaries), or for UTF-8 dictionaries only + with ASCII apostrophe supports (eg. French dictionaries). + * updated manual: + - hunspell.4 renamed to hunspell.5, see + hunspell#241 reported by Cristopher Yeleighton + - updated translations + - note about long/Unicode paths in WIN32 (hunspell.3) + +2014-04-25 Németh László <nemeth at numbertext dot org>: + * OpenDocument support, eg. + hunspell *.odt + hunspell -l *.odt + * always load default personal dictionary (fix + filtering bad words - reduce this word list - using + it as a personal dictionary workflow) + * fix parsing/URL recognition problem (bad tokens + with aposthrophes) + +2013-07-25 pchang9@cs.wisc.edu + * moz#897255 Wasted work in line_uniq + * moz#897780 Wasted work in SuggestMgr::twowords + +2013-07-25 Caolán McNamara <caolanm at LibO>: + * hunspell#167 layout problems with long lines + - based on the original fix by xorho + adapted to HEAD + * rhbz#925562 upgrade config.guess for aarch64 + +2013-07-24 pchang9@cs.wisc.edu + * moz#896301 Wasted work in SfxEntry::checkword + * moz#896844 Wasted work in AffixMgr::defcpd_check + +2013-06-13 Konstantin Khlebniko + * #49 HashMgr::add_word computes wrong size for struct hentry + +2013-06-13 Ville Skyttä + * #53 Man page syntax fixes + +2013-04-19 John Thomson <john thomson at SIL> + * win_api: add remove() of Hunspell API (hun#3606435) + +2013-04-19 Rouslan Solomokhin <at sf.net> + * fix crash in suggestions for 99-character long words + by extending arrays of SuggestMgr::forgotchar_* + (hun#3595024, also http://crbug.com/130128), + thanks to also Paweł Hajdan to report the patch + +2013-04-01 Caolán McNamara <caolanm at LibO>: + * hunspell: -Werror=undef + +2013-03-13 Caolán McNamara <caolanm at LibO>: + * rhbz#918938 crash in interaction with danish thesaurus + +2012-09-18 Németh László <nemeth at numbertext dot org>: + * src/hunspell/affixmgr.*: - fix morphological analysis of + compound words (hun#3544994, reported by Dávid Nemeskey, fdo#55045) + +2012-06-29 Caolán McNamara <caolanm at LibO>: + * fix various coverity warnings + +2012-01-10 Ehsan Akhgari <ehsan at mozilla dot com> + * moz#710940 Firefox Crash [@ AffixMgr::parse_file(char const*, char + const*) ] + +2011-12-16 Jared Wein <jwein at mozilla dot com> + * moz#710967 Incorrect argument passed to strncmp in + AffixMgr::parse_convtable + +2011-12-06 Caolán McNamara <caolanm at LibO>: + * rhbz#759647 fixed tempname of hunSPELL.bak collides with other users + when multiple edits in one dir + +2011-10-13 Caolán McNamara <caolanm at LibO>: + * moz#694002 crash in hunspell affixmgr on exit with bad .aff + * leak in hunspell affixmgr with bad .aff + +2011-09-19 Caolán McNamara <caolanm at LibO>: + * make libparsers.a not installed thanks to Tomáš Chvátal + +2011-06-23 Caolán McNamara <caolanm at LibO>: + * fix some windows compiler warnings + +2011-05-24 Németh László <nemeth at numbertext dot org>: + * src/hunspell/affixmgr.*: allow twofold suffixes in compounds + by extended version of Arno Teigseth's patch, see hun#3288562. + - new option for this feature: COMPOUNDMORESUFFIXES + +2011-02-16 Németh László <nemeth at numbertext dot org>: + * src/*/Makefile.am: fix library versioning, the probem reported by + Rene Engerhald and Simon Brouwer. + + * man/hunspell.4: new version based on the revised version of Ruud Baars + +2011-02-02 Németh László <nemeth at OOo>: + * suggestngr.cxx: fix ngram PHONE suggestion for input words with + diacritics using UTF-8 encoded dictionaries (add byte length to the + 8-bit phonet() argument instead of character length) + + * suggestmgr.cxx: fix missing csconv problem with UTF-8 encoding + dictionares, when the input contains non-BMP characters + - tests/utf8_nonbmp.sug: test file + + * suggestmgr.cxx: mixed and keyboard based character suggestions + don't forbid ngram suggestion search (optimized tests/suggestiontest) + + * affixmgr.cxx: fix hun#2999225: interfering compounding mechanisms, + tested on Dutch word list and reported by Ruud Baars + + * affixmgr.cxx: allomorph fix for hun#2970240 (Hungarian + compound "vadász+gép" was analyzed as vad+ász+gép, and rejected + by the ss->s rep rule (verb "vadássz"), but the analysis + didn't continue for the longer word parts (vadász+gép). + + * csutil.cxx: add lang code "az_AZ", "hu_HU", "tr_TR" for back + compatibility (fixing Azeri and Turkish casing conversion, also + Hungarian compound handling) + + * affixmgr.cxx: fix morphological analysis + +2011-01-26 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix for moz#626195 (memcheck problem with FULLSTRIP). + + * affixmgr.*, suggestmgr.cxx: FORBIDWARN parameter (see manual) + +2011-01-24 Németh László <nemeth at OOo>: + * suffixmgr.cxx: fix bad suggestion of forbidden compound words, eg. + "termijndoel" with the Dutch dictionary. Reported by Ruud Baars. + + * latexparser.cxx: fix double apostrophe TeX quoation mark tokenization + (hun#3119776), reported by Wybodekker at SF.net. + + * tests/suggestiontest/*: multilanguage and single Hunspell version, see README + * tests/suggestiontest/prepare2: for make -f Makefile.orig single + +2011-01-22 Németh László <nemeth at OOo>: + * affixmgr.*, suggestmgr.*: new features + ONLYMAXDIFF: remove all bad ngram suggestions (default mode keeps one) + NONGRAMSUGGEST: similar to NOSUGGEST, but it forbids to use the word + in ngram based (more, than 1-character distance) suggestions. + +2011-01-21 Németh László <nemeth at OOo>: + * suggestmgr.*: limit wild suggestions (hun#2970237 by Ruud Baars) + - limited compound word suggestions + - improved and limited ngram based suggestions + * tests/*.sug: modified test files + - feature MAXCPDSUGS: + MAXCPDSUGS 0 : no compound suggestion, suggested by + Finn Gruwier Larsen in hunfeat#2836033 + MAXCPDSUGS n : max. ~n compound suggestions + - feature MAXDIFF: differency limit for ngram suggestions: 0-10 + eg. MAXDIFF 5: normal (default) limit + MAXDIFF 0: only one ngram suggestion + MAXDIFF 10: ~maxngramsugs ngram suggestions + + * affixmgr.*, hunspell.*: add flag FORCEUCASE (hun#2999228), force + capitalization of compound words, see Hunspell 4 manual), + suggested by Ruud Baars + test/forceucase.*: test files + + * affixmgr.*, hunspell.*: add flag WARN (hun#1808861), optional warning feature + for rare words, suggested by Ruud Baars + tests/warn: test files + * tools/hunspell.cxx: add option -r for optional filtering of rare words + + * affixmgr.cxx: fix hun#3161359 (gcc warnings) reported by Ryan VanderMeulen. + +2011-01-17 Németh László <nemeth at OOo>: + * suggestmgr.cxx: fix hun#3158994 and hun#3159027 (missing csconv table + using awkward 8bit capitalization of UTF-8 encoded dictionary words with PHONE + suggestion, reported by benjarobin and dicollecte at SF.net). + +2011-01-13 Németh László <nemeth at OOo>: + * affixmgr.cxx: ONLYINCOMPOUND fix for hun#2999224 (fogemorphene + was allowed in end position of compoundings). Reported by Ruud Baars. + * tests/onlyincompound2.*: test files + +2011-01-10 Ingo H. de Boer <idb_winshell at SF.net>: + * win_api/{hunspell,libhunspell, testparser}.vcproj: updated project + files for the library and the executables. Compiling problem + also reported by Don Walker. + +2011-01-06 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix freedesktop#32850 (program halt during Hungarian + spell checking of the word "6csillagocska6", reported by András Tímár) + + * tools/hunspell.cxx: add Mac OS X Hunspell dictionary paths, asked by + Vidar Gundersen in hunfeat#3142010 + +2011-01-05 Caolán McNamara <cmc at OOo>: + * moz#620626 NS_UNICHARUTIL_CID doesn't support + case conversion + +2011-01-03 Németh László <nemeth at OOo>: + * NEWS and THANKS: update for release 1.2.13 + +2010-12-20 Németh László <nemeth at OOo>: + * affixmgr.cxx: hun#3140784 + +2010-12-16 Németh László <nemeth at OOo>: + * affixmgr.cxx: + - improved fix of hun#2970242 (supporting + zero affixes, reported by Ruud Baars + - tests/opentaal_cpdpat{,2}: test files + + - switching off default BREAK parameters by BREAK 0, + reported by Ruud Baars + + - hun#2999225: interfering compounding mechanisms, reported by Ruud Baars + +2010-12-11 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix hun#2970242 (CHECKCOMPOUNDPATTERN only with flags), + the bug reported by Ruud Baars + * tests/2970242.*: test files + + * tests/2970240.*: test files for CHECKCOMPOUNDPATTERN fix (check all + boundaries in compound words, fixed by the previous CHECKCOMPOUNDREP + fix), the bug reported by Ruud Baars + + * win_api/Makefile.cygwin: update + +2010-12-09 Caolán McNamara <cmc at OOo>: + * moz#617953 fix leak + +2010-11-08 Caolán McNamara <cmc at OOo>: + * rhbz#650503 crash in arabic dictionary + +2010-11-05 Caolán McNamara <cmc at OOo>: + * rhbz#648740 don't warn on empty flagvector + +2010-11-03 Caolán McNamara <cmc at OOo>: + * logically we shouldn't need a csconv table in utf-8 mode + +2010-10-27 Németh László <nemeth at OOo>: + * hun#3000055 (requested by Ruud Baars) add REP boundary specifiation: + REP ^word$ xxxx + REP ^wordstarting xxxx + REP wordending$ xxxx + + * hun#3008434 (requested by Adrián Chaves Fernández) and + hun#3018929 (requested by Ruud Baars): REP with more than 2 words: + REP morethantwo more_than_two + + * suggestmgr.cxx: fix incomplete suggestion list for capitalized words, + eg. missing Machtstrijd->Machtsstrijd in the Dutch dictionary + (reported by Ruud Bars) + + * tests, man: related updates + +2010-10-12 Caolán McNamara <cmc at OOo>: + * moz#603311 HashMgr::load_tables leaks dict when decode_flags fails + * fix mem leak found with new tests + * hun#3084340 allow underscores in html entity names + +2010-10-07 Németh László <nemeth at OOo>: + * affixmgr.cxx: + - hun#2970239 fix bad suggestion of forbidden compound words + - hun#2999224 fix keepcase feature on compound words (only partial + fix for COMPOUNDRULE based compounding) + - fix checkcompoundrep feature in compound words (check all boundaries, + not only the last one) + Problems reported by Ruud Baars. + + * tests/opentaal_forbiddenword[12]*, tests/opentaal_keepcase*: + new test files for the previous fixes + * tests/checkcompoundrep: extended test file. + +2010-09-05 Caolán McNamara <cmc at OOo>: + * moz#583582 fix double buffer gcc fortify issue + +2010-08-13 Caolán McNamara <cmc at OOo>: + * moz#586671 AffixMgr::parse_convtable leaks pattern/pattern2 if it + can't create both + * moz#586686 tidy up get_xml_list and friends + +2010-08-10 Caolán McNamara <cmc at OOo>: + * hun#3022860 fix remove duplicate code + +2010-07-17 Caolán McNamara <cmc at OOo>: + * remove ununsed get_default_enc and avoid potential misrecognition of + three letter language ids + * normalize encoding names before lookup + +2010-07-05 Caolán McNamara <cmc at OOo>: + * hun#2286060 add Hangul syllables to unicode tables + +2010-06-26 Caolán McNamara <cmc at OOo>: + * moz#571728 keep new[]/delete[] wrappers in sync for embedded in moz + case + +2010-06-13 Caolán McNamara <cmc at OOo>: + * moz#571728 keep new[]/delete[] wrappers in sync for embedded in moz + case + +2010-06-02 Caolán McNamara <cmc at OOo>: + * moz#569611 compile cleanly under win64 + +2010-05-22 Caolán McNamara <cmc at OOo>: + * moz#525581 apply mozilla's current preferred get_current_cs impl + +2010-05-17 Németh László <nemeth at OOo>: + * affixmgr.cxx: fix bad limitation of parenthesized flags at + COMPOUNDRULEs. Windows crash reported by Ruud Baars and Simon Brouwer. + +2010-05-05 Caolán McNamara <cmc at OOo>: + * rhbz#589326 malloc of int that should have been of char** + * hun#2997388 fix ironic misspellings + +2010-04-28 Caolán McNamara <cmc at OOo>: + * moz#550942 get_xml_list doesn't handle failure from get_xml_par + +2010-04-27 Caolán McNamara <cmc at OOo>: + * moz#465612 mozilla-specific code leaks + * moz#430900 phone is dereferenced before oom check + * moz#418348 ckey_utf alloc is used unchecked in SuggestMgr::badcharkey_utf + * CID#1487 pointer "rl" dereferenced before NULL check + * CID#1464 Returned without freeing storage "ptr" + * CID#1459 Avoid duplicate strchr + * CID#1443 Avoid any chance of dereferencing *slst + * CID#1442 Unsafe to have a null morph + * CID#1440 Avoid null filenames + * CID#1302 Dereferencing NULL value "apostrophe" + * CID#1441 Avoid deferencing null ppfx + +2010-04-16 Caolán McNamara <cmc at OOo>: + * hun#2344123 fix U)ncap in utf-8 locale + * fix up hunspell text UI and lines wider than terminal + +2010-04-15 Caolán McNamara <cmc at OOo>: + * hun#2613701 fix small leak in FileMgr::FileMgr + * fix small leak in tools/hunspell + * hun#2871300 avoid crash if def and words are NULL + * hun#2904479 fix length of hzip file + * hun#2986756 mingw build fix + * hun#2986756 fix double-free + * hun#2059896 fix crash in interactive mode without nls + * hun#2917914 add some extra words to the latexparser + * make some structs static + * C-api has duped symbol names + * regenerate gettext/intl with recent version + * hun#2796772 build a .dll under MinGW + * rhbz#502387 allow cross-compiling for MinGW target + * hun#2467643 update .vcproj files to include replist.?xx + * unify visiblity/dll_export support across platforms + * hun#2831289 sizeof(short) typo + * hun#2986756 add -u3 gcc style output + +2010-04-14 Caolán McNamara <cmc at OOo>: + * hun#2813804 fix segfault on hu_HU stemming + +2010-04-13 Caolán McNamara <cmc at OOo>: + * hun#2806689 fix ironic misspellings + * hun#2836240 add Italian translations + +2010-04-09 Caolán McNamara <cmc at OOo>: + * fix titchy possible leak in command-line spellchecker + +2010-04-07 Caolán McNamara <cmc at OOo>: + * hun#2973827 apply win64 patch + * hun#2005643 fix broken mystrdup + +2010-03-04 Caolán McNamara <cmc at OOo>: + * ooo#107768 fix crash in long strings in spellml mode + * hun#1999737 add some malloc checks + * hun#1999769 drop old buffer on realloc failure + * hun#2005643 tidy string functions + * hun#2005643 micro-opt + * hun#2006077 free strings on failed dict parse + * hun#2110783 ispell-alike verbose mode implementation + +2010-03-03 Németh László <nemeth at OOo>: + * hunspell/(affixmgr, suggestmgr).cxx: add character sequence + support for MAP suggestion, using parenthesized character groups + in the syntax, eg. MAP ß(ss). + * man/hunspell.4, tests/map*: documentation and test files + +2010-02-25 Németh László <nemeth at OOo>: + * hunspell/hunspell.cxx: add recursion limit for BREAK (fix OOo Issue 106267) + + * hunspell/hunspell.cxx: fix crash in morphological analysis of + capitalized words with ending dashes + + * affixmgr.cxx: fix morphological analysis of long numbers combined with dash, + eg. 45-00000045 (reported by a@freeblog.hu). + +2010-02-23 Caolán McNamara <cmc at OOo>: + * hun#2314461 improve ispell-alike mode + * hun#2784983 improve default language detection + * hun#2812045 fix some compiler warnings + * hun#2910695 survive missing HOME dir + * hun#2934195 fix suggestmgr crash + * hun#2921129 remove unused variables + * hun#2826164 make sure make check uses the in-tree libhunspell + * bump toolchain to support --disable-rpath + * hun#2843984 fix coverity warning + * hun#2843986 fix coverity warning + * hun#2077630 add iconv lib + * make gcc strict-aliasing warning free + * make cppcheck warning free + +2008-11-01 Németh László <nemeth at OOo>: + * replist.*, hunspell.cxx, affixmgr.cxx: new input and output + conversion support, see ICONV and OCONV keywords in the Hunspell(4) + manual page and the test examples. The input/output conversion + problem of syllabic languages reported by Daniel Yacob and + Shewangizaw Gulilat. + - tests/{iconv,oconv}.*: test examples + + * tools/wordforms: word generation script for dictionary developers + (Hunspell version of the unmunch program) + + * hunspell/hunspell.cxx: extended BREAK feature: ^ and $ mean in break + patterns the beginning and end of the word. + - tests/BREAK.*: modified examples. + + * hunspell/hunspell.cxx: set default break at hyphen characters. + The associated problem reported by S Page in Hunspell Bug 2174061. + See Mozilla Bug ID 355178 and OOo Issue 64400, too. + - tests/breakdefault.*: test data + The following definition is equivalent of the default word break: + + BREAK 3 + BREAK - + BREAK ^- + BREAK -$ + + * affixmgr.cxx: SIMPLIFIEDTRIPLE is a new affix file keyword to allow + simplified forms of the compound words with triple repeating letters. + It is useful for Swedish and Norwegian languages. + + * affixmgr.cxx: extend CHECKCOMPOUNDPATTERN to support + alternations of compound words for example by sandhi + feature of Indian and other languages. The problem reported + by Kiran Chittella associated with Telugu writing system + (see Telugu example in tests/checkcompoundpattern4.test). + The new optional field of CHECKCOMPOUNDPATTERN definition is the + replacement of the compound boundary defined by the previous fields: + CHECKCOMPOUNDPATTERN ff f ff + means ff|f compound boundary has been replaced by "ff", like in + the (prereform) German Schiffahrt (Schiff+fahrt). + - CHECKCOMPOUNDPATTERN supports also optional flag conditions now: + CHECKCOMPOUNDPATTERN ff/A f/B ff + means that the first word of the compound needs flag "A" and + the second word of the compound needs flag "B" to the operation. + + * tools/hunspell.cxx: add empty lines as separators to the output of + the stemming and morphological analysis. + + * affixmgr.cxx: fix condition checking algorithm. Bad suggestion + generation reported by Mehmet Akin in SF.net Bug 2124186 with help of + Eleonora Goldman. + + * affixmgr,cxx: fix COMPOUNDWORDMAX feature. The problem and its + code details reported by Göran Andersson under SF.net Bug ID 2138001. + + * csutil.cxx: fix bad conditional code for Mozilla compilation. + Patch by Serge Gautherie. The problem reported by Ryan VanderMeulen. + + * hunspell/hunspell.cxx: add missing ngram suggestion for HUHINITCAP + (capitalized mixed case) words. + + * w_char.hxx: use GCC conditions for GCC related code. Patch by + Ryan VanderMeulen. + + * affixmgr.cxx: check morphological description in morphgen() + (fix potential program fault by incomplete morphological + description of affix rules) + + * src/win_api: config.h: switch on warning messages on Windows + + * tools/affixcompress: extended help for -h (use LC_ALL=C sort + for input word list) + + * man/hunspell.4: updated manual: + - new and modified features (SIMPLIFIEDTRIPLE, ICONV, OCONV, + BREAK, CHECKCOMPOUNDPATTERN). + - note about costs of zero affixes, suggested by Olivier Ronez. + + * hunspell/hunspell.cxx: remove deprecated word breaking codes. + +2008-08-15 Németh László <nemeth at OOo>: + * affentry.cxx: add FULLSTRIP option. With FULLSTRIP, affix rules can + strip full words, not only one less characters. Suggested by + Davide Prina and other developers in OOo Issue 80145. + * tests/fullstrip.*: Test data based on Davide Prina's example. + * tools/unmunch.cxx: modified for FULLSTRIP. + + * affixmgr.cxx: COMPOUNDRULE now works with long and numerical flag + types by parenthesized flags. Syntax: (flag)*, (flag)(flag)?(flag)*. + * tests/compoundrule[78].*: tests with parenthesized COMPOUNDRULE + definitions. + + * suggestmgr.cxx: modified badchar*(), forgotchar*() and extrachar*() + 1-character distance suggestion algorithms: search a TRY character + in all position instead of all TRY characters in a character position + (it can give more readable suggestion order, also better suggestions + in the first positions, when TRY characters are sorted by frequency.) + For example, suggestions for "moze": + ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6), + maze, more, mote, ooze, mole etc. (Hunspell 1.2.7). + + * suggestmgr.cxx: extended compound word checking for better COMPOUNDRULE + related suggestions, for example English ordinal numbers: 121323th -> + 121323rd (it needs also a th->rd REP definition). + + * phonet.cxx: cast unsigned char parameter of isdigit() and fix + isalpha by myisalpha() (potential problems in Windows environment). + Reported by Thomas Lange in OOo Issue 92736. + + * hunspell/csutil.*,hunspell/{affentry,affixmgr,hunspell,suggestmgr}.cxx: + fix potential buffer overloading under morphological analysis by the + new mystrcat() function. Reported by Molnár Andor (dolhpy at true + dot hu) in SF.net Bug 2026203. + + * affixmgr.cxx: add recursion limit to defcpd(). Fix OOo Issue 76067: + crash-like deceleration by checking hexadecimal numbers with long FFF + sequence (combinatory explosion by the en_US words "f" and "ff"). + Missing fix reported by Mathias Bauer. + + * affixmgr.cxx: fix the difference in the Unicode and non-Unicode + parts of cpdcase_check(). Bug report by Brett Wilson. + + * filemgr.*, affixmgr.cxx, csutil.*, hashmgr.*: warning messages now + contain line numbers (use --with-warnings configure option for + warning messages). + + * hunspell.cxx: analyze(): fix case conversion of stemming and + morphological analysis of UTF-8 encoded input. Reported by Ferenc Godó. + + * tools/hunspell.cxx: fix LaTeX Unicode support in filter mode. + Reported by Jan Seeger in SF.net Bug 2039990. + + * affixmgr.hxx: 0.5 or in 64 bit environment, 1 MB (virtual) memory + saving using only the requested size for sFlag and pFlag arrays. + Bug report by Brett Wilson. + + * affixmgr.cxx,tools/hunspell.cxx: get_version() returns with full + VERSION affix parameter instead of its first word. Fixes for + Hunspell's header. Some problems with Hunspell header reported in + SF.net Bug 2043080. + +2008-07-15 Németh László <nemeth at OOo>: + * affentry.cxx: fixes of the affix rule matching algorithm (affected + only the sk_SK dictionary from all OpenOffice.org dictionaries): + - fix dot pattern + accented letters matching (in non Unicode encoding) + - word-length conditions work again + * tests/condition.*: extended test for the fix. + + * hashmgr.cxx: load multiword expressions: spaces may be parts + of the dictionary words again (but spaces also work as morphological + field separators: word word2 -> "word word2", word po:noun -> "word"). + * man/hunspell.4: updated manual + + * tools/hunspell.cxx: add iconv character conversion support to + stemming and morphological analysis + + * tools/hunspell.cxx: add /usr/share/myspell/dicts search path for + Ubuntu support + +2008-07-09 Németh László <nemeth at OOo>: + * affentry.cxx: fixes of the affix rule matching algorithm: + - right ASCII character handling in bracket expression; + - fault-tolerant nextchar() for bad rules. + Problem with the en_GB dictionary and nextchar() with a detailed + code analysis reported by John Winters in SF.net Bug ID 2012753. + * tests/condition.*: extended test for the fix. + + * hunspell/hunspell.*, parsers/*, tools/hunspell.cxx: fix compiler + warnings (deprecated const-free char consts) + + * win_api/hunspelldll.*: add hunspell_free_list(), the problem + reported by Laurier Mercer. + +2008-06-30 Török László <torok_laszlo at users dot SF dot net>: + * tests/affixmgr.cxx: fix morphological analysis: strcat() on + an uninitialized char array in suffix_check_morph(). + +2008-06-18 Németh László <nemeth at OOo>: + * src/hunspell/affixmgr.cxx: fix GCC compiler warnings + (comparisons with string literal results in unspecified behaviour). + The problem reported by Ladislav Michnovič. + +2008-06-17 Németh László <nemeth at OOo>: + * src/hunspell/{hunspell.cxx,hunspell.h}: add free_list() to the C and + C++ interface to deallocate suggestion lists. The problem + reported by Laurie Mercer and Christophe Paris. + * csutil.cxx: fix freelist() to deallocate non-NULL list, when n = 0. + * tools/{analyze,example,chmorph,hunspell}.cxx: use free_list(). + + * tools/hunspell.cxx: fix only --with-readline compiling problem. + Reported by Volkov Peter in SF.net Bug 1995842. + + * man/hunspell.3,hunspell.hxx: fix analyze and generate examples in + the manual and comments (using char*** parameter instead of char**). + + * tools/example.cxx: fix suggestion example. + +2008-06-17 Németh László <nemeth at OOo>: + * affentry.cxx: fix the new affix rule matching algorithm of + Hunspell 1.2. Arabic dictionary problem reported by Khaled Hosny + in SF.net Bug ID 1975530. Mohamed Kebdani also sent a + prepared test data. + * tests/{1975530,condition*}: tests for the fix + +2008-06-13 Ingo H. de Boer <idb_winshell at SF.net>: + * src/hunspell/{affixmgr.cxx,hunspell.cxx}: add missing type + cast to strstr() calls for VC8 compatibility. + +2008-06-13 Németh László <nemeth at OOo>: + * suggestmgr.cxx: add also part1-part2 suggestion with dash + for bad part1part2 word forms, suggested by Ruud Baars. + For example, now suggestion of "parttime": "part time" + and "part-time". + NOTE: this feature will work only when the TRY definition + contains "-" or the letter "a". + + * hunspell.cxx: new XML API in spell() and suggest() (see hunspell(3)). + + * src/hunspell/*: fixes for OpenOffice.org build environment. + + * man/{hunspell.3,hzip.1,hunzip.1}: add new manual pages for + Hunspell programming API and dictionary compression and + encryption utilities. + + * src/hunspell/*: handle failed mystrdup() calls and other potential + insufficient memory problems. The problem reported by Elio Voci + in OpenOffice.org Issue 90604 and others. + + * src/tools/affixmgr.cxx: restore original behaviour of get_wordchars + without conditional code. Problem reported by Ingo H. de Boer + in SF.net Bug 1763105. + + * win_api/hunspelldll.h: put_word() renamed to add() in the (old) + Windows DLL API bug reported in SF.net Bug 1943236. Also reported + by Bartkó Zoltán. + + * tools/hunspell.cxx: fix chench() for environments without + native language support (ENABLE_NLS 0 in config.h), + PHP system_exec() bug reported by Michel Weimerskirch in + SF.net Bug 1951087. + + * hunspell.cxx, affixmgr.cxx: remove "result" from the + (result && *result) conditions, when "result" is a static variable. + The problem and a possible solution reported by Ladislav Michnovič. + + * affixmgr.cxx: parse_affix(): print line instead of NULL in + the warning message, when affix class header is bad. + The problem reported by Ladislav Michnovič. + +2008-06-01 Christian Lohmaier <cloph at OOo> + * configure.ac: patch to fix --with-readline, --with-ui logic. + Reported in the SF.net Bug 981395. + +2008-05-04: Volkov Peter <volkov_peter at users sourceforge net> + * configure.ac: fix LibTool 2.22 incompatibility by removing + unused LT_* macros. Report and patch in SF.net Bug 1957383. + The problem reported and fixed by Ladislav Michnovič, too. + +2008-04-23: Ladislav Michnovič <lmichnovic at suse cz> + * hunspell.pc.in: fix wrongly set directories. + +2008-04-12 Németh László <nemeth at OOo>: + * src/tools/hunspell.cxx: + - Multilingual spell checking and special dictionary support with -d. + Multilingual spell checking suggested by Khaled Hosny (SF.net + Bug 1834280). Example for the new syntax: + + -d en_US,en_geo,en_med,de_DE,de_med + + en_US and de_DE are base dictionaries, and en_geo, en_med, de_med + are special dictionaries (dictionaries without affix file). + Special dictionaries are optional extension of the base dictionaries. + There is no explicit naming convention for special dictionaries, + only the ".dic" extension: dictionaries without affix file will + be an extension of the preceding base dictionary. First dictionary + in -d parameter must have an affix file (it must be a base + dictionary). + + - new options for debugging, morphological analysis and stemming: + -m: morphological analysis or flag debug mode (without affix + rule data it signs the flag of the affix rules) + -s: stemming mode + -D: show also available dictionaries and search path + (suggested by Aaron Digulla in SF.net Bug 1902133) + + - add missing refresh() to print bad words before the slower suggestion + search in UI (better user experience) + + - fix tabulator problems (reported by ugli-kid-joe AT sf DOT net) + + - fix different encoding of dic and input, and suggestions + + - add per mille sign to LANG hu_HU section. + + - rewrite program messages. Concatenating multiple printfs for + easier translation suggested by András Tímár and Gábor Kelemen. + + * src/hunspell/csutil.cxx: set static encds variable. Patch by + Rene Engerhald. SF.net Bug 1896207 and 1939988. + + * src/hunspell/w_char.hxx,csutil.hxx: reorganizing + w_char typedef and HENTRY_DATA, HENTRY_FIND consts + + * src/hunspell/hunzip.cxx: fopen(): using rb options instead of r (fix + for Windows) + + * src/tools/affixmgr.cxx: restore original behaviour of get_wordchars + in an #ifdef WINSHELL section. Problem reported by Ingo H. de Boer + in SF.net Bug 1763105. + + * src/tools/chmorph.cxx: remove the experimental modifications + + * src/tools/hzip.c: fopen(): using wb options instead of w (fix + for Windows) + + * src/tools/hunzip.cxx: add missing MOZILLA_CLIENT. Reported + by Ryan VanderMeulen. + + * man/*, man/hu/*: updated manual + + * man/hunspell.4: fix formatting problem (missing header) + + * tools/makealias: now works with the extra data fields. + + * phonet.cxx: use HASHSIZE const + + * tests/rep.aff: fix REP count + + * src/win_api/Makefile.cygwin, README: native Windows compilation + in Cygwin environment without cygwin1.dll dependency (see README + for compiling instructions). + +2008-04-08 Roland Smith <rsmith AT xs4all DOT nl>: + * src/parsers/latexparser.cxx: fix PATTERN_LEN for AMD64 and + other platforms with different struct padding (SF.net Bug 1937995). + +2008-04-03 Kelemen Gábor <kelemeng AT gnome DOT hu>: + * po/POTFILES.in: fix path of the source file + + * po/Makevars: add --from-code=UTF-8 gettext option + + * hunspell.cxx: add comments for shortkey translation + +2008-02-04 Flemming Frandsen <flfr AT stibo DOT com> + * src/hunspell.h: fix Windows DLL support + - this patch also reported by Zoltán Bartkó. + +2008-01-30 Mark McClain <marc_mcclain AT users DOT sf DOT net> + * src/hunspell.cxx: stem(): fix function call side effect + for PPC platform (SF.net Bug 1882105). + +2008-01-30 Németh László <nemeth at OOo>: + * hunspell.cxx, csutil.cxx, hunspelldll.c: fix + SF.et Bug 1851246, patch also by Ingo H. de Boer. + + * hunspell.h: fix SF.net Bug 1856572 (C prototype problem), + patch by Mark de Does. + + * hunspell.pc.in: fix SF.net Bug 1857450 wrong prefix, reported + by Mark de Does. + + * hunspell.pc.in: reset numbering scheme: libhunspell-1.2. + Fix SF.net Bug 1857512 reported by Mark de Does, + also by Rene Engelhard. + + * csutil.cxx: patches for ARM platform, signed_chars.dpatch + by Rene Engelhard and arm_structure_alignment.dpatch by + Steinar H. Gunderson <sesse@debian.org> + + * hunzip.*, hzip.c: new hzip compression format + + * tools/affixcompressor: affix compressor utility (similar to + munch, but it generates affix table automatically), works + with million-words dictionaries of agglutinative languages. + + * README: fix problems reported by Pham Ngoc Khanh. + + * csutil.cxx, suggestmgr: Warning-free in OOo builds. + + * hashmgr.*, csutil.*: fix protected memory problems with + stored pointers on several not x86 platforms by + store_pointer(), get_stored_pointer(). + + * src/tools/hunspell.cxx: fix iconv support on Solaris platform. + + * tests/IJ.good: add missing test file + + * csutil.cxx: fix const char* related errors. Compiling bug + with Visual C++ reported by Ryan VanderMeulen and Ingo H. de Boer. + +2008-01-03 Caolan McNamara <cmc at OO.o>: + * csutil.cxx: SF.net Bug 1863239, notrailingcomma patch and + optimization of get_currect_cs(). + +2007-11-01 Németh László <nemeth at OOo>: + * hunspell/*: new feature: morphological generation, + also fix experimental morphological analysis and stemming. + - new API functions and improved API: + - analyze(word): (instead of morph()) morphological analysis + - stem(word): stemming + - stem(list): stemming based on the result of an analysis + - generate(word, word2): morphological generation + - generate(word, list): morphological generation + - add(word): add word to the run-time dictionary (renamed put_word()) + - add_with_affix(word, word2): (renamed put_word_pattern()): + add word to the run-time dictionary with affix flags of the + second parameter: all affixed forms of the user words will be + recognised by the spell checker. Especially useful for + agglutinative languages. + - remove(word): remove word from the run-time dictionary (not + implemented) + - see manual and hunspell/hunspell.hxx header and tests/morph.* + * tests/morph.*: test data, example for morphological analysis, + stemming and generation + + * tools/analyze, tools/chmorph: extended and new demo applications: + - analyze (originally hunmorph): analyses and stems input words, + generates word forms from input word pairs. + - chmorph: morphological transformation filter + + * configure.ac, hunspell/makefile.am: set library version number. + Bug reported by Rene Engelhard. + + * affentry.cxx, affixmgr.cxx: new pattern matching algorithm in + condition checking of affix rules instead of the Dömölki-algorithm: + - Unlimited condition length (instead of max. 8 characters). + - Less memory consumption, especially useful for affix rich languages: + 5,4 MB memory savings with hu_HU dictionary. + - Speed change depends from dictionaries and CPU caches: English spell + checking is 4% faster on Linux words with en_US dictionary, Hungarian + spell checking is 25% slower on most frequent words of Hungarian + Webcorpus. + + * tests/sug.*, sugutf.*: updated test data (use "a" and "lot" + dictionary items instead of "a lot".) + + * src/hunspell/hunspell.cxx: free(csconv) instead of delete csconv. + Report and patch by Sylvain Paschein in Mozilla Issue 398268. + + * suggestmgr.cxx, tools/hunspell.cxx: bad spelling of "misspelled". + Ubuntu Bug #134792, patch by Malcolm Parsons. + + * tests/base_utf.*: use Unicode apostrophe instead of 8-bit one. + + * hunspell.cxx, hashmgr.cxx: add(): use HashMgr::add() + +2007-10-25 Pavel Janík <pjanik at OOo>: + * hunspell/csutil.cxx: Fix type cast warnings on 64bit Linux in + printing of character positions in u8_u16(). OOo issue 82984. + +2007-09-05 Németh László <nemeth at OOo>: + * win_api/Hunspell.vproj, parsers/testparser.cxx,textparser.hxx: + warning fixes and removing unnecessary Windows project file. + Reported by Ingo H. de Boer. + + * hashmgr.*, {affixmgr,suggestmgr}.cxx: optimized data structure + for variable-count fields (only "ph" transliteration field in + this version, see next item). Also less memory consumption: + -13% (0.75 MB) with en_US dictionary, -6% (1 MB) with hu_HU. + + * suggestmgr.cxx: dictionary based phonetic suggestion for special + or foreign pronounciation (see also rule-based PHONE in manual). + Usage: tab separated field in dictionary lines, started with "ph:". + The field contains a phonetic transliteration of the word: + +Marseille ph:maarsayl + * tests/phone.*: test data for dictionary and rule based phonetic + suggestion. + + * hunspell.cxx: fix potential bad memory access in allcap word + capitalization in suggest() (bug of previous version). + + * hunspell.cxx, atypes.hxx: set correct limit for UTF-8 encoded + input words (256 byte). + + * suggestmgr.cxx: improved REP suggestions with spaces: it works + without dictionary modification. + OOo issue 80147, reported by Davide Prina. + * tests/rep.*: new test data: higher priority for "alot" -> "a lot", + and Italian suggestion "un'alunno" -> "un alunno". + + * affixmgr.cxx: fix Unicode ngram suggestions in expand_rootword(). + (Suggestions with bad affixes.) + Bug reported by Vitaly Piryatinksy <piv dot v dot vitaly at gmail>. + * tests/ngram_utf_fix.*: test based on Vitaly Piryatinksy's data. + + * suggestmgr.cxx: fix twowords() for last UTF-8 multibyte character. + (conditional jump or move depended on uninitialised value). + +2007-08-29 Ingo H. de Boer <idb_winshell at SF.net>: + * win_api/{hunspell,libhunspell, testparser}.vcproj: new project + files for the library and the executables. + + * Hunspell.rc, Hunspell.sln, config.h: updated versions. + Version number problem also reported by András Tímár. + +2007-08-27 Németh László <nemeth at OOo>: + * suggestmgr.hxx: put fixed version. Bug report by Ingo H. de Boer. + + * suggestmgr.cxx: remove variable-length local character array + reported by Ingo H. de Boer. + +2007-08-27 Németh László <nemeth at OOo>: + * suggestmgr.hxx: change bad time_t to clock_t in header, too. + Bug reports or patches by Ingo H. de Boer under SF.net + Bug ID 1781951, János Mohácsi and Gábor Zahemszky, András Tímár, + OMax3 at SF.net under SF.net Bug ID 1781592. + + * phonet.*: change variable-length local character array to + portable fixed size character array. Problem reported by + Ingo H. de Boer under SF.net Bug ID 1781951 and + Ryan VanderMeulen. + + * suggestmgr.cxx: remove debug message (also by + Ingo H. de Boer). + +2007-08-26 Ingo H. de Boer <idb_winshell at SF.net>: + * win_api/Hunspell.vcproj: updated version (with phonet.*) + +2007-08-23 Németh László <nemeth at OOo>: + * phonet.{c,h}xx, suggestmgr.cxx: PHONE parameter: + pronounciation based suggestion using Björn Jacke's original Aspell + phonetic transcription algorithm (http://aspell.net), relicensed + under GPL/LGPL/MPL tri-license with the permission of the author. + Usage: see manual. + + * affixmgr,suggestmgr.cxx: add KEY parameter for keyboard and + input method error related suggestions. + Example: KEY qwertyuiop|asdfghjkl|zxcvbnm + + * man/hunspell.4: description about PHONE and KEY suggestion parameters. + + * suggestmgr.cxx: enhancements for better suggestions: + - Set ngram suggestions for badchar-type errors + and only two word and compound word suggestions, too. + - Separate not compound and compound word + suggestions for MAP suggestion, too. + - Double swap suggestions for short words. + For example: ahev -> have, hwihc -> which. + - Better time limits using clock() instead of time() + (tenths of a second resolution instead of second ones). + - leftcommonsubstring() weigth function. + + * htype.hxx, hashmgr.cxx: blen (byte length) and clen (character + length) fields instead of wlen + + * affixmgr.cxx: fix get_syllable() for bad Unicode inputs. + + * tests/suggestiontest/*: test environment for suggestions + +2007-08-07 Martijn Wargers: + * csutil.cxx: fix Mingw build error associated with ToUpper() call. + Report and patch in Mozilla Issue 391447. + +2007-08-07 Robert Longson: + * atypes.cxx: use empty inline function HUNSPELL_WARNING instead of + variadic macros to switch of Hunspell warnings. + Reported by Gavin Sharp in Mozilla Issue 391147. + +2007-08-05 Ginn Chen: + * hashmgr.cxx: Hunspell failed to compile on OpenSolaris (use stdio + instead of csdio). Report and patch in Mozilla Issue 391040. + +2007-07-25 Németh László <nemeth at OOo>: + * parsers/*.cxx: Hunspell executable recognises and accepts URLs, + e-mail addresses, directory paths, reported by Jeppe Bundsgaard. + * src/tools/hunspell.cxx: --check-url: new option of Hunspell program. + Use --check-url, if you want check URLs, e-mail addresses and paths. + + * parsers/textparser.cxx: strip colon at end of words for Finnish + and Swedish (colon may be in words in Finnish and Swedish). + Problem reported by Lars Aronsson. + * tests/colons_in_words.*: test data + + * tests/digits_in_words.*: example for using digits in words + (eg. 1-jährig, 112-jährig etc. in German), reported by Lars Aronsson. + + * hashmgr.cxx: Hunspell accepts allcaps forms of mixed case + words of personal dictionaries (+allcaps custom dictionary words with + allcaps affixes). + Sf.net Bug ID 1755272, reported by Ellis Miller. + + * hashmgr.cxx: fix small memory leaks with alias compressed + dictionaries (free flag vectors of affixed personal dictionary words + and flag vectors of hidden capitalized forms of mixed case and + allcaps words). + + * affixmgr.cxx: fix COMPOUNDRULE checking with affixed compounds. + Sf.net Bug ID 1706659, reported by Björn Jacke. Also fixing for + OOo Issue 76067 (crash-like deceleration for hexadecimal numbers + with long FFFFFF sequence using en_US dictionary). + + * tools/hunspell.cxx: add missing return to save_privdic(). + + * man/hunspell.4: add information about affixation of personal words: + "Personal dictionaries are simple word lists, but with optional + word patterns for affixation, separated by a slash: + + foo + Foo/Simpson + + In this example, "foo" and "Foo" are personal words, plus Foo + will be recognised with affixes of Simpson (Foo's etc.)." + +2007-07-18 Németh László <nemeth at OOo>: + * src/win_api/: add missing resource files, reported by Ingo H. de Boer. + +2007-07-16 Németh László <nemeth at OOo>: + * hunspell.cxx: fix dot removing from UTF-8 encoded words in cleanword2() + (Capitalised words with dots, as "Something." were not recognised + using Unicode encoded dictionaries.) + * tests/{base.*,base_utf.*}: extended and new test files for + dot removing and Unicode support. + + * tools/hunspell.cxx: fix Cygwin, OS X compatibility using platform + specifics iconv() header by ICONV_CONST macro of Autoconf. + Sf.net Bug ID 1746030, reported by Mike Tian-Jian Jiang. + Sf.net Bug ID 1753939, reported by Jean-Christophe Helary. + + * tools/hunspell.cxx: fix missing global path setting with -d option. + + * tests/test.sh: fix broken Valgrind checking (missing warnings + with VALGRIND=memcheck make check). + + * csutil.cxx: fix condition in u8_u16() to avoid invalid read + of not null-terminated character arrays (detected by Valgrind + in Hunspell executable: associated with 8-bit character table + conversion in tools/hunspell.cxx). + + * csutil.cxx: free_utf_tbl(): use utf_tbl_count-- instead of utf_tbl--. + Memory leak in Hunspell executable detected by Valgrind. + + * hashmgr.cxx: add missing free_utf_tbl(), memory leak in Hunspell + executable detected by Valgrind. + + * hashmgr.cxx: load_tables(): fix memory error in spec. capitalization. + Use sizeof(unsigned short) instead of bad sizeof(unsigned short*). + Invalid memory read detected by Valgrind. + + * hashmgr.cxx: add_word(): fix memory error in spec. capitalization. + Update also affix array length of capitalized homonyms. Invalid + memory read detected by Valgrind. + + * hunspell.cxx: suggest(): fix invalid memory write and leak. + Bad realloc() and missing free() detected by Valgrind associated + with suggestions for "something.The" type spelling errors. + + * {dictmgr,csutil,hashmgr,suggestmgr}.cxx: check memory allocation. + Sf.net Bug ID 1747507, based on the patch by Jose da Silva. + +2007-07-13 Ingo H. de Boer <idb_winshell at SF.net>: + * atypes.cxx: fix Visual C compatibility: Using + "HUNSPELL_WARNING(a,b,...} {}" macro instead of empty "X(a,b...)". + + * hunspell.cxx: changes for Windows API. + * win_api/Hunspell.*: new resource files + * win_api/hunspelldll.*: set optional Hunspell and Borland spec. codes + Sf.net Bug ID 1753802, patch by Ingo H. de Boer. + See also Sf.net Bug ID 1751406, patch by Mike Tian-Jian Jiang. + +2007-07-09 Caolan McNamara <cmc at OO.o>: + * {hunspell,hashmgr,affentry}.cxx: fix warnings of Coverity program + analyzer. Sf.net Bug ID, 1750219. + +2007-07-06 Németh László <nemeth at OOo>: + * atypes.cxx: warning-free swallowing of conditional warning messages + and their parameters using empty HUNSPELL_WARNING(a,b...) macro. + * {affixmgr,atypes,csutil}.cxx: fix unused variable warnings + using WARNVAR macro for conditionally named variables. + * hashmgr.cxx: fix unused variable warning in add_word() by cond. name + * hunspell.cxx: fix shadowed declaration of captype var. in suggest() + +2006-06-29 Caolan McNamara <cmc at OO.o>: + * hunspell.cxx: patch to fix possible memory leak in analyze() of + experimental morphological analyzer code. Sf.net Bug ID 1745263. + +2007-06-29 Németh László <nemeth at OOo>: +improvements: + * src/hunspell/hunspell.cxx: check bad capitalisation of Dutch letter IJ. + - Sf.net Feature Request ID 1640985, reported by Frank Fesevur. + - Solution: FORBIDDENWORD for capitalised word forms (need + an improved Dutch dictionary with forbidden words: Ijs/*, etc.). + * tests/IJ.*: test data and example. + + * hashmgr.cxx, hunspell.cxx: check capitalization of special word forms + - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG + Sf.net Bug ID 1398550, reported by Dmitri Gabinski. + - allcap words and suffixes: UNICEF's - UNICEF'S + - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA + For Catalan, French and Italian languages. + Reported by Davide Prina in OOo Issue 68568. + * tests/allcaps*: tests for OPENOFFICE.ORG, UNICEF'S capitalization. + * tests/i68568*: tests for SANT'ELIA capitalization. + + * hunspell/hunspell.cxx: suggestion for missing sentence spacing: + something.The -> something. The + + * tools/hunspell.cxx: multiple character encoding support + - -i option: custom input encoding + Sf.net Bug ID 1610866, reported by Thobias Schlemmer. + Sf.net Bug ID 1633413, reported by Dan Kenigsberg. + See also hunspell-1.1.5-encoding.patch of Fedora from Caolan Mc'Namara. + * tests/*.test: add input encodings + + * tools/hunspell.cxx: use locale data for default dictionary names. + Sf.net Bug ID 1731630, report and patch from Bernhard Rosenkraenzer, + See also hunspell-1.1.4-defaultdictfromlang.patch of Fedora Linux + from Caolan McNamara. + + * tools/hunspell.cxx: fix 8-bit tokenization (letters without + casing, like ß or Hebrew characters now are handled well) + + * tools/hunspell.cxx: dictionary search path + - DICPATH environmental variable + - -D option: show directory path of loaded dictionary + - automatic detection of OpenOffice.org directories + +fixes: + * affixmgr.cxx: fault-tolerant patch for REP and other affix + table data problems. Problem with Hunspell and en_GB dictionary + reported by Thomas Lange in OOo Issue 76098 and + Stephan Bergmann in OOo Issue 76100. + Sf.net Bug ID 1698240, reported by Ingo H. de Boer. + + * csutil.cxx: fix mkallcap_utf() for allcaps suggestion in UTF-8. + + * suggestmgr.cxx: fix bad movechar_utf() (missing strlen()). + + * hunspell.cxx: fix bad degree sign detection in Unicode + hu_HU environment. + + * hunspell/hunspell.cxx: free allocated memory of csconv in + ported Mozilla code. + - Mozilla Bugzilla Bug 383564, report and Mozilla MySpell patch + by Andrew Geul. Reported by Ryan VanderMeulen for Hunspell. + + * suggestmgr.cxx: fix minor difference in Unicode suggestion + (ngram suggestion of allcaps words in Unicode). + + * hashmgr.cxx: close file handle after errors. + Sf.net Bug ID 1736286, reported by John Nisly. + + * configure.ac: syntax error (shell variable with spaces). + Sf.net Bug ID 1731625, reported by Bernhard Rosenkraenzer. + + * hunspell.cxx: check_word(): fix bad usage of info pointer. + + * hashmgr.cxx: fix de_DE related bug (accept words with leading dash). + Sf.net Bug ID 1696134, reported by Björn Jacke. + + * suggestmgr.cxx, tests/1695964.*: fix NEEDAFFIX homonym suggestion. + Sf.net Bug ID 1695964, reported by Björn Jacke. + + * tests/1463589*: capitalized ngram suggestion test data for + Sf.net Bug ID 1463589, reported by Frederik Fouvry. + + * csutil.cxx, affixmgr.cxx: fix possible heap error with + multiple instances of utf_tbl. + Sf.net Bug ID 1693875, reported by Ingo H. de Boer. + + * affixmgr.cxx, suggestmgr.cxx, license.hunspell: convert to ASCII. + Locale dependent compiling problems. Sf.net Bug ID 1694379, reported + by Mike Tian-Jian Jiang. OOo Issue 78018 reported by Thomas Lange. + + * tests/test.sh: compatibility issues + - fix Valgrind support (check shared library instead of shell wrapper) + - remove deprecated "tail +2" syntax + - set 8-bit locale for testing (LC_ALL=C) + + * hunspell.hxx: remove license.* and config.h dependencies. + - hunspell-1.1.5-badheader.patch from Caolan McNamara <cmc at OO.o> + +2007-03-21 Németh László <nemeth at OOo>: + * tools/Makefile.am, munch.h, unmunch.h: add missing munch.h and unmunch.h + Reported by Björn Jacke and Khaled Hosny (sf.net Bug ID 1684144) + * hunspell/hunspell.cxx, hunspell.hxx: fix --with-ui compliling error (add get_csconv()) + Reported by Khaled Hosny (sf.net Bug ID 1685010) + +2007-03-19 Németh László <nemeth at OOo>: + * csutil.cxx, hunspell/hunspell.cxx: Unicode non BMP area (>65K character range) support + (except conditional patterns and strip characters of affix rules) + * tests/utf8_nonbmp*: test data + + * src/hunspell/*: add Mozilla patches from David Einstein + - run-time generated 8-bit character tables + - other Mozilla related changes (see Mozilla Bugzilla Bug 319778) + + * csutil.cxx, affixmgr.cxx, hashmgr.cxx: optimized version of IGNORE feature + - IGNORE works with affixes (except strip characters and affix conditions) + * tests/ignore*: test data with latin characters + * tests/ignoreutf*: Unicode test data with Arabic diacritics (Harakat) + + * src/hunspell/suggestmgr.cxx: new edit distance suggestion methods + - capitalization: nasa -> NASA + - long swap: permenant -> permanent + - long mov.: Ghandi -> Gandhi + - double two characters: vacacation -> vacation + * tests/sug.*: test data + + * src/hunspell/affixmgr.cxx: space in REP strings (alot -> a lot) + Note: Underline character signs the space in REP strings: REP alot a_lot, and + put the expression with space ("a lot") into the dic file (see tests/sug). + + * hashmgr.cxx, affixmgr.cxx: ignore Unicode byte order mark (BOM sequence) + * tests/utf8_bom*: test data + + * hunspell/*.cxx: OOo Issue 68903 - Make lingucomponent warning-free on wntmsci10 + - fix Hunspell related warning messages on Windows platform (except some assignment + within conditional expressions). Reported and started by Stephan Bergmann. + + * hunspell/affixmgr.cxx: fix OOo Issue 66683 - hunspell dmake debug=x fails + - Reported by Stephan Bergmann. + + * src/hunspell/hunspell.[ch]xx: thread safe API for Hunspell executable + (removing prev*() functions, new spell(word, info, root) function) + + * configure.ac, src/hunspell/*: HUNSPELL_EXPERIMENTAL code + --with-experimental configure option (conditional compiling of morphological analyser + and stemmer tools) + + * configure.ac, src/hunspell/*: conditional Hunspell warning messages + --with-warnings configure option + + * affixmgr.cxx: new, optimized parsing functions + + * affixmgr.cxx: fix homonym handling for German dictionary project, + reported by Björn Jacke (sf.net Bug ID 1592880). + * tests/1592880.*: test data by Björn Jacke + + * src/hunspell/affixmgr.cxx: fix CIRCUMFIX suggestion + Bug reported by Erdal Ronahi. + + * hunspell.cxx: reverse root word output (complex prefixes) + Bug reported by Munzir Taha. + + * tools/hunspell.cxx: fix Emacs compatibility, patch by marot at sf.net + - no % command in PIPE mode (SourceForge BugTracker 1595607) + - fix HUNSPELL_VERSION string + + * suggestmgr.[hc]xx: rename check() functions to checkword() (OOo Issue 68296) + adopt MySpell patch by Bryan Petty (tierra at ooo) for Hunspell source + + * csutil.cxx, munch.c, unmunch.c: adopt relevant parts of the MinGW patch + (OOo Issue 42504) by tonal at ooo + + * affigmgr.cxx: remove double candidate_check() call, reported by Bram Moolenaar + + * tests/test.sh: add LC_ALL="C" environment. Locale dependency of make check + reported by Gentoo project. + + * src/tools/hunspell.cxx: UTF-8 highlighting fix for console UI + (not solved: breaking long UTF-8 lines) + + * src/tools/unmunch.c: fix bad generation if strip is shorter than condition, + reported by Davide Prina + * src/tools/unmunch.h: increase 5000 -> 500000 + + * src/tools/hunspell.cxx: fix memory error in suggestion (uninitialized parameter), + Bug also reported by Björn Jacke in SourceForge Bug 1469957 + + * csutil.cxx, affixmgr.cxx: fix Caolan McNamara's patch for non OOo environment + +2006-11-11 Caolan McNamara <cmc at OO.o>: + * csutil.cxx, affixmgr.cxx: UTF-8 table patch (OOo Issue 71449) + Description: memory optimization (OOo doesn't use the large UTF-8 table). + + * Makefile.am: shared library patch (Sourceforge ID 1610756) + + * hunspell.h, hunspell.cxx: C API patch (Sourceforge ID 1616353) + + * hunspell.pc: pkgconfig patch (Sourceforge ID 1639128) + +2006-10-17 Ryan Jones <at Mozilla Bugzilla>: + * affixmgr.cxx: missing fclose(affixlst) calls + Reported by <gavins at ooo> in OOo Issue 70408 + +2007-07-11 Taha Zerrouki <taha at gawab>: + * affixmgr.cxx, hunspell.cxx, hashmgr.cxx, csutil.cxx: IGNORE feature to remove + optional Arabic and other characters from input and dictionary words. + * src/hunspell/langnum.hxx: add Arabic language number, lang_ar=96 + * tests/ignore.*: test data + +2006-05-28 Miha Vrhovnik <mvrhov at users.sourceforge>: + * src/win_api/*: C API for Windows DLLs + - also Delphi text editor example (see on Hunspell Sourceforge page) + +2006-05-18 Kevin F. Quinn <kevquinn at gentoo>: + * utf_info.cxx: struct -> static struct + Shared library patch also developed by Gentoo developers (Hanno Meyer-Thurow, + Diego Pettenò, Kevin F. Quinn) + +2006-02-02 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/hunspell.cxx: suggest(): replace "fooBar" -> "foo bar" suggestions + with "fooBar" ->"foo Bar" (missing spaces are typical OCR bugs). + Bug reported by stowrob at OOo in Issue 58202. + * src/hunspell/suggestmgr.cxx: twowords(): permit 1-character words. + (restore MySpell's original behavior). Here: "aNew" -> "a New". + * tests/i58202.*: test data + + * src/parsers/textparser.cxx: fix Unicode tokenization in is_wordchar() + (extra word characters (WORDCHARS) didn't work on big-endian platforms). + + * src/hunspell/{csutil,affixmgr}.cxx: inline isSubset(), isRevSubset(): + little speed optimalization for languages with rich morphology. + + * src/tools/hunspell.cxx: fix bad --with-ui and --with-readline compiling + when (N)curses is missing. Reported by Daniel Naber. + +2006-01-19 Tor Lillqvist <tml@novell.com> + * src/hunspell/csutil.cxx: mystrsep(): fix locale-dependent isspace() tokenization + +2006-01-06 András Tímár <timar@fsf.hu> + * src/hunspell/{hashmgr.hxx,hunspell.cxx}: fix Visual C++ compiling errors + +2006-01-05 Németh László <nemethl@gyorsposta.hu>: + * COPYING: set GPL/LGPL/MPL tri-license for Mozilla integration. + Rationale: Mozilla source code contains an old MySpell version + with GPL/LGPL/MPL tri-license. (MPL license is a copyleft license, similar + to the LGPL, but it acts on file level.) + * COPYING.LGPL: GNU Lesser General Public License 2.1 (LGPL) + * COPYING.MPL: Mozilla Public License 1.1 (MPL) + * license.hunspell, src/hunspell/license.hunspell: GPL/LGPL/MPL tri-license + + * src/hunspell/{affixmgr,hashmgr}.*: AF, AM alias definitions in affix file: + compression of flag sets and morphological descriptions (see manual, + and tests/alias* test files). + Rationale: Alias compression is also good for loading time and memory + efficiency, not only smaller resources. + * src/tools/makealias: alias compression utility + (usage: ./makealias file.dic file.aff) + * tests/alias{,2,3}: AF, AM tests + * man/hunspell.4: add AF, AM documentation + * src/hunspell/affentry.cxx, atypes.hxx: add new opts bits (aeALIASM, aeALIASF) + + * tools/hunspell, src/parser/*, src/hunspell/*: Hunspell program + tokenizes Unicode texts (only with UTF-8 encoded dictionaries). + Missing Unicode tokenization reported by Björn Jacke, Egmont Koblinger, + Jess Body and others. + Note: Curses interactive interface hasn't worked perfectly yet. + * tests/*.tests: remove -1 parameters of Hunspell + * tests/*.{good,wrong}: remove tabulators + + * src/hunspell/{hunspell,affixmgr}.cxx: BREAK option: break words at + specified break points and checking word parts separately (see manual). + Note: COMPOUNDRULE is better (or will be better) for handling dashes and + other compound joining characters or character strings. Use BREAK, if you + want check words with dashes or other joining characters and there is no time + or possibility to describe precise compound rules with COMPOUNDRULE. + * tests/break.*: BREAK example. + + * src/hunspell/{affixmgr,hunspell}.cxx: add CHECKSHARPS declaration instead + of LANG de_DE definitions to handle German sharp s in both spelling and + suggestion. + * src/hunspell/hunspell.cxx: With CHECKSHARPS, uppercase words are valid + with both lower sharp s (it's is optional for names in German legal texts) + and SS (MÜßIG, MÜSSIG). Missing lower sharp s form reported by Björn Jacke. + * src/hunspell/hunspell.cxx: KEEPCASE flag on a sharp s word has a special + meaning with CHECKSHARPS declaration: KEEPCASE permits capitalisation and SS upper + casing of a sharp s word (Müßig and MÜSSIG), but forbids the upper cased form + with lower sharp s character(s): *MÜßIG. + * tests/germancompounding*: add CHECKSHARPS, remove LANG + * tests/checksharps*: add CHECKSHARPS and KEEPCASE, remove LANG + + * src/hunspell/hunspell.cxx: improved suggestions: + - suggestions for pressed Caps Lock problems: macARONI -> macaroni + - suggestions for long shift problems: MAcaroni -> Macaroni, macaroni + - suggestions for KEEPCASE words: KG -> kg + * src/hunspell/csutil.cxx: fix mystrrep() function: + - suggestions for lower sharp s in uppercased words: MÜßIG -> MÜSSIG + * tests/checksharps{,utf}.sug: add tests for mystrrep() fix + + * src/hunspell/hashmgr.cxx: Now dictionary words can contain slashes + with the "\/" syntax. Problem reported by Frederik Fouvry. + + * src/hunspell/hunspell.cxx: fix bad duplicate filter in suggest(). + (Suggesting some capitalised compound words caused program crash + with Hungarian dictionary, OOo Issue 59055). + + * src/hunspell/affixmgr.cxx: fix bad defcpd_check() call in compound_check(). + (Overlapping new COMPOUNDRULE and old compounding methods caused program + crash at suggestion.) + + * src/hunspell/affixmgr.{cxx,hxx}: check affix flag duplication at affix classes. + Suggested by Daniel Naber. + + * src/hunspell/affentry.cxx: remove unused variable declarations (OOo i58338). + Compiler warnings reported by András Tímár and Martin Hollmichel. + + * src/hunspell/hunspell.cxx: morph(): not analyse bad mixed uppercased forms + (fix Arabic morphological analysis with Buckwalter's Arabic transliteration) + + * src/hunspell/affentry.{cxx,hxx}, atypes.hxx: little memory optimization + in affentry: + - using unsigned char fields instead of short (stripl, appndl, numconds) + - rename xpflg field to opts + - removing utf8 field, use aeUTF8 bit of opts field + + * configure.ac: set tests/maputf.test to XFAILED on ARM platform. + Fail reported by Rene Engelhard. + + * configure.ac: link Ncursesw library, if exists. + + * BUGS: add BUGS file + + * tests/complexprefixes2.*: test for morphological analysis with COMPLEXPREFIXES + + * src/hunspell/affixmgr.cxx: use "COMPOUNDRULE" instead of + "COMPOUND". The new name suggested by Bram Moolenaar. + * tests/compoundrule*: modified and renamed compound.* test files + + * man/hunspell.4: AF, AM, BREAK, CHECKSHARPS, COMPOUNDRULE, KEEPCASE. + - also new addition to the documentation: + Header of the dictionary file define approximate dictionary size: + ``A dictionary file (*.dic) contains a list of words, one per line. + The first line of the dictionaries (except personal dictionaries) + contains the _approximate_ word count (for optimal hash memory size).'' + Asked by Frederik Foudry. + + One-character replacements in REP definitions: ``It's very useful to + define replacements for the most typical one-character mistakes, too: + with REP you can add higher priority to a subset of the TRY suggestions + (suggestion list begins with the REP suggestions).'' + +2005-11-11 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/affixmgr.*: fix Unicode MAP errors (sorted only n-1 + characters instead of n ones in UTF-16 MAP character lists). + Bug reported by Rene Engelhard. + + * src/hunspell/affixmgr.*: fix infinite COMPOUND matching (default char + type is unsigned on PowerPC, s390 and ARM platforms and it will never + be negative). Bug reported by Rene Engelhard. + + * src/hunspell/{affixmgr,suggestmgr}.cxx: fix bad ONLYINCOMPOUND + word suggestions. + * tests/onlyincompound.sug: empty test file to check this fix. + Bug reported by Björn Jacke. + + * src/hunspell/affixmgr.cxx: fix backtracking in COMPOUND pattern matching. + * tests/compound6.*: test files to check this fix. + + * csutil.cxx: set bigger range types in flag_qsort() and flag_bsearch(). + + * affixmgr.hxx: set better type for cont_classes[] Boolean data (short -> char) + + * configure.ac, tests/automake.am: set platform specific XFAIL test + (flagutf8.test on ARM platform) + +2005-11-09 Németh László <nemethl@gyorsposta.hu>: +improvements: + * src/hunspell/affixmgr.*: new and improved affix file parameters: + + - COMPOUND definitions: compound patterns with regexp-like matching. + See manual and test files: tests/compound*.* + Suggested by Bram Moolenaar. + Also useful for simple word-level lexical scanning, for example + analysing numbers or words with numbers (OOo Issue #53643): + http://qa.openoffice.org/issues/show_bug.cgi?id=53643 + Examples: tests/compound{4,5}.*. + + - NOSUGGEST flag: words signed with NOSUGGEST flag are not suggested. + Proposed flag for vulgar and obscene words (OOo Issue #55498). + Example: tests/nosuggest.*. + Problem reported by bobharvey at OOo: + http://qa.openoffice.org/issues/show_bug.cgi?id=55498 + + - KEEPCASE flag: Forbid capitalized and uppercased forms of words + signed with KEEPCASE flags. Useful for special ortographies + (measurements and currency often keep their case in uppercased + texts) and other writing systems (eg. keeping lower case of IPA + characters). + + - CHECKCOMPOUNDCASE: Forbid upper case characters at word bound in compounds. + Examples: tests/checkcompoundcase* and tests/germancompounding.* + + - FLAG UTF-8: New flag type: Unicode character encoded with UTF-8. + Example: tests/flagutf8.*. + Rationale: Unicode character type can be more readable + (in a Unicode text editor) than `long' or `num' flag type. + +bug fixes: + * src/hunspell/hunspell.cxx: accept numbers and numbers with separators (i53643) + Bug reported by skelet at OOo: + http://qa.openoffice.org/issues/show_bug.cgi?id=53643 + + * src/hunspell/csutil.cxx: fix casing data in ISO 8859-13 character table. + + * src/hunspell/csutil.cxx: add ISO-8859-15 character encoding (i54980) + Rationale: ISO-8859-15 is the default encoding of the French OpenOffice.org + dictionary. ISO-8859-15 is a modified version of ISO-8859-1 + (latin-1) character encoding with French œ ligatures and euro + symbol. Problem reported by cbrunet at OOo in OOo Issue 54980: + http://qa.openoffice.org/issues/show_bug.cgi?id=54980 + + * src/hunspell/affixmgr.cxx: fix zero-byte malloc after a bad affix header. + Patch by Harri Pitkänen. + + * src/hunspell/suggestmgr.cxx: fix bad NEEDAFFIX word suggestion + in ngram suggestions. Reported by Daniel Naber and Friedel Wolff. + + * src/hunspell/hashmgr.cxx: fix bad white space checking in affix files. + src/hunspell/{csutil,affixmgr}.cxx: add other white space separators. + Problems with tabulators reported by Frederik Fouvry. + + * src/hunspell/*: replace system-dependent <license.*> #include + parameters with quoted ones. Problem reported by Dafydd Jones. + + * src/hunspell/hunspell.cxx: fix missing morphological analysis of dot(s) + Reported by Trón Viktor. + +changes: + * src/hunspell/affixmgr.cxx: rename PSEUDOROOT to NEEDAFFIX. + Suggested by Bram Moolenaar. + + * src/hunspell/suggestmgr.hxx: Increase default maximum of + ngram suggestions (3->5). Suggested by Kevin Hendricks. + + * src/hunspell/htypes.hxx: Increase MAXDELEN for long affix flags. + + * src/hunspell/suggestmgr.cxx: modify (perhaps fix) Unicode map suggestion. + tests/maputf test fail on ARM platform reported by Rene Engelhard. + + * src/hunspell/{affentry.cxx,atypes.hxx}: remove [PREFIX] and + MISSING_DESCRIPTION messages from morphological analysis. + Problems reported by Trón Viktor. + + * tests/germancompounding.{aff,good}: Add "Computer-Arbeit" test word. + Suggested by Daniel Naber. + + * doc/man/hunspell.4: Proof-reading patch by Goldman Eleonóra. + + * doc/man/hunspell.4: Fix bad affix example (replace `move' with `work'). + Bug reported by Frederik Fouvry. + + * tests/*: new test files: + affixes.*: simple affix compression example from Hunspell 4 manual page + checkcompoundcase.*, checkcompoundcase2.*, checkcompoundcaseutf.* + compound.*, compound2.*, compound3.*, compound4.*, compound5.* + compoundflag.* (former compound.*) + flagutf8.*: test for FLAG UTF-8 + germancompounding.*: simplification with CHECKCOMPOUNDCASE. + germancompoundingold.* (former germancompounding.*) + i53643.*: check numbers with separators + i54980.*: ISO8859-15 test + keepcase.*: test for KEEPCASE + needaffix*.* (former pseudoroot*.* tests) + nosuggest.*: test for NOSUGGEST + +2005-09-19 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/suggestmgr.cxx: improved ngram suggestion: + - detect not neighboring swap characters (pernament -> permanent) + Rationale: ngram method has a significant error with not neighboring + swap characters, especially when swap is in the middle of the word. + - suggest uppercase forms (unesco -> UNESCO, siggraph's -> SIGGRAPH's) + - suggest only ngram swap character and uppercase form, if they exist. + Rationale: swap character and casing equivalence give mutch better + suggestions as any other (weighted) ngram suggestions. + - add uppercase suggestion (PERMENANT -> PERMANENT) + + * src/hunspell/*: complete comparison with MySpell 3.2 (in OOo beta 2): + - affixmgr.cxx: add missing numrep initialization + - hashmgr.cxx: add_word(): don't allocate temporary records + - hunspell.cxx: in suggest(): + - check capitalized words first (better sug. order for proper names), + - check pSMgr->suggest() return value + - set pSMgr->suggest() call to not optional in HUHCAP + - csutil.cxx: fix bad KOI8-U -> koi8r_tbl reference in enc_entry encds + - csutil.cxx: fix casing data in ISO 8859-2, Windows 1251 and KOI8-U + encoding tables. Bug reported by Dmitri Gabinski. + + * src/hunspell/affixmgr.*: improved compound word and other features + - generalize hu_HU specific compound word features with new affix file + parameters, suggested by Bram Moolenaar: + - CHECKCOMPOUNDDUP: forbid word duplication in compounds (eg. foo|foo) + - CHECKCOMPOUNDTRIPLE: forbid triple letters in compounds (eg. foo|obar) + - CHECKCOMPOUNDPATTERN: forbid patterns at word bounds in compounds + - CHECKCOMPOUNDREP: using REP replacement table, forbid presumably bad + compounds (useful for languages with unlimited number of compounds) + - ONLYINCOMPOUND flag works also with words (see tests/onlyincompound.*) + Suggested by Daniel Naber, Björn Jacke, Trón Viktor & Bram Moolenaar. + - PSEUDOROOT works also with prefixes and prefix + suffix combinations + (see tests/pseudoroot5.*). Suggested by Trón Viktor. + - man/hunspell.4: updated man page + + * src/hunspell/affixmgr.*: fix incomplete prefix handling with twofold + suffixes (delete unnecessary contclasses[] conditions in + prefix_check_twosfx() and prefix_check_twosfx_morph()). + Bug reported by Trón Viktor. + + * src/hunspell/affixmgr.*: complete also *_morph() functions with + conditions of new Hunspell features (circumfix, pseudoroot etc.). + + * src/hunspell/suggestmgr.cxx: + - fix missing suggestions for words with crossed prefix and suffix + - fix redundant non compound word checking + - fix losing suggestions problem. Bug reported by Dmitri Gabinski. + + * src/hunspell/dictmgr.*: + - add new dictionary manager for Hunspell UNO modul + Problems with eo_ANY Esperanto locale reported by Dmitri Gabinski. + + * src/hunspell/*: use precise constant sizes for 8-bit and 16-bit character + arrays with MAXWORDUTF8LEN and MAXSWUTF8L macros. + + * src/hunspell/affixmgr.cxx: fix bad MAXNGRAMSUGS parameter handling + + * src/hunspell/affixmgr.cxx, src/tools/{un}munch.*: fix GCC 4.0 warnings + on fgets(), reported by Dvornik László + + * po/hu.po: improved translation by Dvornik László + + * tests/test.sh: improved test environment + - add suggestion testing (see tests/*.sug) + - add memory debugging environment, based on the excellent Valgrind debugger. + Usage on Linux and experimental platforms of Valgrind: + VALGRIND=memcheck make check + - rename test_hunmorph to test.sh + + * tests/*: new tests: + - base.*: base example based on MySpell's checkme.lst. + - map{,utf}.*, rep{,utf}: MAP and REP suggestion examples + - tests on new CHECKCOMPOUND, ONLYINCOMPOUND and PSEUDOROOT features + - i54633.*: capitalized suggestion test for Issue 54633 from OOo's Issuezilla + - i35725.*: improved ngram suggestion test for Issue 35725 + +2005-08-26 Németh László <nemethl@gyorsposta.hu>: +improvements: + + * src/hunspell/suggestmgr.cxx: + Unicode support in related character map suggestion + + * src/hunspell/suggestmgr.cxx: Unicode support in ngram suggestion + + * src/hunspell/{suggestmgr,affixmgr,hunspell}.cxx: improve ngram suggestion. + Fix http://qa.openoffice.org/issues/show_bug.cgi?id=35725. See release + notes for examples. This problem reported by beccablain at OOo. + - ngram suggestions now are case insensitive (see `Permenant' bug in Issuezilla) + - weight ngram suggestions (with the longest common subsequent algorithm, + also considering lengths of bad word and suggestion, identical first + letters and almost completely identical character positions) + - set strict affix congruency in expand_rootword(). Now ngram suggestions + are good for languages with rich morphology and also better for English. + Rationale: affixed forms of the first ngram suggestion + very often suppress the second and subsequent root word suggestions. But + faults in affixes are more uncommon, and can be fix without suggestions. + We must prefer the more informative second and subsequent root word + suggestions instead of the suggestions for bad affixes. + - a better suggestion may not be substring of a less good suggestion + Rationale: Suggesting affixed forms of a root word is + unnecessary, when root word has got better weighted ngram value. + (Checking substrings is a good approximation for this refinement.) + - lesser ngram suggestions (default 3 maximum instead of 10) + Rationale: For users need a big extra effort to check a lot of bad ngram + suggestions, nine times out of ten unnecessarily. It is very + distracting, because ngram suggestions could be very different. + Usually Myspell and Hunspell suggest one or two suggestions with + the old suggestion algorithms (maximum is 15), with ngram algorithm + often gives maximum number suggestions. With strict affix congruency + and other refinements, the good suggestion there is usually among the + first three elements. + - new affix parameter: MAXNGRAMSUG + + * src/hunspell/*: support agglutinative languages with rich prefix + morphology or with right-to-left writing system (for example, Turkic + and Austronesian languages with (modified) Arabic scripts). + - new affix parameter: COMPLEXPREFIXES + Set twofold prefix stripping (but single suffix stripping) + * src/hunspell/affixmgr.cxx: + - speed up prefix loading with tree sorting algorithm. + * tests/complexprefixes.*, tests/complexprefixesutf.*: + Coptic example posted by Moheb Mekhaiel + + * src/hunspell/hashmgr.cxx: check size attribute in dic file + suggested by Daniel Naber + Rationale: With missing size attribute Hunspell allocates too small and + more slower hash memory, and Hunspell can lose first dictionary word. + + * src/hunspell/affixmgr.cxx: check stripping characters and condition + compatibility in affix rules (bugs detected in cs_CZ, es_ES, es_NEW, + es_MX, lt_LT, nn_NO, pt_PT, ro_RO and sk_SK dictionaries). See release + notes of Hunspell 1.0.9 in NEWS. + + * src/hunspell/affixmgr.cxx: check unnecessary fields in affix rules + (bugs detected in ro_RO and sv_SE dictionaries). See release notes. + + * src/hunspell/affixmgr.cxx: remove redundant condition checking + in affix rules with stripping characters (redundancy in OpenOffice.org + dictionaries reported by Eleonóra Goldman) + Rationale: this is a little optimization, but it was excellent for + detect the bad ngram affixation with bad or weak affix conditions. + + * tests/germancompounding.aff: improve compound definition + - use dash prefix instead of language specific tokenizer + Rationale: Using uniform approach is the right way to check and analyze + compound words. Language specific word breaking is deprecated, need + a sophisticated grammar checking for word-like word pairs + (for example in Hungarian there is a substandard, but accepted + syntax with dash for word pairs: cats, dogs -> kutyák-macskák (like + cats/dogs in English). + + * test Hunspell with 54 OpenOffice.org dictionaries: see release notes + +bug fixes: + + * src/hunspell/suggestmgr.*: add time limit to exponential + algorithm of the related character map suggestion + Rationale: a long word in agglutinative languages or a special pattern + (for example a horizontal rule) made of map characters can `crash' the + spell checker. + + * src/hunspell/affentry.cxx: add() functions: fix bad word generation + checking stripping characters (see similar bug in unmunch) + + * src/hunspell/affixmgr.cxx: parse_file(): fix unconditional getNext() + call for ~AffixMgr() when affix file is corrupt. + + * src/hunspell/affixmgr.*: AffixMgr(), parse_cpdsyllable(): fix missing + string duplications for ~AffixMgr() when affix file is corrupt. + + * src/hunspell/affixmgr.*: parse_affix(): fix fprintf() call when affix + file is corrupt. Bug reported by Daniel Naber. + + * suggestmgr.cxx: replace single usage of 'strdup' with 'mystrdup' + patch by Chris Halls (debian.org) + + * src/hunspell/makefile.mk: add makefile.mk for compiling in OpenOffice.org + See README in Hunspell UNO modul. + Problems with separated compiling reported by Rene Engelhard + + * src/hunspell/hunspell.cxx: fix pseudoroot support + - search a not pseudoroot homonym in check() + * tests/pseudoroot4.*: test this fix + + * src/tools/unmunch.c: fix bad word generation when conditions + are shorter or incompatible with stripping characters in affix rules + + * src/tools/unmunch.c: fix mychomp() for de_AT.dic and other dic files + without last new line character. + +other changes: + * src/hunspell/suggestmgr.*: erase ACCENT suggestion + Rationale: ACCENT suggestion was the same as Kevin Hendrick's map + suggestion algorithm, but with a less good interface in affix file. + + * src/hunspell/suggestmgr.*: combine cycle number limit + in badchar(), and forgotchar() with a time limit. + + * src/hunspell/affixmgr.*: remove NOMAPSUGS affix parameter + + * src/hunspell/{suggestmgr,hunspell}.*: strip periods from + suggestions (restore MySpell's original behaviour) + Rationale: OpenOffice.org has an automatic period handling mechanism + and suggestions look better without periods. + - new affix file parameter: SUGSWITHDOTS + Add period(s) to suggestions, if input word terminates in period(s). + (No need for OpenOffice.org dictionaries.) + + * tests/germancompounding.aff: improve bad german affix in affix example + (computeren->computern). Suggested by Daniel Naber. + + * src/tools/example.cxx: add Myspell's example + + * src/tools/munch.cxx: add Myspell's munch + + * man{,/hu}/hunspell.4: refresh manual pages + +2005-08-01 Németh László <nemethl@gyorsposta.hu>: + * add missing MySpell files and features: + - add MySpell license.readme, README and CONTRIBUTORS ({license,README,AUTHORS}.myspell) + - add MySpell unmunch program (src/tools/unmunch.c) + - add licenses to source (src/hunspell/license.{myspell,hunspell}) + - port MAP suggestion (with imperfect UTF-8 support) + - add NOSPLITSUGS affix parameter + - add NOMAPSUGS affix parameter + + * src/man/man.4: MAP, COMPOUNDPERMITFLAG, NOSPLITSUGS, NOMAPSUGS + + * src/hunspell/aff{entry,ixmgr}.cxx: + - improve compound word support + - new affix parameter: COMPOUNDPERMITFLAG (see manual) + * src/tests/compoundaffix{,2}.*: examples for COMPOUNDPERMITFLAG + * src/tests/germancompounding.*: new solution for German compounding + Problems with German compounding reported by Daniel Naber + + * src/hunspell/hunspell.cxx: fix German uppercase word spelling + with the spellsharps() recursive algorithm. + Default recursive depth is 5 (MAXSHARPS). + * src/tests/germansharps*: extended German sharp s tests + + * src/tools/hunspell.cxx: fix fatal memory bug in non-interactive + subshells without HOME environmental variable + Bug detected with PHP by András Izsók. + +2005-07-22 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/csutil.hxx: utf16_u8() + - fix 3-byte UTF-8 character conversion + +2005-07-21 Németh László <nemethl@gyorsposta.hu>: + * src/hunspell/csutil.hxx: hunspell_version() for OOo UNO modul + +2005-07-19 Németh László <nemethl@gyorsposta.hu>: + * renaming: + - src/morphbase -> src/hunspell + - src/hunspell, src/hunmorph -> src/tools + - src/huntokens -> src/parsers + + * src/tools/hunstem.cxx: add stemmer example + +2005-07-18 Németh László <nemethl@gyorsposta.hu>: + * configure.ac: --with-ui, --with-readline configure options + * src/hunspell/hunspell.cxx: fix conditional compiling + + * src/hunspell/hunspell.cxx: set HunSPELL.bak temporaly file + in the same dictionary with the checked file. + + * src/morphbase/morphbase.cxx: + + - handling German sharp s (ß) + + - fix (temporaly) analyize() + + * tests: a lot of new tests + + * po/, intl/, m4/: add gettext from GNU hello + + * po/hu.po: add Hungarian translation + + * doc/, man/: rename doc to man + +2005-07-04 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase/hashmgr.cxx: set FLAG attributum instead of FLAG_NUM and FLAG_LONG + + * doc/hunspell.4: manual in English + +2005-06-30 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase/csutil.cxx: add character tables from csutil.cxx of OOo 1.1.4 + + * src/morphbase/affentry.cxx: fix Unicode condition checking + + * tests/{,utf}compound.*: tests compounding + +2005-06-27 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase/*: fix Unicode compound handling + +2005-06-23 Halácsy Péter: + * src/hunmorph/hunmorph.cxx: delete spelling error message and suggest_auto() call + +2005-06-21 Németh László <nemethl@gyorsposta.hu>: + * src/morphbase: Unicode support + * tests/utf8.*: SET UTF-8 test + + * src/morphbase: checking and fixing with Valgrind + Memory handling error reported by Ferenc Szidarovszky + +2005-05-26 Németh László <nemethl@gyorsposta.hu>: + * suggestmgr.cxx: fix stemming + * AUTHORS, COPYING, ChangeLog: set CC-LGPL free software license + +2004-05-25 Varga Dániel <daniel@all.hu> + * src/stemtool: new subproject + +2005-05-25 Halácsy Péter <peter@halacsy.com> + * AUTHORS, COPYING: set CC Attribution license + +2004-05-23 Varga Dániel <daniel@all.hu> + * src: - modifications for compiling with Visual C++ + + * src/hunmorph/csutil.cxx: correcting header of flag_qsort(), + * src/hunmorph/*: correct csutil include + +2005-05-19 Németh László <nemethl@gyorsposta.hu> + * csutil.cxx: fix loop condition in lineuniq() + bug reported by Viktor Nagy (nagyv nyelvtud hu). + + * morphbase.cxx: handle PSEUDOROOT with zero affixes + bug reported by Viktor Nagy (nagyv nyelvtud hu). + * tests/zeroaffix.*: add zeroaffix tests + +2005-04-09 Németh László <nemethl@gyorsposta.hu> + * config.h.in: reset with autoheader + + * src/hunspell/hunspell.cxx: set version + +2005-04-06 Németh László <nemethl@gyorsposta.hu> + * tests: tests + + * src/morphbase: + New optional parameters in affix file: + - PSEUDOROOT: for forbidding root with not forbidden suffixed forms. + - COMPOUNDWORDMAX: max. words in compounds (default is no limit) + - COMPOUNDROOT: signs compounds in dictionary for handling special compound rules + - remove COMPOUNDWORD, ONLYROOT + +2005-03-21 Németh László <nemethl@gyorsposta.hu> + * src/morphbase/*: + - 2-byte flags, FLAG_NUM, FLAG_LONG + - CIRCUMFIX: signed suffixes and prefixes can only occur together + - ONLYINCOMPOUND for fogemorpheme (Swedish, Danish) or Flute-elements (German) + - COMPOUNDBEGIN: allow signed roots, and roots with signed suffix in begin of compounds + - COMPOUNDMIDDLE: like before, but middle of compounds + - COMPOUNDEND: like before, but end of compounds + - remove COMPOUNDFIRST, COMPOUNDLAST diff --git a/libs/hunspell/docs/ChangeLog.O b/libs/hunspell/docs/ChangeLog.O new file mode 100644 index 0000000000..a2c712d73b --- /dev/null +++ b/libs/hunspell/docs/ChangeLog.O @@ -0,0 +1,524 @@ +Myspell has a lot of parallel development, that is not documented here. + +2005-01-11: Nmeth Lszl <nemethl@gyorsposta.hu> + * hunspell.cxx: + - interaktv javtsnl hinyz j sor karakterek ptlsa. + A hibt Gefferth Andrs s Khiraly jelezte. + * csutil.cxx: + - pontosvesszk trlse a GCC 3.4-es fordt ignyeinek megfelelen + A hibt Dvornik Lszl jelezte. + - i vltoz ismtelt deklarsnak trlse, ami helyenknt hibs + fordtst eredmnyez. + A hibt Ldoktor s Bencsth Boldizsr jelezte. + * OLVASS.EL: + - Windows alatti fordtsnl Langid.cxx mdostand. A hibt + Ldoktor jelezte. + +2004-12-15 Nmeth Lszl <nemethl@gyorsposta.hu> + * src/morphbase/*: + - handling K&R morphological encoding (remove plus signs from output) + - LEMMA_PRESENT: put only morphological description to output + - LANG parameter, langnum variable in source for writing language-dependent codes + - remove HU_KOTOHANGZO + - etc. + * doc/hunspell.4: + - adding some + +2004-09-29 Halcsy Pter <peter@halacsy.com> + + * doc/ : bemsoltam a hunspell.1 hunspell.4 man oldalakat + * doc/hunspell.1: Kivettem a -s -m kapcsolkrl szl rszt + +2004-09-28 Halcsy Pter <peter@halacsy.com> + + * src/hunspell/hunspell.cxx (indexing_interface): Ezt kiszedtem a + HunSpell-bol, mert nem ide valo. Ez egy kulon program lehet. + (main): a hunstem zemmdot is kidobtam, ez se ide val + (main): meg a hunmorph zemmdot is + + * src/morphbase/morphbase.cxx (MorphBase): tneveztem a MySpell + osztlyt MorphBase-re + (stems): tnevezten a suggest_stems metdust stem -re (mint to stem) + +2004-08-25 Nmeth Lszl <nemethl@gyorsposta.hu> + * src/hunbase/suggestmgr.cxx: tvezs visszalltsa, nem + mkdik mg az igektk hozztoldsa a thz, tovbb a + kivtelek kezelse (ehhez a 0.99.4-es sztr szksges mg). + * src/hunbase/myspell.cxx: -s vissza a tvezshez + * src/hunbase/atypes.hxx: HUNSTEM makr definilsa itt az + affixmgr.cxx feltteles kdjhoz + +2004-08-12 Halacsy Peter + * src/misc/lexfilter.cxx : uj program, ami a szotar szureshez hasznalhato + lecserelheti a mostani hunmorph, hunspell -G -1 funkciokat + + * src/hunbase/myspell.cxx (analyzer) : Uj metodust vettem fel, ami mar + karaktertombben adja vissza az elemzes eredmenyet + +2004-08-03 Halcsy Pter <peter@halacsy.com> + + * src/hunspell/hunspell.cxx (HUNSPELL_VERSION): ttettem ide ennek definilst + +2004-07-31 Halcsy Pter <peter@halacsy.com> + + * src/hunbase/suggestmgr.cxx (fixstems): A fixstems mirt itt van + s mirt gy hvjk. Ez mehetne egy kln osztlyba. + +2004-07-31 Halcsy Pter <peter@halacsy.com> + + * src/huntoken/htmlparser.cxx: Egyebkent az include-ok kezelese + eleg zavaros. Peldaul itt minek a textparser.hxx includolasa? + + * src/huntoken/textparser.hxx (MAXLNLEN): thoztam ide a MAXLNLEN makrt + az atypes.hxx-bol, hogy a fuggoseget megszuntessem + + * src/hunbase/myspell.cxx (suggest): Kivettem azt a rszt, ami visszaadja a HUNSPELL_VERSION stringet + ha a VERSION_KEYWORD a bemeneti string. Csnya gnyolsnak tartottam + +2004-07-27 Halcsy Pter <peter@halacsy.com> + + * src/hunbase/myspell.cxx (morph_with_correction): + + * src/hunbase/baseaffix.hxx (class AffEntry): Allandora felvettem a morphcode mezot (last htypes.hxx) + + * src/hunbase/affentry.hxx: Kivettem a hunmorph felteteleket (last htypes.hxx) + + * src/hunbase/htypes.hxx (struct hentry): Kivettem a HUNMORPH feltetelt a char* description korul. Ertem, + hogy hatekonyabb egy folosleges mutato nelkul, ha nincs morf info, de szerintem felesleges + + * src/hunbase/myspell.hxx: HUNSPELL_VERSION es VERSION_KEYWORD makrokat kivettem. Valamiert a + hunspellnek kell majd + + * src/hunbase/config.hxx (FLAG): config.hxx torolve, helyet atveszi a kozponti config.h; FLAG + definicioja az atypes.hxx-be ment + + * src/hunbase/atypes.hxx (FLAG): config.hxx megszuntetese erdekeben attettem ide a FLAG makro + definialasat, ami az EXT_CLASS-tol fugg + + config.hxx include kicserelve a configure altal kezelt config.h-ra + +2004-06-29: Nmeth Lszl <nemethl@gyorsposta.hu> + * affixmgr.cxx: + - csak utols tagknt megengedett szavak (compound3) toldalk + nlkli elfordulsnak engedlyezse (pl. macskapr) + - tbbszrsen sszetett szavak toldalkolt alakjainak morfolgiai + elemzse + * myspell.cxx: + - rvidtsek, szmok, ktjeles sszetett szavak s a + -e hatrozszt tartalmaz szavak morfolgiai elemzse + * suggestmgr.cxx: suggest_morph_for_spelling_error() optimalizlsa + (csak a felhasznlt egy javaslatot keresi meg, tbbet nem). + * csutil.cxx: kimenetben szerepl res sorok trlse + +2004-06-10: Nmeth Lszl <nemethl@gyorsposta.hu> + * suggestmgr.cxx: sszetett szavak elemzsnek korltozsa + - a tvezs mg nincs megvalstva a 0.9.9-es vltozatban + (helyette a Hunspell 0.9.7 hasznland a Magyar Ispell 0.99.4-es + vltozatval) + +2004-05-19: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.9f-alpha + + - morf. lers sztringkezelse jav. + - EXT_CLASS: config.cxx-ben + - nagybets alakok is elemezve (a hibt Tron Viktor jelezte) + - szebb kimenet + - rule119 trlve + - firstparser.cxx javtva + +2004-02-13: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.8a: + - MAXUSERWORD helyett USERWORD, nincs korlt + - description \t-vel dic fjlba + - homonimk kezelse + - aff formtumbvts + - konfixumok + - _morpho fggvnyek + - ketts szuffixum + - hunmorph + - lsd tests/hunmorph + +2004-01-29: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.7-sztaki: + - memriakezelsi hibk javtsa + +2003-12-17: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.7-es vltozat: + * affixmgr.cxx: + - suffix_check() javts (tmpword kivltsa isRevSubSet() + fggvnnyel + - betlts optimalizlsa, build_pfxlist() helyett: + - build_pfxtree() + - process_sfx_tree_to_list(), process_sfx_inorder() + + * csutil.cxx: + - isSubSet() gyorsabb vltozata + - isRevSubSet() + + * langid.cxx, hunp.cxx: + - nyelvfelismer osztly s program (l. man hunp) + * man/hunp.1: + - nyelvfelismer program lersa + + * firstparser.cxx: + - csak a tabultorjelet tartalmaz sorokbl a tabultorjel + eltti rszt adja vissza (l. man Hunspell, -1 kapcsol) + + * hunspell.cxx: + - -u, -U, -u2 kapcsolk: tipikus hibk kijelzse; + automatikus, illetve lektorlt javtsa. L. man hunspell. + + - -w kapcsol teljes sor vizsglathoz + + * hunspell.cxx: + - spell(): javts (Valgrind hibajelzs alapjn) + + * hunspell.cxx: sprintf()-ek el strlen() felttelvizsglat + + * suggestmgr.cxx: + - 0.99.4-es Hunspell sztrral bekerlt tvezsi hiba + javtsa (nem produktv ragozs, sszetett szbam szerepl + fneveknl lpett fel ez a problma). + + * OLVASS.EL: + - bvts + +2003-11-03: Nmeth Lszl <nemethl@gyorsposta.hu> + * SuggestMgr::permute_accent(): + - illeglis memriaolvassi hiba javtsa. + * example.cxx:: + - dupla free() a "" karakterlnc tvezse utn + + A hibkat Sarls Tams <stamas@csillag.ilab.sztaki.hu> + fedezte fel a figyelemre mlt Valgrind nyomkvet + programmal (http://developer.kde.org/~sewardj/) + +2003-10-22: Bencsth Boldizsr <boldi@datacontact.hu> + * affixmgr.[ch]xx, csutil.[ch]xx: Az eredeti + MySpell foltjainak alkalmazsa az OpenOffice.org 1.1 + kompatibilits rdekben. Itt karakterkezel + segdfggvnyek lettek thelyezve elrhetbb helyre. + + * dictmgr.[ch]xx: Itt etype paramter hozzadsa. + + * makefile.mk: Itt angol sztrak megjegyzsbe ttele. + +2003-10-04: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.6.3-as vltozat: + * myspell.cxx: suggest() fggvnyben hibs + memriafoglals javtsa. A hiba a pontra vgzd + helytelen szavakra adott javaslattevs sorn + jelentkezett. A hibs mkdst Khiraly + <khiraly@gmx.net> jelezte. + +2003-09-15: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.6.2-es vltozat: + * latexparser.cxx: TeX elemz javtsa: + - elemzsi hiba ({{}}}) + - verb+ +, \verb stb. kezelse + +2003-09-01: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.6-os vltozat: + + * affentry.cxx: check2 trlse, lehetsges + tvek trolsa + * suggestmgr.cxx, myspell.cxx: suggest_pos_stems() + az ismeretlen szavak nvszragjainak s + jeleinek levlasztsra. + + * affixmgr.cxx, suggestmgr.cxx: suggest_stems() + szlkezelshez mdostott s javtott fggvny + + * myspell.cxx: szmok tvezse (teszt: 5-nek) + + * myspell.cxx: egy karakter + sz javaslatok + trlse (pldul cpak->cpa k) + + * affixmgr.cxx, myspell.cxx, hunspell.cxx: sztr + verziszmnak kirsa + + * hunspell.cxx: \r karaktert tartalmaz sorok + helyes megjelentse + + * myspell.cxx, hunspell.cxx: rvidts vgi pontok + hozzadsa fggvnyknyvtr szinten + + * hunspell.cxx: pipe_interface(): standard bemenet + tvezsnl hinyz memriafelszabadts ptlsa + + * Makefile: install javtsa, tbb felttelvizsglat + deinstall szakasz + +2003-07-22: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.5-s vltozat + * suggestmgr.cxx: marhalevl->lelevl tvezs javtsa + * myspell.cxx: nagy kezdbets rvidtsek vizsglata (Bp., Btk.) + - pontot tartalmaz szmok helyesnek val elfogadsa, ha: + - az els pontot legalbb egy, + - de legfeljebb hrom szmjegy elzi meg, + - a pontok nem egyms mellett helyezkednek el, + - az utols pont utn legfeljebb kt szmjegy van. + Ezzel elfogadjuk az idpontokat (12.00-kor), a pontozsokat + (1.1.2-ben), de kizrjuk a szkz nlkli hibs dtummegadsokat + (2003.7.22.), valamint a tizedesvessz helyett ponttal rt + tizedestrteket (3.456, 4563.34). + - Javts a tiltott szavakra adott ktjeles javaslatoknl: + Straussal->Strauss-szal, s nem ,,Strauss szal''. + * hunspell.cxx: csak a -a kapcsol megadsval lnek a + csfelleti parancsok. Ezrt most mr nincsenek figyelmen + kvl hagyva pldul a ktjellel kezdd sorok, ha a -l + kapcsolval hibs szavakat keresnk egy llomnyban. + * man/hunspell.1: a -a kapcsol lersnak kiegsztse. + +2003-06-13: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.4-es vltozat + * bin/*: makedb, lookdb segdprogramok az indexelshez + * man/*: hunstem, makedb, lookdb + * hunspell.cxx: pipe_interface: nyomkvet kirs trlse + - LOG #ifdef-be + +2003-06-11: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.3-es vltozat + * suggestmgr.cxx: nagybets javaslat tulajdonneveknl + * hunspell.cxx: pipe_interface: hiba javtsa + +2003-06-05: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.2-es vltozat + * hunspell.cxx: -s kapcsol + * suggestmgr.cxx: suggest_stems() + Sztvek ellltsa + * example.cxx: plda a sztvek ellltsra + +2003-05-13: Nmeth Lszl <nemethl@gyorsposta.hu> + * 0.9.1-es vltozat + * hunspell.cxx: + - rl_escape(), stb.: a readline sorban ki lett kapcsolva + a fjlnv-kiegszts, s helyette a kt Escape lenyoms + megszaktja a szvegbevitelt. A Csere mveletnl is a + readline() hvs tallhat most mr. + - egy hibs sprintf() sor javtva lett + * Makefile.unix: + - belltsok elklntve az llomny elejn + - Makefile most mr szimblikus kts + * ooomagyarispellteszt.txt: tesztllomny + +2003-04-28: Nmeth Lszl <nemethl@gyorsposta.hu> + * affixmgr.cxx: + - y vg szavak kezelse: bvebb lers a + Magyar Ispell Changelog llomnyban. + + * *parser.cxx: + ISO-8859-1 HTML karakterentitsok kzl a betrtkek + (csak az ISO-8859-2-ben nem szereplk) felismerse + s kezelse. + +2003-04-21: Goldman Elenonra <eleonora46@gmx.net> + * *.dll fggvnyknyvtr ellltsa Windows alatt: + - StdAfx.h + - libmyspell.def + - dlltest.cpp + +2003-04-16: Nmeth Lszl <nemethl@gyorsposta.hu> + * Hunspell.cxx, stb: a Mispell tnevezse Hunspell-l. + A nevet Kornai Andrs <andras@kornai.com> javasolta. + Knyvtrak: /usr/share/mispell -> /usr/share/myspell + (korbban is ez volt). + A /usr/share/hunmorph sztr a helye a specilis + morfolgiai informcikat tartalmaz Hunmorph (bvtett + Myspell sztrformtum) sztrllomnyoknak. + * Licenc: LGPL + * config.hxx: SZOSZABLYA_POSSIBLE_ROOTS + Ha a makrt bekapcsoljuk, akkor kirsra kerlnek + a lehetsges tvek is, az alkalmazott ragozsi szably + osztlynak betjelvel, illetve az alapszval egytt. + +2003-04-10: Nmeth Lszl <nemethl@gyorsposta.hu>: + * affixmgr.cxx: + - kthangzk helyes kezelse (hu_kotohangzo kapcsolval), + l. mg Magyar Ispell Changelog + +2003-03-24: Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: pipe_interface(): az adatfjl szrsnl fellp + memriaszivrgs megszntetse a kimaradt free(token) ptlsval + * affixmgr.cxx: prefix_check(): leg-, legesleg- confixum ellenrzs + - onlyroot kapcsol a csak tszt rint tiltshoz. L. Magyar Ispell + Az affixum llomnyban j kapcsolt adhatunk meg az + ONLYROOT paranccsal bevezetve. A kapcsol mdostja a tiltkapcsol + mkdst. L. man 4 mispell + * myspell.cxx: + - spell(): nagybets tulajdonnevek ellenrzse (pl. BALATON) + - onlyroot vizsglat forbiddenword mellett -> mangrove kezelse + +2003-03-17: Goldman Elenonra <eleonora46@gmx.net> + * Windows port + * makefile.Windows: + +2003-03-04: Nmeth Lszl <nemethl@gyorsposta.hu> + * firstparser.[ch]xx: adatfjlok szrshez (l. -1 kapcsol) + * mispell.cxx: -L, -1, -G kapcsolk + * man/mispell.1: -L, -1, -G kapcsolk + +2003-03-03: Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: -l, -p, WORDLIST + * man/mispell.1: -l, -p, WORDLIST + +2003-02-26: Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: dialog_screen(): + TILTOTT! (FORBIDDEN!) megjelentse a tiltott szsszettelek + esetn. + * suggestmgr.cxx: + - check(): -, - kpzs igeneveket rint kd trlse + - check_forbidden(): a 6 sztagnl hosszabb, tiltott sztvekre + vonatkoz javaslatok nem ktjellel, hanem szkzzel elvlasztva + tartalmazzk a szavakat, ehhez szksges a check_forbidden(). + * man/*: j kziknyv oldal az llomnyok formtumrl (mispell(4)), + a mispell(1) bvtse. + * Makefile, mispell.mo: Br rpd <biro_arpad@yahoo.com> javtsai + +2003-02-18: Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: interactive_interface() + - nem nyeli el a MAXLNLEN-t meghalad mret sorokban a MAXLNLEN + mret rszek hatrn lv karaktereket, s a nem jsor karakterre + vgzd llomnyok utols karaktert. (Hibt viszont mg mindig + jelez, ha a MAXLNLEN hatr feldarabol egy amgy helyes szt.) + A MAXLNLEN 8192 karakter jelenleg. + - readline fggvnyknyvtr hasznlata a bevitelnl + - tfelvtelnl egy lehetsges t ellltsa, s a beviteli + sorban val feltntetse. Az gy megjelen sz javthat. + - --help kapcsol + * Makefile: Javtsok az install szakaszban. + A hibkat Br rpd <biro_arpad@yahoo.com> jelezte. + +2003-02-07: Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: put_dots_to_suggestions() + - realloc() cserje malloc()-ra ismeretlen eredet lefagys miatt. + - lehetsges az Ispellhez hasonlan a kapcsolkat kzzel megadni a + sajt sztrban a szavak utn egy perjelet kveten: pldul a + valamicsnyasz/w + sor megadsa utn a valamicsnyasz s toldalkolt vltozatai hibsak + lesznek az ellenrzs alatt. (Tovbbi kapcsolk lersrt lsd a + Magyar Ispell forrsban az aff/aff.fej llomnyt.) + * affixmgr.cxx: compound_check() + - repl_chars() hvsa a megfelel helyre lett tve, ezzel a + javaslattevs sebessge ktszeresre ntt. + - A dinamikus memriakezels lecserelse veremmemrira nem jrt + lnyeges sebessgnvekedssel, de a kzeljvben ezzel elkerlhet + az a memriaszivrgs, ami pldul itt a tiltott szavak kezelsnl + volt az elz vltozatban (javtva). + * affentry.cxx, affixmgr.cxx: szt-elllt kd megalapozsa: + get_possible_root() metdus az utols toldalk-levlaszts + eredmnyvel tr vissza. + +2003-02-05: Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: put_dots_to_suggestions(): amennyiben + a felismert sz pontra, vagy pontokra vgzdik, a + javaslatokat is bvti ezzel. + - @, valamint 1-nl tbb pontot magba foglal (de nem arra vgzd) + szavak ellenrzsnek tiltsa (e-mail, fjlnevek, mg nem opcionlis). + - Hossz sorok helyes megjelentse. + - Tabultorjelet tartalmaz sorok helyes megjelentse. + - Mozaikszavak tfelvtelnl ktjeles alak automatikus rgztse + Pl.: BKV//URH mellett BKV-//URH- is bekerl a sajt sztrba + (a ragozott mozaikszavak felismerse teht automatikus lesz, kivve a + nem trivilis -val/-vel toldalkos alakok, amit kln kell felvenni.) + - PuT trlse (helyette MySpell::put_word(), put_word_suffix(), + put_word_pattern() eljrsok a sajt sztr bvtsre) + - dupla szavak ellenrzsnek trlse a MySpell kdbl (thelyezs majd a + Mispell felletbe), hogy a MySpell meghvhat maradjon prhuzamosan + fut szlakbl. + +2002-12-30: Nmeth Lszl <nemethl@gyorsposta.hu> + * *parser.cxx, *parser.hxx: elemz osztlyok a rgi s csnya kd helyett + +2002-12-10: Nmeth Lszl <nemethl@gyorsposta.hu> + * myspell.cxx: 35-os, 20%-kal kezelse + * man/mispell.1: kziknyv + +2002-12-04: Noll Jnos <johnzero@johnzero.hu> + * spdaemon/: kiszolgl fellet, ld. README.spdaemon + +2002-12-04: Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: Emacs kompatibilitshoz hibk javtsa (pl. tbbszrs -d) + * mispell.cxx: CURSES makrval kikapcsolhat az interaktv fellet + locale + (Windows, Macintosh) + +2002-11-30: Nmeth Lszl <nemethl@gyorsposta.hu> + * affixmgr.cxx: get_checkdoublewords() + +2002-11-25: Nmeth Lszl <nemethl@gyorsposta.hu> + * affixmgr.cxx: mozgszably (hu_mov_rule) + * myspell.cxx: mozgszably + * affixmgr.cxx: kitljnekmacskt (affix is sszetettben, ha prefix) + +2002-11-08 Nmeth Lszl <nemethl@gyorsposta.hu> + * myspell.cxx: balatonnak->Balatonnak, balatoninak + +2002-11-07 Nmeth Lszl <nemethl@gyorsposta.hu> + * myspell: 0.6-os vltozat + +2002-10-31 Nmeth Lszl <nemethl@gyorsposta.hu> + * Egyszerbb nv: Magyar MySpell 0.5 utn -> MIspell 0.6 + * mispell.cxx: tbbnyelv interaktv fellet (ncurses, locale) + * Makefile: make install + +2002-09-22 Nmeth Lszl <nemethl@gyorsposta.hu> + * affixmgr.cxx: compound_check() macskaugom->macskaugrom, stb. javtsa + * affixmgr.cxx: compound_check() szismtls (pl. macskamacska) tiltsa + * myspell.cxx: szismtlds tiltsa (pl. kutya kutya) msodik rossz + * suggestmgr.cxx: macskarat->macska rat mellett ->macskairat + +2002-07-29 Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell Windowsra, teszt Emacs-szel (vagy Emacs-csal) + * tiltott szavakat nem javasol, s sszetett szban sem fogad el + * fonev_morfo, fonev_morfo2 lsztvek elutastsa (hzakmacska) + * ktjeles szavak kezelse + * szmok kezelse, ktjeles alakjaikkal egytt, CHECKNUM kapcsol + +2002-07-17 Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: MySpell Ispell cs interfsz + +2002-07-04 Nmeth Lszl <nemethl@gyorsposta.hu> + * mispell.cxx: MySpell Ispell cs interfsz + * affxmgr.cxx: szszer kiszrse, + * j funkcik: + COMPOUNDFIRST: sz szerepelhet els tagknt a szszettelekben + COMPOUNDLAST: sz szerepelhet utols tagknt a szszettelekben + FORBIDDENWORD: tiltott szavak kapcsolja (ut, uta, stb.) + +2002-06-25 Nmeth Lszl <nemethl@gyorsposta.hu> + * myspell.cxx, suggestmgr.cxx: get_compound() char* helyett char + * affxmgr.cxx: check_repl() a helyesnek tn, de hibs sszetett + szavak kiszrsre (pl. tejles, szervz) + A szsszettel elfogadsa eltt mg megnzzk, hogy a sz + nem-e a cseretblzatban felvett hibt tartalmaz, + ha igen, akkor a sz hibsnak minsl, hiba helyes szsszettel. + * affxmgr.cxx, suggestmgr.xx: accent: kezetest. + Lers: README.accent + Tovbbi optimalizci: az kezet nlkli bet kezetes + vltozatai szmnak fggvnyben + +2002-06-05 Noll Jnos <johnzero@johnzero.hu> + * myspell.cxx, suggestmgr.cxx: mem. szivrgs javtsa + (a get_compound() felszabadts nlkl lett meghva). + A hiba a GNU mtrace segtsgvel lett detektlva. + +2002-06-03 Nmeth Lszl <nemethl@gyorsposta.hu> + * Licenc: GPL + * Lsd MYSPELL.HU + * compound_check: 6-3 szably, stb. + +MySpell: + +2002-xx-xx Kevin Hendricks <kevin.hendricks@sympatico.ca> + * REP: ismtlsek kiszrse a javaslatokbl + * COMPOUNDMIN + +2002-xx-xx Nmeth Lszl <nemethl@gyorsposta.hu> + * REP cseretblzat + * COMPOUND, szsszettelkpzs + +2002-xx-xx David Einstein <Deinst@world.std.com> + * optimalizlt MySpell algoritmus + +2001-xx-xx Kevin Hendricks <kevin.hendricks@sympatico.ca> + * Mkd ellenrz, Ispell toldalktmrtsi algoritmussal diff --git a/libs/hunspell/docs/HACKING b/libs/hunspell/docs/HACKING new file mode 100644 index 0000000000..e65da70e10 --- /dev/null +++ b/libs/hunspell/docs/HACKING @@ -0,0 +1,10 @@ +To bump a release + +1. edit... + a) ./configure.ac + b) ./msvc/Hunspell.rc + c) ./msvc/config.h +and convert release string X.Y.Z/X,Y,Z to the next version + +2. autoconf && ./configure && make +and the various strings will get updated by the build tooling diff --git a/libs/hunspell/docs/INSTALL b/libs/hunspell/docs/INSTALL new file mode 100644 index 0000000000..2099840756 --- /dev/null +++ b/libs/hunspell/docs/INSTALL @@ -0,0 +1,370 @@ +Installation Instructions +************************* + +Copyright (C) 1994-1996, 1999-2002, 2004-2013 Free Software Foundation, +Inc. + + Copying and distribution of this file, with or without modification, +are permitted in any medium without royalty provided the copyright +notice and this notice are preserved. This file is offered as-is, +without warranty of any kind. + +Basic Installation +================== + + Briefly, the shell command `./configure && make && make install' +should configure, build, and install this package. The following +more-detailed instructions are generic; see the `README' file for +instructions specific to this package. Some packages provide this +`INSTALL' file but do not implement all of the features documented +below. The lack of an optional feature in a given package is not +necessarily a bug. More recommendations for GNU packages can be found +in *note Makefile Conventions: (standards)Makefile Conventions. + + The `configure' shell script attempts to guess correct values for +various system-dependent variables used during compilation. It uses +those values to create a `Makefile' in each directory of the package. +It may also create one or more `.h' files containing system-dependent +definitions. Finally, it creates a shell script `config.status' that +you can run in the future to recreate the current configuration, and a +file `config.log' containing compiler output (useful mainly for +debugging `configure'). + + It can also use an optional file (typically called `config.cache' +and enabled with `--cache-file=config.cache' or simply `-C') that saves +the results of its tests to speed up reconfiguring. Caching is +disabled by default to prevent problems with accidental use of stale +cache files. + + If you need to do unusual things to compile the package, please try +to figure out how `configure' could check whether to do them, and mail +diffs or instructions to the address given in the `README' so they can +be considered for the next release. If you are using the cache, and at +some point `config.cache' contains results you don't want to keep, you +may remove or edit it. + + The file `configure.ac' (or `configure.in') is used to create +`configure' by a program called `autoconf'. You need `configure.ac' if +you want to change it or regenerate `configure' using a newer version +of `autoconf'. + + The simplest way to compile this package is: + + 1. `cd' to the directory containing the package's source code and type + `./configure' to configure the package for your system. + + Running `configure' might take a while. While running, it prints + some messages telling which features it is checking for. + + 2. Type `make' to compile the package. + + 3. Optionally, type `make check' to run any self-tests that come with + the package, generally using the just-built uninstalled binaries. + + 4. Type `make install' to install the programs and any data files and + documentation. When installing into a prefix owned by root, it is + recommended that the package be configured and built as a regular + user, and only the `make install' phase executed with root + privileges. + + 5. Optionally, type `make installcheck' to repeat any self-tests, but + this time using the binaries in their final installed location. + This target does not install anything. Running this target as a + regular user, particularly if the prior `make install' required + root privileges, verifies that the installation completed + correctly. + + 6. You can remove the program binaries and object files from the + source code directory by typing `make clean'. To also remove the + files that `configure' created (so you can compile the package for + a different kind of computer), type `make distclean'. There is + also a `make maintainer-clean' target, but that is intended mainly + for the package's developers. If you use it, you may have to get + all sorts of other programs in order to regenerate files that came + with the distribution. + + 7. Often, you can also type `make uninstall' to remove the installed + files again. In practice, not all packages have tested that + uninstallation works correctly, even though it is required by the + GNU Coding Standards. + + 8. Some packages, particularly those that use Automake, provide `make + distcheck', which can by used by developers to test that all other + targets like `make install' and `make uninstall' work correctly. + This target is generally not run by end users. + +Compilers and Options +===================== + + Some systems require unusual options for compilation or linking that +the `configure' script does not know about. Run `./configure --help' +for details on some of the pertinent environment variables. + + You can give `configure' initial values for configuration parameters +by setting variables in the command line or in the environment. Here +is an example: + + ./configure CC=c99 CFLAGS=-g LIBS=-lposix + + *Note Defining Variables::, for more details. + +Compiling For Multiple Architectures +==================================== + + You can compile the package for more than one kind of computer at the +same time, by placing the object files for each architecture in their +own directory. To do this, you can use GNU `make'. `cd' to the +directory where you want the object files and executables to go and run +the `configure' script. `configure' automatically checks for the +source code in the directory that `configure' is in and in `..'. This +is known as a "VPATH" build. + + With a non-GNU `make', it is safer to compile the package for one +architecture at a time in the source code directory. After you have +installed the package for one architecture, use `make distclean' before +reconfiguring for another architecture. + + On MacOS X 10.5 and later systems, you can create libraries and +executables that work on multiple system types--known as "fat" or +"universal" binaries--by specifying multiple `-arch' options to the +compiler but only a single `-arch' option to the preprocessor. Like +this: + + ./configure CC="gcc -arch i386 -arch x86_64 -arch ppc -arch ppc64" \ + CXX="g++ -arch i386 -arch x86_64 -arch ppc -arch ppc64" \ + CPP="gcc -E" CXXCPP="g++ -E" + + This is not guaranteed to produce working output in all cases, you +may have to build one architecture at a time and combine the results +using the `lipo' tool if you have problems. + +Installation Names +================== + + By default, `make install' installs the package's commands under +`/usr/local/bin', include files under `/usr/local/include', etc. You +can specify an installation prefix other than `/usr/local' by giving +`configure' the option `--prefix=PREFIX', where PREFIX must be an +absolute file name. + + You can specify separate installation prefixes for +architecture-specific files and architecture-independent files. If you +pass the option `--exec-prefix=PREFIX' to `configure', the package uses +PREFIX as the prefix for installing programs and libraries. +Documentation and other data files still use the regular prefix. + + In addition, if you use an unusual directory layout you can give +options like `--bindir=DIR' to specify different values for particular +kinds of files. Run `configure --help' for a list of the directories +you can set and what kinds of files go in them. In general, the +default for these options is expressed in terms of `${prefix}', so that +specifying just `--prefix' will affect all of the other directory +specifications that were not explicitly provided. + + The most portable way to affect installation locations is to pass the +correct locations to `configure'; however, many packages provide one or +both of the following shortcuts of passing variable assignments to the +`make install' command line to change installation locations without +having to reconfigure or recompile. + + The first method involves providing an override variable for each +affected directory. For example, `make install +prefix=/alternate/directory' will choose an alternate location for all +directory configuration variables that were expressed in terms of +`${prefix}'. Any directories that were specified during `configure', +but not in terms of `${prefix}', must each be overridden at install +time for the entire installation to be relocated. The approach of +makefile variable overrides for each directory variable is required by +the GNU Coding Standards, and ideally causes no recompilation. +However, some platforms have known limitations with the semantics of +shared libraries that end up requiring recompilation when using this +method, particularly noticeable in packages that use GNU Libtool. + + The second method involves providing the `DESTDIR' variable. For +example, `make install DESTDIR=/alternate/directory' will prepend +`/alternate/directory' before all installation names. The approach of +`DESTDIR' overrides is not required by the GNU Coding Standards, and +does not work on platforms that have drive letters. On the other hand, +it does better at avoiding recompilation issues, and works well even +when some directory options were not specified in terms of `${prefix}' +at `configure' time. + +Optional Features +================= + + If the package supports it, you can cause programs to be installed +with an extra prefix or suffix on their names by giving `configure' the +option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'. + + Some packages pay attention to `--enable-FEATURE' options to +`configure', where FEATURE indicates an optional part of the package. +They may also pay attention to `--with-PACKAGE' options, where PACKAGE +is something like `gnu-as' or `x' (for the X Window System). The +`README' should mention any `--enable-' and `--with-' options that the +package recognizes. + + For packages that use the X Window System, `configure' can usually +find the X include and library files automatically, but if it doesn't, +you can use the `configure' options `--x-includes=DIR' and +`--x-libraries=DIR' to specify their locations. + + Some packages offer the ability to configure how verbose the +execution of `make' will be. For these packages, running `./configure +--enable-silent-rules' sets the default to minimal output, which can be +overridden with `make V=1'; while running `./configure +--disable-silent-rules' sets the default to verbose, which can be +overridden with `make V=0'. + +Particular systems +================== + + On HP-UX, the default C compiler is not ANSI C compatible. If GNU +CC is not installed, it is recommended to use the following options in +order to use an ANSI C compiler: + + ./configure CC="cc -Ae -D_XOPEN_SOURCE=500" + +and if that doesn't work, install pre-built binaries of GCC for HP-UX. + + HP-UX `make' updates targets which have the same time stamps as +their prerequisites, which makes it generally unusable when shipped +generated files such as `configure' are involved. Use GNU `make' +instead. + + On OSF/1 a.k.a. Tru64, some versions of the default C compiler cannot +parse its `<wchar.h>' header file. The option `-nodtk' can be used as +a workaround. If GNU CC is not installed, it is therefore recommended +to try + + ./configure CC="cc" + +and if that doesn't work, try + + ./configure CC="cc -nodtk" + + On Solaris, don't put `/usr/ucb' early in your `PATH'. This +directory contains several dysfunctional programs; working variants of +these programs are available in `/usr/bin'. So, if you need `/usr/ucb' +in your `PATH', put it _after_ `/usr/bin'. + + On Haiku, software installed for all users goes in `/boot/common', +not `/usr/local'. It is recommended to use the following options: + + ./configure --prefix=/boot/common + +Specifying the System Type +========================== + + There may be some features `configure' cannot figure out +automatically, but needs to determine by the type of machine the package +will run on. Usually, assuming the package is built to be run on the +_same_ architectures, `configure' can figure that out, but if it prints +a message saying it cannot guess the machine type, give it the +`--build=TYPE' option. TYPE can either be a short name for the system +type, such as `sun4', or a canonical name which has the form: + + CPU-COMPANY-SYSTEM + +where SYSTEM can have one of these forms: + + OS + KERNEL-OS + + See the file `config.sub' for the possible values of each field. If +`config.sub' isn't included in this package, then this package doesn't +need to know the machine type. + + If you are _building_ compiler tools for cross-compiling, you should +use the option `--target=TYPE' to select the type of system they will +produce code for. + + If you want to _use_ a cross compiler, that generates code for a +platform different from the build platform, you should specify the +"host" platform (i.e., that on which the generated programs will +eventually be run) with `--host=TYPE'. + +Sharing Defaults +================ + + If you want to set default values for `configure' scripts to share, +you can create a site shell script called `config.site' that gives +default values for variables like `CC', `cache_file', and `prefix'. +`configure' looks for `PREFIX/share/config.site' if it exists, then +`PREFIX/etc/config.site' if it exists. Or, you can set the +`CONFIG_SITE' environment variable to the location of the site script. +A warning: not all `configure' scripts look for a site script. + +Defining Variables +================== + + Variables not defined in a site shell script can be set in the +environment passed to `configure'. However, some packages may run +configure again during the build, and the customized values of these +variables may be lost. In order to avoid this problem, you should set +them in the `configure' command line, using `VAR=value'. For example: + + ./configure CC=/usr/local2/bin/gcc + +causes the specified `gcc' to be used as the C compiler (unless it is +overridden in the site shell script). + +Unfortunately, this technique does not work for `CONFIG_SHELL' due to +an Autoconf limitation. Until the limitation is lifted, you can use +this workaround: + + CONFIG_SHELL=/bin/bash ./configure CONFIG_SHELL=/bin/bash + +`configure' Invocation +====================== + + `configure' recognizes the following options to control how it +operates. + +`--help' +`-h' + Print a summary of all of the options to `configure', and exit. + +`--help=short' +`--help=recursive' + Print a summary of the options unique to this package's + `configure', and exit. The `short' variant lists options used + only in the top level, while the `recursive' variant lists options + also present in any nested packages. + +`--version' +`-V' + Print the version of Autoconf used to generate the `configure' + script, and exit. + +`--cache-file=FILE' + Enable the cache: use and save the results of the tests in FILE, + traditionally `config.cache'. FILE defaults to `/dev/null' to + disable caching. + +`--config-cache' +`-C' + Alias for `--cache-file=config.cache'. + +`--quiet' +`--silent' +`-q' + Do not print messages saying which checks are being made. To + suppress all normal output, redirect it to `/dev/null' (any error + messages will still be shown). + +`--srcdir=DIR' + Look for the package's source code in directory DIR. Usually + `configure' can determine that directory automatically. + +`--prefix=DIR' + Use DIR as the installation prefix. *note Installation Names:: + for more details, including other options available for fine-tuning + the installation locations. + +`--no-create' +`-n' + Run the configure checks, but stop before creating any output + files. + +`configure' also accepts some other, not widely useful, options. Run +`configure --help' for more details. diff --git a/libs/hunspell/docs/NEWS b/libs/hunspell/docs/NEWS new file mode 100644 index 0000000000..8422a6f030 --- /dev/null +++ b/libs/hunspell/docs/NEWS @@ -0,0 +1,705 @@ +2017-09-03: Hunspell 1.6.2 release: + - Library changes: no. Same as 1.6.1. + - Command line tool: + - Added German translation + - Fixed bug with wrong output encoding, not respecting system locale. + +2017-03-25: Hunspell 1.6.1 release: + - Library changes: + - Performance improvements in suggest() + - Fixes regressions for Hungarian related to compounding. + - Fixes regressions for Korean related to ICONV. + - Command line tool: + - Added Tajik translation + - Fix regarding serching of OOo dicts installed in user folder + - Manpages: + - Fix microsoft-cp1251 to cp1251. Dicts should not use the first. + - Typos. + +2016-12-22: Hunspell 1.6.0 release: + - Library changes: + - Performance improvement in ngsuggest(), suggestions should be faster. + - Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons. + - MAXWORDLEN can be set during build time with -D defines. + - Fix crash when word with 102 consecutive X is spelled. + - Command line tool: + - -D shows all loaded dictionares insted of only the first. + - -D properly lists all available dictionaries on Windows. + +2016-11-30: Hunspell 1.5.4 release: + - Fixes the command COMPOUNDSYLLABLE used in Hungarian dictionary. + +2016-11-28: Hunspell 1.5.3 release: + - Removed a #include from hunspell.hxx that was creating trouble + +2016-11-27: Hunspell 1.5.2 release: + - Reverted full backward compatibility with 1.4 public API, again + +2016-11-27: Hunspell 1.5.1 release: + - Reverted full backward compatibility with 1.4 public API + +2016-11-18: Hunspell 1.5.0 release: + - Lot of stability fixes + - Fixed compilation errors on various systems (Windows, FreeBSD) + - Small performance improvement compared to 1.4.0 + - The C++ API is updated to use modern C++ types (string, vector). + Backward compatibility is kept for most of the functions except for + the following: + - get_wordchars(); + - get_version(); + - input_conv(string, string); + - removed get_csconv(); + +2016-04-15: Hunspell 1.4.0 release: + - various abi changes due to moving away from char* to std::string + +2014-06-02: Hunspell 1.3.3 release: + - OpenDocument (ODF and Flat ODF) support (ODF needs unzip program) + - various bug fixes + +2011-02-02: Hunspell 1.3.2 release: + - fix library versioning + - improved manual + +2011-02-02: Hunspell 1.3.1 release: + - bug fixes + +2011-01-26: Hunspell 1.2.15/1.3 release: + - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual + - bug fixes + +2011-01-21: + - new features: FORCEUCASE and WARN, see manual + - new options: -r to filter potential mistakes (rare words + signed by flag WARN in the dictionary) + - limited and optimized suggestions + +2011-01-06: Hunspell 1.2.14 release: + - bug fix +2011-01-03: Hunspell 1.2.13 release: + - bug fixes + - improved compound handling and + other improvements supported by OpenTaal Foundation, Netherlands +2010-07-15: Hunspell 1.2.12 release +2010-05-06: Hunspell 1.2.11 release: + - Maintenance release bug fixes +2010-04-30: Hunspell 1.2.10 release: + - Maintenance release bug fixes +2010-03-03: Hunspell 1.2.9 release: + - Maintenance release bug fixes and warnings + - MAP support for composed characters or character sequences +2008-11-01: Hunspell 1.2.8 release: + - Default BREAK feature and better hyphenated word suggestion to accept + and fix (compound) words with hyphen characters by spell checker + instead of by work breaking code of OpenOffice.org. With this feature + it's possible to accept hyphenated compound words, such as "scot-free", + where "scot" is not a correct English word. + + - ICONV & OCONV: input and output conversion tables for optional character + handling or using special inner format. Example: + + # Accepting de facto replacements of the Romanian comma acuted letters + SET UTF-8 + ICONV 4 + ICONV ş ș + ICONV ţ ț + ICONV Ş Ș + ICONV Ţ Ț + + Typical usage of ICONV/OCONV is to manage an inner format for a segmental + writing system, like the Ethiopic script of the Amharic language. + + - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like + sandhi feature of Telugu and other writing systems. + + - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and + Norwegian compound word forms, like tillåta (till|låta) and + bussjåfør (buss|sjåfør) + + - wordforms: word generator script for dictionary developers (Hunspell + version of unmunch). + + - bug fixes + +2008-08-15: Hunspell 1.2.7 release: + - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can + strip full words, not only one less characters. + - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern + matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE + for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd + etc.). + - optimized suggestions: + - modified 1-character distance suggestion algorithms: search a TRY character + in all position instead of all TRY characters in a character position + (it can give more readable suggestion order, also better suggestions + in the first positions, when TRY characters are sorted by frequency.) + For example, suggestions for "moze": + ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6), + maze, more, mote, ooze, mole etc. (Hunspell 1.2.7). + - extended compound word checking for better COMPOUNDRULE related + suggestions, for example English ordinal numbers: 121323th -> 121323rd + (it needs also a th->rd REP definition). + - bug fixes + +2008-07-15: Hunspell 1.2.6 release: + - bug fix release (fix affix rule condition checking of sk_SK dictionary, + iconv support in stemming and morphological analysis of the Hunspell + utility, see also Changelog) + +2008-07-09: Hunspell 1.2.5 release: + - bug fix release (fix affix rule condition checking of en_GB dictionary, + also morphological analysis by dictionaries with two-level suffixes) + +2008-06-18: Hunspell 1.2.4-2 release: + - fix GCC compiler warnings + +2008-06-17: Hunspell 1.2.4 release: + - add free_list() for C, C++ interfaces to deallocate suggestion lists + + - bug fixes + +2008-06-17: Hunspell 1.2.3 release: + - extended XML interface to use morphological functions by standard + spell checking interface, spell() and suggest(). See hunspell.3 manual page. + + - default dash suggestions for compound words: newword-> new word and new-word + + - new manual pages: hunspell.3, hzip.1, hunzip.1. + + - bug fixes + +2008-04-12: Hunspell 1.2.2 release: + - extended dictionary (dic file) support to use multiple base and + special dictionaries. + + - new and improved options of command line hunspell: + -m: morphological analysis or flag debug mode (without affix + rule data it signs the flag of the affix rules) + -s: stemming mode + -D: list available dictionaries and search path + -d: support extra dictionaries by comma separated list. Example: + + hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt + + - forbidding in personal dictionary (with asterisk, / signs affixation) + + - optional compressed dictionary format "hzip" for aff and dic files + usage: + hzip example.aff example.dic + mv example.aff example.dic /tmp + hunspell -d example + hunzip example.aff.hz >example.aff + hunzip example.dic.hz >example.dic + + - new affix compression tool "affixcompress": compression tool for + large (millions of words) dictionaries. + + - support encrypted dictionaries for closed OpenOffice.org extensions or + other commercial programs + + - improved manual + + - bug fixes + +2007-11-01: Hunspell 1.2.1 release: + - new memory efficient condition checking algorithm for affix rules + + - new morphological functions: + - stem() for stemming + - analyze() for morphological analysis + - generate() for morphological generation + + - new demos: + - analyze: stemming, morphological analysis and generation + - chmorph: morphological conversion of texts + +2007-09-05: Hunspell 1.1.12 release: + - dictionary based phonetic suggestion for words with + special or foreign pronounciation or alternative (bad) transliteration + (see Changelog, tests/phone.* and manual). + + - improved data structure and memory optimization for dictionaries + with variable count fields + + - bug fixes for Unicode encoding dictionaries and ngram suggestions + + - improved REP suggestions with space: it works without dictionary + modification + + - updated and new project files for Windows API + +2007-08-27: Hunspell 1.1.11 release: + - portability fixes + +2007-08-23: Hunspell 1.1.10 release: + - pronounciation based suggestion using Bjrn Jacke's original Aspell + phonetic transcription algorithm (http://aspell.net), relicensed under + GPL/LGPL/MPL tri-license with the permission of the author + + - keyboard base suggestion by KEY (see manual) + + - better time limits for suggestion search + + - test environment for suggestion based on Wikipedia data + + - bug fixes for non standard Mozilla platforms etc. + +2007-07-25: Hunspell 1.1.9 release: + - better tokenization: + - for URLs, mail addresses and directory paths (default: skip these tokens) + - for colons in words (for Finnish and Swedish) + + - new examples: + - affixation of personal dictionary words + - digits in words + + - bug fixes (see ChangeLog) + +2007-07-16: Hunspell 1.1.8 release: + - better Mac OS X/Cygwin and Windows compatibility + + - fix Hunspell's Valgrind environment and memory handling errors + detected by Valgrind + + - other bug fixes (see ChangeLog) + +2007-07-06: Hunspell 1.1.7 release: + - fix warning messages of OpenOffice.org build + +2007-06-29: Hunspell 1.1.6 release: + - check capitalization of the following word forms + - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG + - allcap words and suffixes: UNICEF's - UNICEF'S + - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA + + - suggestion for missing sentence spacing: something.The -> something. The + + - Hunspell executable: improved locale support + - -i option: custom input encoding + - use locale data for default dictionary names. + - tools/hunspell.cxx: fix 8-bit tokenization (letters without + casing, like ß or Hebrew characters now are handled well) + - dictionary search path (automatic detection of OpenOffice.org directories) + - DICPATH environmental variable + - -D option: show directory path of loaded dictionary + + - patches and bug fixes for Mozilla, OpenOffice.org. + +2007-03-19: Hunspell 1.1.5 release: + - optimizations: 10-100% speed up, smaller code size and memory footprint + (conditional experimental code and warning messages) + + - extended Unicode support: + - non BMP Unicode characters in dictionary words and affixes (except + affix rules and conditions) + - support BOM sequence in aff and dic files + + - IGNORE feature for Arabic diacritics and other optional characters + + - New edit distance suggestion methods: + - capitalisation: nasa -> NASA + - long swap: permenant -> permanent + - long move: Ghandi -> Gandhi, greatful -> grateful + - double two characters: vacacation -> vacation + - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word) + + - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua, + German and Arabic language, etc. + +2006-02-01: Hunspell 1.1.4 release: + - Improved suggestion for typical OCR bugs (missing spaces between + capitalized words). For example: "aNew" -> "a New". + http://qa.openoffice.org/issues/show_bug.cgi?id=58202 + + - tokenization fixes (fix incomplete tokenization of input texts on big-endian + platforms, and locale-dependent tokenization of dictionary entries) + +2006-01-06: Hunspell 1.1.3.2 release: + - fix Visual C++ compiling errors + +2006-01-05: Hunspell 1.1.3 release: + - GPL/LGPL/MPL tri-license for Mozilla integration + + - Alias compression of flag sets and morphological descriptions. + (For example, 16 MB Arabic dic file can be compressed to 1 MB.) + + - Improved suggestion. + + - Improved, language independent German sharp s casing with CHECKSHARPS + declaration. + + - Unicode tokenization in Hunspell program. + + - Bug fixes (at new and old compound word handling methods), etc. + +2005-11-11: Hunspell 1.1.2 release: + + - Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND + suggestions) + + - Checked with 51 regression tests in Valgrind debugging environment, + and tested with 52 OOo dictionaries on i686-pc-linux platform. + +2005-11-09: Hunspell 1.1.1 release: + + - Compound word patterns for complex compound word handling and + simple word-level lexical scanning. Ideal for checking + Arabic and Roman numbers, ordinal numbers in English, affixed + numbers in agglutinative languages, etc. + http://qa.openoffice.org/issues/show_bug.cgi?id=53643 + + - Support ISO-8859-15 encoding for French (French oe ligatures are + missing from the latin-1 encoding). + http://qa.openoffice.org/issues/show_bug.cgi?id=54980 + + - Implemented a flag to forbid obscene word suggestion: + http://qa.openoffice.org/issues/show_bug.cgi?id=55498 + + - Checked with 50 regression tests in Valgrind debugging environment, + and tested with 52 OOo dictionaries. + + - other improvements and bug fixes (see ChangeLog) + +2005-09-19: Hunspell 1.1.0 release + +* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta) + +* improved ngram suggestion with swap character detection and + case insensitivity + +------ examples for ngram improvement (input word and suggestions) ----- + +1. pernament (instead of permanent) + +MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, + ornament, ornamentals, ornamental, ornamentally + +Hunspell 1.0.9: ornamental, ornament, tournament + +Hunspell 1.1.0: permanent + +Note: swap character detection + + +2. PERNAMENT (instead of PERMANENT) + +MySpell 3.2: - + +Hunspell 1.0.9: - + +Hunspell 1.1.0: PERMANENT + + +3. Unesco (instead of UNESCO) + +MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's, + Frescoed, Fresco, Escorts, Escorting + +Hunspell 1.0.9: Genesco, Ionesco, Fresco + +Hunspell 1.1.0: UNESCO + + +4. siggraph's (instead of SIGGRAPH's) + +MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's, + physiography, digraphs, serigraph, stratigraphy's, stratigraphy + epigraphs + +Hunspell 1.0.9: serigraph's, epigraph's, digraph's + +Hunspell 1.1.0: SIGGRAPH's + +--------------- end of examples -------------------- + +* improved testing environment with suggestion checking and memory debugging + + memory debugging of all tests with a simple command: + + VALGRIND=memcheck make check + +* lots of other improvements and bug fixes (see ChangeLog) + + +2005-08-26: Hunspell 1.0.9 release + +* improved related character map suggestion + +* improved ngram suggestion + +------ examples for ngram improvement (O=old, N = new ngram suggestions) -- + +1. Permenant (instead of Permanent) + +O: Endangerment, Ferment, Fermented, Deferment's, Empowerment, + Ferment's, Ferments, Fermenting, Countermen, Weathermen + +N: Permanent, Supermen, Preferment + +Note: Ngram suggestions was case sensitive. + +2. permenant (instead of permanent) + +O: supermen, newspapermen, empowerment, endangerment, preferments, + preferment, permanent, preferment's, permanently, impermanent + +N: permanent, supermen, preferment + +Note: new suggestions are also weighted with longest common subsequence, +first letter and common character positions + +3. pernemant (instead of permanent) + +O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent, + supernatant, impermanent, semipermanent, impermanently + +N: permanent, supernatant, pimpernel + +Note: new method also prefers root word instead of not +relevant affixes ('s, s and ly) + + +4. pernament (instead of permanent) + +O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, + ornament, ornamentals, ornamental, ornamentally + +N: ornamental, ornament, tournament + +Note: Both ngram methods misses here. + + +5. obvus (instad of obvious): + +O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse, + obviates, obviate, Travus + +N: obvious, obtuse, obverse + +Note: new method also prefers common first letters. + + +6. unambigus (instead of unambiguous) + +O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous, + unambitious, ambiguities, ambiguousness + +N: unambiguous, unambiguity, unambitious + + + +7. consecvence (instead of consequence) + +O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence, + consecutiveness's, convenience's, consistences, consistence + +N: consequence, consecutive, consecrates + + +An example in a language with rich morphology: + +8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]): + +O: Misikdiben, Pisisediben, Misikiiben, Pisisekiben, Misikiben, + Misikidiben, Misikkiben, Misikikiben, Misikimiben, Mississippiiben + +N: Mississippiben, Mississippiiben, Misiiben + +Note: Suggesting not relevant affixes was the biggest fault in ngram + suggestion for languages with a lot of affixes. + +--------------- end of examples -------------------- + +* support twofold prefix cutting + +* lots of other improvements and bug fixes (see ChangeLog) + +* test Hunspell with 54 OpenOffice.org dictionaries: + +source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries + +testing shell script: +------------------------------------------------------- +for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'` +do + dic=`basename $i .zip` + mkdir $dic + echo unzip $dic + unzip -d $dic $i 2>/dev/null + cd $dic + echo unmunch and test $dic + unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' | + hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result + cd .. +done +-------------------------------------------------------- + +test result (0 size is o.k.): + +$ for i in *_*/*.result; do wc -c $i; done +0 af_ZA/af_ZA.result +0 bg_BG/bg_BG.result +0 ca_ES/ca_ES.result +0 cy_GB/cy_GB.result +0 cs_CZ/cs_CZ.result +0 da_DK/da_DK.result +0 de_AT/de_AT.result +0 de_CH/de_CH.result +0 de_DE/de_DE.result +0 el_GR/el_GR.result +6 en_AU/en_AU.result +0 en_CA/en_CA.result +0 en_GB/en_GB.result +0 en_NZ/en_NZ.result +0 en_US/en_US.result +0 eo_EO/eo_EO.result +0 es_ES/es_ES.result +0 es_MX/es_MX.result +0 es_NEW/es_NEW.result +0 fo_FO/fo_FO.result +0 fr_FR/fr_FR.result +0 ga_IE/ga_IE.result +0 gd_GB/gd_GB.result +0 gl_ES/gl_ES.result +0 he_IL/he_IL.result +0 hr_HR/hr_HR.result +200694989 hu_HU/hu_HU.result +0 id_ID/id_ID.result +0 it_IT/it_IT.result +0 ku_TR/ku_TR.result +0 lt_LT/lt_LT.result +0 lv_LV/lv_LV.result +0 mg_MG/mg_MG.result +0 mi_NZ/mi_NZ.result +0 ms_MY/ms_MY.result +0 nb_NO/nb_NO.result +0 nl_NL/nl_NL.result +0 nn_NO/nn_NO.result +0 ny_MW/ny_MW.result +0 pl_PL/pl_PL.result +0 pt_BR/pt_BR.result +0 pt_PT/pt_PT.result +0 ro_RO/ro_RO.result +0 ru_RU/ru_RU.result +0 rw_RW/rw_RW.result +0 sk_SK/sk_SK.result +0 sl_SI/sl_SI.result +0 sv_SE/sv_SE.result +0 sw_KE/sw_KE.result +0 tet_ID/tet_ID.result +0 tl_PH/tl_PH.result +0 tn_ZA/tn_ZA.result +0 uk_UA/uk_UA.result +0 zu_ZA/zu_ZA.result + +In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but +`eqn.' is missing. Presumably it is a dictionary bug. Myspell also +haven't accepted it. + +Hungarian dictionary contains pseudoroots and forbidden words. +Unmunch haven't supported these features yet, and generates bad words, too. + +* check affix rules and OOo dictionaries. Detected bugs in cs_CZ, +es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries). + +Details: +-------------------------------------------------------- +cs_CZ +warning - incompatible stripping characters and condition: +SFX D us ech [^ighk]os +SFX D us y [^i]os +SFX Q os ech [^ghk]es +SFX M o ech [^ghkei]a +SFX J m ej m +SFX J m ejme m +SFX J m ejte m +SFX A ouit up oupit +SFX A ouit upme oupit +SFX A ouit upte oupit +SFX A nout l [aeiouyr][^aeiouyrl][^aeiouy +SFX A nout l [aeiouyr][^aeiouyrl][^aeiouy + +es_ES +warning - incompatible stripping characters and condition: +SFX W umar se [ae]husar +SFX W emir iis eir + +es_NEW +warning - incompatible stripping characters and condition: +SFX I unan nen unar + +es_MX +warning - incompatible stripping characters and condition: +SFX A a ote e +SFX W umar se [ae]husar +SFX W emir iis eir + +lt_LT +warning - incompatible stripping characters and condition: +SFX U ti siuosi tis +SFX U ti siuosi tis +SFX U ti siesi tis +SFX U ti siesi tis +SFX U ti sis tis +SFX U ti sis tis +SFX U ti sims tis +SFX U ti sims tis +SFX U ti sits tis +SFX U ti sits tis + +nn_NO +warning - incompatible stripping characters and condition: +SFX D ar rar [^fmk]er +SFX U re orde ere +SFX U re ort ere + +pt_PT +warning - incompatible stripping characters and condition: +SFX g os oas o +SFX g os oas o + +ro_RO +warning - bad field number: +SFX L 0 le [^cg] i +SFX L 0 i [cg] i +SFX U 0 i [^i] ii +warning - incompatible stripping characters and condition: +SFX P l i l [<- there is an unnecessary tabulator here) +SFX I a ii [gc] a +warning - bad field number: +SFX I a ii [gc] a +SFX I a ei [^cg] a + +sk_SK +warning - incompatible stripping characters and condition: +SFX T a ol kla +SFX T a olc kla +SFX T sa l sla +SFX T sa lc sla +SFX R c liem c +SFX R is tie mias +SFX R iez iem [^i]ez +SFX R iez ie [^i]ez +SFX R iez ie [^i]ez +SFX R iez eme [^i]ez +SFX R iez ete [^i]ez +SFX R iez [^i]ez +SFX R iez c [^i]ez +SFX R iez z [^i]ez +SFX R iez me [^i]ez +SFX R iez te [^i]ez + +sv_SE +warning - bad field number: +SFX C 0 net nets [^e]n +-------------------------------------------------------- + +2005-08-01: Hunspell 1.0.8 release + +- improved compound word support +- fix German S handling +- port MySpell files and MAP feature + +2005-07-22: Hunspell 1.0.7 release + +2005-07-21: new home page: http://hunspell.sourceforge.net diff --git a/libs/hunspell/docs/README b/libs/hunspell/docs/README index b97a112fd3..42061c01a1 100644 --- a/libs/hunspell/docs/README +++ b/libs/hunspell/docs/README @@ -1,21 +1 @@ -Hunspell spell checker and morphological analyser library - -Documentation, tests, examples: http://hunspell.github.io/ - -Author of Hunspell: -László Németh (nemethl (at) gyorsposta.hu) - -Hunspell based on OpenOffice.org's Myspell. MySpell's author: -Kevin Hendricks (kevin.hendricks (at) sympatico.ca) - -License: GPL 2.0/LGPL 2.1/MPL 1.1 tri-license - -The contents of this library may be used under the terms of -the GNU General Public License Version 2 or later (the "GPL"), or -the GNU Lesser General Public License Version 2.1 or later (the "LGPL", -see http://gnu.org/copyleft/lesser.html) or the Mozilla Public License -Version 1.1 or later (the "MPL", see http://mozilla.org/MPL/MPL-1.1.html). - -Software distributed under these licenses is distributed on an "AS IS" basis, -WITHOUT WARRANTY OF ANY KIND, either express or implied. See the licences -for the specific language governing rights and limitations under the licenses. +README.md
\ No newline at end of file diff --git a/libs/hunspell/docs/README.md b/libs/hunspell/docs/README.md new file mode 100644 index 0000000000..13bac95c78 --- /dev/null +++ b/libs/hunspell/docs/README.md @@ -0,0 +1,182 @@ +About Hunspell +============== + +NOTICE: Version 2 is in the works. For contributing see +[version 2 specification][v2spec] and the folder `src/hunspell2`. + +[v2spec]: https://github.com/hunspell/hunspell/wiki/Version-2-Specification + +Hunspell is a spell checker and morphological analyzer library and program +designed for languages with rich morphology and complex word compounding or +character encoding. Hunspell interfaces: Ispell-like terminal interface +using Curses library, Ispell pipe interface, C++ class and C functions. + +Hunspell's code base comes from the OpenOffice.org MySpell +(http://lingucomponent.openoffice.org/MySpell-3.zip). See README.MYSPELL, +AUTHORS.MYSPELL and license.myspell files. +Hunspell is designed to eventually replace Myspell in OpenOffice.org. + +Main features of Hunspell spell checker and morphological analyzer: + +- Unicode support (affix rules work only with the first 65535 Unicode + characters) +- Morphological analysis (in custom item and arrangement style) and stemming +- Max. 65535 affix classes and twofold affix stripping (for agglutinative + languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.) +- Support complex compoundings (for example, Hungarian and German) +- Support language specific features (for example, special casing of + Azeri and Turkish dotted i, or German sharp s) +- Handle conditional affixes, circumfixes, fogemorphemes, + forbidden words, pseudoroots and homonyms. +- Free software. Versions 1.x are licenced under LGPL, GPL, MPL tri-license. + Version 2 is licenced only under GNU LGPL. + +Compiling on GNU/Linux and Unixes +================================= + + autoreconf -vfi + ./configure + make + sudo make install + sudo ldconfig + +For dictionary development, use the `--with-warnings` option of configure. + +For interactive user interface of Hunspell executable, use the `--with-ui option`. + +The developer packages you need to compile Hunspell's interface: + + autoconf automake autopoint libtool g++ + +Optional developer packages: + +- ncurses (need for --with-ui), eg. libncursesw5 for UTF-8 +- readline (for fancy input line editing, + configure parameter: --with-readline) +- locale and gettext (but you can also use the + --with-included-gettext configure parameter) + +Compiling on Windows +==================== + +## 1. Compiling with Mingw64 and MSYS2 + +Download Msys2, update everything and install the following packages: + + pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool + +Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see above. + +## 2. Compiling in Cygwin environment + +Download and install Cygwin environment for Windows with the following +extra packages: + +- make +- automake +- autoconf +- libtool +- gcc-g++ development package +- ncurses, readline (for user interface) +- iconv (character conversion) + +Then compile the same way as on Linux. Cygwin builds depend on Cygwin1.dll. + +Debugging +========= + +For debugging we need to create a debug build and then we need to start `gdb`. + + make clean + make CXXFLAGS='-g -O0' + libtool --mode=execute gdb src/tools/hunspell + +Testing +======= + +Testing Hunspell (see tests in tests/ subdirectory): + + make check + +or with Valgrind debugger: + + make check + VALGRIND=[Valgrind_tool] make check + +For example: + + make check + VALGRIND=memcheck make check + +Documentation +============= + +features and dictionary format: + + man 5 hunspell + man hunspell + hunspell -h + +http://hunspell.github.io/ + +Usage +===== + +The src/tools directory contains ten executables after compiling: + +- affixcompress: dictionary generation from large (millions of words) + vocabularies +- analyze: example of spell checking, stemming and morphological analysis +- chmorph: example of automatic morphological generation and conversion +- example: example of spell checking and suggestion +- hunspell: main program for spell checking and others (see manual) +- hunzip: decompressor of hzip format +- hzip: compressor of hzip format +- makealias: alias compression (Hunspell only, not back compatible with MySpell) +- munch: dictionary generation from vocabularies (it needs an affix file, too). +- unmunch: list all recognized words of a MySpell dictionary +- wordforms: word generation (Hunspell version of unmunch) + +After compiling and installing (see INSTALL) you can +run the Hunspell spell checker (compiled with user interface) +with a Hunspell or Myspell dictionary: + + hunspell -d en_US text.txt + +or without interface: + + hunspell + hunspell -d en_UK -l <text.txt + +Dictionaries consist of an affix and dictionary file, see tests/ +or http://wiki.services.openoffice.org/wiki/Dictionaries. + +Using Hunspell library with GCC +=============================== + +Including in your program: + + #include <hunspell.hxx> + +Linking with Hunspell static library: + + g++ -lhunspell example.cxx + +Dictionaries +------------ + +Myspell & Hunspell dictionaries: + +- http://extensions.libreoffice.org +- http://cgit.freedesktop.org/libreoffice/dictionaries +- http://extensions.openoffice.org +- http://wiki.services.openoffice.org/wiki/Dictionaries + +Aspell dictionaries (need some conversion): + +- ftp://ftp.gnu.org/gnu/aspell/dict + +Conversion steps: see relevant feature request at http://hunspell.github.io/ . + +László Németh +nemeth at numbertext org diff --git a/libs/hunspell/docs/README.myspell b/libs/hunspell/docs/README.myspell new file mode 100644 index 0000000000..25934eec01 --- /dev/null +++ b/libs/hunspell/docs/README.myspell @@ -0,0 +1,69 @@ +MySpell is a simple spell checker that uses affix +compression and is modelled after the spell checker +ispell. + +MySpell was written to explore how affix compression +can be implemented. + +The Main features of MySpell are: + +1. written in C++ to make it easier to interface with + Pspell, OpenOffice, AbiWord, etc + +2. it is stateless, uses no static variables and + should be completely reentrant with almost no + ifdefs + +3. it tries to be as compatible with ispell to + the extent it can. It can read slightly modified + versions of munched ispell dictionaries (and it + comes with a munched english wordlist borrowed from + Kevin Atkinson's excellent Aspell. + +4. it uses a heavily modified aff file format that + can be derived from ispell aff files but uses + the iso-8859-X character sets only + +5. it is simple with *lots* of comments that + describes how the affixes are stored + and tested for (based on the approach used by + ispell). + +6. it supports improved suggestions with replacement + tables and ngram-scoring based mechanisms in addition + to the main suggestion mechanisms + +7. like ispell it has a BSD license (and no + advertising clause) + +But ... it has *no* support for adding words +to a personal dictionary, *no* support for converting +between various text encodings, and *no* command line +interface (it is purely meant to be a library). + +It can not (in any way) replace all of the functionality +of ispell or aspell/pspell. It is meant as a learning +tool for understanding affix compression and for +being used by front ends like OpenOffice, Abiword, etc. + +MySpell has been tested under Linux and Solaris +and has the world's simplest Makefile and no +configure support. + +It does come with a simple example program that +spell checks some words and returns suggestions. + +To build a static library and an example +program under Linux simply type: + +tar -zxvf myspell.tar.gz +cd myspell2 +make + +To run the example program: +./example ./en_US.aff ./en_US.dic checkme.lst + +Please play around with it and let me know +what you think. + +Please see the file CONTRIBUTORS for more info. diff --git a/libs/hunspell/docs/THANKS b/libs/hunspell/docs/THANKS new file mode 100644 index 0000000000..761fa77438 --- /dev/null +++ b/libs/hunspell/docs/THANKS @@ -0,0 +1,136 @@ +Many thanks to the following contributors and supporters: + +Mehmet Akin +Göran Andersson +Lars Aronsson +Ruud Baars +Bartkó Zoltán +Mathias Bauer +Bencsáth Boldizsár +Bíró Árpád +Ingo H. de Boer +Simon Brouwer +Jeppe Bundsgaard +Ginn Chen +Tomáš Chvátal +Aaron Digulla +Dmitri Gabinski +Dvornik László +David Einstein +Rene Engelhard +Frederik Fouvry +Flemming Frandsen +Serge Gautherie +Marek Gleń +Gavins at OOo +Gefferth András +Godó Ferenc +Goldman Eleonóra +Steinar H. Gunderson +Halácsy Péter +Chris Halls +Khaled Hosny +Izsók András +Björn Jacke +Mike Tian-Jian Jiang +Dafydd Jones +Ryan Jones +Jean-Christophe Helary +Kevin Hendricks +Martin Hollmichel +Pavel Janík +John Winters +Mohamed Kebdani +Kelemen Gábor +Shewangizaw Gulilat +Kéménczy Kálmán +Dan Kenigsberg +Pham Ngoc Khanh +Khiraly László +Koblinger Egmont +Kornai András +Tor Lillqvist +Christian Lohmaier +Robert Longson +Marot at SF dot net +Mark McClain +Caolan McNamara +Michael Meeks +Moheb Mekhaiel +Laurie Mercer +Ladislav Michnovič +Ellis Miller +Giuseppe Modugno +János Mohácsi +Bram Moolenaar +Daniel Naber +Nagy Viktor +John Nisly +Noll János +S Page +Christophe Paris +Malcolm Parsons +Sylvain Paschein +Volkov Peter +Bryan Petty +Harri Pitkänen +Davide Prina +Kevin F. Quinn +Erdal Ronahi +Olivier Ronez +Bernhard Rosenkraenzer +Sarlós Tamás +Thobias Schlemmer +Jan Seeger +Jose da Silva +Paulo Ney de Souza +Roland Smith +Munzir Taha +Timeless at bemail dot org +Tímár András +Tonal at OOo +Török László +Trón Viktor +Gianluca Turconi +Ryan VanderMeulen +Varga Dániel +Elio Voci +Miha Vrhovnik +Martijn Wargers +Michel Weimerskirch +Brett Wilson +Friedel Wolff +Daniel Yacob +Gábor Zahemszky +Taha Zerrouki +and others (see also AUTHORS.myspell) + +FSF.hu Foundation +http://www.fsf.hu + +LibreOffice community +http://www.libreoffice.org + +MOKK Research Centre +Budapest University of Technology and Economics +Sociology and Communications Department +http://www.mokk.bme.hu + +Hungarian Ministry of Informatics and Telecommunications + +IMEDIA Kft. +http://www.imedia.hu + +OpenOffice.org community +http://www.openoffice.org + +OpenTaal Foundation, Netherlands and +Dutch Language Union (Nederlandse Taalunie) +http://opentaal.org + +UHU-Linux Kft. + +Thanks, + +Németh László +nemeth at numbertext org diff --git a/libs/hunspell/docs/TODO b/libs/hunspell/docs/TODO new file mode 100644 index 0000000000..fb32e7ec89 --- /dev/null +++ b/libs/hunspell/docs/TODO @@ -0,0 +1,4 @@ +* shared dictionaries for multi-user environment +* improve compound handling +* Unicode unmunch (munch) +* forbiddenword and pseudoword support in unmunch diff --git a/libs/hunspell/docs/license.hunspell b/libs/hunspell/docs/license.hunspell index dc2ce9c1e8..18835adf03 100644 --- a/libs/hunspell/docs/license.hunspell +++ b/libs/hunspell/docs/license.hunspell @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Laszlo Nemeth (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): * David Einstein @@ -24,21 +21,21 @@ * Giuseppe Modugno * Gianluca Turconi * Simon Brouwer - * Noll Janos - * Biro Arpad - * Goldman Eleonora - * Sarlos Tamas - * Bencsath Boldizsar - * Halacsy Peter - * Dvornik Laszlo - * Gefferth Andras + * Noll János + * Bíró Árpád + * Goldman Eleonóra + * Sarlós Tamás + * Bencsáth Boldizsár + * Halácsy Péter + * Dvornik László + * Gefferth András * Nagy Viktor - * Varga Daniel + * Varga Dániel * Chris Halls * Rene Engelhard * Bram Moolenaar * Dafydd Jones - * Harri Pitkanen + * Harri Pitkänen * Andras Timar * Tor Lillqvist * @@ -55,7 +52,3 @@ * the terms of any one of the MPL, the GPL or the LGPL. * * ***** END LICENSE BLOCK ***** */ - -#ifndef MOZILLA_CLIENT -# include "config.h" -#endif diff --git a/libs/hunspell/hunspell.vcxproj b/libs/hunspell/hunspell.vcxproj index 259a46402c..5bd2316a9a 100644 --- a/libs/hunspell/hunspell.vcxproj +++ b/libs/hunspell/hunspell.vcxproj @@ -31,7 +31,7 @@ </ClCompile>
</ItemDefinitionGroup>
<ItemGroup>
- <ClCompile Include="src\*.c++">
+ <ClCompile Include="src\*.cxx">
<PreprocessorDefinitions>BUILDING_LIBHUNSPELL;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PrecompiledHeader>NotUsing</PrecompiledHeader>
</ClCompile>
diff --git a/libs/hunspell/include/hunspell.hpp b/libs/hunspell/include/hunspell.hpp index 7d5ec5c771..27e2e235be 100644 --- a/libs/hunspell/include/hunspell.hpp +++ b/libs/hunspell/include/hunspell.hpp @@ -7,7 +7,6 @@ #include "../src/affixmgr.hxx"
#include "../src/langnum.hxx"
#include "../src/atypes.hxx"
-#include "../src/dictmgr.hxx"
#include "../src/filemgr.hxx"
#include "../src/hashmgr.hxx"
#include "../src/hunzip.hxx"
diff --git a/libs/hunspell/res/Hunspell.rc b/libs/hunspell/res/Hunspell.rc index 5b48608ef9..8fd119ebb7 100644 --- a/libs/hunspell/res/Hunspell.rc +++ b/libs/hunspell/res/Hunspell.rc @@ -1,31 +1,32 @@ -#include <winver.h>
-
-VS_VERSION_INFO VERSIONINFO
-FILEVERSION 1,3,4,0
-PRODUCTVERSION 1,3,4,0
-FILEFLAGSMASK 0x17L
-FILEFLAGS 0
-FILEOS VOS_NT_WINDOWS32
-FILETYPE VFT_APP
-FILESUBTYPE VFT2_UNKNOWN
-BEGIN
- BLOCK "VarFileInfo"
- BEGIN
- VALUE "Translation", 0x409, 1200
- END
- BLOCK "StringFileInfo"
- BEGIN
- BLOCK "040904b0"
- BEGIN
- VALUE "Comments", "Hunspell (http://hunspell.github.io/) by László Németh"
- VALUE "CompanyName", "http://hunspell.github.io/"
- VALUE "FileDescription", "libhunspell"
- VALUE "FileVersion", "1.3.4"
- VALUE "InternalName", "libhunspell"
- VALUE "LegalCopyright", "Copyright (c) 2007-2011"
- VALUE "OriginalFilename", "libhunspell.dll"
- VALUE "ProductName", "Hunspell Dynamic Link Library"
- VALUE "ProductVersion", "1.3.4"
- END
- END
-END
+ +#include <windows.h> + +VS_VERSION_INFO VERSIONINFO +FILEVERSION 1,6,2,0 +PRODUCTVERSION 1,6,2,0 +FILEFLAGSMASK 0x17L +FILEFLAGS 0 +FILEOS VOS_NT_WINDOWS32 +FILETYPE VFT_APP +FILESUBTYPE VFT2_UNKNOWN +BEGIN + BLOCK "VarFileInfo" + BEGIN + VALUE "Translation", 0x409, 1200 + END + BLOCK "StringFileInfo" + BEGIN + BLOCK "040904b0" + BEGIN + VALUE "Comments", "Hunspell (http://hunspell.github.io/) by Lszl Nmeth" + VALUE "CompanyName", "http://hunspell.github.io/" + VALUE "FileDescription", "libhunspell" + VALUE "FileVersion", "1.6.2" + VALUE "InternalName", "libhunspell" + VALUE "LegalCopyright", "Copyright (c) 2007-2017" + VALUE "OriginalFilename", "libhunspell.dll" + VALUE "ProductName", "Hunspell Dynamic Link Library" + VALUE "ProductVersion", "1.6.2" + END + END +END diff --git a/libs/hunspell/src/affentry.c++ b/libs/hunspell/src/affentry.cxx index bd28274368..4ef0c00d9b 100644 --- a/libs/hunspell/src/affentry.c++ +++ b/libs/hunspell/src/affentry.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -79,33 +76,7 @@ #include "affentry.hxx" #include "csutil.hxx" -PfxEntry::PfxEntry(AffixMgr* pmgr, affentry* dp) - // register affix manager - : pmyMgr(pmgr), - next(NULL), - nexteq(NULL), - nextne(NULL), - flgnxt(NULL) { - // set up its initial values - aflag = dp->aflag; // flag - strip = dp->strip; // string to strip - appnd = dp->appnd; // string to append - numconds = dp->numconds; // length of the condition - opts = dp->opts; // cross product flag - // then copy over all of the conditions - if (opts & aeLONGCOND) { - memcpy(c.conds, dp->c.l.conds1, MAXCONDLEN_1); - c.l.conds2 = dp->c.l.conds2; - } else - memcpy(c.conds, dp->c.conds, MAXCONDLEN); - morphcode = dp->morphcode; - contclass = dp->contclass; - contclasslen = dp->contclasslen; -} - -PfxEntry::~PfxEntry() { - aflag = 0; - pmyMgr = NULL; +AffEntry::~AffEntry() { if (opts & aeLONGCOND) free(c.l.conds2); if (morphcode && !(opts & aeALIASM)) @@ -114,17 +85,26 @@ PfxEntry::~PfxEntry() { free(contclass); } +PfxEntry::PfxEntry(AffixMgr* pmgr) + // register affix manager + : pmyMgr(pmgr), + next(NULL), + nexteq(NULL), + nextne(NULL), + flgnxt(NULL) { +} + // add prefix to this word assuming conditions hold -char* PfxEntry::add(const char* word, size_t len) { +std::string PfxEntry::add(const char* word, size_t len) { + std::string result; if ((len > strip.size() || (len == 0 && pmyMgr->get_fullstrip())) && (len >= numconds) && test_condition(word) && (!strip.size() || (strncmp(word, strip.c_str(), strip.size()) == 0))) { /* we have a match so add prefix */ - std::string tword(appnd); - tword.append(word + strip.size()); - return mystrdup(tword.c_str()); + result.assign(appnd); + result.append(word + strip.size()); } - return NULL; + return result; } inline char* PfxEntry::nextchar(char* p) { @@ -276,8 +256,7 @@ struct hentry* PfxEntry::checkword(const char* word, // if ((opts & aeXPRODUCT) && in_compound) { if ((opts & aeXPRODUCT)) { he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, aeXPRODUCT, this, - NULL, 0, NULL, FLAG_NULL, needflag, - in_compound); + FLAG_NULL, needflag, in_compound); if (he) return he; } @@ -291,8 +270,6 @@ struct hentry* PfxEntry::check_twosfx(const char* word, int len, char in_compound, const FLAG needflag) { - struct hentry* he; // hash entry of root word or NULL - // on entry prefix is 0 length or already matches the beginning of the word. // So if the remaining root word has positive length // and if there are enough chars in root word and added back strip chars @@ -324,8 +301,9 @@ struct hentry* PfxEntry::check_twosfx(const char* word, // cross checked combined with a suffix if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) { - he = pmyMgr->suffix_check_twosfx(tmpword.c_str(), tmpl, aeXPRODUCT, this, - needflag); + // hash entry of root word or NULL + struct hentry* he = pmyMgr->suffix_check_twosfx(tmpword.c_str(), tmpl, aeXPRODUCT, this, + needflag); if (he) return he; } @@ -335,15 +313,15 @@ struct hentry* PfxEntry::check_twosfx(const char* word, } // check if this prefix entry matches -char* PfxEntry::check_twosfx_morph(const char* word, - int len, - char in_compound, - const FLAG needflag) { +std::string PfxEntry::check_twosfx_morph(const char* word, + int len, + char in_compound, + const FLAG needflag) { + std::string result; // on entry prefix is 0 length or already matches the beginning of the word. // So if the remaining root word has positive length // and if there are enough chars in root word and added back strip chars // to meet the number of characters conditions, then test it - int tmpl = len - appnd.size(); // length of tmpword if ((tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) && @@ -370,22 +348,21 @@ char* PfxEntry::check_twosfx_morph(const char* word, // ross checked combined with a suffix if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) { - return pmyMgr->suffix_check_twosfx_morph(tmpword.c_str(), tmpl, - aeXPRODUCT, - this, needflag); + result = pmyMgr->suffix_check_twosfx_morph(tmpword.c_str(), tmpl, + aeXPRODUCT, + this, needflag); } } } - return NULL; + return result; } // check if this prefix entry matches -char* PfxEntry::check_morph(const char* word, - int len, - char in_compound, - const FLAG needflag) { - struct hentry* he; // hash entry of root word or NULL - char* st; +std::string PfxEntry::check_morph(const char* word, + int len, + char in_compound, + const FLAG needflag) { + std::string result; // on entry prefix is 0 length or already matches the beginning of the word. // So if the remaining root word has positive length @@ -411,9 +388,8 @@ char* PfxEntry::check_morph(const char* word, // root word in the dictionary if (test_condition(tmpword.c_str())) { - std::string result; - tmpl += strip.size(); + struct hentry* he; // hash entry of root word or NULL if ((he = pmyMgr->lookup(tmpword.c_str())) != NULL) { do { if (TESTAFF(he->astr, aflag, he->alen) && @@ -455,23 +431,19 @@ char* PfxEntry::check_morph(const char* word, // ross checked combined with a suffix if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) { - st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, aeXPRODUCT, this, - FLAG_NULL, needflag); - if (st) { + std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, aeXPRODUCT, this, + FLAG_NULL, needflag); + if (!st.empty()) { result.append(st); - free(st); } } - - if (!result.empty()) - return mystrdup(result.c_str()); } } - return NULL; + return result; } -SfxEntry::SfxEntry(AffixMgr* pmgr, affentry* dp) +SfxEntry::SfxEntry(AffixMgr* pmgr) : pmyMgr(pmgr) // register affix manager , next(NULL), @@ -481,50 +453,21 @@ SfxEntry::SfxEntry(AffixMgr* pmgr, affentry* dp) l_morph(NULL), r_morph(NULL), eq_morph(NULL) { - // set up its initial values - aflag = dp->aflag; // char flag - strip = dp->strip; // string to strip - appnd = dp->appnd; // string to append - numconds = dp->numconds; // length of the condition - opts = dp->opts; // cross product flag - - // then copy over all of the conditions - if (opts & aeLONGCOND) { - memcpy(c.l.conds1, dp->c.l.conds1, MAXCONDLEN_1); - c.l.conds2 = dp->c.l.conds2; - } else - memcpy(c.conds, dp->c.conds, MAXCONDLEN); - rappnd = appnd; - reverseword(rappnd); - morphcode = dp->morphcode; - contclass = dp->contclass; - contclasslen = dp->contclasslen; -} - -SfxEntry::~SfxEntry() { - aflag = 0; - pmyMgr = NULL; - if (opts & aeLONGCOND) - free(c.l.conds2); - if (morphcode && !(opts & aeALIASM)) - free(morphcode); - if (contclass && !(opts & aeALIASF)) - free(contclass); } // add suffix to this word assuming conditions hold -char* SfxEntry::add(const char* word, size_t len) { +std::string SfxEntry::add(const char* word, size_t len) { + std::string result; /* make sure all conditions match */ if ((len > strip.size() || (len == 0 && pmyMgr->get_fullstrip())) && (len >= numconds) && test_condition(word + len, word) && (!strip.size() || (strcmp(word + len - strip.size(), strip.c_str()) == 0))) { - std::string tword(word); + result.assign(word); /* we have a match so add suffix */ - tword.replace(len - strip.size(), std::string::npos, appnd); - return mystrdup(tword.c_str()); + result.replace(len - strip.size(), std::string::npos, appnd); } - return NULL; + return result; } inline char* SfxEntry::nextchar(char* p) { @@ -669,9 +612,6 @@ struct hentry* SfxEntry::checkword(const char* word, int len, int optflags, PfxEntry* ppfx, - char** wlst, - int maxSug, - int* ns, const FLAG cclass, const FLAG needflag, const FLAG badflag) { @@ -742,27 +682,6 @@ struct hentry* SfxEntry::checkword(const char* word, return he; he = he->next_homonym; // check homonyms } while (he); - - // obsolote stemming code (used only by the - // experimental SuffixMgr:suggest_pos_stems) - // store resulting root in wlst - } else if (wlst && (*ns < maxSug)) { - int cwrd = 1; - for (int k = 0; k < *ns; k++) - if (strcmp(tmpword, wlst[k]) == 0) { - cwrd = 0; - break; - } - if (cwrd) { - wlst[*ns] = mystrdup(tmpword); - if (wlst[*ns] == NULL) { - for (int j = 0; j < *ns; j++) - free(wlst[j]); - *ns = -1; - return NULL; - } - (*ns)++; - } } } } @@ -775,7 +694,6 @@ struct hentry* SfxEntry::check_twosfx(const char* word, int optflags, PfxEntry* ppfx, const FLAG needflag) { - struct hentry* he; // hash entry pointer PfxEntry* ep = ppfx; // if this suffix is being cross checked with a prefix @@ -813,17 +731,18 @@ struct hentry* SfxEntry::check_twosfx(const char* word, // if all conditions are met then recall suffix_check if (test_condition(end, beg)) { + struct hentry* he; // hash entry pointer if (ppfx) { // handle conditional suffix if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen)) - he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, 0, NULL, NULL, 0, NULL, - (FLAG)aflag, needflag); + he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, 0, NULL, + (FLAG)aflag, needflag, IN_CPD_NOT); else - he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, optflags, ppfx, NULL, 0, - NULL, (FLAG)aflag, needflag); + he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, optflags, ppfx, + (FLAG)aflag, needflag, IN_CPD_NOT); } else { - he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, 0, NULL, NULL, 0, NULL, - (FLAG)aflag, needflag); + he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, 0, NULL, + (FLAG)aflag, needflag, IN_CPD_NOT); } if (he) return he; @@ -833,23 +752,20 @@ struct hentry* SfxEntry::check_twosfx(const char* word, } // see if two-level suffix is present in the word -char* SfxEntry::check_twosfx_morph(const char* word, - int len, - int optflags, - PfxEntry* ppfx, - const FLAG needflag) { +std::string SfxEntry::check_twosfx_morph(const char* word, + int len, + int optflags, + PfxEntry* ppfx, + const FLAG needflag) { PfxEntry* ep = ppfx; - char* st; - - char result[MAXLNLEN]; - *result = '\0'; + std::string result; // if this suffix is being cross checked with a prefix // but it does not support cross products skip it if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0) - return NULL; + return result; // upon entry suffix is 0 length or already matches the end of the word. // So if the remaining root word has positive length @@ -883,40 +799,34 @@ char* SfxEntry::check_twosfx_morph(const char* word, if (ppfx) { // handle conditional suffix if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen)) { - st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, 0, NULL, aflag, - needflag); - if (st) { + std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, 0, NULL, aflag, + needflag); + if (!st.empty()) { if (ppfx->getMorph()) { - mystrcat(result, ppfx->getMorph(), MAXLNLEN); - mystrcat(result, " ", MAXLNLEN); + result.append(ppfx->getMorph()); + result.append(" "); } - mystrcat(result, st, MAXLNLEN); - free(st); + result.append(st); mychomp(result); } } else { - st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, optflags, ppfx, aflag, - needflag); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, optflags, ppfx, aflag, + needflag); + if (!st.empty()) { + result.append(st); mychomp(result); } } } else { - st = - pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, 0, NULL, aflag, needflag); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, 0, NULL, aflag, needflag); + if (!st.empty()) { + result.append(st); mychomp(result); } } - if (*result) - return mystrdup(result); } } - return NULL; + return result; } // get next homonym with same affix @@ -948,6 +858,11 @@ struct hentry* SfxEntry::get_next_homonym(struct hentry* he, return NULL; } +void SfxEntry::initReverseWord() { + rappnd = appnd; + reverseword(rappnd); +} + #if 0 Appendix: Understanding Affix Code diff --git a/libs/hunspell/src/affentry.hxx b/libs/hunspell/src/affentry.hxx index 6311d83fff..4bafc043f4 100644 --- a/libs/hunspell/src/affentry.hxx +++ b/libs/hunspell/src/affentry.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -71,10 +68,8 @@ * SUCH DAMAGE. */ -#ifndef _AFFIX_HXX_ -#define _AFFIX_HXX_ - -#include "hunvisapi.h" +#ifndef AFFIX_HXX_ +#define AFFIX_HXX_ #include "atypes.hxx" #include "baseaffix.hxx" @@ -82,7 +77,7 @@ /* A Prefix Entry */ -class LIBHUNSPELL_DLL_EXPORTED PfxEntry : protected AffEntry { +class PfxEntry : public AffEntry { private: PfxEntry(const PfxEntry&); PfxEntry& operator=(const PfxEntry&); @@ -96,10 +91,9 @@ class LIBHUNSPELL_DLL_EXPORTED PfxEntry : protected AffEntry { PfxEntry* flgnxt; public: - PfxEntry(AffixMgr* pmgr, affentry* dp); - ~PfxEntry(); + explicit PfxEntry(AffixMgr* pmgr); - inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); } + bool allowCross() const { return ((opts & aeXPRODUCT) != 0); } struct hentry* checkword(const char* word, int len, char in_compound, @@ -110,19 +104,19 @@ class LIBHUNSPELL_DLL_EXPORTED PfxEntry : protected AffEntry { char in_compound, const FLAG needflag = FLAG_NULL); - char* check_morph(const char* word, - int len, - char in_compound, - const FLAG needflag = FLAG_NULL); + std::string check_morph(const char* word, + int len, + char in_compound, + const FLAG needflag = FLAG_NULL); - char* check_twosfx_morph(const char* word, - int len, - char in_compound, - const FLAG needflag = FLAG_NULL); + std::string check_twosfx_morph(const char* word, + int len, + char in_compound, + const FLAG needflag = FLAG_NULL); - inline FLAG getFlag() { return aflag; } - inline const char* getKey() { return appnd.c_str(); } - char* add(const char* word, size_t len); + FLAG getFlag() { return aflag; } + const char* getKey() { return appnd.c_str(); } + std::string add(const char* word, size_t len); inline short getKeyLen() { return appnd.size(); } @@ -147,7 +141,7 @@ class LIBHUNSPELL_DLL_EXPORTED PfxEntry : protected AffEntry { /* A Suffix Entry */ -class LIBHUNSPELL_DLL_EXPORTED SfxEntry : protected AffEntry { +class SfxEntry : public AffEntry { private: SfxEntry(const SfxEntry&); SfxEntry& operator=(const SfxEntry&); @@ -166,20 +160,16 @@ class LIBHUNSPELL_DLL_EXPORTED SfxEntry : protected AffEntry { SfxEntry* eq_morph; public: - SfxEntry(AffixMgr* pmgr, affentry* dp); - ~SfxEntry(); + explicit SfxEntry(AffixMgr* pmgr); - inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); } + bool allowCross() const { return ((opts & aeXPRODUCT) != 0); } struct hentry* checkword(const char* word, int len, int optflags, PfxEntry* ppfx, - char** wlst, - int maxSug, - int* ns, - const FLAG cclass = FLAG_NULL, - const FLAG needflag = FLAG_NULL, - const FLAG badflag = FLAG_NULL); + const FLAG cclass, + const FLAG needflag, + const FLAG badflag); struct hentry* check_twosfx(const char* word, int len, @@ -187,11 +177,11 @@ class LIBHUNSPELL_DLL_EXPORTED SfxEntry : protected AffEntry { PfxEntry* ppfx, const FLAG needflag = FLAG_NULL); - char* check_twosfx_morph(const char* word, - int len, - int optflags, - PfxEntry* ppfx, - const FLAG needflag = FLAG_NULL); + std::string check_twosfx_morph(const char* word, + int len, + int optflags, + PfxEntry* ppfx, + const FLAG needflag = FLAG_NULL); struct hentry* get_next_homonym(struct hentry* he); struct hentry* get_next_homonym(struct hentry* word, int optflags, @@ -199,9 +189,9 @@ class LIBHUNSPELL_DLL_EXPORTED SfxEntry : protected AffEntry { const FLAG cclass, const FLAG needflag); - inline FLAG getFlag() { return aflag; } - inline const char* getKey() { return rappnd.c_str(); } - char* add(const char* word, size_t len); + FLAG getFlag() { return aflag; } + const char* getKey() { return rappnd.c_str(); } + std::string add(const char* word, size_t len); inline const char* getMorph() { return morphcode; } @@ -224,6 +214,7 @@ class LIBHUNSPELL_DLL_EXPORTED SfxEntry : protected AffEntry { inline void setNextNE(SfxEntry* ptr) { nextne = ptr; } inline void setNextEQ(SfxEntry* ptr) { nexteq = ptr; } inline void setFlgNxt(SfxEntry* ptr) { flgnxt = ptr; } + void initReverseWord(); inline char* nextchar(char* p); inline int test_condition(const char* st, const char* begin); diff --git a/libs/hunspell/src/affixmgr.c++ b/libs/hunspell/src/affixmgr.cxx index d6bb677982..ffce7bb1bd 100644 --- a/libs/hunspell/src/affixmgr.c++ +++ b/libs/hunspell/src/affixmgr.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -88,33 +85,24 @@ #include "csutil.hxx" AffixMgr::AffixMgr(const char* affpath, - HashMgr** ptr, - int* md, - const char* key) { + const std::vector<HashMgr*>& ptr, + const char* key) + : alldic(ptr) + , pHMgr(ptr[0]) { + // register hash manager and load affix data from aff file - pHMgr = ptr[0]; - alldic = ptr; - maxdic = md; - keystring = NULL; - trystring = NULL; - encoding = NULL; csconv = NULL; utf8 = 0; complexprefixes = 0; - maptable = NULL; - nummap = 0; - breaktable = NULL; - numbreak = -1; - reptable = NULL; - numrep = 0; + parsedmaptable = false; + parsedbreaktable = false; + parsedrep = false; iconvtable = NULL; oconvtable = NULL; - checkcpdtable = NULL; // allow simplified compound forms (see 3rd field of CHECKCOMPOUNDPATTERN) simplifiedcpd = 0; - numcheckcpd = 0; - defcpdtable = NULL; - numdefcpd = 0; + parsedcheckcpd = false; + parseddefcpd = false; phone = NULL; compoundflag = FLAG_NULL; // permits word in compound forms compoundbegin = FLAG_NULL; // may be first word in compound forms @@ -135,25 +123,15 @@ AffixMgr::AffixMgr(const char* affpath, forbiddenword = FORBIDDENWORD; // forbidden word signing flag nosuggest = FLAG_NULL; // don't suggest words signed with NOSUGGEST flag nongramsuggest = FLAG_NULL; - lang = NULL; // language langnum = 0; // language code (see http://l10n.openoffice.org/languages.html) needaffix = FLAG_NULL; // forbidden root, allowed only with suffixes cpdwordmax = -1; // default: unlimited wordcount in compound words cpdmin = -1; // undefined cpdmaxsyllable = 0; // default: unlimited syllablecount in compound words - cpdvowels = NULL; // vowels (for calculating of Hungarian compounding limit, - // O(n) search! XXX) - cpdvowels_utf16 = - NULL; // vowels for UTF-8 encoding (bsearch instead of O(n) search) - cpdvowels_utf16_len = 0; // vowels pfxappnd = NULL; // previous prefix for counting syllables of the prefix BUG sfxappnd = NULL; // previous suffix for counting syllables of the suffix BUG sfxextra = 0; // modifier for syllable count of sfxappnd BUG - cpdsyllablenum = NULL; // syllable count incrementing flag checknum = 0; // checking numbers, and word with numbers - wordchars = NULL; // letters + spec. word characters - ignorechars = NULL; // letters + spec. word characters - version = NULL; // affix and dictionary file version string havecontclass = 0; // flags of possible continuing classes (double affix) // LEMMA_PRESENT: not put root into the morphological output. Lemma presents // in morhological description in dictionary file. It's often combined with @@ -225,83 +203,10 @@ AffixMgr::~AffixMgr() { sStart[j] = NULL; } - if (keystring) - free(keystring); - keystring = NULL; - if (trystring) - free(trystring); - trystring = NULL; - if (encoding) - free(encoding); - encoding = NULL; - if (maptable) { - for (int j = 0; j < nummap; j++) { - for (int k = 0; k < maptable[j].len; k++) { - if (maptable[j].set[k]) - free(maptable[j].set[k]); - } - free(maptable[j].set); - maptable[j].set = NULL; - maptable[j].len = 0; - } - free(maptable); - maptable = NULL; - } - nummap = 0; - if (breaktable) { - for (int j = 0; j < numbreak; j++) { - if (breaktable[j]) - free(breaktable[j]); - breaktable[j] = NULL; - } - free(breaktable); - breaktable = NULL; - } - numbreak = 0; - if (reptable) { - for (int j = 0; j < numrep; j++) { - free(reptable[j].pattern); - free(reptable[j].pattern2); - } - free(reptable); - reptable = NULL; - } - if (iconvtable) - delete iconvtable; - if (oconvtable) - delete oconvtable; - if (phone && phone->rules) { - for (int j = 0; j < phone->num + 1; j++) { - free(phone->rules[j * 2]); - free(phone->rules[j * 2 + 1]); - } - free(phone->rules); - free(phone); - phone = NULL; - } + delete iconvtable; + delete oconvtable; + delete phone; - if (defcpdtable) { - for (int j = 0; j < numdefcpd; j++) { - free(defcpdtable[j].def); - defcpdtable[j].def = NULL; - } - free(defcpdtable); - defcpdtable = NULL; - } - numrep = 0; - if (checkcpdtable) { - for (int j = 0; j < numcheckcpd; j++) { - free(checkcpdtable[j].pattern); - free(checkcpdtable[j].pattern2); - free(checkcpdtable[j].pattern3); - checkcpdtable[j].pattern = NULL; - checkcpdtable[j].pattern2 = NULL; - checkcpdtable[j].pattern3 = NULL; - } - free(checkcpdtable); - checkcpdtable = NULL; - } - numcheckcpd = 0; FREE_FLAG(compoundflag); FREE_FLAG(compoundbegin); FREE_FLAG(compoundmiddle); @@ -321,21 +226,7 @@ AffixMgr::~AffixMgr() { pHMgr = NULL; cpdmin = 0; cpdmaxsyllable = 0; - if (cpdvowels) - free(cpdvowels); - if (cpdvowels_utf16) - free(cpdvowels_utf16); - if (cpdsyllablenum) - free(cpdsyllablenum); free_utf_tbl(); - if (lang) - free(lang); - if (wordchars) - free(wordchars); - if (ignorechars) - free(ignorechars); - if (version) - free(version); checknum = 0; #ifdef MOZILLA_CLIENT delete[] csconv; @@ -352,8 +243,6 @@ void AffixMgr::finishFileMgr(FileMgr* afflst) { // read in aff file and build up prefix and suffix entry objects int AffixMgr::parse_file(const char* affpath, const char* key) { - char* line; // io buffers - char ft; // affix type // checking flag duplication char dupflags[CONTSIZE]; @@ -375,7 +264,8 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { // read in each line ignoring any that do not // start with a known line type indicator - while ((line = afflst->getline()) != NULL) { + std::string line; + while (afflst->getline(line)) { mychomp(line); /* remove byte order mark */ @@ -383,41 +273,38 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { firstline = 0; // Affix file begins with byte order mark: possible incompatibility with // old Hunspell versions - if (strncmp(line, "\xEF\xBB\xBF", 3) == 0) { - memmove(line, line + 3, strlen(line + 3) + 1); + if (line.compare(0, 3, "\xEF\xBB\xBF", 3) == 0) { + line.erase(0, 3); } } /* parse in the keyboard string */ - if (strncmp(line, "KEY", 3) == 0) { - if (parse_string(line, &keystring, afflst->getlinenum())) { + if (line.compare(0, 3, "KEY", 3) == 0) { + if (!parse_string(line, keystring, afflst->getlinenum())) { finishFileMgr(afflst); return 1; } } /* parse in the try string */ - if (strncmp(line, "TRY", 3) == 0) { - if (parse_string(line, &trystring, afflst->getlinenum())) { + if (line.compare(0, 3, "TRY", 3) == 0) { + if (!parse_string(line, trystring, afflst->getlinenum())) { finishFileMgr(afflst); return 1; } } /* parse in the name of the character set used by the .dict and .aff */ - if (strncmp(line, "SET", 3) == 0) { - if (parse_string(line, &encoding, afflst->getlinenum())) { + if (line.compare(0, 3, "SET", 3) == 0) { + if (!parse_string(line, encoding, afflst->getlinenum())) { finishFileMgr(afflst); return 1; } - if (strcmp(encoding, "UTF-8") == 0) { + if (encoding == "UTF-8") { utf8 = 1; #ifndef OPENOFFICEORG #ifndef MOZILLA_CLIENT - if (initialize_utf_tbl()) { - finishFileMgr(afflst); - return 1; - } + initialize_utf_tbl(); #endif #endif } @@ -425,26 +312,26 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { /* parse COMPLEXPREFIXES for agglutinative languages with right-to-left * writing system */ - if (strncmp(line, "COMPLEXPREFIXES", 15) == 0) + if (line.compare(0, 15, "COMPLEXPREFIXES", 15) == 0) complexprefixes = 1; /* parse in the flag used by the controlled compound words */ - if (strncmp(line, "COMPOUNDFLAG", 12) == 0) { - if (parse_flag(line, &compoundflag, afflst)) { + if (line.compare(0, 12, "COMPOUNDFLAG", 12) == 0) { + if (!parse_flag(line, &compoundflag, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by compound words */ - if (strncmp(line, "COMPOUNDBEGIN", 13) == 0) { + if (line.compare(0, 13, "COMPOUNDBEGIN", 13) == 0) { if (complexprefixes) { - if (parse_flag(line, &compoundend, afflst)) { + if (!parse_flag(line, &compoundend, afflst)) { finishFileMgr(afflst); return 1; } } else { - if (parse_flag(line, &compoundbegin, afflst)) { + if (!parse_flag(line, &compoundbegin, afflst)) { finishFileMgr(afflst); return 1; } @@ -452,21 +339,22 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { } /* parse in the flag used by compound words */ - if (strncmp(line, "COMPOUNDMIDDLE", 14) == 0) { - if (parse_flag(line, &compoundmiddle, afflst)) { + if (line.compare(0, 14, "COMPOUNDMIDDLE", 14) == 0) { + if (!parse_flag(line, &compoundmiddle, afflst)) { finishFileMgr(afflst); return 1; } } + /* parse in the flag used by compound words */ - if (strncmp(line, "COMPOUNDEND", 11) == 0) { + if (line.compare(0, 11, "COMPOUNDEND", 11) == 0) { if (complexprefixes) { - if (parse_flag(line, &compoundbegin, afflst)) { + if (!parse_flag(line, &compoundbegin, afflst)) { finishFileMgr(afflst); return 1; } } else { - if (parse_flag(line, &compoundend, afflst)) { + if (!parse_flag(line, &compoundend, afflst)) { finishFileMgr(afflst); return 1; } @@ -474,126 +362,126 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { } /* parse in the data used by compound_check() method */ - if (strncmp(line, "COMPOUNDWORDMAX", 15) == 0) { - if (parse_num(line, &cpdwordmax, afflst)) { + if (line.compare(0, 15, "COMPOUNDWORDMAX", 15) == 0) { + if (!parse_num(line, &cpdwordmax, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag sign compounds in dictionary */ - if (strncmp(line, "COMPOUNDROOT", 12) == 0) { - if (parse_flag(line, &compoundroot, afflst)) { + if (line.compare(0, 12, "COMPOUNDROOT", 12) == 0) { + if (!parse_flag(line, &compoundroot, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by compound_check() method */ - if (strncmp(line, "COMPOUNDPERMITFLAG", 18) == 0) { - if (parse_flag(line, &compoundpermitflag, afflst)) { + if (line.compare(0, 18, "COMPOUNDPERMITFLAG", 18) == 0) { + if (!parse_flag(line, &compoundpermitflag, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by compound_check() method */ - if (strncmp(line, "COMPOUNDFORBIDFLAG", 18) == 0) { - if (parse_flag(line, &compoundforbidflag, afflst)) { + if (line.compare(0, 18, "COMPOUNDFORBIDFLAG", 18) == 0) { + if (!parse_flag(line, &compoundforbidflag, afflst)) { finishFileMgr(afflst); return 1; } } - if (strncmp(line, "COMPOUNDMORESUFFIXES", 20) == 0) { + if (line.compare(0, 20, "COMPOUNDMORESUFFIXES", 20) == 0) { compoundmoresuffixes = 1; } - if (strncmp(line, "CHECKCOMPOUNDDUP", 16) == 0) { + if (line.compare(0, 16, "CHECKCOMPOUNDDUP", 16) == 0) { checkcompounddup = 1; } - if (strncmp(line, "CHECKCOMPOUNDREP", 16) == 0) { + if (line.compare(0, 16, "CHECKCOMPOUNDREP", 16) == 0) { checkcompoundrep = 1; } - if (strncmp(line, "CHECKCOMPOUNDTRIPLE", 19) == 0) { + if (line.compare(0, 19, "CHECKCOMPOUNDTRIPLE", 19) == 0) { checkcompoundtriple = 1; } - if (strncmp(line, "SIMPLIFIEDTRIPLE", 16) == 0) { + if (line.compare(0, 16, "SIMPLIFIEDTRIPLE", 16) == 0) { simplifiedtriple = 1; } - if (strncmp(line, "CHECKCOMPOUNDCASE", 17) == 0) { + if (line.compare(0, 17, "CHECKCOMPOUNDCASE", 17) == 0) { checkcompoundcase = 1; } - if (strncmp(line, "NOSUGGEST", 9) == 0) { - if (parse_flag(line, &nosuggest, afflst)) { + if (line.compare(0, 9, "NOSUGGEST", 9) == 0) { + if (!parse_flag(line, &nosuggest, afflst)) { finishFileMgr(afflst); return 1; } } - if (strncmp(line, "NONGRAMSUGGEST", 14) == 0) { - if (parse_flag(line, &nongramsuggest, afflst)) { + if (line.compare(0, 14, "NONGRAMSUGGEST", 14) == 0) { + if (!parse_flag(line, &nongramsuggest, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by forbidden words */ - if (strncmp(line, "FORBIDDENWORD", 13) == 0) { - if (parse_flag(line, &forbiddenword, afflst)) { + if (line.compare(0, 13, "FORBIDDENWORD", 13) == 0) { + if (!parse_flag(line, &forbiddenword, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by forbidden words */ - if (strncmp(line, "LEMMA_PRESENT", 13) == 0) { - if (parse_flag(line, &lemma_present, afflst)) { + if (line.compare(0, 13, "LEMMA_PRESENT", 13) == 0) { + if (!parse_flag(line, &lemma_present, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by circumfixes */ - if (strncmp(line, "CIRCUMFIX", 9) == 0) { - if (parse_flag(line, &circumfix, afflst)) { + if (line.compare(0, 9, "CIRCUMFIX", 9) == 0) { + if (!parse_flag(line, &circumfix, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by fogemorphemes */ - if (strncmp(line, "ONLYINCOMPOUND", 14) == 0) { - if (parse_flag(line, &onlyincompound, afflst)) { + if (line.compare(0, 14, "ONLYINCOMPOUND", 14) == 0) { + if (!parse_flag(line, &onlyincompound, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by `needaffixs' */ - if (strncmp(line, "PSEUDOROOT", 10) == 0) { - if (parse_flag(line, &needaffix, afflst)) { + if (line.compare(0, 10, "PSEUDOROOT", 10) == 0) { + if (!parse_flag(line, &needaffix, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by `needaffixs' */ - if (strncmp(line, "NEEDAFFIX", 9) == 0) { - if (parse_flag(line, &needaffix, afflst)) { + if (line.compare(0, 9, "NEEDAFFIX", 9) == 0) { + if (!parse_flag(line, &needaffix, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the minimal length for words in compounds */ - if (strncmp(line, "COMPOUNDMIN", 11) == 0) { - if (parse_num(line, &cpdmin, afflst)) { + if (line.compare(0, 11, "COMPOUNDMIN", 11) == 0) { + if (!parse_num(line, &cpdmin, afflst)) { finishFileMgr(afflst); return 1; } @@ -602,29 +490,29 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { } /* parse in the max. words and syllables in compounds */ - if (strncmp(line, "COMPOUNDSYLLABLE", 16) == 0) { - if (parse_cpdsyllable(line, afflst)) { + if (line.compare(0, 16, "COMPOUNDSYLLABLE", 16) == 0) { + if (!parse_cpdsyllable(line, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by compound_check() method */ - if (strncmp(line, "SYLLABLENUM", 11) == 0) { - if (parse_string(line, &cpdsyllablenum, afflst->getlinenum())) { + if (line.compare(0, 11, "SYLLABLENUM", 11) == 0) { + if (!parse_string(line, cpdsyllablenum, afflst->getlinenum())) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by the controlled compound words */ - if (strncmp(line, "CHECKNUM", 8) == 0) { + if (line.compare(0, 8, "CHECKNUM", 8) == 0) { checknum = 1; } /* parse in the extra word characters */ - if (strncmp(line, "WORDCHARS", 9) == 0) { - if (!parse_array(line, &wordchars, wordchars_utf16, + if (line.compare(0, 9, "WORDCHARS", 9) == 0) { + if (!parse_array(line, wordchars, wordchars_utf16, utf8, afflst->getlinenum())) { finishFileMgr(afflst); return 1; @@ -633,8 +521,8 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { /* parse in the ignored characters (for example, Arabic optional diacretics * charachters */ - if (strncmp(line, "IGNORE", 6) == 0) { - if (!parse_array(line, &ignorechars, ignorechars_utf16, + if (line.compare(0, 6, "IGNORE", 6) == 0) { + if (!parse_array(line, ignorechars, ignorechars_utf16, utf8, afflst->getlinenum())) { finishFileMgr(afflst); return 1; @@ -642,172 +530,174 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { } /* parse in the typical fault correcting table */ - if (strncmp(line, "REP", 3) == 0) { - if (parse_reptable(line, afflst)) { + if (line.compare(0, 3, "REP", 3) == 0) { + if (!parse_reptable(line, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the input conversion table */ - if (strncmp(line, "ICONV", 5) == 0) { - if (parse_convtable(line, afflst, &iconvtable, "ICONV")) { + if (line.compare(0, 5, "ICONV", 5) == 0) { + if (!parse_convtable(line, afflst, &iconvtable, "ICONV")) { finishFileMgr(afflst); return 1; } } /* parse in the input conversion table */ - if (strncmp(line, "OCONV", 5) == 0) { - if (parse_convtable(line, afflst, &oconvtable, "OCONV")) { + if (line.compare(0, 5, "OCONV", 5) == 0) { + if (!parse_convtable(line, afflst, &oconvtable, "OCONV")) { finishFileMgr(afflst); return 1; } } /* parse in the phonetic translation table */ - if (strncmp(line, "PHONE", 5) == 0) { - if (parse_phonetable(line, afflst)) { + if (line.compare(0, 5, "PHONE", 5) == 0) { + if (!parse_phonetable(line, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the checkcompoundpattern table */ - if (strncmp(line, "CHECKCOMPOUNDPATTERN", 20) == 0) { - if (parse_checkcpdtable(line, afflst)) { + if (line.compare(0, 20, "CHECKCOMPOUNDPATTERN", 20) == 0) { + if (!parse_checkcpdtable(line, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the defcompound table */ - if (strncmp(line, "COMPOUNDRULE", 12) == 0) { - if (parse_defcpdtable(line, afflst)) { + if (line.compare(0, 12, "COMPOUNDRULE", 12) == 0) { + if (!parse_defcpdtable(line, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the related character map table */ - if (strncmp(line, "MAP", 3) == 0) { - if (parse_maptable(line, afflst)) { + if (line.compare(0, 3, "MAP", 3) == 0) { + if (!parse_maptable(line, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the word breakpoints table */ - if (strncmp(line, "BREAK", 5) == 0) { - if (parse_breaktable(line, afflst)) { + if (line.compare(0, 5, "BREAK", 5) == 0) { + if (!parse_breaktable(line, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the language for language specific codes */ - if (strncmp(line, "LANG", 4) == 0) { - if (parse_string(line, &lang, afflst->getlinenum())) { + if (line.compare(0, 4, "LANG", 4) == 0) { + if (!parse_string(line, lang, afflst->getlinenum())) { finishFileMgr(afflst); return 1; } langnum = get_lang_num(lang); } - if (strncmp(line, "VERSION", 7) == 0) { - for (line = line + 7; *line == ' ' || *line == '\t'; line++) - ; - version = mystrdup(line); + if (line.compare(0, 7, "VERSION", 7) == 0) { + size_t startpos = line.find_first_not_of(" \t", 7); + if (startpos != std::string::npos) { + version = line.substr(startpos); + } } - if (strncmp(line, "MAXNGRAMSUGS", 12) == 0) { - if (parse_num(line, &maxngramsugs, afflst)) { + if (line.compare(0, 12, "MAXNGRAMSUGS", 12) == 0) { + if (!parse_num(line, &maxngramsugs, afflst)) { finishFileMgr(afflst); return 1; } } - if (strncmp(line, "ONLYMAXDIFF", 11) == 0) + if (line.compare(0, 11, "ONLYMAXDIFF", 11) == 0) onlymaxdiff = 1; - if (strncmp(line, "MAXDIFF", 7) == 0) { - if (parse_num(line, &maxdiff, afflst)) { + if (line.compare(0, 7, "MAXDIFF", 7) == 0) { + if (!parse_num(line, &maxdiff, afflst)) { finishFileMgr(afflst); return 1; } } - if (strncmp(line, "MAXCPDSUGS", 10) == 0) { - if (parse_num(line, &maxcpdsugs, afflst)) { + if (line.compare(0, 10, "MAXCPDSUGS", 10) == 0) { + if (!parse_num(line, &maxcpdsugs, afflst)) { finishFileMgr(afflst); return 1; } } - if (strncmp(line, "NOSPLITSUGS", 11) == 0) { + if (line.compare(0, 11, "NOSPLITSUGS", 11) == 0) { nosplitsugs = 1; } - if (strncmp(line, "FULLSTRIP", 9) == 0) { + if (line.compare(0, 9, "FULLSTRIP", 9) == 0) { fullstrip = 1; } - if (strncmp(line, "SUGSWITHDOTS", 12) == 0) { + if (line.compare(0, 12, "SUGSWITHDOTS", 12) == 0) { sugswithdots = 1; } /* parse in the flag used by forbidden words */ - if (strncmp(line, "KEEPCASE", 8) == 0) { - if (parse_flag(line, &keepcase, afflst)) { + if (line.compare(0, 8, "KEEPCASE", 8) == 0) { + if (!parse_flag(line, &keepcase, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by `forceucase' */ - if (strncmp(line, "FORCEUCASE", 10) == 0) { - if (parse_flag(line, &forceucase, afflst)) { + if (line.compare(0, 10, "FORCEUCASE", 10) == 0) { + if (!parse_flag(line, &forceucase, afflst)) { finishFileMgr(afflst); return 1; } } /* parse in the flag used by `warn' */ - if (strncmp(line, "WARN", 4) == 0) { - if (parse_flag(line, &warn, afflst)) { + if (line.compare(0, 4, "WARN", 4) == 0) { + if (!parse_flag(line, &warn, afflst)) { finishFileMgr(afflst); return 1; } } - if (strncmp(line, "FORBIDWARN", 10) == 0) { + if (line.compare(0, 10, "FORBIDWARN", 10) == 0) { forbidwarn = 1; } /* parse in the flag used by the affix generator */ - if (strncmp(line, "SUBSTANDARD", 11) == 0) { - if (parse_flag(line, &substandard, afflst)) { + if (line.compare(0, 11, "SUBSTANDARD", 11) == 0) { + if (!parse_flag(line, &substandard, afflst)) { finishFileMgr(afflst); return 1; } } - if (strncmp(line, "CHECKSHARPS", 11) == 0) { + if (line.compare(0, 11, "CHECKSHARPS", 11) == 0) { checksharps = 1; } /* parse this affix: P - prefix, S - suffix */ - ft = ' '; - if (strncmp(line, "PFX", 3) == 0) + // affix type + char ft = ' '; + if (line.compare(0, 3, "PFX", 3) == 0) ft = complexprefixes ? 'S' : 'P'; - if (strncmp(line, "SFX", 3) == 0) + if (line.compare(0, 3, "SFX", 3) == 0) ft = complexprefixes ? 'P' : 'S'; if (ft != ' ') { if (dupflags_ini) { memset(dupflags, 0, sizeof(dupflags)); dupflags_ini = 0; } - if (parse_affix(line, ft, afflst, dupflags)) { + if (!parse_affix(line, ft, afflst, dupflags)) { finishFileMgr(afflst); return 1; } @@ -848,37 +738,22 @@ int AffixMgr::parse_file(const char* affpath, const char* key) { /* get encoding for CHECKCOMPOUNDCASE */ if (!utf8) { - char* enc = get_encoding(); - csconv = get_current_cs(enc); - free(enc); - enc = NULL; - - std::string expw; - if (wordchars) { - expw.assign(wordchars); - free(wordchars); - } - + csconv = get_current_cs(get_encoding()); for (int i = 0; i <= 255; i++) { if ((csconv[i].cupper != csconv[i].clower) && - (expw.find((char)i) == std::string::npos)) { - expw.push_back((char)i); + (wordchars.find((char)i) == std::string::npos)) { + wordchars.push_back((char)i); } } - wordchars = mystrdup(expw.c_str()); } // default BREAK definition - if (numbreak == -1) { - breaktable = (char**)malloc(sizeof(char*) * 3); - if (!breaktable) - return 1; - breaktable[0] = mystrdup("-"); - breaktable[1] = mystrdup("^-"); - breaktable[2] = mystrdup("-$"); - if (breaktable[0] && breaktable[1] && breaktable[2]) - numbreak = 3; + if (!parsedbreaktable) { + breaktable.push_back("-"); + breaktable.push_back("^-"); + breaktable.push_back("-$"); + parsedbreaktable = true; } return 0; } @@ -949,6 +824,9 @@ int AffixMgr::build_pfxtree(PfxEntry* pfxptr) { // both by suffix flag, and sorted by the reverse of the // suffix string itself; so we need to set up two indexes int AffixMgr::build_sfxtree(SfxEntry* sfxptr) { + + sfxptr->initReverseWord(); + SfxEntry* ptr; SfxEntry* pptr; SfxEntry* ep = sfxptr; @@ -1143,17 +1021,6 @@ int AffixMgr::process_sfx_order() { } // add flags to the result for dictionary debugging -void AffixMgr::debugflag(char* result, unsigned short flag) { - char* st = encode_flag(flag); - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, MORPH_FLAG, MAXLNLEN); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); - } -} - -// add flags to the result for dictionary debugging std::string& AffixMgr::debugflag(std::string& result, unsigned short flag) { char* st = encode_flag(flag); result.append(" "); @@ -1181,13 +1048,18 @@ int AffixMgr::condlen(const char* st) { return l; } -int AffixMgr::encodeit(affentry& entry, const char* cs) { +int AffixMgr::encodeit(AffEntry& entry, const char* cs) { if (strcmp(cs, ".") != 0) { entry.numconds = (char)condlen(cs); - // coverity[buffer_size_warning] - deliberate use of lack of end of conds - // padded by strncpy as long condition flag - strncpy(entry.c.conds, cs, MAXCONDLEN); - if (entry.c.conds[MAXCONDLEN - 1] && cs[MAXCONDLEN]) { + const size_t cslen = strlen(cs); + const size_t short_part = std::min<size_t>(MAXCONDLEN, cslen); + memcpy(entry.c.conds, cs, short_part); + if (short_part < MAXCONDLEN) { + //blank out the remaining space + memset(entry.c.conds + short_part, 0, MAXCONDLEN - short_part); + } else if (cs[MAXCONDLEN]) { + //there is more conditions than fit in fixed space, so its + //a long condition entry.opts += aeLONGCOND; entry.c.l.conds2 = mystrdup(cs + MAXCONDLEN_1); if (!entry.c.l.conds2) @@ -1316,13 +1188,12 @@ struct hentry* AffixMgr::prefix_check_twosfx(const char* word, } // check word for prefixes -char* AffixMgr::prefix_check_morph(const char* word, - int len, - char in_compound, - const FLAG needflag) { +std::string AffixMgr::prefix_check_morph(const char* word, + int len, + char in_compound, + const FLAG needflag) { - char result[MAXLNLEN]; - result[0] = '\0'; + std::string result; pfx = NULL; sfxappnd = NULL; @@ -1331,12 +1202,10 @@ char* AffixMgr::prefix_check_morph(const char* word, // first handle the special case of 0 length prefixes PfxEntry* pe = pStart[0]; while (pe) { - char* st = pe->check_morph(word, len, in_compound, needflag); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + std::string st = pe->check_morph(word, len, in_compound, needflag); + if (!st.empty()) { + result.append(st); } - // if (rv) return rv; pe = pe->getNext(); } @@ -1346,16 +1215,15 @@ char* AffixMgr::prefix_check_morph(const char* word, while (pptr) { if (isSubset(pptr->getKey(), word)) { - char* st = pptr->check_morph(word, len, in_compound, needflag); - if (st) { + std::string st = pptr->check_morph(word, len, in_compound, needflag); + if (!st.empty()) { // fogemorpheme if ((in_compound != IN_CPD_NOT) || !((pptr->getCont() && (TESTAFF(pptr->getCont(), onlyincompound, pptr->getContLen()))))) { - mystrcat(result, st, MAXLNLEN); + result.append(st); pfx = pptr; } - free(st); } pptr = pptr->getNextEQ(); } else { @@ -1363,18 +1231,15 @@ char* AffixMgr::prefix_check_morph(const char* word, } } - if (*result) - return mystrdup(result); - return NULL; + return result; } // check word for prefixes -char* AffixMgr::prefix_check_twosfx_morph(const char* word, - int len, - char in_compound, - const FLAG needflag) { - char result[MAXLNLEN]; - result[0] = '\0'; +std::string AffixMgr::prefix_check_twosfx_morph(const char* word, + int len, + char in_compound, + const FLAG needflag) { + std::string result; pfx = NULL; sfxappnd = NULL; @@ -1383,10 +1248,9 @@ char* AffixMgr::prefix_check_twosfx_morph(const char* word, // first handle the special case of 0 length prefixes PfxEntry* pe = pStart[0]; while (pe) { - char* st = pe->check_twosfx_morph(word, len, in_compound, needflag); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + std::string st = pe->check_twosfx_morph(word, len, in_compound, needflag); + if (!st.empty()) { + result.append(st); } pe = pe->getNext(); } @@ -1397,10 +1261,9 @@ char* AffixMgr::prefix_check_twosfx_morph(const char* word, while (pptr) { if (isSubset(pptr->getKey(), word)) { - char* st = pptr->check_twosfx_morph(word, len, in_compound, needflag); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + std::string st = pptr->check_twosfx_morph(word, len, in_compound, needflag); + if (!st.empty()) { + result.append(st); pfx = pptr; } pptr = pptr->getNextEQ(); @@ -1409,29 +1272,31 @@ char* AffixMgr::prefix_check_twosfx_morph(const char* word, } } - if (*result) - return mystrdup(result); - return NULL; + return result; } // Is word a non compound with a REP substitution (see checkcompoundrep)? int AffixMgr::cpdrep_check(const char* word, int wl) { - if ((wl < 2) || !numrep) + if ((wl < 2) || reptable.empty()) return 0; - for (int i = 0; i < numrep; i++) { + for (size_t i = 0; i < reptable.size(); ++i) { const char* r = word; - int lenp = strlen(reptable[i].pattern); + const size_t lenp = reptable[i].pattern.size(); // search every occurence of the pattern in the word - while ((r = strstr(r, reptable[i].pattern)) != NULL) { + while ((r = strstr(r, reptable[i].pattern.c_str())) != NULL) { std::string candidate(word); - candidate.replace(r - word, lenp, reptable[i].pattern2); + size_t type = r == word && langnum != LANG_hu ? 1 : 0; + if (r - word + reptable[i].pattern.size() == lenp && langnum != LANG_hu) + type += 2; + candidate.replace(r - word, lenp, reptable[i].outstrings[type]); if (candidate_check(candidate.c_str(), candidate.size())) return 1; - r++; // search for the next letter + ++r; // search for the next letter } } + return 0; } @@ -1441,21 +1306,21 @@ int AffixMgr::cpdpat_check(const char* word, hentry* r1, hentry* r2, const char /*affixed*/) { - int len; - for (int i = 0; i < numcheckcpd; i++) { - if (isSubset(checkcpdtable[i].pattern2, word + pos) && + for (size_t i = 0; i < checkcpdtable.size(); ++i) { + size_t len; + if (isSubset(checkcpdtable[i].pattern2.c_str(), word + pos) && (!r1 || !checkcpdtable[i].cond || (r1->astr && TESTAFF(r1->astr, checkcpdtable[i].cond, r1->alen))) && (!r2 || !checkcpdtable[i].cond2 || (r2->astr && TESTAFF(r2->astr, checkcpdtable[i].cond2, r2->alen))) && // zero length pattern => only TESTAFF // zero pattern (0/flag) => unmodified stem (zero affixes allowed) - (!*(checkcpdtable[i].pattern) || - ((*(checkcpdtable[i].pattern) == '0' && r1->blen <= pos && + (checkcpdtable[i].pattern.empty() || + ((checkcpdtable[i].pattern[0] == '0' && r1->blen <= pos && strncmp(word + pos - r1->blen, r1->word, r1->blen) == 0) || - (*(checkcpdtable[i].pattern) != '0' && - ((len = strlen(checkcpdtable[i].pattern)) != 0) && - strncmp(word + pos - len, checkcpdtable[i].pattern, len) == 0)))) { + (checkcpdtable[i].pattern[0] != '0' && + ((len = checkcpdtable[i].pattern.size()) != 0) && + strncmp(word + pos - len, checkcpdtable[i].pattern.c_str(), len) == 0)))) { return 1; } } @@ -1513,7 +1378,6 @@ int AffixMgr::defcpd_check(hentry*** words, std::vector<metachar_data> btinfo(1); short bt = 0; - int i, j; (*words)[wnum] = rv; @@ -1525,10 +1389,10 @@ int AffixMgr::defcpd_check(hentry*** words, return 0; } int ok = 0; - for (i = 0; i < numdefcpd; i++) { - for (j = 0; j < defcpdtable[i].len; j++) { - if (defcpdtable[i].def[j] != '*' && defcpdtable[i].def[j] != '?' && - TESTAFF(rv->astr, defcpdtable[i].def[j], rv->alen)) { + for (size_t i = 0; i < defcpdtable.size(); ++i) { + for (size_t j = 0; j < defcpdtable[i].size(); ++j) { + if (defcpdtable[i][j] != '*' && defcpdtable[i][j] != '?' && + TESTAFF(rv->astr, defcpdtable[i][j], rv->alen)) { ok = 1; break; } @@ -1541,25 +1405,25 @@ int AffixMgr::defcpd_check(hentry*** words, return 0; } - for (i = 0; i < numdefcpd; i++) { - signed short pp = 0; // pattern position + for (size_t i = 0; i < defcpdtable.size(); ++i) { + size_t pp = 0; // pattern position signed short wp = 0; // "words" position int ok2; ok = 1; ok2 = 1; do { - while ((pp < defcpdtable[i].len) && (wp <= wnum)) { - if (((pp + 1) < defcpdtable[i].len) && - ((defcpdtable[i].def[pp + 1] == '*') || - (defcpdtable[i].def[pp + 1] == '?'))) { - int wend = (defcpdtable[i].def[pp + 1] == '?') ? wp : wnum; + while ((pp < defcpdtable[i].size()) && (wp <= wnum)) { + if (((pp + 1) < defcpdtable[i].size()) && + ((defcpdtable[i][pp + 1] == '*') || + (defcpdtable[i][pp + 1] == '?'))) { + int wend = (defcpdtable[i][pp + 1] == '?') ? wp : wnum; ok2 = 1; pp += 2; btinfo[bt].btpp = pp; btinfo[bt].btwp = wp; while (wp <= wend) { if (!(*words)[wp]->alen || - !TESTAFF((*words)[wp]->astr, defcpdtable[i].def[pp - 2], + !TESTAFF((*words)[wp]->astr, defcpdtable[i][pp - 2], (*words)[wp]->alen)) { ok2 = 0; break; @@ -1578,24 +1442,24 @@ int AffixMgr::defcpd_check(hentry*** words, } else { ok2 = 1; if (!(*words)[wp] || !(*words)[wp]->alen || - !TESTAFF((*words)[wp]->astr, defcpdtable[i].def[pp], + !TESTAFF((*words)[wp]->astr, defcpdtable[i][pp], (*words)[wp]->alen)) { ok = 0; break; } pp++; wp++; - if ((defcpdtable[i].len == pp) && !(wp > wnum)) + if ((defcpdtable[i].size() == pp) && !(wp > wnum)) ok = 0; } } if (ok && ok2) { - int r = pp; - while ((defcpdtable[i].len > r) && ((r + 1) < defcpdtable[i].len) && - ((defcpdtable[i].def[r + 1] == '*') || - (defcpdtable[i].def[r + 1] == '?'))) + size_t r = pp; + while ((defcpdtable[i].size() > r) && ((r + 1) < defcpdtable[i].size()) && + ((defcpdtable[i][r + 1] == '*') || + (defcpdtable[i][r + 1] == '?'))) r += 2; - if (defcpdtable[i].len <= r) + if (defcpdtable[i].size() <= r) return 1; } // backtrack @@ -1608,16 +1472,16 @@ int AffixMgr::defcpd_check(hentry*** words, } while ((btinfo[bt - 1].btnum < 0) && --bt); } while (bt); - if (ok && ok2 && (!all || (defcpdtable[i].len <= pp))) + if (ok && ok2 && (!all || (defcpdtable[i].size() <= pp))) return 1; // check zero ending - while (ok && ok2 && (defcpdtable[i].len > pp) && - ((pp + 1) < defcpdtable[i].len) && - ((defcpdtable[i].def[pp + 1] == '*') || - (defcpdtable[i].def[pp + 1] == '?'))) + while (ok && ok2 && (defcpdtable[i].size() > pp) && + ((pp + 1) < defcpdtable[i].size()) && + ((defcpdtable[i][pp + 1] == '*') || + (defcpdtable[i][pp + 1] == '?'))) pp += 2; - if (ok && ok2 && (defcpdtable[i].len <= pp)) + if (ok && ok2 && (defcpdtable[i].size() <= pp)) return 1; } (*words)[wnum] = NULL; @@ -1627,9 +1491,8 @@ int AffixMgr::defcpd_check(hentry*** words, } inline int AffixMgr::candidate_check(const char* word, int len) { - struct hentry* rv = NULL; - rv = lookup(word); + struct hentry* rv = lookup(word); if (rv) return 1; @@ -1651,20 +1514,23 @@ short AffixMgr::get_syllable(const std::string& word) { if (!utf8) { for (size_t i = 0; i < word.size(); ++i) { - if (strchr(cpdvowels, word[i])) - num++; + if (std::binary_search(cpdvowels.begin(), cpdvowels.end(), + word[i])) { + ++num; + } } - } else if (cpdvowels_utf16) { + } else if (!cpdvowels_utf16.empty()) { std::vector<w_char> w; - int i = u8_u16(w, word); - for (; i > 0; i--) { - if (std::binary_search(cpdvowels_utf16, - cpdvowels_utf16 + cpdvowels_utf16_len, - w[i - 1])) { + u8_u16(w, word); + for (size_t i = 0; i < w.size(); ++i) { + if (std::binary_search(cpdvowels_utf16.begin(), + cpdvowels_utf16.end(), + w[i])) { ++num; } } } + return num; } @@ -1687,8 +1553,7 @@ void AffixMgr::setcminmax(int* cmin, int* cmax, const char* word, int len) { // check if compound word is correctly spelled // hu_mov_rule = spec. Hungarian rule (XXX) -struct hentry* AffixMgr::compound_check(const char* word, - int len, +struct hentry* AffixMgr::compound_check(const std::string& word, short wordnum, short numsyllable, short maxwordnum, @@ -1707,19 +1572,19 @@ struct hentry* AffixMgr::compound_check(const char* word, int cmin; int cmax; int striple = 0; - int scpd = 0; + size_t scpd = 0; int soldi = 0; int oldcmin = 0; int oldcmax = 0; int oldlen = 0; int checkedstriple = 0; - int onlycpdrule; char affixed = 0; hentry** oldwords = words; + size_t len = word.size(); int checked_prefix; - setcminmax(&cmin, &cmax, word, len); + setcminmax(&cmin, &cmax, word.c_str(), len); st.assign(word); @@ -1733,7 +1598,7 @@ struct hentry* AffixMgr::compound_check(const char* word, } words = oldwords; - onlycpdrule = (words) ? 1 : 0; + int onlycpdrule = (words) ? 1 : 0; do { // onlycpdrule loop @@ -1744,26 +1609,26 @@ struct hentry* AffixMgr::compound_check(const char* word, do { // simplified checkcompoundpattern loop if (scpd > 0) { - for (; scpd <= numcheckcpd && - (!checkcpdtable[scpd - 1].pattern3 || - strncmp(word + i, checkcpdtable[scpd - 1].pattern3, - strlen(checkcpdtable[scpd - 1].pattern3)) != 0); + for (; scpd <= checkcpdtable.size() && + (checkcpdtable[scpd - 1].pattern3.empty() || + strncmp(word.c_str() + i, checkcpdtable[scpd - 1].pattern3.c_str(), + checkcpdtable[scpd - 1].pattern3.size()) != 0); scpd++) ; - if (scpd > numcheckcpd) + if (scpd > checkcpdtable.size()) break; // break simplified checkcompoundpattern loop st.replace(i, std::string::npos, checkcpdtable[scpd - 1].pattern); soldi = i; - i += strlen(checkcpdtable[scpd - 1].pattern); + i += checkcpdtable[scpd - 1].pattern.size(); st.replace(i, std::string::npos, checkcpdtable[scpd - 1].pattern2); - st.replace(i + strlen(checkcpdtable[scpd - 1].pattern2), std::string::npos, - word + soldi + strlen(checkcpdtable[scpd - 1].pattern3)); + st.replace(i + checkcpdtable[scpd - 1].pattern2.size(), std::string::npos, + word.substr(soldi + checkcpdtable[scpd - 1].pattern3.size())); oldlen = len; - len += strlen(checkcpdtable[scpd - 1].pattern) + - strlen(checkcpdtable[scpd - 1].pattern2) - - strlen(checkcpdtable[scpd - 1].pattern3); + len += checkcpdtable[scpd - 1].pattern.size() + + checkcpdtable[scpd - 1].pattern2.size() - + checkcpdtable[scpd - 1].pattern3.size(); oldcmin = cmin; oldcmax = cmax; setcminmax(&cmin, &cmax, st.c_str(), len); @@ -1791,7 +1656,7 @@ struct hentry* AffixMgr::compound_check(const char* word, TESTAFF(rv->astr, compoundbegin, rv->alen)) || (compoundmiddle && wordnum && !words && !onlycpdrule && TESTAFF(rv->astr, compoundmiddle, rv->alen)) || - (numdefcpd && onlycpdrule && + (!defcpdtable.empty() && onlycpdrule && ((!words && !wordnum && defcpd_check(&words, wnum, rv, rwords, 0)) || (words && @@ -1812,7 +1677,7 @@ struct hentry* AffixMgr::compound_check(const char* word, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN, compoundflag))) { if (((rv = suffix_check( - st.c_str(), i, 0, NULL, NULL, 0, NULL, FLAG_NULL, compoundflag, + st.c_str(), i, 0, NULL, FLAG_NULL, compoundflag, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN)) || (compoundmoresuffixes && (rv = suffix_check_twosfx(st.c_str(), i, 0, NULL, compoundflag)))) && @@ -1829,7 +1694,7 @@ struct hentry* AffixMgr::compound_check(const char* word, if (rv || (((wordnum == 0) && compoundbegin && ((rv = suffix_check( - st.c_str(), i, 0, NULL, NULL, 0, NULL, FLAG_NULL, compoundbegin, + st.c_str(), i, 0, NULL, FLAG_NULL, compoundbegin, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN)) || (compoundmoresuffixes && (rv = suffix_check_twosfx( @@ -1840,7 +1705,7 @@ struct hentry* AffixMgr::compound_check(const char* word, compoundbegin)))) || ((wordnum > 0) && compoundmiddle && ((rv = suffix_check( - st.c_str(), i, 0, NULL, NULL, 0, NULL, FLAG_NULL, compoundmiddle, + st.c_str(), i, 0, NULL, FLAG_NULL, compoundmiddle, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN)) || (compoundmoresuffixes && (rv = suffix_check_twosfx( @@ -1911,8 +1776,7 @@ struct hentry* AffixMgr::compound_check(const char* word, ((oldwordnum == 0) && compoundbegin && TESTAFF(rv->astr, compoundbegin, rv->alen)) || ((oldwordnum > 0) && compoundmiddle && - TESTAFF(rv->astr, compoundmiddle, rv->alen)) // || - // (numdefcpd && ) + TESTAFF(rv->astr, compoundmiddle, rv->alen)) // LANG_hu section: spec. Hungarian rule || ((langnum == LANG_hu) && hu_mov_rule && @@ -1934,7 +1798,7 @@ struct hentry* AffixMgr::compound_check(const char* word, ((word[i - 1] == word[i + 1])) // may be word[i+1] == '\0' )) || (checkcompoundcase && scpd == 0 && !words && - cpdcase_check(word, i)))) + cpdcase_check(word.c_str(), i)))) // LANG_hu section: spec. Hungarian rule || ((!rv) && (langnum == LANG_hu) && hu_mov_rule && (rv = affix_check(st.c_str(), i)) && @@ -1949,7 +1813,7 @@ struct hentry* AffixMgr::compound_check(const char* word, // LANG_hu section: spec. Hungarian rule if (langnum == LANG_hu) { // calculate syllable number of the word - numsyllable += get_syllable(st.substr(i)); + numsyllable += get_syllable(st.substr(0, i)); // + 1 word, if syllable number of the prefix > 1 (hungarian // convention) if (pfx && (get_syllable(pfx->getKey()) > 1)) @@ -1968,7 +1832,7 @@ struct hentry* AffixMgr::compound_check(const char* word, if (striple) { checkedstriple = 1; i--; // check "fahrt" instead of "ahrt" in "Schiffahrt" - } else if (i > 2 && *(word + i - 1) == *(word + i - 2)) + } else if (i > 2 && word[i - 1] == word[i - 2]) striple = 1; } @@ -1981,7 +1845,7 @@ struct hentry* AffixMgr::compound_check(const char* word, TESTAFF(rv->astr, compoundflag, rv->alen)) || (compoundend && !words && TESTAFF(rv->astr, compoundend, rv->alen)) || - (numdefcpd && words && + (!defcpdtable.empty() && words && defcpd_check(&words, wnum + 1, rv, NULL, 1))) || (scpd != 0 && checkcpdtable[scpd - 1].cond2 != FLAG_NULL && !TESTAFF(rv->astr, checkcpdtable[scpd - 1].cond2, @@ -2034,12 +1898,12 @@ struct hentry* AffixMgr::compound_check(const char* word, (compoundend && TESTAFF(rv->astr, compoundend, rv->alen))) && (((cpdwordmax == -1) || (wordnum + 1 < cpdwordmax)) || ((cpdmaxsyllable != 0) && - (numsyllable + get_syllable(std::string(HENTRY_WORD(rv), rv->clen)) <= + (numsyllable + get_syllable(std::string(HENTRY_WORD(rv), rv->blen)) <= cpdmaxsyllable))) && ( // test CHECKCOMPOUNDPATTERN - !numcheckcpd || scpd != 0 || - !cpdpat_check(word, i, rv_first, rv, 0)) && + checkcpdtable.empty() || scpd != 0 || + !cpdpat_check(word.c_str(), i, rv_first, rv, 0)) && ((!checkcompounddup || (rv != rv_first))) // test CHECKCOMPOUNDPATTERN conditions && @@ -2047,7 +1911,7 @@ struct hentry* AffixMgr::compound_check(const char* word, TESTAFF(rv->astr, checkcpdtable[scpd - 1].cond2, rv->alen))) { // forbid compound word, if it is a non compound word with typical // fault - if (checkcompoundrep && cpdrep_check(word, len)) + if (checkcompoundrep && cpdrep_check(word.c_str(), len)) return NULL; return rv_first; } @@ -2059,18 +1923,18 @@ struct hentry* AffixMgr::compound_check(const char* word, sfx = NULL; sfxflag = FLAG_NULL; rv = (compoundflag && !onlycpdrule) - ? affix_check((word + i), strlen(word + i), compoundflag, + ? affix_check((word.c_str() + i), strlen(word.c_str() + i), compoundflag, IN_CPD_END) : NULL; if (!rv && compoundend && !onlycpdrule) { sfx = NULL; pfx = NULL; - rv = affix_check((word + i), strlen(word + i), compoundend, + rv = affix_check((word.c_str() + i), strlen(word.c_str() + i), compoundend, IN_CPD_END); } - if (!rv && numdefcpd && words) { - rv = affix_check((word + i), strlen(word + i), 0, IN_CPD_END); + if (!rv && !defcpdtable.empty() && words) { + rv = affix_check((word.c_str() + i), strlen(word.c_str() + i), 0, IN_CPD_END); if (rv && defcpd_check(&words, wnum + 1, rv, NULL, 1)) return rv_first; rv = NULL; @@ -2083,8 +1947,8 @@ struct hentry* AffixMgr::compound_check(const char* word, rv = NULL; // test CHECKCOMPOUNDPATTERN conditions (forbidden compounds) - if (rv && numcheckcpd && scpd == 0 && - cpdpat_check(word, i, rv_first, rv, affixed)) + if (rv && !checkcpdtable.empty() && scpd == 0 && + cpdpat_check(word.c_str(), i, rv_first, rv, affixed)) rv = NULL; // check non_compound flag in suffix and prefix @@ -2118,7 +1982,7 @@ struct hentry* AffixMgr::compound_check(const char* word, if (langnum == LANG_hu) { // calculate syllable number of the word - numsyllable += get_syllable(word + i); + numsyllable += get_syllable(word.c_str() + i); // - affix syllable num. // XXX only second suffix (inflections, not derivations) @@ -2136,7 +2000,7 @@ struct hentry* AffixMgr::compound_check(const char* word, // increment syllable num, if last word has a SYLLABLENUM flag // and the suffix is beginning `s' - if (cpdsyllablenum) { + if (!cpdsyllablenum.empty()) { switch (sfxflag) { case 'c': { numsyllable += 2; @@ -2171,7 +2035,7 @@ struct hentry* AffixMgr::compound_check(const char* word, ((!checkcompounddup || (rv != rv_first)))) { // forbid compound word, if it is a non compound word with typical // fault - if (checkcompoundrep && cpdrep_check(word, len)) + if (checkcompoundrep && cpdrep_check(word.c_str(), len)) return NULL; return rv_first; } @@ -2180,16 +2044,16 @@ struct hentry* AffixMgr::compound_check(const char* word, wordnum = oldwordnum2; // perhaps second word is a compound word (recursive call) - if (wordnum < maxwordnum) { - rv = compound_check(st.c_str() + i, strlen(st.c_str() + i), wordnum + 1, + if (wordnum + 2 < maxwordnum) { + rv = compound_check(st.substr(i), wordnum + 1, numsyllable, maxwordnum, wnum + 1, words, rwords, 0, is_sug, info); - if (rv && numcheckcpd && + if (rv && !checkcpdtable.empty() && ((scpd == 0 && - cpdpat_check(word, i, rv_first, rv, affixed)) || + cpdpat_check(word.c_str(), i, rv_first, rv, affixed)) || (scpd != 0 && - !cpdpat_check(word, i, rv_first, rv, affixed)))) + !cpdpat_check(word.c_str(), i, rv_first, rv, affixed)))) rv = NULL; } else { rv = NULL; @@ -2198,13 +2062,12 @@ struct hentry* AffixMgr::compound_check(const char* word, // forbid compound word, if it is a non compound word with typical // fault if (checkcompoundrep || forbiddenword) { - struct hentry* rv2 = NULL; - if (checkcompoundrep && cpdrep_check(word, len)) + if (checkcompoundrep && cpdrep_check(word.c_str(), len)) return NULL; // check first part - if (strncmp(rv->word, word + i, rv->blen) == 0) { + if (strncmp(rv->word, word.c_str() + i, rv->blen) == 0) { char r = st[i + rv->blen]; st[i + rv->blen] = '\0'; @@ -2214,9 +2077,9 @@ struct hentry* AffixMgr::compound_check(const char* word, } if (forbiddenword) { - rv2 = lookup(word); + struct hentry* rv2 = lookup(word.c_str()); if (!rv2) - rv2 = affix_check(word, len); + rv2 = affix_check(word.c_str(), len); if (rv2 && rv2->astr && TESTAFF(rv2->astr, forbiddenword, rv2->alen) && (strncmp(rv2->word, st.c_str(), i + rv->blen) == 0)) { @@ -2248,7 +2111,7 @@ struct hentry* AffixMgr::compound_check(const char* word, scpd++; } while (!onlycpdrule && simplifiedcpd && - scpd <= numcheckcpd); // end of simplifiedcpd loop + scpd <= checkcpdtable.size()); // end of simplifiedcpd loop scpd = 0; wordnum = oldwordnum; @@ -2261,7 +2124,7 @@ struct hentry* AffixMgr::compound_check(const char* word, } else st[i] = ch; - } while (numdefcpd && oldwordnum == 0 && + } while (!defcpdtable.empty() && oldwordnum == 0 && onlycpdrule++ < 1); // end of onlycpd loop } @@ -2278,9 +2141,9 @@ int AffixMgr::compound_check_morph(const char* word, short wnum, hentry** words, hentry** rwords, - char hu_mov_rule = 0, - char** result = NULL, - char* partresult = NULL) { + char hu_mov_rule, + std::string& result, + const std::string* partresult) { int i; short oldnumsyllable, oldnumsyllable2, oldwordnum, oldwordnum2; int ok = 0; @@ -2291,12 +2154,11 @@ int AffixMgr::compound_check_morph(const char* word, char ch; int checked_prefix; - char presult[MAXLNLEN]; + std::string presult; int cmin; int cmax; - int onlycpdrule; char affixed = 0; hentry** oldwords = words; @@ -2314,7 +2176,7 @@ int AffixMgr::compound_check_morph(const char* word, } words = oldwords; - onlycpdrule = (words) ? 1 : 0; + int onlycpdrule = (words) ? 1 : 0; do { // onlycpdrule loop @@ -2330,9 +2192,9 @@ int AffixMgr::compound_check_morph(const char* word, affixed = 1; - *presult = '\0'; + presult.clear(); if (partresult) - mystrcat(presult, partresult, MAXLNLEN); + presult.append(*partresult); rv = lookup(st.c_str()); // perhaps without prefix @@ -2345,7 +2207,7 @@ int AffixMgr::compound_check_morph(const char* word, TESTAFF(rv->astr, compoundbegin, rv->alen)) || (compoundmiddle && wordnum && !words && !onlycpdrule && TESTAFF(rv->astr, compoundmiddle, rv->alen)) || - (numdefcpd && onlycpdrule && + (!defcpdtable.empty() && onlycpdrule && ((!words && !wordnum && defcpd_check(&words, wnum, rv, rwords, 0)) || (words && @@ -2357,28 +2219,26 @@ int AffixMgr::compound_check_morph(const char* word, affixed = 0; if (rv) { - sprintf(presult + strlen(presult), "%c%s%s", MSEP_FLD, MORPH_PART, st.c_str()); + presult.push_back(MSEP_FLD); + presult.append(MORPH_PART); + presult.append(st.c_str()); if (!HENTRY_FIND(rv, MORPH_STEM)) { - sprintf(presult + strlen(presult), "%c%s%s", MSEP_FLD, MORPH_STEM, - st.c_str()); + presult.push_back(MSEP_FLD); + presult.append(MORPH_STEM); + presult.append(st.c_str()); } - // store the pointer of the hash entry - // sprintf(presult + strlen(presult), "%c%s%p", MSEP_FLD, - // MORPH_HENTRY, rv); if (HENTRY_DATA(rv)) { - sprintf(presult + strlen(presult), "%c%s", MSEP_FLD, - HENTRY_DATA2(rv)); + presult.push_back(MSEP_FLD); + presult.append(HENTRY_DATA2(rv)); } } if (!rv) { - if (onlycpdrule && strlen(*result) > MAXLNLEN / 10) - break; if (compoundflag && !(rv = prefix_check(st.c_str(), i, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN, compoundflag))) { - if (((rv = suffix_check(st.c_str(), i, 0, NULL, NULL, 0, NULL, FLAG_NULL, + if (((rv = suffix_check(st.c_str(), i, 0, NULL, FLAG_NULL, compoundflag, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN)) || (compoundmoresuffixes && @@ -2395,7 +2255,7 @@ int AffixMgr::compound_check_morph(const char* word, if (rv || (((wordnum == 0) && compoundbegin && - ((rv = suffix_check(st.c_str(), i, 0, NULL, NULL, 0, NULL, FLAG_NULL, + ((rv = suffix_check(st.c_str(), i, 0, NULL, FLAG_NULL, compoundbegin, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN)) || (compoundmoresuffixes && @@ -2406,7 +2266,7 @@ int AffixMgr::compound_check_morph(const char* word, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN, compoundbegin)))) || ((wordnum > 0) && compoundmiddle && - ((rv = suffix_check(st.c_str(), i, 0, NULL, NULL, 0, NULL, FLAG_NULL, + ((rv = suffix_check(st.c_str(), i, 0, NULL, FLAG_NULL, compoundmiddle, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN)) || (compoundmoresuffixes && @@ -2416,26 +2276,23 @@ int AffixMgr::compound_check_morph(const char* word, (rv = prefix_check(st.c_str(), i, hu_mov_rule ? IN_CPD_OTHER : IN_CPD_BEGIN, compoundmiddle)))))) { - // char * p = prefix_check_morph(st, i, 0, compound); - char* p = NULL; + std::string p; if (compoundflag) p = affix_check_morph(st.c_str(), i, compoundflag); - if (!p || (*p == '\0')) { - if (p) - free(p); - p = NULL; + if (p.empty()) { if ((wordnum == 0) && compoundbegin) { p = affix_check_morph(st.c_str(), i, compoundbegin); } else if ((wordnum > 0) && compoundmiddle) { p = affix_check_morph(st.c_str(), i, compoundmiddle); } } - if (p && (*p != '\0')) { - sprintf(presult + strlen(presult), "%c%s%s%s", MSEP_FLD, MORPH_PART, - st.c_str(), line_uniq_app(&p, MSEP_REC)); + if (!p.empty()) { + presult.push_back(MSEP_FLD); + presult.append(MORPH_PART); + presult.append(st.c_str()); + line_uniq_app(p, MSEP_REC); + presult.append(p); } - if (p) - free(p); checked_prefix = 1; } // else check forbiddenwords @@ -2507,7 +2364,7 @@ int AffixMgr::compound_check_morph(const char* word, )) || ( // test CHECKCOMPOUNDPATTERN - numcheckcpd && !words && + !checkcpdtable.empty() && !words && cpdpat_check(word, i, rv, NULL, affixed)) || (checkcompoundcase && !words && cpdcase_check(word, i)))) // LANG_hu section: spec. Hungarian rule @@ -2522,7 +2379,7 @@ int AffixMgr::compound_check_morph(const char* word, // LANG_hu section: spec. Hungarian rule if (langnum == LANG_hu) { // calculate syllable number of the word - numsyllable += get_syllable(st.substr(i)); + numsyllable += get_syllable(st.substr(0, i)); // + 1 word, if syllable number of the prefix > 1 (hungarian // convention) @@ -2541,31 +2398,29 @@ int AffixMgr::compound_check_morph(const char* word, TESTAFF(rv->astr, compoundflag, rv->alen)) || (compoundend && !words && TESTAFF(rv->astr, compoundend, rv->alen)) || - (numdefcpd && words && + (!defcpdtable.empty() && words && defcpd_check(&words, wnum + 1, rv, NULL, 1))))) { rv = rv->next_homonym; } if (rv && words && words[wnum + 1]) { - mystrcat(*result, presult, MAXLNLEN); - mystrcat(*result, " ", MAXLNLEN); - mystrcat(*result, MORPH_PART, MAXLNLEN); - mystrcat(*result, word + i, MAXLNLEN); + result.append(presult); + result.append(" "); + result.append(MORPH_PART); + result.append(word + i); if (complexprefixes && HENTRY_DATA(rv)) - mystrcat(*result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(HENTRY_DATA2(rv)); if (!HENTRY_FIND(rv, MORPH_STEM)) { - mystrcat(*result, " ", MAXLNLEN); - mystrcat(*result, MORPH_STEM, MAXLNLEN); - mystrcat(*result, HENTRY_WORD(rv), MAXLNLEN); + result.append(" "); + result.append(MORPH_STEM); + result.append(HENTRY_WORD(rv)); } // store the pointer of the hash entry - // sprintf(*result + strlen(*result), " %s%p", - // MORPH_HENTRY, rv); if (!complexprefixes && HENTRY_DATA(rv)) { - mystrcat(*result, " ", MAXLNLEN); - mystrcat(*result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(" "); + result.append(HENTRY_DATA2(rv)); } - mystrcat(*result, "\n", MAXLNLEN); + result.append("\n"); return 0; } @@ -2606,28 +2461,26 @@ int AffixMgr::compound_check_morph(const char* word, cpdmaxsyllable))) && ((!checkcompounddup || (rv != rv_first)))) { // bad compound word - mystrcat(*result, presult, MAXLNLEN); - mystrcat(*result, " ", MAXLNLEN); - mystrcat(*result, MORPH_PART, MAXLNLEN); - mystrcat(*result, word + i, MAXLNLEN); + result.append(presult); + result.append(" "); + result.append(MORPH_PART); + result.append(word + i); if (HENTRY_DATA(rv)) { if (complexprefixes) - mystrcat(*result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(HENTRY_DATA2(rv)); if (!HENTRY_FIND(rv, MORPH_STEM)) { - mystrcat(*result, " ", MAXLNLEN); - mystrcat(*result, MORPH_STEM, MAXLNLEN); - mystrcat(*result, HENTRY_WORD(rv), MAXLNLEN); + result.append(" "); + result.append(MORPH_STEM); + result.append(HENTRY_WORD(rv)); } // store the pointer of the hash entry - // sprintf(*result + strlen(*result), " - // %s%p", MORPH_HENTRY, rv); if (!complexprefixes) { - mystrcat(*result, " ", MAXLNLEN); - mystrcat(*result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(" "); + result.append(HENTRY_DATA2(rv)); } } - mystrcat(*result, "\n", MAXLNLEN); + result.append("\n"); ok = 1; } @@ -2649,27 +2502,24 @@ int AffixMgr::compound_check_morph(const char* word, rv = affix_check((word + i), strlen(word + i), compoundend); } - if (!rv && numdefcpd && words) { + if (!rv && !defcpdtable.empty() && words) { rv = affix_check((word + i), strlen(word + i), 0, IN_CPD_END); if (rv && words && defcpd_check(&words, wnum + 1, rv, NULL, 1)) { - char* m = NULL; + std::string m; if (compoundflag) m = affix_check_morph((word + i), strlen(word + i), compoundflag); - if ((!m || *m == '\0') && compoundend) { - if (m) - free(m); + if (m.empty() && compoundend) { m = affix_check_morph((word + i), strlen(word + i), compoundend); } - mystrcat(*result, presult, MAXLNLEN); - if (m || (*m != '\0')) { - char m2[MAXLNLEN]; - sprintf(m2, "%c%s%s%s", MSEP_FLD, MORPH_PART, word + i, - line_uniq_app(&m, MSEP_REC)); - mystrcat(*result, m2, MAXLNLEN); + result.append(presult); + if (!m.empty()) { + result.push_back(MSEP_FLD); + result.append(MORPH_PART); + result.append(word + i); + line_uniq_app(m, MSEP_REC); + result.append(m); } - if (m) - free(m); - mystrcat(*result, "\n", MAXLNLEN); + result.append("\n"); ok = 1; } } @@ -2713,7 +2563,7 @@ int AffixMgr::compound_check_morph(const char* word, // increment syllable num, if last word has a SYLLABLENUM flag // and the suffix is beginning `s' - if (cpdsyllablenum) { + if (!cpdsyllablenum.empty()) { switch (sfxflag) { case 'c': { numsyllable += 2; @@ -2745,25 +2595,21 @@ int AffixMgr::compound_check_morph(const char* word, (((cpdwordmax == -1) || (wordnum + 1 < cpdwordmax)) || ((cpdmaxsyllable != 0) && (numsyllable <= cpdmaxsyllable))) && ((!checkcompounddup || (rv != rv_first)))) { - char* m = NULL; + std::string m; if (compoundflag) m = affix_check_morph((word + i), strlen(word + i), compoundflag); - if ((!m || *m == '\0') && compoundend) { - if (m) - free(m); + if (m.empty() && compoundend) { m = affix_check_morph((word + i), strlen(word + i), compoundend); } - mystrcat(*result, presult, MAXLNLEN); - if (m && (*m != '\0')) { - char m2[MAXLNLEN]; - sprintf(m2, "%c%s%s%s", MSEP_FLD, MORPH_PART, word + i, - line_uniq_app(&m, MSEP_REC)); - mystrcat(*result, m2, MAXLNLEN); + result.append(presult); + if (!m.empty()) { + result.push_back(MSEP_FLD); + result.append(MORPH_PART); + result.append(word + 1); + line_uniq_app(m, MSEP_REC); + result.append(m); } - if (m) - free(m); - if (strlen(*result) + 1 < MAXLNLEN) - sprintf(*result + strlen(*result), "%c", MSEP_REC); + result.push_back(MSEP_REC); ok = 1; } @@ -2771,10 +2617,10 @@ int AffixMgr::compound_check_morph(const char* word, wordnum = oldwordnum2; // perhaps second word is a compound word (recursive call) - if ((wordnum < maxwordnum) && (ok == 0)) { + if ((wordnum + 2 < maxwordnum) && (ok == 0)) { compound_check_morph((word + i), strlen(word + i), wordnum + 1, numsyllable, maxwordnum, wnum + 1, words, rwords, 0, - result, presult); + result, &presult); } else { rv = NULL; } @@ -2783,26 +2629,13 @@ int AffixMgr::compound_check_morph(const char* word, wordnum = oldwordnum; numsyllable = oldnumsyllable; - } while (numdefcpd && oldwordnum == 0 && + } while (!defcpdtable.empty() && oldwordnum == 0 && onlycpdrule++ < 1); // end of onlycpd loop } return 0; } -// return 1 if s1 (reversed) is a leading subset of end of s2 -/* inline int AffixMgr::isRevSubset(const char * s1, const char * end_of_s2, int - len) - { - while ((len > 0) && *s1 && (*s1 == *end_of_s2)) { - s1++; - end_of_s2--; - len--; - } - return (*s1 == '\0'); - } - */ - inline int AffixMgr::isRevSubset(const char* s1, const char* end_of_s2, int len) { @@ -2815,14 +2648,10 @@ inline int AffixMgr::isRevSubset(const char* s1, } // check word for suffixes - struct hentry* AffixMgr::suffix_check(const char* word, int len, int sfxopts, PfxEntry* ppfx, - char** wlst, - int maxSug, - int* ns, const FLAG cclass, const FLAG needflag, char in_compound) { @@ -2861,7 +2690,7 @@ struct hentry* AffixMgr::suffix_check(const char* word, (ppfx && !((ep->getCont()) && TESTAFF(ep->getCont(), needaffix, ep->getContLen()))))) { - rv = se->checkword(word, len, sfxopts, ppfx, wlst, maxSug, ns, + rv = se->checkword(word, len, sfxopts, ppfx, (FLAG)cclass, needflag, (in_compound ? 0 : onlyincompound)); if (rv) { @@ -2912,7 +2741,7 @@ struct hentry* AffixMgr::suffix_check(const char* word, if (in_compound != IN_CPD_END || ppfx || !(sptr->getCont() && TESTAFF(sptr->getCont(), onlyincompound, sptr->getContLen()))) { - rv = sptr->checkword(word, len, sfxopts, ppfx, wlst, maxSug, ns, + rv = sptr->checkword(word, len, sfxopts, ppfx, cclass, needflag, (in_compound ? 0 : onlyincompound)); if (rv) { @@ -2985,23 +2814,21 @@ struct hentry* AffixMgr::suffix_check_twosfx(const char* word, return NULL; } -char* AffixMgr::suffix_check_twosfx_morph(const char* word, - int len, - int sfxopts, - PfxEntry* ppfx, - const FLAG needflag) { +std::string AffixMgr::suffix_check_twosfx_morph(const char* word, + int len, + int sfxopts, + PfxEntry* ppfx, + const FLAG needflag) { std::string result; std::string result2; std::string result3; - char* st; - // first handle the special case of 0 length suffixes SfxEntry* se = sStart[0]; while (se) { if (contclasses[se->getFlag()]) { - st = se->check_twosfx_morph(word, len, sfxopts, ppfx, needflag); - if (st) { + std::string st = se->check_twosfx_morph(word, len, sfxopts, ppfx, needflag); + if (!st.empty()) { if (ppfx) { if (ppfx->getMorph()) { result.append(ppfx->getMorph()); @@ -3010,7 +2837,6 @@ char* AffixMgr::suffix_check_twosfx_morph(const char* word, debugflag(result, ppfx->getFlag()); } result.append(st); - free(st); if (se->getMorph()) { result.append(" "); result.append(se->getMorph()); @@ -3024,20 +2850,19 @@ char* AffixMgr::suffix_check_twosfx_morph(const char* word, // now handle the general case if (len == 0) - return NULL; // FULLSTRIP + return std::string(); // FULLSTRIP unsigned char sp = *((const unsigned char*)(word + len - 1)); SfxEntry* sptr = sStart[sp]; while (sptr) { if (isRevSubset(sptr->getKey(), word + len - 1, len)) { if (contclasses[sptr->getFlag()]) { - st = sptr->check_twosfx_morph(word, len, sfxopts, ppfx, needflag); - if (st) { + std::string st = sptr->check_twosfx_morph(word, len, sfxopts, ppfx, needflag); + if (!st.empty()) { sfxflag = sptr->getFlag(); // BUG: sfxflag not stateless if (!sptr->getCont()) sfxappnd = sptr->getKey(); // BUG: sfxappnd not stateless result2.assign(st); - free(st); result3.clear(); @@ -3057,25 +2882,20 @@ char* AffixMgr::suffix_check_twosfx_morph(const char* word, } } - if (!result.empty()) - return mystrdup(result.c_str()); - - return NULL; + return result; } -char* AffixMgr::suffix_check_morph(const char* word, - int len, - int sfxopts, - PfxEntry* ppfx, - const FLAG cclass, - const FLAG needflag, - char in_compound) { - char result[MAXLNLEN]; +std::string AffixMgr::suffix_check_morph(const char* word, + int len, + int sfxopts, + PfxEntry* ppfx, + const FLAG cclass, + const FLAG needflag, + char in_compound) { + std::string result; struct hentry* rv = NULL; - result[0] = '\0'; - PfxEntry* ep = ppfx; // first handle the special case of 0 length suffixes @@ -3109,37 +2929,34 @@ char* AffixMgr::suffix_check_morph(const char* word, (ppfx && !((ep->getCont()) && TESTAFF(ep->getCont(), needaffix, ep->getContLen())))))) - rv = se->checkword(word, len, sfxopts, ppfx, NULL, 0, 0, cclass, - needflag); + rv = se->checkword(word, len, sfxopts, ppfx, cclass, + needflag, FLAG_NULL); while (rv) { if (ppfx) { if (ppfx->getMorph()) { - mystrcat(result, ppfx->getMorph(), MAXLNLEN); - mystrcat(result, " ", MAXLNLEN); + result.append(ppfx->getMorph()); + result.append(" "); } else debugflag(result, ppfx->getFlag()); } if (complexprefixes && HENTRY_DATA(rv)) - mystrcat(result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(HENTRY_DATA2(rv)); if (!HENTRY_FIND(rv, MORPH_STEM)) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, MORPH_STEM, MAXLNLEN); - mystrcat(result, HENTRY_WORD(rv), MAXLNLEN); + result.append(" "); + result.append(MORPH_STEM); + result.append(HENTRY_WORD(rv)); } - // store the pointer of the hash entry - // sprintf(result + strlen(result), " %s%p", MORPH_HENTRY, - // rv); if (!complexprefixes && HENTRY_DATA(rv)) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(" "); + result.append(HENTRY_DATA2(rv)); } if (se->getMorph()) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, se->getMorph(), MAXLNLEN); + result.append(" "); + result.append(se->getMorph()); } else debugflag(result, se->getFlag()); - mystrcat(result, "\n", MAXLNLEN); + result.append("\n"); rv = se->get_next_homonym(rv, sfxopts, ppfx, cclass, needflag); } } @@ -3148,7 +2965,7 @@ char* AffixMgr::suffix_check_morph(const char* word, // now handle the general case if (len == 0) - return NULL; // FULLSTRIP + return std::string(); // FULLSTRIP unsigned char sp = *((const unsigned char*)(word + len - 1)); SfxEntry* sptr = sStart[sp]; @@ -3179,38 +2996,35 @@ char* AffixMgr::suffix_check_morph(const char* word, (cclass || !(sptr->getCont() && TESTAFF(sptr->getCont(), needaffix, sptr->getContLen()))))) - rv = sptr->checkword(word, len, sfxopts, ppfx, NULL, 0, 0, cclass, - needflag); + rv = sptr->checkword(word, len, sfxopts, ppfx, cclass, + needflag, FLAG_NULL); while (rv) { if (ppfx) { if (ppfx->getMorph()) { - mystrcat(result, ppfx->getMorph(), MAXLNLEN); - mystrcat(result, " ", MAXLNLEN); + result.append(ppfx->getMorph()); + result.append(" "); } else debugflag(result, ppfx->getFlag()); } if (complexprefixes && HENTRY_DATA(rv)) - mystrcat(result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(HENTRY_DATA2(rv)); if (!HENTRY_FIND(rv, MORPH_STEM)) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, MORPH_STEM, MAXLNLEN); - mystrcat(result, HENTRY_WORD(rv), MAXLNLEN); + result.append(" "); + result.append(MORPH_STEM); + result.append(HENTRY_WORD(rv)); } - // store the pointer of the hash entry - // sprintf(result + strlen(result), " %s%p", - // MORPH_HENTRY, rv); if (!complexprefixes && HENTRY_DATA(rv)) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(" "); + result.append(HENTRY_DATA2(rv)); } if (sptr->getMorph()) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, sptr->getMorph(), MAXLNLEN); + result.append(" "); + result.append(sptr->getMorph()); } else debugflag(result, sptr->getFlag()); - mystrcat(result, "\n", MAXLNLEN); + result.append("\n"); rv = sptr->get_next_homonym(rv, sfxopts, ppfx, cclass, needflag); } sptr = sptr->getNextEQ(); @@ -3219,9 +3033,7 @@ char* AffixMgr::suffix_check_morph(const char* word, } } - if (*result) - return mystrdup(result); - return NULL; + return result; } // check if word with affixes is correctly spelled @@ -3229,16 +3041,14 @@ struct hentry* AffixMgr::affix_check(const char* word, int len, const FLAG needflag, char in_compound) { - struct hentry* rv = NULL; // check all prefixes (also crossed with suffixes if allowed) - rv = prefix_check(word, len, in_compound, needflag); + struct hentry* rv = prefix_check(word, len, in_compound, needflag); if (rv) return rv; // if still not found check all suffixes - rv = suffix_check(word, len, 0, NULL, NULL, 0, NULL, FLAG_NULL, needflag, - in_compound); + rv = suffix_check(word, len, 0, NULL, FLAG_NULL, needflag, in_compound); if (havecontclass) { sfx = NULL; @@ -3259,27 +3069,22 @@ struct hentry* AffixMgr::affix_check(const char* word, } // check if word with affixes is correctly spelled -char* AffixMgr::affix_check_morph(const char* word, +std::string AffixMgr::affix_check_morph(const char* word, int len, const FLAG needflag, char in_compound) { - char result[MAXLNLEN]; - char* st = NULL; - - *result = '\0'; + std::string result; // check all prefixes (also crossed with suffixes if allowed) - st = prefix_check_morph(word, len, in_compound); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + std::string st = prefix_check_morph(word, len, in_compound); + if (!st.empty()) { + result.append(st); } // if still not found check all suffixes st = suffix_check_morph(word, len, 0, NULL, '\0', needflag, in_compound); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + if (!st.empty()) { + result.append(st); } if (havecontclass) { @@ -3287,39 +3092,120 @@ char* AffixMgr::affix_check_morph(const char* word, pfx = NULL; // if still not found check all two-level suffixes st = suffix_check_twosfx_morph(word, len, 0, NULL, needflag); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + if (!st.empty()) { + result.append(st); } // if still not found check all two-level suffixes st = prefix_check_twosfx_morph(word, len, IN_CPD_NOT, needflag); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + if (!st.empty()) { + result.append(st); } } - return mystrdup(result); + return result; +} + +// morphcmp(): compare MORPH_DERI_SFX, MORPH_INFL_SFX and MORPH_TERM_SFX fields +// in the first line of the inputs +// return 0, if inputs equal +// return 1, if inputs may equal with a secondary suffix +// otherwise return -1 +static int morphcmp(const char* s, const char* t) { + int se = 0; + int te = 0; + const char* sl; + const char* tl; + const char* olds; + const char* oldt; + if (!s || !t) + return 1; + olds = s; + sl = strchr(s, '\n'); + s = strstr(s, MORPH_DERI_SFX); + if (!s || (sl && sl < s)) + s = strstr(olds, MORPH_INFL_SFX); + if (!s || (sl && sl < s)) { + s = strstr(olds, MORPH_TERM_SFX); + olds = NULL; + } + oldt = t; + tl = strchr(t, '\n'); + t = strstr(t, MORPH_DERI_SFX); + if (!t || (tl && tl < t)) + t = strstr(oldt, MORPH_INFL_SFX); + if (!t || (tl && tl < t)) { + t = strstr(oldt, MORPH_TERM_SFX); + oldt = NULL; + } + while (s && t && (!sl || sl > s) && (!tl || tl > t)) { + s += MORPH_TAG_LEN; + t += MORPH_TAG_LEN; + se = 0; + te = 0; + while ((*s == *t) && !se && !te) { + s++; + t++; + switch (*s) { + case ' ': + case '\n': + case '\t': + case '\0': + se = 1; + } + switch (*t) { + case ' ': + case '\n': + case '\t': + case '\0': + te = 1; + } + } + if (!se || !te) { + // not terminal suffix difference + if (olds) + return -1; + return 1; + } + olds = s; + s = strstr(s, MORPH_DERI_SFX); + if (!s || (sl && sl < s)) + s = strstr(olds, MORPH_INFL_SFX); + if (!s || (sl && sl < s)) { + s = strstr(olds, MORPH_TERM_SFX); + olds = NULL; + } + oldt = t; + t = strstr(t, MORPH_DERI_SFX); + if (!t || (tl && tl < t)) + t = strstr(oldt, MORPH_INFL_SFX); + if (!t || (tl && tl < t)) { + t = strstr(oldt, MORPH_TERM_SFX); + oldt = NULL; + } + } + if (!s && !t && se && te) + return 0; + return 1; } -char* AffixMgr::morphgen(const char* ts, - int wl, - const unsigned short* ap, - unsigned short al, - const char* morph, - const char* targetmorph, +std::string AffixMgr::morphgen(const char* ts, + int wl, + const unsigned short* ap, + unsigned short al, + const char* morph, + const char* targetmorph, int level) { // handle suffixes if (!morph) - return NULL; + return std::string(); // check substandard flag if (TESTAFF(ap, substandard, al)) - return NULL; + return std::string(); if (morphcmp(morph, targetmorph) == 0) - return mystrdup(ts); + return ts; size_t stemmorphcatpos; std::string mymorph; @@ -3352,41 +3238,36 @@ char* AffixMgr::morphgen(const char* ts, int cmp = morphcmp(stemmorph, targetmorph); if (cmp == 0) { - char* newword = sptr->add(ts, wl); - if (newword) { - hentry* check = pHMgr->lookup(newword); // XXX extra dic + std::string newword = sptr->add(ts, wl); + if (!newword.empty()) { + hentry* check = pHMgr->lookup(newword.c_str()); // XXX extra dic if (!check || !check->astr || !(TESTAFF(check->astr, forbiddenword, check->alen) || TESTAFF(check->astr, ONLYUPCASEFLAG, check->alen))) { return newword; } - free(newword); } } // recursive call for secondary suffixes if ((level == 0) && (cmp == 1) && (sptr->getContLen() > 0) && - // (get_sfxcount(stemmorph) < targetcount) && !TESTAFF(sptr->getCont(), substandard, sptr->getContLen())) { - char* newword = sptr->add(ts, wl); - if (newword) { - char* newword2 = - morphgen(newword, strlen(newword), sptr->getCont(), + std::string newword = sptr->add(ts, wl); + if (!newword.empty()) { + std::string newword2 = + morphgen(newword.c_str(), newword.size(), sptr->getCont(), sptr->getContLen(), stemmorph, targetmorph, 1); - if (newword2) { - free(newword); + if (!newword2.empty()) { return newword2; } - free(newword); - newword = NULL; } } } sptr = sptr->getFlgNxt(); } } - return NULL; + return std::string(); } int AffixMgr::expand_rootword(struct guessword* wlst, @@ -3406,7 +3287,7 @@ int AffixMgr::expand_rootword(struct guessword* wlst, wlst[nh].word = mystrdup(ts); if (!wlst[nh].word) return 0; - wlst[nh].allow = (1 == 0); + wlst[nh].allow = false; wlst[nh].orig = NULL; nh++; // add special phonetic version @@ -3414,7 +3295,7 @@ int AffixMgr::expand_rootword(struct guessword* wlst, wlst[nh].word = mystrdup(phon); if (!wlst[nh].word) return nh - 1; - wlst[nh].allow = (1 == 0); + wlst[nh].allow = false; wlst[nh].orig = mystrdup(ts); if (!wlst[nh].orig) return nh - 1; @@ -3439,10 +3320,10 @@ int AffixMgr::expand_rootword(struct guessword* wlst, TESTAFF(sptr->getCont(), circumfix, sptr->getContLen())) || (onlyincompound && TESTAFF(sptr->getCont(), onlyincompound, sptr->getContLen()))))) { - char* newword = sptr->add(ts, wl); - if (newword) { + std::string newword = sptr->add(ts, wl); + if (!newword.empty()) { if (nh < maxn) { - wlst[nh].word = newword; + wlst[nh].word = mystrdup(newword.c_str()); wlst[nh].allow = sptr->allowCross(); wlst[nh].orig = NULL; nh++; @@ -3455,14 +3336,12 @@ int AffixMgr::expand_rootword(struct guessword* wlst, wlst[nh].word = mystrdup(prefix.c_str()); if (!wlst[nh].word) return nh - 1; - wlst[nh].allow = (1 == 0); - wlst[nh].orig = mystrdup(newword); + wlst[nh].allow = false; + wlst[nh].orig = mystrdup(newword.c_str()); if (!wlst[nh].orig) return nh - 1; nh++; } - } else { - free(newword); } } } @@ -3484,15 +3363,13 @@ int AffixMgr::expand_rootword(struct guessword* wlst, ((badl > cptr->getKeyLen()) && (strncmp(cptr->getKey(), bad, cptr->getKeyLen()) == 0)))) { int l1 = strlen(wlst[j].word); - char* newword = cptr->add(wlst[j].word, l1); - if (newword) { + std::string newword = cptr->add(wlst[j].word, l1); + if (!newword.empty()) { if (nh < maxn) { - wlst[nh].word = newword; + wlst[nh].word = mystrdup(newword.c_str()); wlst[nh].allow = cptr->allowCross(); wlst[nh].orig = NULL; nh++; - } else { - free(newword); } } } @@ -3518,15 +3395,13 @@ int AffixMgr::expand_rootword(struct guessword* wlst, TESTAFF(ptr->getCont(), circumfix, ptr->getContLen())) || (onlyincompound && TESTAFF(ptr->getCont(), onlyincompound, ptr->getContLen()))))) { - char* newword = ptr->add(ts, wl); - if (newword) { + std::string newword = ptr->add(ts, wl); + if (!newword.empty()) { if (nh < maxn) { - wlst[nh].word = newword; + wlst[nh].word = mystrdup(newword.c_str()); wlst[nh].allow = ptr->allowCross(); wlst[nh].orig = NULL; nh++; - } else { - free(newword); } } } @@ -3537,15 +3412,8 @@ int AffixMgr::expand_rootword(struct guessword* wlst, return nh; } -// return length of replacing table -int AffixMgr::get_numrep() const { - return numrep; -} - // return replacing table -struct replentry* AffixMgr::get_reptable() const { - if (!reptable) - return NULL; +const std::vector<replentry>& AffixMgr::get_reptable() const { return reptable; } @@ -3570,35 +3438,21 @@ struct phonetable* AffixMgr::get_phonetable() const { return phone; } -// return length of character map table -int AffixMgr::get_nummap() const { - return nummap; -} - // return character map table -struct mapentry* AffixMgr::get_maptable() const { - if (!maptable) - return NULL; +const std::vector<mapentry>& AffixMgr::get_maptable() const { return maptable; } -// return length of word break table -int AffixMgr::get_numbreak() const { - return numbreak; -} - // return character map table -char** AffixMgr::get_breaktable() const { - if (!breaktable) - return NULL; +const std::vector<std::string>& AffixMgr::get_breaktable() const { return breaktable; } // return text encoding of dictionary -char* AffixMgr::get_encoding() { - if (!encoding) - encoding = mystrdup(SPELL_ENCODING); - return mystrdup(encoding); +const std::string& AffixMgr::get_encoding() { + if (encoding.empty()) + encoding = SPELL_ENCODING; + return encoding; } // return text encoding of dictionary @@ -3641,10 +3495,10 @@ char* AffixMgr::encode_flag(unsigned short aflag) const { } // return the preferred ignore string for suggestions -char* AffixMgr::get_ignore() const { - if (!ignorechars) +const char* AffixMgr::get_ignore() const { + if (ignorechars.empty()) return NULL; - return ignorechars; + return ignorechars.c_str(); } // return the preferred ignore string for suggestions @@ -3654,20 +3508,20 @@ const std::vector<w_char>& AffixMgr::get_ignore_utf16() const { // return the keyboard string for suggestions char* AffixMgr::get_key_string() { - if (!keystring) - keystring = mystrdup(SPELL_KEYSTRING); - return mystrdup(keystring); + if (keystring.empty()) + keystring = SPELL_KEYSTRING; + return mystrdup(keystring.c_str()); } // return the preferred try string for suggestions char* AffixMgr::get_try_string() const { - if (!trystring) + if (trystring.empty()) return NULL; - return mystrdup(trystring); + return mystrdup(trystring.c_str()); } // return the preferred try string for suggestions -const char* AffixMgr::get_wordchars() const { +const std::string& AffixMgr::get_wordchars() const { return wordchars; } @@ -3677,7 +3531,7 @@ const std::vector<w_char>& AffixMgr::get_wordchars_utf16() const { // is there compounding? int AffixMgr::get_compound() const { - return compoundflag || compoundbegin || numdefcpd; + return compoundflag || compoundbegin || !defcpdtable.empty(); } // return the compound words control flag @@ -3710,49 +3564,16 @@ FLAG AffixMgr::get_onlyincompound() const { return onlyincompound; } -// return the compound word signal flag -FLAG AffixMgr::get_compoundroot() const { - return compoundroot; -} - -// return the compound begin signal flag -FLAG AffixMgr::get_compoundbegin() const { - return compoundbegin; -} - -// return the value of checknum -int AffixMgr::get_checknum() const { - return checknum; -} - -// return the value of prefix -const char* AffixMgr::get_prefix() const { - if (pfx) - return pfx->getKey(); - return NULL; -} - // return the value of suffix -const char* AffixMgr::get_suffix() const { - return sfxappnd; -} - -// return the value of suffix -const char* AffixMgr::get_version() const { +const std::string& AffixMgr::get_version() const { return version; } -// return lemma_present flag -FLAG AffixMgr::get_lemma_present() const { - return lemma_present; -} - // utility method to look up root words in hash table struct hentry* AffixMgr::lookup(const char* word) { - int i; struct hentry* he = NULL; - for (i = 0; i < *maxdic && !he; i++) { - he = (alldic[i])->lookup(word); + for (size_t i = 0; i < alldic.size() && !he; ++i) { + he = alldic[i]->lookup(word); } return he; } @@ -3794,839 +3615,751 @@ int AffixMgr::get_sugswithdots(void) const { } /* parse flag */ -int AffixMgr::parse_flag(char* line, unsigned short* out, FileMgr* af) { - char* s = NULL; +bool AffixMgr::parse_flag(const std::string& line, unsigned short* out, FileMgr* af) { if (*out != FLAG_NULL && !(*out >= DEFAULTFLAGS)) { HUNSPELL_WARNING( stderr, "error: line %d: multiple definitions of an affix file parameter\n", af->getlinenum()); - return 1; + return false; } - if (parse_string(line, &s, af->getlinenum())) - return 1; - *out = pHMgr->decode_flag(s); - free(s); - return 0; + std::string s; + if (!parse_string(line, s, af->getlinenum())) + return false; + *out = pHMgr->decode_flag(s.c_str()); + return true; } /* parse num */ -int AffixMgr::parse_num(char* line, int* out, FileMgr* af) { - char* s = NULL; +bool AffixMgr::parse_num(const std::string& line, int* out, FileMgr* af) { if (*out != -1) { HUNSPELL_WARNING( stderr, "error: line %d: multiple definitions of an affix file parameter\n", af->getlinenum()); - return 1; + return false; } - if (parse_string(line, &s, af->getlinenum())) - return 1; - *out = atoi(s); - free(s); - return 0; + std::string s; + if (!parse_string(line, s, af->getlinenum())) + return false; + *out = atoi(s.c_str()); + return true; } /* parse in the max syllablecount of compound words and */ -int AffixMgr::parse_cpdsyllable(char* line, FileMgr* af) { - char* tp = line; - char* piece; +bool AffixMgr::parse_cpdsyllable(const std::string& line, FileMgr* af) { int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - cpdmaxsyllable = atoi(piece); - np++; - break; - } - case 2: { - if (!utf8) { - cpdvowels = mystrdup(piece); - } else { - std::vector<w_char> w; - u8_u16(w, piece); - if (!w.empty()) { - std::sort(w.begin(), w.end()); - cpdvowels_utf16 = (w_char*)malloc(w.size() * sizeof(w_char)); - if (!cpdvowels_utf16) - return 1; - memcpy(cpdvowels_utf16, &w[0], w.size()); - } - cpdvowels_utf16_len = w.size(); - } - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + cpdmaxsyllable = atoi(std::string(start_piece, iter).c_str()); + np++; + break; + } + case 2: { + if (!utf8) { + cpdvowels.assign(start_piece, iter); + std::sort(cpdvowels.begin(), cpdvowels.end()); + } else { + std::string piece(start_piece, iter); + u8_u16(cpdvowels_utf16, piece); + std::sort(cpdvowels_utf16.begin(), cpdvowels_utf16.end()); } - default: - break; + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np < 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing compoundsyllable information\n", af->getlinenum()); - return 1; + return false; } if (np == 2) - cpdvowels = mystrdup("aeiouAEIOU"); - return 0; + cpdvowels = "AEIOUaeiou"; + return true; } /* parse in the typical fault correcting table */ -int AffixMgr::parse_reptable(char* line, FileMgr* af) { - if (numrep != 0) { +bool AffixMgr::parse_reptable(const std::string& line, FileMgr* af) { + if (parsedrep) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; + parsedrep = true; + int numrep = -1; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - numrep = atoi(piece); - if (numrep < 1) { - HUNSPELL_WARNING(stderr, "error: line %d: incorrect entry number\n", - af->getlinenum()); - return 1; - } - reptable = (replentry*)malloc(numrep * sizeof(struct replentry)); - if (!reptable) - return 1; - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + numrep = atoi(std::string(start_piece, iter).c_str()); + if (numrep < 1) { + HUNSPELL_WARNING(stderr, "error: line %d: incorrect entry number\n", + af->getlinenum()); + return false; } - default: - break; + reptable.reserve(numrep); + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the numrep lines to read in the remainder of the table */ - char* nl; - for (int j = 0; j < numrep; j++) { - if ((nl = af->getline()) == NULL) - return 1; + for (int j = 0; j < numrep; ++j) { + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; + reptable.push_back(replentry()); + iter = nl.begin(); i = 0; - reptable[j].pattern = NULL; - reptable[j].pattern2 = NULL; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - if (strncmp(piece, "REP", 3) != 0) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - numrep = 0; - return 1; - } - break; - } - case 1: { - if (*piece == '^') - reptable[j].start = true; - else - reptable[j].start = false; - reptable[j].pattern = - mystrrep(mystrdup(piece + int(reptable[j].start)), "_", " "); - int lr = strlen(reptable[j].pattern) - 1; - if (reptable[j].pattern[lr] == '$') { - reptable[j].end = true; - reptable[j].pattern[lr] = '\0'; - } else - reptable[j].end = false; - break; + int type = 0; + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + case 0: { + if (nl.compare(start_piece - nl.begin(), 3, "REP", 3) != 0) { + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + reptable.clear(); + return false; } - case 2: { - reptable[j].pattern2 = mystrrep(mystrdup(piece), "_", " "); - break; + break; + } + case 1: { + if (*start_piece == '^') + type = 1; + reptable.back().pattern.assign(start_piece + type, iter); + mystrrep(reptable.back().pattern, "_", " "); + if (!reptable.back().pattern.empty() && reptable.back().pattern[reptable.back().pattern.size() - 1] == '$') { + type += 2; + reptable.back().pattern.resize(reptable.back().pattern.size() - 1); } - default: - break; + break; } - i++; + case 2: { + reptable.back().outstrings[type].assign(start_piece, iter); + mystrrep(reptable.back().outstrings[type], "_", " "); + break; + } + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(nl, iter); } - if ((!(reptable[j].pattern)) || (!(reptable[j].pattern2))) { + if (reptable.back().pattern.empty() || reptable.back().outstrings[type].empty()) { HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - numrep = 0; - return 1; + reptable.clear(); + return false; } } - return 0; + return true; } /* parse in the typical fault correcting table */ -int AffixMgr::parse_convtable(char* line, +bool AffixMgr::parse_convtable(const std::string& line, FileMgr* af, RepList** rl, - const char* keyword) { + const std::string& keyword) { if (*rl) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; int i = 0; int np = 0; int numrl = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - numrl = atoi(piece); - if (numrl < 1) { - HUNSPELL_WARNING(stderr, "error: line %d: incorrect entry number\n", - af->getlinenum()); - return 1; - } - *rl = new RepList(numrl); - if (!*rl) - return 1; - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + numrl = atoi(std::string(start_piece, iter).c_str()); + if (numrl < 1) { + HUNSPELL_WARNING(stderr, "error: line %d: incorrect entry number\n", + af->getlinenum()); + return false; } - default: - break; + *rl = new RepList(numrl); + if (!*rl) + return false; + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the num lines to read in the remainder of the table */ - char* nl; for (int j = 0; j < numrl; j++) { - if (!(nl = af->getline())) - return 1; + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; i = 0; - char* pattern = NULL; - char* pattern2 = NULL; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { + std::string pattern; + std::string pattern2; + iter = nl.begin(); + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + { switch (i) { case 0: { - if (strncmp(piece, keyword, strlen(keyword)) != 0) { + if (nl.compare(start_piece - nl.begin(), keyword.size(), keyword, 0, keyword.size()) != 0) { HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); delete *rl; *rl = NULL; - return 1; + return false; } break; } case 1: { - pattern = mystrrep(mystrdup(piece), "_", " "); + pattern.assign(start_piece, iter); break; } case 2: { - pattern2 = mystrrep(mystrdup(piece), "_", " "); + pattern2.assign(start_piece, iter); break; } default: break; } - i++; + ++i; } - piece = mystrsep(&tp, 0); + start_piece = mystrsep(nl, iter); } - if (!pattern || !pattern2) { - if (pattern) - free(pattern); - if (pattern2) - free(pattern2); + if (pattern.empty() || pattern2.empty()) { HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - return 1; + return false; } (*rl)->add(pattern, pattern2); } - return 0; + return true; } /* parse in the typical fault correcting table */ -int AffixMgr::parse_phonetable(char* line, FileMgr* af) { +bool AffixMgr::parse_phonetable(const std::string& line, FileMgr* af) { if (phone) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; + int num = -1; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - phone = (phonetable*)malloc(sizeof(struct phonetable)); - if (!phone) - return 1; - phone->num = atoi(piece); - phone->rules = NULL; - phone->utf8 = (char)utf8; - if (phone->num < 1) { - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - return 1; - } - phone->rules = (char**)malloc(2 * (phone->num + 1) * sizeof(char*)); - if (!phone->rules) { - free(phone); - phone = NULL; - return 1; - } - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + num = atoi(std::string(start_piece, iter).c_str()); + if (num < 1) { + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + return false; } - default: - break; + phone = new phonetable; + phone->utf8 = (char)utf8; + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the phone->num lines to read in the remainder of the table */ - char* nl; - for (int j = 0; j < phone->num; j++) { - if (!(nl = af->getline())) - return 1; + for (int j = 0; j < num; ++j) { + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; i = 0; - phone->rules[j * 2] = NULL; - phone->rules[j * 2 + 1] = NULL; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { + const size_t old_size = phone->rules.size(); + iter = nl.begin(); + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + { switch (i) { case 0: { - if (strncmp(piece, "PHONE", 5) != 0) { + if (nl.compare(start_piece - nl.begin(), 5, "PHONE", 5) != 0) { HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - phone->num = 0; - return 1; + return false; } break; } case 1: { - phone->rules[j * 2] = mystrrep(mystrdup(piece), "_", ""); + phone->rules.push_back(std::string(start_piece, iter)); break; } case 2: { - phone->rules[j * 2 + 1] = mystrrep(mystrdup(piece), "_", ""); + phone->rules.push_back(std::string(start_piece, iter)); + mystrrep(phone->rules.back(), "_", ""); break; } default: break; } - i++; + ++i; } - piece = mystrsep(&tp, 0); + start_piece = mystrsep(nl, iter); } - if ((!(phone->rules[j * 2])) || (!(phone->rules[j * 2 + 1]))) { + if (phone->rules.size() != old_size + 2) { HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - phone->num = 0; - return 1; + phone->rules.clear(); + return false; } } - phone->rules[phone->num * 2] = mystrdup(""); - phone->rules[phone->num * 2 + 1] = mystrdup(""); + phone->rules.push_back(""); + phone->rules.push_back(""); init_phonet_hash(*phone); - return 0; + return true; } /* parse in the checkcompoundpattern table */ -int AffixMgr::parse_checkcpdtable(char* line, FileMgr* af) { - if (numcheckcpd != 0) { +bool AffixMgr::parse_checkcpdtable(const std::string& line, FileMgr* af) { + if (parsedcheckcpd) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; + parsedcheckcpd = true; + int numcheckcpd = -1; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - numcheckcpd = atoi(piece); - if (numcheckcpd < 1) { - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - return 1; - } - checkcpdtable = - (patentry*)malloc(numcheckcpd * sizeof(struct patentry)); - if (!checkcpdtable) - return 1; - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + numcheckcpd = atoi(std::string(start_piece, iter).c_str()); + if (numcheckcpd < 1) { + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + return false; } - default: - break; + checkcpdtable.reserve(numcheckcpd); + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the numcheckcpd lines to read in the remainder of the table */ - char* nl; - for (int j = 0; j < numcheckcpd; j++) { - if (!(nl = af->getline())) - return 1; + for (int j = 0; j < numcheckcpd; ++j) { + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; i = 0; - checkcpdtable[j].pattern = NULL; - checkcpdtable[j].pattern2 = NULL; - checkcpdtable[j].pattern3 = NULL; - checkcpdtable[j].cond = FLAG_NULL; - checkcpdtable[j].cond2 = FLAG_NULL; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - if (strncmp(piece, "CHECKCOMPOUNDPATTERN", 20) != 0) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - numcheckcpd = 0; - return 1; - } - break; - } - case 1: { - checkcpdtable[j].pattern = mystrdup(piece); - char* p = strchr(checkcpdtable[j].pattern, '/'); - if (p) { - *p = '\0'; - checkcpdtable[j].cond = pHMgr->decode_flag(p + 1); - } - break; + checkcpdtable.push_back(patentry()); + iter = nl.begin(); + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + case 0: { + if (nl.compare(start_piece - nl.begin(), 20, "CHECKCOMPOUNDPATTERN", 20) != 0) { + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + return false; } - case 2: { - checkcpdtable[j].pattern2 = mystrdup(piece); - char* p = strchr(checkcpdtable[j].pattern2, '/'); - if (p) { - *p = '\0'; - checkcpdtable[j].cond2 = pHMgr->decode_flag(p + 1); - } - break; + break; + } + case 1: { + checkcpdtable.back().pattern.assign(start_piece, iter); + size_t slash_pos = checkcpdtable.back().pattern.find('/'); + if (slash_pos != std::string::npos) { + std::string chunk(checkcpdtable.back().pattern, slash_pos + 1); + checkcpdtable.back().pattern.resize(slash_pos); + checkcpdtable.back().cond = pHMgr->decode_flag(chunk.c_str()); } - case 3: { - checkcpdtable[j].pattern3 = mystrdup(piece); - simplifiedcpd = 1; - break; + break; + } + case 2: { + checkcpdtable.back().pattern2.assign(start_piece, iter); + size_t slash_pos = checkcpdtable.back().pattern2.find('/'); + if (slash_pos != std::string::npos) { + std::string chunk(checkcpdtable.back().pattern2, slash_pos + 1); + checkcpdtable.back().pattern2.resize(slash_pos); + checkcpdtable.back().cond2 = pHMgr->decode_flag(chunk.c_str()); } - default: - break; + break; + } + case 3: { + checkcpdtable.back().pattern3.assign(start_piece, iter); + simplifiedcpd = 1; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); - } - if ((!(checkcpdtable[j].pattern)) || (!(checkcpdtable[j].pattern2))) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - numcheckcpd = 0; - return 1; + i++; + start_piece = mystrsep(nl, iter); } } - return 0; + return true; } /* parse in the compound rule table */ -int AffixMgr::parse_defcpdtable(char* line, FileMgr* af) { - if (numdefcpd != 0) { +bool AffixMgr::parse_defcpdtable(const std::string& line, FileMgr* af) { + if (parseddefcpd) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; + parseddefcpd = true; + int numdefcpd = -1; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - numdefcpd = atoi(piece); - if (numdefcpd < 1) { - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - return 1; - } - defcpdtable = (flagentry*)malloc(numdefcpd * sizeof(flagentry)); - if (!defcpdtable) - return 1; - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + numdefcpd = atoi(std::string(start_piece, iter).c_str()); + if (numdefcpd < 1) { + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + return false; } - default: - break; + defcpdtable.reserve(numdefcpd); + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the numdefcpd lines to read in the remainder of the table */ - char* nl; - for (int j = 0; j < numdefcpd; j++) { - if (!(nl = af->getline())) - return 1; + for (int j = 0; j < numdefcpd; ++j) { + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; i = 0; - defcpdtable[j].def = NULL; - defcpdtable[j].len = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - if (strncmp(piece, "COMPOUNDRULE", 12) != 0) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - numdefcpd = 0; - return 1; - } - break; + defcpdtable.push_back(flagentry()); + iter = nl.begin(); + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + case 0: { + if (nl.compare(start_piece - nl.begin(), 12, "COMPOUNDRULE", 12) != 0) { + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + numdefcpd = 0; + return false; } - case 1: { // handle parenthesized flags - if (strchr(piece, '(')) { - defcpdtable[j].def = (FLAG*)malloc(strlen(piece) * sizeof(FLAG)); - defcpdtable[j].len = 0; - int end = 0; - FLAG* conv; - while (!end) { - char* par = piece + 1; - while (*par != '(' && *par != ')' && *par != '\0') - par++; - if (*par == '\0') - end = 1; - else - *par = '\0'; - if (*piece == '(') - piece++; - if (*piece == '*' || *piece == '?') { - defcpdtable[j].def[defcpdtable[j].len++] = (FLAG)*piece; - } else if (*piece != '\0') { - int l = pHMgr->decode_flags(&conv, piece, af); - for (int k = 0; k < l; k++) - defcpdtable[j].def[defcpdtable[j].len++] = conv[k]; - free(conv); + break; + } + case 1: { // handle parenthesized flags + if (std::find(start_piece, iter, '(') != iter) { + for (std::string::const_iterator k = start_piece; k != iter; ++k) { + std::string::const_iterator chb = k; + std::string::const_iterator che = k + 1; + if (*k == '(') { + std::string::const_iterator parpos = std::find(k, iter, ')'); + if (parpos != iter) { + chb = k + 1; + che = parpos; + k = parpos; } - piece = par + 1; } - } else { - defcpdtable[j].len = - pHMgr->decode_flags(&(defcpdtable[j].def), piece, af); + + if (*chb == '*' || *chb == '?') { + defcpdtable.back().push_back((FLAG)*chb); + } else { + pHMgr->decode_flags(defcpdtable.back(), std::string(chb, che), af); + } } - break; + } else { + pHMgr->decode_flags(defcpdtable.back(), std::string(start_piece, iter), af); } - default: - break; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(nl, iter); } - if (!defcpdtable[j].len) { + if (defcpdtable.back().empty()) { HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - numdefcpd = 0; - return 1; + return false; } } - return 0; + return true; } /* parse in the character map table */ -int AffixMgr::parse_maptable(char* line, FileMgr* af) { - if (nummap != 0) { +bool AffixMgr::parse_maptable(const std::string& line, FileMgr* af) { + if (parsedmaptable) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; + parsedmaptable = true; + int nummap = -1; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - nummap = atoi(piece); - if (nummap < 1) { - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - return 1; - } - maptable = (mapentry*)malloc(nummap * sizeof(struct mapentry)); - if (!maptable) - return 1; - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + nummap = atoi(std::string(start_piece, iter).c_str()); + if (nummap < 1) { + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + return false; } - default: - break; + maptable.reserve(nummap); + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the nummap lines to read in the remainder of the table */ - char* nl; - for (int j = 0; j < nummap; j++) { - if (!(nl = af->getline())) - return 1; + for (int j = 0; j < nummap; ++j) { + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; i = 0; - maptable[j].set = NULL; - maptable[j].len = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - if (strncmp(piece, "MAP", 3) != 0) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - nummap = 0; - return 1; - } - break; + maptable.push_back(mapentry()); + iter = nl.begin(); + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + case 0: { + if (nl.compare(start_piece - nl.begin(), 3, "MAP", 3) != 0) { + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + nummap = 0; + return false; } - case 1: { - int setn = 0; - maptable[j].len = strlen(piece); - maptable[j].set = (char**)malloc(maptable[j].len * sizeof(char*)); - if (!maptable[j].set) - return 1; - for (int k = 0; k < maptable[j].len; k++) { - int chl = 1; - int chb = k; - if (piece[k] == '(') { - char* parpos = strchr(piece + k, ')'); - if (parpos != NULL) { - chb = k + 1; - chl = (int)(parpos - piece) - k - 1; - k = k + chl + 1; - } - } else { - if (utf8 && (piece[k] & 0xc0) == 0xc0) { - for (k++; utf8 && (piece[k] & 0xc0) == 0x80; k++) - ; - chl = k - chb; - k--; - } + break; + } + case 1: { + for (std::string::const_iterator k = start_piece; k != iter; ++k) { + std::string::const_iterator chb = k; + std::string::const_iterator che = k + 1; + if (*k == '(') { + std::string::const_iterator parpos = std::find(k, iter, ')'); + if (parpos != iter) { + chb = k + 1; + che = parpos; + k = parpos; + } + } else { + if (utf8 && (*k & 0xc0) == 0xc0) { + ++k; + while (k != iter && (*k & 0xc0) == 0x80) + ++k; + che = k; + --k; } - maptable[j].set[setn] = (char*)malloc(chl + 1); - if (!maptable[j].set[setn]) - return 1; - strncpy(maptable[j].set[setn], piece + chb, chl); - maptable[j].set[setn][chl] = '\0'; - setn++; } - maptable[j].len = setn; - break; + maptable.back().push_back(std::string(chb, che)); } - default: - break; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(nl, iter); } - if (!maptable[j].set || !maptable[j].len) { + if (maptable.back().empty()) { HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - nummap = 0; - return 1; + return false; } } - return 0; + return true; } /* parse in the word breakpoint table */ -int AffixMgr::parse_breaktable(char* line, FileMgr* af) { - if (numbreak > -1) { +bool AffixMgr::parse_breaktable(const std::string& line, FileMgr* af) { + if (parsedbreaktable) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; + parsedbreaktable = true; + int numbreak = -1; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - numbreak = atoi(piece); - if (numbreak < 0) { - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - return 1; - } - if (numbreak == 0) - return 0; - breaktable = (char**)malloc(numbreak * sizeof(char*)); - if (!breaktable) - return 1; - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + numbreak = atoi(std::string(start_piece, iter).c_str()); + if (numbreak < 0) { + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + return false; } - default: - break; + if (numbreak == 0) + return true; + breaktable.reserve(numbreak); + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the numbreak lines to read in the remainder of the table */ - char* nl; - for (int j = 0; j < numbreak; j++) { - if (!(nl = af->getline())) - return 1; + for (int j = 0; j < numbreak; ++j) { + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; i = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - if (strncmp(piece, "BREAK", 5) != 0) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - numbreak = 0; - return 1; - } - break; - } - case 1: { - breaktable[j] = mystrdup(piece); - break; + iter = nl.begin(); + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + case 0: { + if (nl.compare(start_piece - nl.begin(), 5, "BREAK", 5) != 0) { + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + numbreak = 0; + return false; } - default: - break; + break; + } + case 1: { + breaktable.push_back(std::string(start_piece, iter)); + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); - } - if (!breaktable) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - numbreak = 0; - return 1; + ++i; + start_piece = mystrsep(nl, iter); } } - return 0; + + if (breaktable.size() != static_cast<size_t>(numbreak)) { + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + return false; + } + + return true; } void AffixMgr::reverse_condition(std::string& piece) { @@ -4665,20 +4398,68 @@ void AffixMgr::reverse_condition(std::string& piece) { } } -int AffixMgr::parse_affix(char* line, +class entries_container { + std::vector<AffEntry*> entries; + AffixMgr* m_mgr; + char m_at; +public: + entries_container(char at, AffixMgr* mgr) + : m_mgr(mgr) + , m_at(at) { + } + void release() { + entries.clear(); + } + void initialize(int numents, + char opts, unsigned short aflag) { + entries.reserve(numents); + + if (m_at == 'P') { + entries.push_back(new PfxEntry(m_mgr)); + } else { + entries.push_back(new SfxEntry(m_mgr)); + } + + entries.back()->opts = opts; + entries.back()->aflag = aflag; + } + + AffEntry* add_entry(char opts) { + if (m_at == 'P') { + entries.push_back(new PfxEntry(m_mgr)); + } else { + entries.push_back(new SfxEntry(m_mgr)); + } + AffEntry* ret = entries.back(); + ret->opts = entries[0]->opts & opts; + return ret; + } + + AffEntry* first_entry() { + return entries.empty() ? NULL : entries[0]; + } + + ~entries_container() { + for (size_t i = 0; i < entries.size(); ++i) { + delete entries[i]; + } + } + + std::vector<AffEntry*>::iterator begin() { return entries.begin(); } + std::vector<AffEntry*>::iterator end() { return entries.end(); } +}; + +bool AffixMgr::parse_affix(const std::string& line, const char at, FileMgr* af, char* dupflags) { - int numents = 0; // number of affentry structures to parse + int numents = 0; // number of AffEntry structures to parse unsigned short aflag = 0; // affix char identifier char ff = 0; - std::vector<affentry> affentries; + entries_container affentries(at, this); - char* tp = line; - char* nl = line; - char* piece; int i = 0; // checking lines with bad syntax @@ -4689,71 +4470,68 @@ int AffixMgr::parse_affix(char* line, // split affix header line into pieces int np = 0; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + // piece 1 - is type of affix + case 0: { + np++; + break; + } - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - // piece 1 - is type of affix - case 0: { - np++; - break; - } - - // piece 2 - is affix char - case 1: { - np++; - aflag = pHMgr->decode_flag(piece); - if (((at == 'S') && (dupflags[aflag] & dupSFX)) || - ((at == 'P') && (dupflags[aflag] & dupPFX))) { - HUNSPELL_WARNING( - stderr, - "error: line %d: multiple definitions of an affix flag\n", - af->getlinenum()); - // return 1; XXX permissive mode for bad dictionaries - } - dupflags[aflag] += (char)((at == 'S') ? dupSFX : dupPFX); - break; - } - // piece 3 - is cross product indicator - case 2: { - np++; - if (*piece == 'Y') - ff = aeXPRODUCT; - break; + // piece 2 - is affix char + case 1: { + np++; + aflag = pHMgr->decode_flag(std::string(start_piece, iter).c_str()); + if (((at == 'S') && (dupflags[aflag] & dupSFX)) || + ((at == 'P') && (dupflags[aflag] & dupPFX))) { + HUNSPELL_WARNING( + stderr, + "error: line %d: multiple definitions of an affix flag\n", + af->getlinenum()); } + dupflags[aflag] += (char)((at == 'S') ? dupSFX : dupPFX); + break; + } + // piece 3 - is cross product indicator + case 2: { + np++; + if (*start_piece == 'Y') + ff = aeXPRODUCT; + break; + } - // piece 4 - is number of affentries - case 3: { - np++; - numents = atoi(piece); - if ((numents <= 0) || ((std::numeric_limits<size_t>::max() / - sizeof(struct affentry)) < static_cast<size_t>(numents))) { - char* err = pHMgr->encode_flag(aflag); - if (err) { - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - free(err); - } - return 1; + // piece 4 - is number of affentries + case 3: { + np++; + numents = atoi(std::string(start_piece, iter).c_str()); + if ((numents <= 0) || ((std::numeric_limits<size_t>::max() / + sizeof(AffEntry)) < static_cast<size_t>(numents))) { + char* err = pHMgr->encode_flag(aflag); + if (err) { + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + free(err); } - affentries.resize(numents); - affentries[0].opts = ff; - if (utf8) - affentries[0].opts += aeUTF8; - if (pHMgr->is_aliasf()) - affentries[0].opts += aeALIASF; - if (pHMgr->is_aliasm()) - affentries[0].opts += aeALIASM; - affentries[0].aflag = aflag; + return false; } - default: - break; + char opts = ff; + if (utf8) + opts += aeUTF8; + if (pHMgr->is_aliasf()) + opts += aeALIASF; + if (pHMgr->is_aliasm()) + opts += aeALIASM; + affentries.initialize(numents, opts, aflag); } - i++; + + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } // check to make sure we parsed enough pieces if (np != 4) { @@ -4763,196 +4541,193 @@ int AffixMgr::parse_affix(char* line, af->getlinenum()); free(err); } - return 1; + return false; } // now parse numents affentries for this affix - std::vector<affentry>::iterator start = affentries.begin(); - std::vector<affentry>::iterator end = affentries.end(); - for (std::vector<affentry>::iterator entry = start; entry != end; ++entry) { - if ((nl = af->getline()) == NULL) - return 1; + AffEntry* entry = affentries.first_entry(); + for (int ent = 0; ent < numents; ++ent) { + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; + + iter = nl.begin(); i = 0; np = 0; // split line into pieces - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - // piece 1 - is type - case 0: { - np++; - if (entry != start) - entry->opts = start->opts & - (char)(aeXPRODUCT + aeUTF8 + aeALIASF + aeALIASM); - break; - } + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + // piece 1 - is type + case 0: { + np++; + if (ent != 0) + entry = affentries.add_entry((char)(aeXPRODUCT + aeUTF8 + aeALIASF + aeALIASM)); + break; + } - // piece 2 - is affix char - case 1: { - np++; - if (pHMgr->decode_flag(piece) != aflag) { - char* err = pHMgr->encode_flag(aflag); - if (err) { - HUNSPELL_WARNING(stderr, - "error: line %d: affix %s is corrupt\n", - af->getlinenum(), err); - free(err); - } - return 1; + // piece 2 - is affix char + case 1: { + np++; + std::string chunk(start_piece, iter); + if (pHMgr->decode_flag(chunk.c_str()) != aflag) { + char* err = pHMgr->encode_flag(aflag); + if (err) { + HUNSPELL_WARNING(stderr, + "error: line %d: affix %s is corrupt\n", + af->getlinenum(), err); + free(err); } - - if (entry != start) - entry->aflag = start->aflag; - break; + return false; } - // piece 3 - is string to strip or 0 for null - case 2: { - np++; - entry->strip = piece; - if (complexprefixes) { - if (utf8) - reverseword_utf(entry->strip); - else - reverseword(entry->strip); - } - if (entry->strip.compare("0") == 0) { - entry->strip.clear(); - } - break; + if (ent != 0) { + AffEntry* start_entry = affentries.first_entry(); + entry->aflag = start_entry->aflag; } + break; + } - // piece 4 - is affix string or 0 for null - case 3: { - char* dash; - entry->morphcode = NULL; - entry->contclass = NULL; - entry->contclasslen = 0; - np++; - dash = strchr(piece, '/'); - if (dash) { - *dash = '\0'; - - entry->appnd = piece; - - if (ignorechars) { - if (utf8) { - remove_ignored_chars_utf(entry->appnd, ignorechars_utf16); - } else { - remove_ignored_chars(entry->appnd, ignorechars); - } - } - - if (complexprefixes) { - if (utf8) - reverseword_utf(entry->appnd); - else - reverseword(entry->appnd); - } + // piece 3 - is string to strip or 0 for null + case 2: { + np++; + entry->strip = std::string(start_piece, iter); + if (complexprefixes) { + if (utf8) + reverseword_utf(entry->strip); + else + reverseword(entry->strip); + } + if (entry->strip.compare("0") == 0) { + entry->strip.clear(); + } + break; + } - if (pHMgr->is_aliasf()) { - int index = atoi(dash + 1); - entry->contclasslen = (unsigned short)pHMgr->get_aliasf( - index, &(entry->contclass), af); - if (!entry->contclasslen) - HUNSPELL_WARNING(stderr, - "error: bad affix flag alias: \"%s\"\n", - dash + 1); + // piece 4 - is affix string or 0 for null + case 3: { + entry->morphcode = NULL; + entry->contclass = NULL; + entry->contclasslen = 0; + np++; + std::string::const_iterator dash = std::find(start_piece, iter, '/'); + if (dash != iter) { + entry->appnd = std::string(start_piece, dash); + std::string dash_str(dash + 1, iter); + + if (!ignorechars.empty()) { + if (utf8) { + remove_ignored_chars_utf(entry->appnd, ignorechars_utf16); } else { - entry->contclasslen = (unsigned short)pHMgr->decode_flags( - &(entry->contclass), dash + 1, af); - std::sort(entry->contclass, entry->contclass + entry->contclasslen); + remove_ignored_chars(entry->appnd, ignorechars); } - *dash = '/'; + } - havecontclass = 1; - for (unsigned short _i = 0; _i < entry->contclasslen; _i++) { - contclasses[(entry->contclass)[_i]] = 1; - } + if (complexprefixes) { + if (utf8) + reverseword_utf(entry->appnd); + else + reverseword(entry->appnd); + } + + if (pHMgr->is_aliasf()) { + int index = atoi(dash_str.c_str()); + entry->contclasslen = (unsigned short)pHMgr->get_aliasf( + index, &(entry->contclass), af); + if (!entry->contclasslen) + HUNSPELL_WARNING(stderr, + "error: bad affix flag alias: \"%s\"\n", + dash_str.c_str()); } else { - entry->appnd = piece; + entry->contclasslen = (unsigned short)pHMgr->decode_flags( + &(entry->contclass), dash_str.c_str(), af); + std::sort(entry->contclass, entry->contclass + entry->contclasslen); + } - if (ignorechars) { - if (utf8) { - remove_ignored_chars_utf(entry->appnd, ignorechars_utf16); - } else { - remove_ignored_chars(entry->appnd, ignorechars); - } - } + havecontclass = 1; + for (unsigned short _i = 0; _i < entry->contclasslen; _i++) { + contclasses[(entry->contclass)[_i]] = 1; + } + } else { + entry->appnd = std::string(start_piece, iter); - if (complexprefixes) { - if (utf8) - reverseword_utf(entry->appnd); - else - reverseword(entry->appnd); + if (!ignorechars.empty()) { + if (utf8) { + remove_ignored_chars_utf(entry->appnd, ignorechars_utf16); + } else { + remove_ignored_chars(entry->appnd, ignorechars); } } - if (entry->appnd.compare("0") == 0) { - entry->appnd.clear(); + if (complexprefixes) { + if (utf8) + reverseword_utf(entry->appnd); + else + reverseword(entry->appnd); } - break; } - // piece 5 - is the conditions descriptions - case 4: { - std::string chunk(piece); - np++; - if (complexprefixes) { + if (entry->appnd.compare("0") == 0) { + entry->appnd.clear(); + } + break; + } + + // piece 5 - is the conditions descriptions + case 4: { + std::string chunk(start_piece, iter); + np++; + if (complexprefixes) { + if (utf8) + reverseword_utf(chunk); + else + reverseword(chunk); + reverse_condition(chunk); + } + if (!entry->strip.empty() && chunk != "." && + redundant_condition(at, entry->strip.c_str(), entry->strip.size(), chunk.c_str(), + af->getlinenum())) + chunk = "."; + if (at == 'S') { + reverseword(chunk); + reverse_condition(chunk); + } + if (encodeit(*entry, chunk.c_str())) + return false; + break; + } + + case 5: { + std::string chunk(start_piece, iter); + np++; + if (pHMgr->is_aliasm()) { + int index = atoi(chunk.c_str()); + entry->morphcode = pHMgr->get_aliasm(index); + } else { + if (complexprefixes) { // XXX - fix me for morph. gen. if (utf8) reverseword_utf(chunk); else reverseword(chunk); - reverse_condition(chunk); } - if (!entry->strip.empty() && chunk != "." && - redundant_condition(at, entry->strip.c_str(), entry->strip.size(), chunk.c_str(), - af->getlinenum())) - chunk = "."; - if (at == 'S') { - reverseword(chunk); - reverse_condition(chunk); - } - if (encodeit(*entry, chunk.c_str())) - return 1; - break; - } - - case 5: { - std::string chunk(piece); - np++; - if (pHMgr->is_aliasm()) { - int index = atoi(chunk.c_str()); - entry->morphcode = pHMgr->get_aliasm(index); - } else { - if (complexprefixes) { // XXX - fix me for morph. gen. - if (utf8) - reverseword_utf(chunk); - else - reverseword(chunk); - } - // add the remaining of the line - if (*tp) { - *(tp - 1) = ' '; - chunk.push_back(' '); - chunk.append(tp); - } - entry->morphcode = mystrdup(chunk.c_str()); - if (!entry->morphcode) - return 1; + // add the remaining of the line + std::string::const_iterator end = nl.end(); + if (iter != end) { + chunk.append(iter, end); } - break; + entry->morphcode = mystrdup(chunk.c_str()); + if (!entry->morphcode) + return false; } - default: - break; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + i++; + start_piece = mystrsep(nl, iter); } // check to make sure we parsed enough pieces if (np < 4) { @@ -4962,7 +4737,7 @@ int AffixMgr::parse_affix(char* line, af->getlinenum(), err); free(err); } - return 1; + return false; } #ifdef DEBUG @@ -4982,16 +4757,20 @@ int AffixMgr::parse_affix(char* line, // now create SfxEntry or PfxEntry objects and use links to // build an ordered (sorted by affix string) list - for (std::vector<affentry>::iterator entry = start; entry != end; ++entry) { + std::vector<AffEntry*>::iterator start = affentries.begin(); + std::vector<AffEntry*>::iterator end = affentries.end(); + for (std::vector<AffEntry*>::iterator affentry = start; affentry != end; ++affentry) { if (at == 'P') { - PfxEntry* pfxptr = new PfxEntry(this, &(*entry)); - build_pfxtree(pfxptr); + build_pfxtree(static_cast<PfxEntry*>(*affentry)); } else { - SfxEntry* sfxptr = new SfxEntry(this, &(*entry)); - build_sfxtree(sfxptr); + build_sfxtree(static_cast<SfxEntry*>(*affentry)); } } - return 0; + + //contents belong to AffixMgr now + affentries.release(); + + return true; } int AffixMgr::redundant_condition(char ft, @@ -5088,11 +4867,10 @@ int AffixMgr::redundant_condition(char ft, return 0; } -int AffixMgr::get_suffix_words(short unsigned* suff, +std::vector<std::string> AffixMgr::get_suffix_words(short unsigned* suff, int len, - const char* root_word, - char** slst) { - int suff_words_cnt = 0; + const char* root_word) { + std::vector<std::string> slst; short unsigned* start_ptr = suff; for (int j = 0; j < SETSIZE; j++) { SfxEntry* ptr = sStart[j]; @@ -5102,10 +4880,9 @@ int AffixMgr::get_suffix_words(short unsigned* suff, if ((*suff) == ptr->getFlag()) { std::string nw(root_word); nw.append(ptr->getAffix()); - hentry* ht = ptr->checkword(nw.c_str(), nw.size(), 0, NULL, NULL, 0, - NULL, 0, 0, 0); + hentry* ht = ptr->checkword(nw.c_str(), nw.size(), 0, NULL, 0, 0, 0); if (ht) { - slst[suff_words_cnt++] = mystrdup(nw.c_str()); + slst.push_back(nw); } } suff++; @@ -5113,5 +4890,5 @@ int AffixMgr::get_suffix_words(short unsigned* suff, ptr = ptr->getNext(); } } - return suff_words_cnt; + return slst; } diff --git a/libs/hunspell/src/affixmgr.hxx b/libs/hunspell/src/affixmgr.hxx index d70e853388..d41e69cfd2 100644 --- a/libs/hunspell/src/affixmgr.hxx +++ b/libs/hunspell/src/affixmgr.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -71,14 +68,13 @@ * SUCH DAMAGE. */ -#ifndef _AFFIXMGR_HXX_ -#define _AFFIXMGR_HXX_ - -#include "hunvisapi.h" +#ifndef AFFIXMGR_HXX_ +#define AFFIXMGR_HXX_ #include <stdio.h> #include <string> +#include <vector> #include "atypes.hxx" #include "baseaffix.hxx" @@ -93,17 +89,16 @@ class PfxEntry; class SfxEntry; -class LIBHUNSPELL_DLL_EXPORTED AffixMgr { +class AffixMgr { PfxEntry* pStart[SETSIZE]; SfxEntry* sStart[SETSIZE]; PfxEntry* pFlag[SETSIZE]; SfxEntry* sFlag[SETSIZE]; - HashMgr* pHMgr; - HashMgr** alldic; - int* maxdic; - char* keystring; - char* trystring; - char* encoding; + const std::vector<HashMgr*>& alldic; + const HashMgr* pHMgr; + std::string keystring; + std::string trystring; + std::string encoding; struct cs_info* csconv; int utf8; int complexprefixes; @@ -125,19 +120,19 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { FLAG nongramsuggest; FLAG needaffix; int cpdmin; - int numrep; - replentry* reptable; + bool parsedrep; + std::vector<replentry> reptable; RepList* iconvtable; RepList* oconvtable; - int nummap; - mapentry* maptable; - int numbreak; - char** breaktable; - int numcheckcpd; - patentry* checkcpdtable; + bool parsedmaptable; + std::vector<mapentry> maptable; + bool parsedbreaktable; + std::vector<std::string> breaktable; + bool parsedcheckcpd; + std::vector<patentry> checkcpdtable; int simplifiedcpd; - int numdefcpd; - flagentry* defcpdtable; + bool parseddefcpd; + std::vector<flagentry> defcpdtable; phonetable* phone; int maxngramsugs; int maxcpdsugs; @@ -147,10 +142,9 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { int sugswithdots; int cpdwordmax; int cpdmaxsyllable; - char* cpdvowels; - w_char* cpdvowels_utf16; - int cpdvowels_utf16_len; - char* cpdsyllablenum; + std::string cpdvowels; // vowels (for calculating of Hungarian compounding limit, + std::vector<w_char> cpdvowels_utf16; //vowels for UTF-8 encoding + std::string cpdsyllablenum; // syllable count incrementing flag const char* pfxappnd; // BUG: not stateless const char* sfxappnd; // BUG: not stateless int sfxextra; // BUG: not stateless @@ -159,12 +153,12 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { SfxEntry* sfx; // BUG: not stateless PfxEntry* pfx; // BUG: not stateless int checknum; - char* wordchars; + std::string wordchars; // letters + spec. word characters std::vector<w_char> wordchars_utf16; - char* ignorechars; + std::string ignorechars; // letters + spec. word characters std::vector<w_char> ignorechars_utf16; - char* version; - char* lang; + std::string version; // affix and dictionary file version string + std::string lang; // language int langnum; FLAG lemma_present; FLAG circumfix; @@ -182,7 +176,7 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { // affix) public: - AffixMgr(const char* affpath, HashMgr** ptr, int* md, const char* key = NULL); + AffixMgr(const char* affpath, const std::vector<HashMgr*>& ptr, const char* key = NULL); ~AffixMgr(); struct hentry* affix_check(const char* word, int len, @@ -202,9 +196,6 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { int len, int sfxopts, PfxEntry* ppfx, - char** wlst, - int maxSug, - int* ns, const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT); @@ -214,39 +205,39 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { PfxEntry* ppfx, const FLAG needflag = FLAG_NULL); - char* affix_check_morph(const char* word, - int len, - const FLAG needflag = FLAG_NULL, - char in_compound = IN_CPD_NOT); - char* prefix_check_morph(const char* word, - int len, - char in_compound, - const FLAG needflag = FLAG_NULL); - char* suffix_check_morph(const char* word, - int len, - int sfxopts, - PfxEntry* ppfx, - const FLAG cclass = FLAG_NULL, - const FLAG needflag = FLAG_NULL, - char in_compound = IN_CPD_NOT); + std::string affix_check_morph(const char* word, + int len, + const FLAG needflag = FLAG_NULL, + char in_compound = IN_CPD_NOT); + std::string prefix_check_morph(const char* word, + int len, + char in_compound, + const FLAG needflag = FLAG_NULL); + std::string suffix_check_morph(const char* word, + int len, + int sfxopts, + PfxEntry* ppfx, + const FLAG cclass = FLAG_NULL, + const FLAG needflag = FLAG_NULL, + char in_compound = IN_CPD_NOT); - char* prefix_check_twosfx_morph(const char* word, - int len, - char in_compound, - const FLAG needflag = FLAG_NULL); - char* suffix_check_twosfx_morph(const char* word, - int len, - int sfxopts, - PfxEntry* ppfx, - const FLAG needflag = FLAG_NULL); + std::string prefix_check_twosfx_morph(const char* word, + int len, + char in_compound, + const FLAG needflag = FLAG_NULL); + std::string suffix_check_twosfx_morph(const char* word, + int len, + int sfxopts, + PfxEntry* ppfx, + const FLAG needflag = FLAG_NULL); - char* morphgen(const char* ts, - int wl, - const unsigned short* ap, - unsigned short al, - const char* morph, - const char* targetmorph, - int level); + std::string morphgen(const char* ts, + int wl, + const unsigned short* ap, + unsigned short al, + const char* morph, + const char* targetmorph, + int level); int expand_rootword(struct guessword* wlst, int maxn, @@ -273,8 +264,7 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { int cpdcase_check(const char* word, int len); inline int candidate_check(const char* word, int len); void setcminmax(int* cmin, int* cmax, const char* word, int len); - struct hentry* compound_check(const char* word, - int len, + struct hentry* compound_check(const std::string& word, short wordnum, short numsyllable, short maxwordnum, @@ -294,47 +284,37 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { hentry** words, hentry** rwords, char hu_mov_rule, - char** result, - char* partresult); + std::string& result, + const std::string* partresult); - int get_suffix_words(short unsigned* suff, + std::vector<std::string> get_suffix_words(short unsigned* suff, int len, - const char* root_word, - char** slst); + const char* root_word); struct hentry* lookup(const char* word); - int get_numrep() const; - struct replentry* get_reptable() const; + const std::vector<replentry>& get_reptable() const; RepList* get_iconvtable() const; RepList* get_oconvtable() const; struct phonetable* get_phonetable() const; - int get_nummap() const; - struct mapentry* get_maptable() const; - int get_numbreak() const; - char** get_breaktable() const; - char* get_encoding(); + const std::vector<mapentry>& get_maptable() const; + const std::vector<std::string>& get_breaktable() const; + const std::string& get_encoding(); int get_langnum() const; char* get_key_string(); char* get_try_string() const; - const char* get_wordchars() const; + const std::string& get_wordchars() const; const std::vector<w_char>& get_wordchars_utf16() const; - char* get_ignore() const; + const char* get_ignore() const; const std::vector<w_char>& get_ignore_utf16() const; int get_compound() const; FLAG get_compoundflag() const; - FLAG get_compoundbegin() const; FLAG get_forbiddenword() const; FLAG get_nosuggest() const; FLAG get_nongramsuggest() const; FLAG get_needaffix() const; FLAG get_onlyincompound() const; - FLAG get_compoundroot() const; - FLAG get_lemma_present() const; - int get_checknum() const; - const char* get_prefix() const; - const char* get_suffix() const; const char* get_derived() const; - const char* get_version() const; + const std::string& get_version() const; int have_contclass() const; int get_utf8() const; int get_complexprefixes() const; @@ -355,26 +335,25 @@ class LIBHUNSPELL_DLL_EXPORTED AffixMgr { private: int parse_file(const char* affpath, const char* key); - int parse_flag(char* line, unsigned short* out, FileMgr* af); - int parse_num(char* line, int* out, FileMgr* af); - int parse_cpdsyllable(char* line, FileMgr* af); - int parse_reptable(char* line, FileMgr* af); - int parse_convtable(char* line, + bool parse_flag(const std::string& line, unsigned short* out, FileMgr* af); + bool parse_num(const std::string& line, int* out, FileMgr* af); + bool parse_cpdsyllable(const std::string& line, FileMgr* af); + bool parse_reptable(const std::string& line, FileMgr* af); + bool parse_convtable(const std::string& line, FileMgr* af, RepList** rl, - const char* keyword); - int parse_phonetable(char* line, FileMgr* af); - int parse_maptable(char* line, FileMgr* af); - int parse_breaktable(char* line, FileMgr* af); - int parse_checkcpdtable(char* line, FileMgr* af); - int parse_defcpdtable(char* line, FileMgr* af); - int parse_affix(char* line, const char at, FileMgr* af, char* dupflags); + const std::string& keyword); + bool parse_phonetable(const std::string& line, FileMgr* af); + bool parse_maptable(const std::string& line, FileMgr* af); + bool parse_breaktable(const std::string& line, FileMgr* af); + bool parse_checkcpdtable(const std::string& line, FileMgr* af); + bool parse_defcpdtable(const std::string& line, FileMgr* af); + bool parse_affix(const std::string& line, const char at, FileMgr* af, char* dupflags); void reverse_condition(std::string&); - void debugflag(char* result, unsigned short flag); std::string& debugflag(std::string& result, unsigned short flag); int condlen(const char*); - int encodeit(affentry& entry, const char* cs); + int encodeit(AffEntry& entry, const char* cs); int build_pfxtree(PfxEntry* pfxptr); int build_sfxtree(SfxEntry* sfxptr); int process_pfx_order(); diff --git a/libs/hunspell/src/atypes.hxx b/libs/hunspell/src/atypes.hxx index 60826af20e..f841523189 100644 --- a/libs/hunspell/src/atypes.hxx +++ b/libs/hunspell/src/atypes.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -38,8 +35,8 @@ * * ***** END LICENSE BLOCK ***** */ -#ifndef _ATYPES_HXX_ -#define _ATYPES_HXX_ +#ifndef ATYPES_HXX_ +#define ATYPES_HXX_ #ifndef HUNSPELL_WARNING #include <stdio.h> @@ -55,15 +52,15 @@ static inline void HUNSPELL_WARNING(FILE*, const char*, ...) {} // HUNSTEM def. #define HUNSTEM -#include "hashmgr.hxx" #include "w_char.hxx" #include <algorithm> #include <string> +#include <vector> #define SETSIZE 256 #define CONTSIZE 65536 -// affentry options +// AffEntry options #define aeXPRODUCT (1 << 0) #define aeUTF8 (1 << 1) #define aeALIASF (1 << 2) @@ -85,8 +82,6 @@ static inline void HUNSPELL_WARNING(FILE*, const char*, ...) {} #define SPELL_ORIGCAP (1 << 5) #define SPELL_WARN (1 << 6) -#define MAXLNLEN 8192 - #define MINCPDLEN 3 #define MAXCOMPOUND 10 #define MAXCONDLEN 20 @@ -100,46 +95,25 @@ static inline void HUNSPELL_WARNING(FILE*, const char*, ...) {} #define TESTAFF(a, b, c) (std::binary_search(a, a + c, b)) -struct affentry { - std::string strip; - std::string appnd; - char numconds; - char opts; - unsigned short aflag; - unsigned short* contclass; - short contclasslen; - union { - char conds[MAXCONDLEN]; - struct { - char conds1[MAXCONDLEN_1]; - char* conds2; - } l; - } c; - char* morphcode; -}; - struct guessword { char* word; bool allow; char* orig; }; -struct mapentry { - char** set; - int len; -}; - -struct flagentry { - FLAG* def; - int len; -}; +typedef std::vector<std::string> mapentry; +typedef std::vector<FLAG> flagentry; struct patentry { - char* pattern; - char* pattern2; - char* pattern3; + std::string pattern; + std::string pattern2; + std::string pattern3; FLAG cond; FLAG cond2; + patentry() + : cond(FLAG_NULL) + , cond2(FLAG_NULL) { + } }; #endif diff --git a/libs/hunspell/src/baseaffix.hxx b/libs/hunspell/src/baseaffix.hxx index 59256e92f3..9191dba475 100644 --- a/libs/hunspell/src/baseaffix.hxx +++ b/libs/hunspell/src/baseaffix.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -38,18 +35,17 @@ * * ***** END LICENSE BLOCK ***** */ -#ifndef _BASEAFF_HXX_ -#define _BASEAFF_HXX_ +#ifndef BASEAFF_HXX_ +#define BASEAFF_HXX_ -#include "hunvisapi.h" #include <string> -class LIBHUNSPELL_DLL_EXPORTED AffEntry { +class AffEntry { private: AffEntry(const AffEntry&); AffEntry& operator=(const AffEntry&); - protected: + public: AffEntry() : numconds(0), opts(0), @@ -57,6 +53,7 @@ class LIBHUNSPELL_DLL_EXPORTED AffEntry { morphcode(0), contclass(NULL), contclasslen(0) {} + virtual ~AffEntry(); std::string appnd; std::string strip; unsigned char numconds; diff --git a/libs/hunspell/src/config.h b/libs/hunspell/src/config.h index 1230ed0be7..f3b64fb819 100644 --- a/libs/hunspell/src/config.h +++ b/libs/hunspell/src/config.h @@ -201,5 +201,5 @@ #define PACKAGE_TARNAME /* Define to the version of this package. */ -#define PACKAGE_VERSION "1.4.0" -#define VERSION "1.4.0" +#define PACKAGE_VERSION "1.6.2" +#define VERSION "1.6.2" diff --git a/libs/hunspell/src/csutil.c++ b/libs/hunspell/src/csutil.cxx index 1948e4a3b3..be43a5b597 100644 --- a/libs/hunspell/src/csutil.c++ +++ b/libs/hunspell/src/csutil.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -76,6 +73,7 @@ #include <string.h> #include <stdio.h> #include <ctype.h> +#include <sstream> #include "csutil.hxx" #include "atypes.hxx" @@ -97,7 +95,7 @@ struct unicode_info { #include <unicode/uchar.h> #else #ifndef MOZILLA_CLIENT -#include "utf_info.cxx" +#include "utf_info.c++" #define UTF_LST_LEN (sizeof(utf_lst) / (sizeof(unicode_info))) #endif #endif @@ -122,26 +120,24 @@ static struct unicode_info2* utf_tbl = NULL; static int utf_tbl_count = 0; // utf_tbl can be used by multiple Hunspell instances -FILE* myfopen(const char* path, const char* mode) { -#ifdef _WIN32 +void myopen(std::ifstream& stream, const char* path, std::ios_base::openmode mode) +{ +#if defined(_WIN32) && defined(_MSC_VER) #define WIN32_LONG_PATH_PREFIX "\\\\?\\" if (strncmp(path, WIN32_LONG_PATH_PREFIX, 4) == 0) { int len = MultiByteToWideChar(CP_UTF8, 0, path, -1, NULL, 0); - wchar_t* buff = (wchar_t*)malloc(len * sizeof(wchar_t)); - wchar_t* buff2 = (wchar_t*)malloc(len * sizeof(wchar_t)); - FILE* f = NULL; - if (buff && buff2) { - MultiByteToWideChar(CP_UTF8, 0, path, -1, buff, len); - if (_wfullpath(buff2, buff, len) != NULL) { - f = _wfopen(buff2, (strcmp(mode, "r") == 0) ? L"r" : L"rb"); - } - free(buff); - free(buff2); + wchar_t* buff = new wchar_t[len]; + wchar_t* buff2 = new wchar_t[len]; + MultiByteToWideChar(CP_UTF8, 0, path, -1, buff, len); + if (_wfullpath(buff2, buff, len) != NULL) { + stream.open(buff2, mode); } - return f; + delete [] buff; + delete [] buff2; } + else #endif - return fopen(path, mode); + stream.open(path, mode); } std::string& u16_u8(std::string& dest, const std::vector<w_char>& src) { @@ -218,7 +214,7 @@ int u8_u16(std::vector<w_char>& dest, const std::string& src) { case 0xd0: { // 2-byte UTF-8 codes if ((*(u8 + 1) & 0xc0) == 0x80) { u2.h = (*u8 & 0x1f) >> 2; - u2.l = (*u8 << 6) + (*(u8 + 1) & 0x3f); + u2.l = (static_cast<unsigned char>(*u8) << 6) + (*(u8 + 1) & 0x3f); ++u8; } else { HUNSPELL_WARNING(stderr, @@ -275,34 +271,35 @@ int u8_u16(std::vector<w_char>& dest, const std::string& src) { return dest.size(); } -// strip strings into token based on single char delimiter -// acts like strsep() but only uses a delim char and not -// a delim string -// default delimiter: white space characters - -char* mystrsep(char** stringp, const char delim) { - char* mp = *stringp; - if (*mp != '\0') { - char* dp; - if (delim) { - dp = strchr(mp, delim); - } else { - // don't use isspace() here, the string can be in some random charset - // that's way different than the locale's - for (dp = mp; (*dp && *dp != ' ' && *dp != '\t'); dp++) - ; - if (!*dp) - dp = NULL; - } - if (dp) { - *stringp = dp + 1; - *dp = '\0'; - } else { - *stringp = mp + strlen(mp); - } - return mp; - } - return NULL; +namespace { +class is_any_of { + public: + explicit is_any_of(const std::string& in) : chars(in) {} + + bool operator()(char c) { return chars.find(c) != std::string::npos; } + + private: + std::string chars; +}; +} + +std::string::const_iterator mystrsep(const std::string &str, + std::string::const_iterator& start) { + std::string::const_iterator end = str.end(); + + is_any_of op(" \t"); + // don't use isspace() here, the string can be in some random charset + // that's way different than the locale's + std::string::const_iterator sp = start; + while (sp != end && op(*sp)) + ++sp; + + std::string::const_iterator dp = sp; + while (dp != end && !op(*dp)) + ++dp; + + start = dp; + return sp; } // replaces strdup with ansi version @@ -320,142 +317,98 @@ char* mystrdup(const char* s) { return d; } -// strcat for limited length destination string -char* mystrcat(char* dest, const char* st, int max) { - int len; - int len2; - if (dest == NULL || st == NULL) - return dest; - len = strlen(dest); - len2 = strlen(st); - if (len + len2 + 1 > max) - return dest; - strcpy(dest + len, st); - return dest; -} - // remove cross-platform text line end characters -void mychomp(char* s) { - size_t k = strlen(s); - if ((k > 0) && ((*(s + k - 1) == '\r') || (*(s + k - 1) == '\n'))) - *(s + k - 1) = '\0'; - if ((k > 1) && (*(s + k - 2) == '\r')) - *(s + k - 2) = '\0'; +void mychomp(std::string& s) { + size_t k = s.size(); + size_t newsize = k; + if ((k > 0) && ((s[k - 1] == '\r') || (s[k - 1] == '\n'))) + --newsize; + if ((k > 1) && (s[k - 2] == '\r')) + --newsize; + s.resize(newsize); } // break text to lines -// return number of lines -int line_tok(const char* text, char*** lines, char breakchar) { - int linenum = 0; - if (!text) { - return linenum; - } - char* dup = mystrdup(text); - char* p = strchr(dup, breakchar); - while (p) { - linenum++; - *p = '\0'; - p++; - p = strchr(p, breakchar); - } - linenum++; - *lines = (char**)malloc(linenum * sizeof(char*)); - if (!(*lines)) { - free(dup); - return 0; +std::vector<std::string> line_tok(const std::string& text, char breakchar) { + std::vector<std::string> ret; + if (text.empty()) { + return ret; } - p = dup; - int l = 0; - for (int i = 0; i < linenum; i++) { - if (*p != '\0') { - (*lines)[l] = mystrdup(p); - if (!(*lines)[l]) { - for (i = 0; i < l; i++) - free((*lines)[i]); - free(dup); - return 0; - } - l++; + std::stringstream ss(text); + std::string tok; + while(std::getline(ss, tok, breakchar)) { + if (!tok.empty()) { + ret.push_back(tok); } - p += strlen(p) + 1; } - free(dup); - if (!l) { - free(*lines); - *lines = NULL; - } - return l; + + return ret; } // uniq line in place -char* line_uniq(char* text, char breakchar) { - char** lines; - int linenum = line_tok(text, &lines, breakchar); - int i; - strcpy(text, lines[0]); - for (i = 1; i < linenum; i++) { - int dup = 0; - for (int j = 0; j < i; j++) { - if (strcmp(lines[i], lines[j]) == 0) { - dup = 1; +void line_uniq(std::string& text, char breakchar) +{ + std::vector<std::string> lines = line_tok(text, breakchar); + text.clear(); + if (lines.empty()) { + return; + } + text = lines[0]; + for (size_t i = 1; i < lines.size(); ++i) { + bool dup = false; + for (size_t j = 0; j < i; ++j) { + if (lines[i] == lines[j]) { + dup = true; break; } } if (!dup) { - if ((i > 1) || (*(lines[0]) != '\0')) { - sprintf(text + strlen(text), "%c", breakchar); - } - strcat(text, lines[i]); + if (!text.empty()) + text.push_back(breakchar); + text.append(lines[i]); } } - for (i = 0; i < linenum; i++) { - free(lines[i]); - } - free(lines); - return text; } // uniq and boundary for compound analysis: "1\n\2\n\1" -> " ( \1 | \2 ) " -char* line_uniq_app(char** text, char breakchar) { - if (!strchr(*text, breakchar)) { - return *text; +void line_uniq_app(std::string& text, char breakchar) { + if (text.find(breakchar) == std::string::npos) { + return; } - char** lines; - int i; - int linenum = line_tok(*text, &lines, breakchar); - int dup = 0; - for (i = 0; i < linenum; i++) { - for (int j = 0; j < (i - 1); j++) { - if (strcmp(lines[i], lines[j]) == 0) { - *(lines[i]) = '\0'; - dup++; + std::vector<std::string> lines = line_tok(text, breakchar); + text.clear(); + if (lines.empty()) { + return; + } + text = lines[0]; + for (size_t i = 1; i < lines.size(); ++i) { + bool dup = false; + for (size_t j = 0; j < i; ++j) { + if (lines[i] == lines[j]) { + dup = true; break; } } + if (!dup) { + if (!text.empty()) + text.push_back(breakchar); + text.append(lines[i]); + } } - if ((linenum - dup) == 1) { - strcpy(*text, lines[0]); - freelist(&lines, linenum); - return *text; + + if (lines.size() == 1) { + text = lines[0]; + return; } - char* newtext = (char*)malloc(strlen(*text) + 2 * linenum + 3 + 1); - if (newtext) { - free(*text); - *text = newtext; - } else { - freelist(&lines, linenum); - return *text; + + text.assign(" ( "); + for (size_t i = 0; i < lines.size(); ++i) { + text.append(lines[i]); + text.append(" | "); } - strcpy(*text, " ( "); - for (i = 0; i < linenum; i++) - if (*(lines[i])) { - sprintf(*text + strlen(*text), "%s%s", lines[i], " | "); - } - (*text)[strlen(*text) - 2] = ')'; // " ) " - freelist(&lines, linenum); - return *text; + text[text.size() - 2] = ')'; // " ) " } // append s to ends of every lines in text @@ -469,111 +422,6 @@ std::string& strlinecat(std::string& str, const std::string& apd) { return str; } -// morphcmp(): compare MORPH_DERI_SFX, MORPH_INFL_SFX and MORPH_TERM_SFX fields -// in the first line of the inputs -// return 0, if inputs equal -// return 1, if inputs may equal with a secondary suffix -// otherwise return -1 -int morphcmp(const char* s, const char* t) { - int se = 0; - int te = 0; - const char* sl; - const char* tl; - const char* olds; - const char* oldt; - if (!s || !t) - return 1; - olds = s; - sl = strchr(s, '\n'); - s = strstr(s, MORPH_DERI_SFX); - if (!s || (sl && sl < s)) - s = strstr(olds, MORPH_INFL_SFX); - if (!s || (sl && sl < s)) { - s = strstr(olds, MORPH_TERM_SFX); - olds = NULL; - } - oldt = t; - tl = strchr(t, '\n'); - t = strstr(t, MORPH_DERI_SFX); - if (!t || (tl && tl < t)) - t = strstr(oldt, MORPH_INFL_SFX); - if (!t || (tl && tl < t)) { - t = strstr(oldt, MORPH_TERM_SFX); - oldt = NULL; - } - while (s && t && (!sl || sl > s) && (!tl || tl > t)) { - s += MORPH_TAG_LEN; - t += MORPH_TAG_LEN; - se = 0; - te = 0; - while ((*s == *t) && !se && !te) { - s++; - t++; - switch (*s) { - case ' ': - case '\n': - case '\t': - case '\0': - se = 1; - } - switch (*t) { - case ' ': - case '\n': - case '\t': - case '\0': - te = 1; - } - } - if (!se || !te) { - // not terminal suffix difference - if (olds) - return -1; - return 1; - } - olds = s; - s = strstr(s, MORPH_DERI_SFX); - if (!s || (sl && sl < s)) - s = strstr(olds, MORPH_INFL_SFX); - if (!s || (sl && sl < s)) { - s = strstr(olds, MORPH_TERM_SFX); - olds = NULL; - } - oldt = t; - t = strstr(t, MORPH_DERI_SFX); - if (!t || (tl && tl < t)) - t = strstr(oldt, MORPH_INFL_SFX); - if (!t || (tl && tl < t)) { - t = strstr(oldt, MORPH_TERM_SFX); - oldt = NULL; - } - } - if (!s && !t && se && te) - return 0; - return 1; -} - -int get_sfxcount(const char* morph) { - if (!morph || !*morph) - return 0; - int n = 0; - const char* old = morph; - morph = strstr(morph, MORPH_DERI_SFX); - if (!morph) - morph = strstr(old, MORPH_INFL_SFX); - if (!morph) - morph = strstr(old, MORPH_TERM_SFX); - while (morph) { - n++; - old = morph; - morph = strstr(morph + 1, MORPH_DERI_SFX); - if (!morph) - morph = strstr(old + 1, MORPH_INFL_SFX); - if (!morph) - morph = strstr(old + 1, MORPH_TERM_SFX); - } - return n; -} - int fieldlen(const char* r) { int n = 0; while (r && *r != ' ' && *r != '\t' && *r != '\0' && *r != '\n') { @@ -615,33 +463,6 @@ std::string& mystrrep(std::string& str, return str; } -char* mystrrep(char* word, const char* pat, const char* rep) { - char* pos = strstr(word, pat); - if (pos) { - int replen = strlen(rep); - int patlen = strlen(pat); - while (pos) { - if (replen < patlen) { - char* end = word + strlen(word); - char* next = pos + replen; - char* prev = pos + strlen(pat); - for (; prev < end;* next = *prev, prev++, next++) - ; - *next = '\0'; - } else if (replen > patlen) { - char* end = pos + patlen; - char* next = word + strlen(word) + replen - patlen; - char* prev = next - replen + patlen; - for (; prev >= end;* next = *prev, prev--, next--) - ; - } - strncpy(pos, rep, replen); - pos = strstr(word, pat); - } - } - return word; -} - // reverse word size_t reverseword(std::string& word) { std::reverse(word.begin(), word.end()); @@ -657,35 +478,19 @@ size_t reverseword_utf(std::string& word) { return w.size(); } -int uniqlist(char** list, int n) { - int i; - if (n < 2) - return n; - for (i = 0; i < n; i++) { - for (int j = 0; j < i; j++) { - if (list[j] && list[i] && (strcmp(list[j], list[i]) == 0)) { - free(list[i]); - list[i] = NULL; - break; - } - } - } - int m = 1; - for (i = 1; i < n; i++) - if (list[i]) { - list[m] = list[i]; - m++; - } - return m; -} +void uniqlist(std::vector<std::string>& list) { + if (list.size() < 2) + return; -void freelist(char*** list, int n) { - if (list && *list) { - for (int i = 0; i < n; i++) - free((*list)[i]); - free(*list); - *list = NULL; + std::vector<std::string> ret; + ret.push_back(list[0]); + + for (size_t i = 1; i < list.size(); ++i) { + if (std::find(ret.begin(), ret.end(), list[i]) == ret.end()) + ret.push_back(list[i]); } + + list.swap(ret); } namespace { @@ -710,18 +515,20 @@ unsigned char ccase(const struct cs_info* csconv, int nIndex) { w_char upper_utf(w_char u, int langnum) { unsigned short idx = (u.h << 8) + u.l; - if (idx != unicodetoupper(idx, langnum)) { - u.h = (unsigned char)(unicodetoupper(idx, langnum) >> 8); - u.l = (unsigned char)(unicodetoupper(idx, langnum) & 0x00FF); + unsigned short upridx = unicodetoupper(idx, langnum); + if (idx != upridx) { + u.h = (unsigned char)(upridx >> 8); + u.l = (unsigned char)(upridx & 0x00FF); } return u; } w_char lower_utf(w_char u, int langnum) { unsigned short idx = (u.h << 8) + u.l; - if (idx != unicodetolower(idx, langnum)) { - u.h = (unsigned char)(unicodetolower(idx, langnum) >> 8); - u.l = (unsigned char)(unicodetolower(idx, langnum) & 0x00FF); + unsigned short lwridx = unicodetolower(idx, langnum); + if (idx != lwridx) { + u.h = (unsigned char)(lwridx >> 8); + u.l = (unsigned char)(lwridx & 0x00FF); } return u; } @@ -743,12 +550,13 @@ std::string& mkallsmall(std::string& s, const struct cs_info* csconv) { } std::vector<w_char>& mkallsmall_utf(std::vector<w_char>& u, - int langnum) { + int langnum) { for (size_t i = 0; i < u.size(); ++i) { unsigned short idx = (u[i].h << 8) + u[i].l; - if (idx != unicodetolower(idx, langnum)) { - u[i].h = (unsigned char)(unicodetolower(idx, langnum) >> 8); - u[i].l = (unsigned char)(unicodetolower(idx, langnum) & 0x00FF); + unsigned short lwridx = unicodetolower(idx, langnum); + if (idx != lwridx) { + u[i].h = (unsigned char)(lwridx >> 8); + u[i].l = (unsigned char)(lwridx & 0x00FF); } } return u; @@ -757,9 +565,10 @@ std::vector<w_char>& mkallsmall_utf(std::vector<w_char>& u, std::vector<w_char>& mkallcap_utf(std::vector<w_char>& u, int langnum) { for (size_t i = 0; i < u.size(); i++) { unsigned short idx = (u[i].h << 8) + u[i].l; - if (idx != unicodetoupper(idx, langnum)) { - u[i].h = (unsigned char)(unicodetoupper(idx, langnum) >> 8); - u[i].l = (unsigned char)(unicodetoupper(idx, langnum) & 0x00FF); + unsigned short upridx = unicodetoupper(idx, langnum); + if (idx != upridx) { + u[i].h = (unsigned char)(upridx >> 8); + u[i].l = (unsigned char)(upridx & 0x00FF); } } return u; @@ -775,9 +584,10 @@ std::string& mkinitcap(std::string& s, const struct cs_info* csconv) { std::vector<w_char>& mkinitcap_utf(std::vector<w_char>& u, int langnum) { if (!u.empty()) { unsigned short idx = (u[0].h << 8) + u[0].l; - if (idx != unicodetoupper(idx, langnum)) { - u[0].h = (unsigned char)(unicodetoupper(idx, langnum) >> 8); - u[0].l = (unsigned char)(unicodetoupper(idx, langnum) & 0x00FF); + unsigned short upridx = unicodetoupper(idx, langnum); + if (idx != upridx) { + u[0].h = (unsigned char)(upridx >> 8); + u[0].l = (unsigned char)(upridx & 0x00FF); } } return u; @@ -793,9 +603,10 @@ std::string& mkinitsmall(std::string& s, const struct cs_info* csconv) { std::vector<w_char>& mkinitsmall_utf(std::vector<w_char>& u, int langnum) { if (!u.empty()) { unsigned short idx = (u[0].h << 8) + u[0].l; - if (idx != unicodetolower(idx, langnum)) { - u[0].h = (unsigned char)(unicodetolower(idx, langnum) >> 8); - u[0].l = (unsigned char)(unicodetolower(idx, langnum) & 0x00FF); + unsigned short lwridx = unicodetolower(idx, langnum); + if (idx != lwridx) { + u[0].h = (unsigned char)(lwridx >> 8); + u[0].l = (unsigned char)(lwridx & 0x00FF); } } return u; @@ -2457,9 +2268,9 @@ static void toAsciiLowerAndRemoveNonAlphanumeric(const char* pName, *pBuf = '\0'; } -struct cs_info* get_current_cs(const char* es) { - char* normalized_encoding = new char[strlen(es) + 1]; - toAsciiLowerAndRemoveNonAlphanumeric(es, normalized_encoding); +struct cs_info* get_current_cs(const std::string& es) { + char* normalized_encoding = new char[es.size() + 1]; + toAsciiLowerAndRemoveNonAlphanumeric(es.c_str(), normalized_encoding); struct cs_info* ccs = NULL; int n = sizeof(encds) / sizeof(encds[0]); @@ -2474,7 +2285,7 @@ struct cs_info* get_current_cs(const char* es) { if (!ccs) { HUNSPELL_WARNING(stderr, - "error: unknown encoding %s: using %s as fallback\n", es, + "error: unknown encoding %s: using %s as fallback\n", es.c_str(), encds[0].enc_name); ccs = encds[0].cs_table; } @@ -2485,7 +2296,7 @@ struct cs_info* get_current_cs(const char* es) { // XXX This function was rewritten for mozilla. Instead of storing the // conversion tables static in this file, create them when needed // with help the mozilla backend. -struct cs_info* get_current_cs(const char* es) { +struct cs_info* get_current_cs(const std::string& es) { struct cs_info* ccs = new cs_info[256]; // Initialze the array with dummy data so that we wouldn't need // to return null in case of failures. @@ -2500,7 +2311,7 @@ struct cs_info* get_current_cs(const char* es) { nsresult rv; - nsAutoCString label(es); + nsAutoCString label(es.c_str()); nsAutoCString encoding; if (!EncodingUtils::FindEncodingForLabelNoReplacement(label, encoding)) { return ccs; @@ -2565,21 +2376,18 @@ struct cs_info* get_current_cs(const char* es) { #endif // primitive isalpha() replacement for tokenization -char* get_casechars(const char* enc) { +std::string get_casechars(const char* enc) { struct cs_info* csconv = get_current_cs(enc); - char expw[MAXLNLEN]; - char* p = expw; - for (int i = 0; i <= 255; i++) { + std::string expw; + for (int i = 0; i <= 255; ++i) { if (cupper(csconv, i) != clower(csconv, i)) { - *p = static_cast<char>(i); - p++; + expw.push_back(static_cast<char>(i)); } } - *p = '\0'; #ifdef MOZILLA_CLIENT delete[] csconv; #endif - return mystrdup(expw); + return expw; } // language to encoding default map @@ -2606,10 +2414,10 @@ static struct lang_map lang2enc[] = {"tr_TR", LANG_tr}, // for back-compatibility {"ru", LANG_ru}, {"uk", LANG_uk}}; -int get_lang_num(const char* lang) { +int get_lang_num(const std::string& lang) { int n = sizeof(lang2enc) / sizeof(lang2enc[0]); for (int i = 0; i < n; i++) { - if (strcmp(lang, lang2enc[i].lang) == 0) { + if (strcmp(lang.c_str(), lang2enc[i].lang) == 0) { return lang2enc[i].num; } } @@ -2618,26 +2426,21 @@ int get_lang_num(const char* lang) { #ifndef OPENOFFICEORG #ifndef MOZILLA_CLIENT -int initialize_utf_tbl() { +void initialize_utf_tbl() { utf_tbl_count++; if (utf_tbl) - return 0; - utf_tbl = (unicode_info2*)malloc(CONTSIZE * sizeof(unicode_info2)); - if (utf_tbl) { - size_t j; - for (j = 0; j < CONTSIZE; j++) { - utf_tbl[j].cletter = 0; - utf_tbl[j].clower = (unsigned short)j; - utf_tbl[j].cupper = (unsigned short)j; - } - for (j = 0; j < UTF_LST_LEN; j++) { - utf_tbl[utf_lst[j].c].cletter = 1; - utf_tbl[utf_lst[j].c].clower = utf_lst[j].clower; - utf_tbl[utf_lst[j].c].cupper = utf_lst[j].cupper; - } - } else - return 1; - return 0; + return; + utf_tbl = new unicode_info2[CONTSIZE]; + for (size_t j = 0; j < CONTSIZE; ++j) { + utf_tbl[j].cletter = 0; + utf_tbl[j].clower = (unsigned short)j; + utf_tbl[j].cupper = (unsigned short)j; + } + for (size_t j = 0; j < UTF_LST_LEN; ++j) { + utf_tbl[utf_lst[j].c].cletter = 1; + utf_tbl[utf_lst[j].c].clower = utf_lst[j].clower; + utf_tbl[utf_lst[j].c].cupper = utf_lst[j].cupper; + } } #endif #endif @@ -2646,7 +2449,7 @@ void free_utf_tbl() { if (utf_tbl_count > 0) utf_tbl_count--; if (utf_tbl && (utf_tbl_count == 0)) { - free(utf_tbl); + delete[] utf_tbl; utf_tbl = NULL; } } @@ -2731,12 +2534,17 @@ int get_captype_utf8(const std::vector<w_char>& word, int langnum) { size_t ncap = 0; size_t nneutral = 0; size_t firstcap = 0; - for (size_t i = 0; i < word.size(); ++i) { - unsigned short idx = (word[i].h << 8) + word[i].l; - if (idx != unicodetolower(idx, langnum)) + + std::vector<w_char>::const_iterator it = word.begin(); + std::vector<w_char>::const_iterator it_end = word.end(); + while (it != it_end) { + unsigned short idx = (it->h << 8) + it->l; + unsigned short lwridx = unicodetolower(idx, langnum); + if (idx != lwridx) ncap++; - if (unicodetoupper(idx, langnum) == unicodetolower(idx, langnum)) + if (unicodetoupper(idx, langnum) == lwridx) nneutral++; + ++it; } if (ncap) { unsigned short idx = (word[0].h << 8) + word[0].l; @@ -2775,18 +2583,6 @@ size_t remove_ignored_chars_utf(std::string& word, return w2.size(); } -namespace { -class is_any_of { - public: - is_any_of(const std::string& in) : chars(in) {} - - bool operator()(char c) { return chars.find(c) != std::string::npos; } - - private: - std::string chars; -}; -} - // strip all ignored characters in the string size_t remove_ignored_chars(std::string& word, const std::string& ignored_chars) { @@ -2796,54 +2592,48 @@ size_t remove_ignored_chars(std::string& word, return word.size(); } -int parse_string(char* line, char** out, int ln) { - char* tp = line; - char* piece; - int i = 0; - int np = 0; - if (*out) { +bool parse_string(const std::string& line, std::string& out, int ln) { + if (!out.empty()) { HUNSPELL_WARNING(stderr, "error: line %d: multiple definitions\n", ln); - return 1; + return false; } - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; - } - case 1: { - *out = mystrdup(piece); - if (!*out) - return 1; - np++; - break; - } - default: - break; + int i = 0; + int np = 0; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + out.assign(start_piece, iter); + np++; + break; } - i++; + default: + break; } - // free(piece); - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", ln); - return 1; + return false; } - return 0; + return true; } -bool parse_array(char* line, - char** out, +bool parse_array(const std::string& line, + std::string& out, std::vector<w_char>& out_utf16, int utf8, int ln) { - if (parse_string(line, out, ln)) + if (!parse_string(line, out, ln)) return false; if (utf8) { - u8_u16(out_utf16, *out); + u8_u16(out_utf16, out); std::sort(out_utf16.begin(), out_utf16.end()); } return true; diff --git a/libs/hunspell/src/csutil.hxx b/libs/hunspell/src/csutil.hxx index ce7091df55..5d83f80970 100644 --- a/libs/hunspell/src/csutil.hxx +++ b/libs/hunspell/src/csutil.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -71,13 +68,14 @@ * SUCH DAMAGE. */ -#ifndef __CSUTILHXX__ -#define __CSUTILHXX__ +#ifndef CSUTIL_HXX_ +#define CSUTIL_HXX_ #include "hunvisapi.h" // First some base level utility routines +#include <fstream> #include <string> #include <vector> #include <string.h> @@ -127,8 +125,9 @@ #define FORBIDDENWORD 65510 #define ONLYUPCASEFLAG 65511 -// fopen or optional _wfopen to fix long pathname problem of WIN32 -LIBHUNSPELL_DLL_EXPORTED FILE* myfopen(const char* path, const char* mode); +// fix long pathname problem of WIN32 by using w_char std::fstream::open override +LIBHUNSPELL_DLL_EXPORTED void myopen(std::ifstream& stream, const char* path, + std::ios_base::openmode mode); // convert UTF-16 characters to UTF-8 LIBHUNSPELL_DLL_EXPORTED std::string& u16_u8(std::string& dest, @@ -139,21 +138,16 @@ LIBHUNSPELL_DLL_EXPORTED int u8_u16(std::vector<w_char>& dest, const std::string& src); // remove end of line char(s) -LIBHUNSPELL_DLL_EXPORTED void mychomp(char* s); +LIBHUNSPELL_DLL_EXPORTED void mychomp(std::string& s); // duplicate string LIBHUNSPELL_DLL_EXPORTED char* mystrdup(const char* s); -// strcat for limited length destination string -LIBHUNSPELL_DLL_EXPORTED char* mystrcat(char* dest, const char* st, int max); - // parse into tokens with char delimiter -LIBHUNSPELL_DLL_EXPORTED char* mystrsep(char** sptr, const char delim); +LIBHUNSPELL_DLL_EXPORTED std::string::const_iterator mystrsep(const std::string &str, + std::string::const_iterator& start); // replace pat by rep in word and return word -LIBHUNSPELL_DLL_EXPORTED char* mystrrep(char* word, - const char* pat, - const char* rep); LIBHUNSPELL_DLL_EXPORTED std::string& mystrrep(std::string& str, const std::string& search, const std::string& replace); @@ -163,13 +157,13 @@ LIBHUNSPELL_DLL_EXPORTED std::string& strlinecat(std::string& str, const std::string& apd); // tokenize into lines with new line -LIBHUNSPELL_DLL_EXPORTED int line_tok(const char* text, - char*** lines, - char breakchar); +LIBHUNSPELL_DLL_EXPORTED std::vector<std::string> line_tok(const std::string& text, + char breakchar); // tokenize into lines with new line and uniq in place -LIBHUNSPELL_DLL_EXPORTED char* line_uniq(char* text, char breakchar); -LIBHUNSPELL_DLL_EXPORTED char* line_uniq_app(char** text, char breakchar); +LIBHUNSPELL_DLL_EXPORTED void line_uniq(std::string& text, char breakchar); + +LIBHUNSPELL_DLL_EXPORTED void line_uniq_app(std::string& text, char breakchar); // reverse word LIBHUNSPELL_DLL_EXPORTED size_t reverseword(std::string& word); @@ -178,10 +172,7 @@ LIBHUNSPELL_DLL_EXPORTED size_t reverseword(std::string& word); LIBHUNSPELL_DLL_EXPORTED size_t reverseword_utf(std::string&); // remove duplicates -LIBHUNSPELL_DLL_EXPORTED int uniqlist(char** list, int n); - -// free character array list -LIBHUNSPELL_DLL_EXPORTED void freelist(char*** list, int n); +LIBHUNSPELL_DLL_EXPORTED void uniqlist(std::vector<std::string>& list); // character encoding information struct cs_info { @@ -190,7 +181,7 @@ struct cs_info { unsigned char cupper; }; -LIBHUNSPELL_DLL_EXPORTED int initialize_utf_tbl(); +LIBHUNSPELL_DLL_EXPORTED void initialize_utf_tbl(); LIBHUNSPELL_DLL_EXPORTED void free_utf_tbl(); LIBHUNSPELL_DLL_EXPORTED unsigned short unicodetoupper(unsigned short c, int langnum); @@ -200,13 +191,13 @@ LIBHUNSPELL_DLL_EXPORTED unsigned short unicodetolower(unsigned short c, int langnum); LIBHUNSPELL_DLL_EXPORTED int unicodeisalpha(unsigned short c); -LIBHUNSPELL_DLL_EXPORTED struct cs_info* get_current_cs(const char* es); +LIBHUNSPELL_DLL_EXPORTED struct cs_info* get_current_cs(const std::string& es); // get language identifiers of language codes -LIBHUNSPELL_DLL_EXPORTED int get_lang_num(const char* lang); +LIBHUNSPELL_DLL_EXPORTED int get_lang_num(const std::string& lang); // get characters of the given 8bit encoding with lower- and uppercase forms -LIBHUNSPELL_DLL_EXPORTED char* get_casechars(const char* enc); +LIBHUNSPELL_DLL_EXPORTED std::string get_casechars(const char* enc); // convert std::string to all caps LIBHUNSPELL_DLL_EXPORTED std::string& mkallcap(std::string& s, @@ -256,10 +247,12 @@ LIBHUNSPELL_DLL_EXPORTED size_t remove_ignored_chars( std::string& word, const std::string& ignored_chars); -LIBHUNSPELL_DLL_EXPORTED int parse_string(char* line, char** out, int ln); +LIBHUNSPELL_DLL_EXPORTED bool parse_string(const std::string& line, + std::string& out, + int ln); -LIBHUNSPELL_DLL_EXPORTED bool parse_array(char* line, - char** out, +LIBHUNSPELL_DLL_EXPORTED bool parse_array(const std::string& line, + std::string& out, std::vector<w_char>& out_utf16, int utf8, int ln); @@ -270,10 +263,6 @@ LIBHUNSPELL_DLL_EXPORTED bool copy_field(std::string& dest, const std::string& morph, const std::string& var); -LIBHUNSPELL_DLL_EXPORTED int morphcmp(const char* s, const char* t); - -LIBHUNSPELL_DLL_EXPORTED int get_sfxcount(const char* morph); - // conversion function for protected memory LIBHUNSPELL_DLL_EXPORTED void store_pointer(char* dest, char* source); diff --git a/libs/hunspell/src/dictmgr.c++ b/libs/hunspell/src/dictmgr.c++ deleted file mode 100644 index 473c09acfe..0000000000 --- a/libs/hunspell/src/dictmgr.c++ +++ /dev/null @@ -1,216 +0,0 @@ -/* ***** BEGIN LICENSE BLOCK ***** - * Version: MPL 1.1/GPL 2.0/LGPL 2.1 - * - * The contents of this file are subject to the Mozilla Public License Version - * 1.1 (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * http://www.mozilla.org/MPL/ - * - * Software distributed under the License is distributed on an "AS IS" basis, - * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License - * for the specific language governing rights and limitations under the - * License. - * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. - * - * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, - * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, - * Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter, - * Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls, - * Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen - * - * Alternatively, the contents of this file may be used under the terms of - * either the GNU General Public License Version 2 or later (the "GPL"), or - * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), - * in which case the provisions of the GPL or the LGPL are applicable instead - * of those above. If you wish to allow use of your version of this file only - * under the terms of either the GPL or the LGPL, and not to allow others to - * use your version of this file under the terms of the MPL, indicate your - * decision by deleting the provisions above and replace them with the notice - * and other provisions required by the GPL or the LGPL. If you do not delete - * the provisions above, a recipient may use your version of this file under - * the terms of any one of the MPL, the GPL or the LGPL. - * - * ***** END LICENSE BLOCK ***** */ - -#include <stdlib.h> -#include <string.h> -#include <ctype.h> -#include <stdio.h> - -#include "dictmgr.hxx" -#include "csutil.hxx" - -DictMgr::DictMgr(const char* dictpath, const char* etype) : numdict(0) { - // load list of etype entries - pdentry = (dictentry*)malloc(MAXDICTIONARIES * sizeof(struct dictentry)); - if (pdentry) { - if (parse_file(dictpath, etype)) { - numdict = 0; - // no dictionary.lst found is okay - } - } -} - -DictMgr::~DictMgr() { - dictentry* pdict = NULL; - if (pdentry) { - pdict = pdentry; - for (int i = 0; i < numdict; i++) { - if (pdict->lang) { - free(pdict->lang); - pdict->lang = NULL; - } - if (pdict->region) { - free(pdict->region); - pdict->region = NULL; - } - if (pdict->filename) { - free(pdict->filename); - pdict->filename = NULL; - } - pdict++; - } - free(pdentry); - pdentry = NULL; - pdict = NULL; - } - numdict = 0; -} - -// read in list of etype entries and build up structure to describe them -int DictMgr::parse_file(const char* dictpath, const char* etype) { - int i; - char line[MAXDICTENTRYLEN + 1]; - dictentry* pdict = pdentry; - - // open the dictionary list file - FILE* dictlst; - dictlst = myfopen(dictpath, "r"); - if (!dictlst) { - return 1; - } - - // step one is to parse the dictionary list building up the - // descriptive structures - - // read in each line ignoring any that dont start with etype - while (fgets(line, MAXDICTENTRYLEN, dictlst)) { - mychomp(line); - - /* parse in a dictionary entry */ - if (strncmp(line, etype, 4) == 0) { - if (numdict < MAXDICTIONARIES) { - char* tp = line; - char* piece; - i = 0; - while ((piece = mystrsep(&tp, ' '))) { - if (*piece != '\0') { - switch (i) { - case 0: - break; - case 1: - pdict->lang = mystrdup(piece); - break; - case 2: - if (strcmp(piece, "ANY") == 0) - pdict->region = mystrdup(""); - else - pdict->region = mystrdup(piece); - break; - case 3: - pdict->filename = mystrdup(piece); - break; - default: - break; - } - i++; - } - free(piece); - } - if (i == 4) { - numdict++; - pdict++; - } else { - switch (i) { - case 3: - free(pdict->region); - pdict->region = NULL; - /* FALLTHROUGH */ - case 2: - free(pdict->lang); - pdict->lang = NULL; - default: - break; - } - fprintf(stderr, "dictionary list corruption in line \"%s\"\n", line); - fflush(stderr); - } - } - } - } - fclose(dictlst); - return 0; -} - -// return text encoding of dictionary -int DictMgr::get_list(dictentry** ppentry) { - *ppentry = pdentry; - return numdict; -} - -// strip strings into token based on single char delimiter -// acts like strsep() but only uses a delim char and not -// a delim string - -char* DictMgr::mystrsep(char** stringp, const char delim) { - char* rv = NULL; - char* mp = *stringp; - size_t n = strlen(mp); - if (n > 0) { - char* dp = (char*)memchr(mp, (int)((unsigned char)delim), n); - if (dp) { - *stringp = dp + 1; - size_t nc = dp - mp; - rv = (char*)malloc(nc + 1); - if (rv) { - memcpy(rv, mp, nc); - *(rv + nc) = '\0'; - } - } else { - rv = (char*)malloc(n + 1); - if (rv) { - memcpy(rv, mp, n); - *(rv + n) = '\0'; - *stringp = mp + n; - } - } - } - return rv; -} - -// replaces strdup with ansi version -char* DictMgr::mystrdup(const char* s) { - char* d = NULL; - if (s) { - int sl = strlen(s) + 1; - d = (char*)malloc(sl); - if (d) - memcpy(d, s, sl); - } - return d; -} - -// remove cross-platform text line end characters -void DictMgr::mychomp(char* s) { - int k = strlen(s); - if ((k > 0) && ((*(s + k - 1) == '\r') || (*(s + k - 1) == '\n'))) - *(s + k - 1) = '\0'; - if ((k > 1) && (*(s + k - 2) == '\r')) - *(s + k - 2) = '\0'; -} diff --git a/libs/hunspell/src/dictmgr.hxx b/libs/hunspell/src/dictmgr.hxx deleted file mode 100644 index 98134c3b2f..0000000000 --- a/libs/hunspell/src/dictmgr.hxx +++ /dev/null @@ -1,76 +0,0 @@ -/* ***** BEGIN LICENSE BLOCK ***** - * Version: MPL 1.1/GPL 2.0/LGPL 2.1 - * - * The contents of this file are subject to the Mozilla Public License Version - * 1.1 (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * http://www.mozilla.org/MPL/ - * - * Software distributed under the License is distributed on an "AS IS" basis, - * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License - * for the specific language governing rights and limitations under the - * License. - * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. - * - * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, - * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, - * Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter, - * Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls, - * Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen - * - * Alternatively, the contents of this file may be used under the terms of - * either the GNU General Public License Version 2 or later (the "GPL"), or - * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), - * in which case the provisions of the GPL or the LGPL are applicable instead - * of those above. If you wish to allow use of your version of this file only - * under the terms of either the GPL or the LGPL, and not to allow others to - * use your version of this file under the terms of the MPL, indicate your - * decision by deleting the provisions above and replace them with the notice - * and other provisions required by the GPL or the LGPL. If you do not delete - * the provisions above, a recipient may use your version of this file under - * the terms of any one of the MPL, the GPL or the LGPL. - * - * ***** END LICENSE BLOCK ***** */ - -#ifndef _DICTMGR_HXX_ -#define _DICTMGR_HXX_ - -#include "hunvisapi.h" - -#define MAXDICTIONARIES 100 -#define MAXDICTENTRYLEN 1024 - -struct dictentry { - char* filename; - char* lang; - char* region; -}; - -class LIBHUNSPELL_DLL_EXPORTED DictMgr { - private: - DictMgr(const DictMgr&); - DictMgr& operator=(const DictMgr&); - - private: - int numdict; - dictentry* pdentry; - - public: - DictMgr(const char* dictpath, const char* etype); - ~DictMgr(); - int get_list(dictentry** ppentry); - - private: - int parse_file(const char* dictpath, const char* etype); - char* mystrsep(char** stringp, const char delim); - char* mystrdup(const char* s); - void mychomp(char* s); -}; - -#endif diff --git a/libs/hunspell/src/filemgr.c++ b/libs/hunspell/src/filemgr.cxx index 2218bc79e1..4a14de8762 100644 --- a/libs/hunspell/src/filemgr.c++ +++ b/libs/hunspell/src/filemgr.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -86,33 +83,33 @@ int FileMgr::fail(const char* err, const char* par) { FileMgr::FileMgr(const char* file, const char* key) : hin(NULL), linenum(0) { in[0] = '\0'; - fin = myfopen(file, "r"); - if (!fin) { + myopen(fin, file, std::ios_base::in); + if (!fin.is_open()) { // check hzipped file std::string st(file); st.append(HZIP_EXTENSION); hin = new Hunzip(st.c_str(), key); } - if (!fin && !hin) + if (!fin.is_open() && !hin->is_open()) fail(MSG_OPEN, file); } FileMgr::~FileMgr() { - if (fin) - fclose(fin); - if (hin) - delete hin; + delete hin; } -char* FileMgr::getline() { - const char* l; - linenum++; - if (fin) - return fgets(in, BUFSIZE - 1, fin); - if (hin && ((l = hin->getline()) != NULL)) - return strcpy(in, l); - linenum--; - return NULL; +bool FileMgr::getline(std::string& dest) { + bool ret = false; + ++linenum; + if (fin.is_open()) { + ret = static_cast<bool>(std::getline(fin, dest)); + } else if (hin->is_open()) { + ret = hin->getline(dest); + } + if (!ret) { + --linenum; + } + return ret; } int FileMgr::getlinenum() { diff --git a/libs/hunspell/src/filemgr.hxx b/libs/hunspell/src/filemgr.hxx index 8b69931ddb..62433aeefe 100644 --- a/libs/hunspell/src/filemgr.hxx +++ b/libs/hunspell/src/filemgr.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -72,21 +69,21 @@ */ /* file manager class - read lines of files [filename] OR [filename.hz] */ -#ifndef _FILEMGR_HXX_ -#define _FILEMGR_HXX_ - -#include "hunvisapi.h" +#ifndef FILEMGR_HXX_ +#define FILEMGR_HXX_ #include "hunzip.hxx" #include <stdio.h> +#include <string> +#include <fstream> -class LIBHUNSPELL_DLL_EXPORTED FileMgr { +class FileMgr { private: FileMgr(const FileMgr&); FileMgr& operator=(const FileMgr&); protected: - FILE* fin; + std::ifstream fin; Hunzip* hin; char in[BUFSIZE + 50]; // input buffer int fail(const char* err, const char* par); @@ -95,7 +92,7 @@ class LIBHUNSPELL_DLL_EXPORTED FileMgr { public: FileMgr(const char* filename, const char* key = NULL); ~FileMgr(); - char* getline(); + bool getline(std::string&); int getlinenum(); }; #endif diff --git a/libs/hunspell/src/hashmgr.c++ b/libs/hunspell/src/hashmgr.cxx index c3cd95420f..23421b567a 100644 --- a/libs/hunspell/src/hashmgr.c++ +++ b/libs/hunspell/src/hashmgr.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -98,20 +95,19 @@ HashMgr::HashMgr(const char* tpath, const char* apath, const char* key) numaliasm(0), aliasm(NULL) { langnum = 0; - lang = NULL; - enc = NULL; csconv = 0; - ignorechars = NULL; load_config(apath, key); int ec = load_tables(tpath, key); if (ec) { /* error condition - what should we do here */ HUNSPELL_WARNING(stderr, "Hash Manager Error : %d\n", ec); - if (tableptr) { - free(tableptr); - tableptr = NULL; + free(tableptr); + //keep tablesize to 1 to fix possible division with zero + tablesize = 1; + tableptr = (struct hentry**)calloc(tablesize, sizeof(struct hentry*)); + if (!tableptr) { + tablesize = 0; } - tablesize = 0; } } @@ -159,14 +155,6 @@ HashMgr::~HashMgr() { #endif #endif - if (enc) - free(enc); - if (lang) - free(lang); - - if (ignorechars) - free(ignorechars); - #ifdef MOZILLA_CLIENT delete[] csconv; #endif @@ -189,20 +177,21 @@ struct hentry* HashMgr::lookup(const char* word) const { } // add a word to the hash table (private) -int HashMgr::add_word(const char* word, - int wbl, +int HashMgr::add_word(const std::string& in_word, int wcl, unsigned short* aff, int al, - const char* desc, + const std::string* in_desc, bool onlyupcase) { + const std::string* word = &in_word; + const std::string* desc = in_desc; std::string *word_copy = NULL; std::string *desc_copy = NULL; - if (ignorechars || complexprefixes) { - word_copy = new std::string(word, wbl); + if (!ignorechars.empty() || complexprefixes) { + word_copy = new std::string(in_word); - if (ignorechars != NULL) { + if (!ignorechars.empty()) { if (utf8) { wcl = remove_ignored_chars_utf(*word_copy, ignorechars_utf16); } else { @@ -216,8 +205,8 @@ int HashMgr::add_word(const char* word, else reverseword(*word_copy); - if (desc && !aliasm) { - desc_copy = new std::string(desc); + if (in_desc && !aliasm) { + desc_copy = new std::string(*in_desc); if (complexprefixes) { if (utf8) @@ -225,19 +214,18 @@ int HashMgr::add_word(const char* word, else reverseword(*desc_copy); } - desc = desc_copy->c_str(); + desc = desc_copy; } } - wbl = word_copy->size(); - word = word_copy->c_str(); + word = word_copy; } bool upcasehomonym = false; - int descl = desc ? (aliasm ? sizeof(char*) : strlen(desc) + 1) : 0; + int descl = desc ? (aliasm ? sizeof(char*) : desc->size() + 1) : 0; // variable-length hash record with word and optional fields struct hentry* hp = - (struct hentry*)malloc(sizeof(struct hentry) + wbl + descl); + (struct hentry*)malloc(sizeof(struct hentry) + word->size() + descl); if (!hp) { delete desc_copy; delete word_copy; @@ -245,11 +233,11 @@ int HashMgr::add_word(const char* word, } char* hpw = hp->word; - strcpy(hpw, word); + strcpy(hpw, word->c_str()); int i = hash(hpw); - hp->blen = (unsigned char)wbl; + hp->blen = (unsigned char)word->size(); hp->clen = (unsigned char)wcl; hp->alen = (short)al; hp->astr = aff; @@ -261,9 +249,9 @@ int HashMgr::add_word(const char* word, hp->var = H_OPT; if (aliasm) { hp->var += H_OPT_ALIASM; - store_pointer(hpw + wbl + 1, get_aliasm(atoi(desc))); + store_pointer(hpw + word->size() + 1, get_aliasm(atoi(desc->c_str()))); } else { - strcpy(hpw + wbl + 1, desc); + strcpy(hpw + word->size() + 1, desc->c_str()); } if (strstr(HENTRY_DATA(hp), MORPH_PHON)) hp->var += H_OPT_PHON; @@ -334,7 +322,7 @@ int HashMgr::add_hidden_capitalized_word(const std::string& word, int wcl, unsigned short* flags, int flagslen, - char* dp, + const std::string* dp, int captype) { if (flags == NULL) flagslen = 0; @@ -359,12 +347,12 @@ int HashMgr::add_hidden_capitalized_word(const std::string& word, mkallsmall_utf(w, langnum); mkinitcap_utf(w, langnum); u16_u8(st, w); - return add_word(st.c_str(), st.size(), wcl, flags2, flagslen + 1, dp, true); + return add_word(st, wcl, flags2, flagslen + 1, dp, true); } else { std::string new_word(word); mkallsmall(new_word, csconv); mkinitcap(new_word, csconv); - int ret = add_word(new_word.c_str(), new_word.size(), wcl, flags2, flagslen + 1, dp, true); + int ret = add_word(new_word, wcl, flags2, flagslen + 1, dp, true); return ret; } } @@ -372,12 +360,11 @@ int HashMgr::add_hidden_capitalized_word(const std::string& word, } // detect captype and modify word length for UTF-8 encoding -int HashMgr::get_clen_and_captype(const std::string& word, int* captype) { +int HashMgr::get_clen_and_captype(const std::string& word, int* captype, std::vector<w_char> &workbuf) { int len; if (utf8) { - std::vector<w_char> dest_utf; - len = u8_u16(dest_utf, word); - *captype = get_captype_utf8(dest_utf, langnum); + len = u8_u16(workbuf, word); + *captype = get_captype_utf8(workbuf, langnum); } else { len = word.size(); *captype = get_captype(word, csconv); @@ -385,9 +372,14 @@ int HashMgr::get_clen_and_captype(const std::string& word, int* captype) { return len; } +int HashMgr::get_clen_and_captype(const std::string& word, int* captype) { + std::vector<w_char> workbuf; + return get_clen_and_captype(word, captype, workbuf); +} + // remove word (personal dictionary function for standalone applications) -int HashMgr::remove(const char* word) { - struct hentry* dp = lookup(word); +int HashMgr::remove(const std::string& word) { + struct hentry* dp = lookup(word.c_str()); while (dp) { if (dp->alen == 0 || !TESTAFF(dp->astr, forbiddenword, dp->alen)) { unsigned short* flags = @@ -397,6 +389,7 @@ int HashMgr::remove(const char* word) { for (int i = 0; i < dp->alen; i++) flags[i] = dp->astr[i]; flags[dp->alen] = forbiddenword; + free(dp->astr); dp->astr = flags; dp->alen++; std::sort(flags, flags + dp->alen); @@ -426,6 +419,7 @@ int HashMgr::remove_forbidden_flag(const std::string& word) { flags2[j++] = dp->astr[i]; } dp->alen--; + free(dp->astr); dp->astr = flags2; // XXX allowed forbidden words } } @@ -436,36 +430,34 @@ int HashMgr::remove_forbidden_flag(const std::string& word) { // add a custom dic. word to the hash table (public) int HashMgr::add(const std::string& word) { - unsigned short* flags = NULL; - int al = 0; if (remove_forbidden_flag(word)) { int captype; - int wbl = word.size(); + int al = 0; + unsigned short* flags = NULL; int wcl = get_clen_and_captype(word, &captype); - add_word(word.c_str(), wbl, wcl, flags, al, NULL, false); + add_word(word, wcl, flags, al, NULL, false); return add_hidden_capitalized_word(word, wcl, flags, al, NULL, captype); } return 0; } -int HashMgr::add_with_affix(const char* word, const char* example) { +int HashMgr::add_with_affix(const std::string& word, const std::string& example) { // detect captype and modify word length for UTF-8 encoding - struct hentry* dp = lookup(example); + struct hentry* dp = lookup(example.c_str()); remove_forbidden_flag(word); if (dp && dp->astr) { int captype; - int wbl = strlen(word); int wcl = get_clen_and_captype(word, &captype); if (aliasf) { - add_word(word, wbl, wcl, dp->astr, dp->alen, NULL, false); + add_word(word, wcl, dp->astr, dp->alen, NULL, false); } else { unsigned short* flags = (unsigned short*)malloc(dp->alen * sizeof(unsigned short)); if (flags) { memcpy((void*)flags, (void*)dp->astr, dp->alen * sizeof(unsigned short)); - add_word(word, wbl, wcl, flags, dp->alen, NULL, false); + add_word(word, wcl, flags, dp->alen, NULL, false); } else return 1; } @@ -491,20 +483,14 @@ struct hentry* HashMgr::walk_hashtable(int& col, struct hentry* hp) const { // load a munched word list and build a hash table on the fly int HashMgr::load_tables(const char* tpath, const char* key) { - int al; - char* ap; - char* dp; - char* dp2; - unsigned short* flags; - char* ts; - // open dictionary file FileMgr* dict = new FileMgr(tpath, key); if (dict == NULL) return 1; // first read the first line of file to get hash table size */ - if ((ts = dict->getline()) == NULL) { + std::string ts; + if (!dict->getline(ts)) { HUNSPELL_WARNING(stderr, "error: empty dic file %s\n", tpath); delete dict; return 2; @@ -512,13 +498,11 @@ int HashMgr::load_tables(const char* tpath, const char* key) { mychomp(ts); /* remove byte order mark */ - if (strncmp(ts, "\xEF\xBB\xBF", 3) == 0) { - memmove(ts, ts + 3, strlen(ts + 3) + 1); - // warning: dic file begins with byte order mark: possible incompatibility - // with old Hunspell versions + if (ts.compare(0, 3, "\xEF\xBB\xBF", 3) == 0) { + ts.erase(0, 3); } - tablesize = atoi(ts); + tablesize = atoi(ts.c_str()); int nExtra = 5 + USERWORD; @@ -544,60 +528,67 @@ int HashMgr::load_tables(const char* tpath, const char* key) { // loop through all words on much list and add to hash // table and create word and affix strings - while ((ts = dict->getline()) != NULL) { + std::vector<w_char> workbuf; + + while (dict->getline(ts)) { mychomp(ts); // split each line into word and morphological description - dp = ts; - while ((dp = strchr(dp, ':')) != NULL) { - if ((dp > ts + 3) && (*(dp - 3) == ' ' || *(dp - 3) == '\t')) { - for (dp -= 4; dp >= ts && (*dp == ' ' || *dp == '\t'); dp--) + size_t dp_pos = 0; + while ((dp_pos = ts.find(':', dp_pos)) != std::string::npos) { + if ((dp_pos > 3) && (ts[dp_pos - 3] == ' ' || ts[dp_pos - 3] == '\t')) { + for (dp_pos -= 3; dp_pos > 0 && (ts[dp_pos-1] == ' ' || ts[dp_pos-1] == '\t'); --dp_pos) ; - if (dp < ts) { // missing word - dp = NULL; + if (dp_pos == 0) { // missing word + dp_pos = std::string::npos; } else { - *(dp + 1) = '\0'; - dp = dp + 2; + ++dp_pos; } break; } - dp++; + ++dp_pos; } // tabulator is the old morphological field separator - dp2 = strchr(ts, '\t'); - if (dp2 && (!dp || dp2 < dp)) { - *dp2 = '\0'; - dp = dp2 + 1; + size_t dp2_pos = ts.find('\t'); + if (dp2_pos != std::string::npos && (dp_pos == std::string::npos || dp2_pos < dp_pos)) { + dp_pos = dp2_pos + 1; + } + + std::string dp; + if (dp_pos != std::string::npos) { + dp.assign(ts.substr(dp_pos)); + ts.resize(dp_pos - 1); } // split each line into word and affix char strings // "\/" signs slash in words (not affix separator) // "/" at beginning of the line is word character (not affix separator) - ap = strchr(ts, '/'); - while (ap) { - if (ap == ts) { - ap++; + size_t ap_pos = ts.find('/'); + while (ap_pos != std::string::npos) { + if (ap_pos == 0) { + ++ap_pos; continue; - } else if (*(ap - 1) != '\\') + } else if (ts[ap_pos - 1] != '\\') break; // replace "\/" with "/" - for (char *sp = ap - 1; *sp; *sp = *(sp + 1), sp++) - ; - ap = strchr(ap, '/'); + ts.erase(ap_pos - 1, 1); + ap_pos = ts.find('/', ap_pos); } - if (ap) { - *ap = '\0'; + unsigned short* flags; + int al; + if (ap_pos != std::string::npos && ap_pos != ts.size()) { + std::string ap(ts.substr(ap_pos + 1)); + ts.resize(ap_pos); if (aliasf) { - int index = atoi(ap + 1); + int index = atoi(ap.c_str()); al = get_aliasf(index, &flags, dict); if (!al) { HUNSPELL_WARNING(stderr, "error: line %d: bad flag vector alias\n", dict->getlinenum()); - *ap = '\0'; } } else { - al = decode_flags(&flags, ap + 1, dict); + al = decode_flags(&flags, ap.c_str(), dict); if (al == -1) { HUNSPELL_WARNING(stderr, "Can't allocate memory.\n"); delete dict; @@ -607,16 +598,15 @@ int HashMgr::load_tables(const char* tpath, const char* key) { } } else { al = 0; - ap = NULL; flags = NULL; } int captype; - int wbl = strlen(ts); - int wcl = get_clen_and_captype(ts, &captype); + int wcl = get_clen_and_captype(ts, &captype, workbuf); + const std::string *dp_str = dp.empty() ? NULL : &dp; // add the word and its index plus its capitalized form optionally - if (add_word(ts, wbl, wcl, flags, al, dp, false) || - add_hidden_capitalized_word(ts, wcl, flags, al, dp, captype)) { + if (add_word(ts, wcl, flags, al, dp_str, false) || + add_hidden_capitalized_word(ts, wcl, flags, al, dp_str, captype)) { delete dict; return 5; } @@ -639,15 +629,15 @@ int HashMgr::hash(const char* word) const { return (unsigned long)hv % tablesize; } -int HashMgr::decode_flags(unsigned short** result, char* flags, FileMgr* af) { +int HashMgr::decode_flags(unsigned short** result, const std::string& flags, FileMgr* af) const { int len; - if (*flags == '\0') { + if (flags.empty()) { *result = NULL; return 0; } switch (flag_mode) { case FLAG_LONG: { // two-character flags (1x2yZz -> 1x 2y Zz) - len = strlen(flags); + len = flags.size(); if (len % 2 == 1) HUNSPELL_WARNING(stderr, "error: line %d: bad flagvector\n", af->getlinenum()); @@ -656,29 +646,27 @@ int HashMgr::decode_flags(unsigned short** result, char* flags, FileMgr* af) { if (!*result) return -1; for (int i = 0; i < len; i++) { - (*result)[i] = (((unsigned short)flags[i * 2]) << 8) + - (unsigned short)flags[i * 2 + 1]; + (*result)[i] = ((unsigned short)((unsigned char)flags[i * 2]) << 8) + + (unsigned char)flags[i * 2 + 1]; } break; } case FLAG_NUM: { // decimal numbers separated by comma (4521,23,233 -> 4521 // 23 233) - int i; len = 1; - char* src = flags; unsigned short* dest; - char* p; - for (p = flags; *p; p++) { - if (*p == ',') + for (size_t i = 0; i < flags.size(); ++i) { + if (flags[i] == ',') len++; } *result = (unsigned short*)malloc(len * sizeof(unsigned short)); if (!*result) return -1; dest = *result; - for (p = flags; *p; p++) { + const char* src = flags.c_str(); + for (const char* p = src; *p; p++) { if (*p == ',') { - i = atoi(src); + int i = atoi(src); if (i >= DEFAULTFLAGS) HUNSPELL_WARNING( stderr, "error: line %d: flag id %d is too large (max: %d)\n", @@ -691,7 +679,7 @@ int HashMgr::decode_flags(unsigned short** result, char* flags, FileMgr* af) { dest++; } } - i = atoi(src); + int i = atoi(src); if (i >= DEFAULTFLAGS) HUNSPELL_WARNING(stderr, "error: line %d: flag id %d is too large (max: %d)\n", @@ -714,13 +702,13 @@ int HashMgr::decode_flags(unsigned short** result, char* flags, FileMgr* af) { } default: { // Ispell's one-character flags (erfg -> e r f g) unsigned short* dest; - len = strlen(flags); + len = flags.size(); *result = (unsigned short*)malloc(len * sizeof(unsigned short)); if (!*result) return -1; dest = *result; - for (unsigned char* p = (unsigned char*)flags; *p; p++) { - *dest = (unsigned short)*p; + for (size_t i = 0; i < flags.size(); ++i) { + *dest = (unsigned char)flags[i]; dest++; } } @@ -728,12 +716,77 @@ int HashMgr::decode_flags(unsigned short** result, char* flags, FileMgr* af) { return len; } -unsigned short HashMgr::decode_flag(const char* f) { +bool HashMgr::decode_flags(std::vector<unsigned short>& result, const std::string& flags, FileMgr* af) const { + if (flags.empty()) { + return false; + } + switch (flag_mode) { + case FLAG_LONG: { // two-character flags (1x2yZz -> 1x 2y Zz) + size_t len = flags.size(); + if (len % 2 == 1) + HUNSPELL_WARNING(stderr, "error: line %d: bad flagvector\n", + af->getlinenum()); + len /= 2; + result.reserve(result.size() + len); + for (size_t i = 0; i < len; ++i) { + result.push_back(((unsigned short)((unsigned char)flags[i * 2]) << 8) + + (unsigned char)flags[i * 2 + 1]); + } + break; + } + case FLAG_NUM: { // decimal numbers separated by comma (4521,23,233 -> 4521 + // 23 233) + const char* src = flags.c_str(); + for (const char* p = src; *p; p++) { + if (*p == ',') { + int i = atoi(src); + if (i >= DEFAULTFLAGS) + HUNSPELL_WARNING( + stderr, "error: line %d: flag id %d is too large (max: %d)\n", + af->getlinenum(), i, DEFAULTFLAGS - 1); + result.push_back((unsigned short)i); + if (result.back() == 0) + HUNSPELL_WARNING(stderr, "error: line %d: 0 is wrong flag id\n", + af->getlinenum()); + src = p + 1; + } + } + int i = atoi(src); + if (i >= DEFAULTFLAGS) + HUNSPELL_WARNING(stderr, + "error: line %d: flag id %d is too large (max: %d)\n", + af->getlinenum(), i, DEFAULTFLAGS - 1); + result.push_back((unsigned short)i); + if (result.back() == 0) + HUNSPELL_WARNING(stderr, "error: line %d: 0 is wrong flag id\n", + af->getlinenum()); + break; + } + case FLAG_UNI: { // UTF-8 characters + std::vector<w_char> w; + u8_u16(w, flags); + size_t len = w.size(); + size_t origsize = result.size(); + result.resize(origsize + len); + memcpy(&result[origsize], &w[0], len * sizeof(short)); + break; + } + default: { // Ispell's one-character flags (erfg -> e r f g) + result.reserve(flags.size()); + for (size_t i = 0; i < flags.size(); ++i) { + result.push_back((unsigned char)flags[i]); + } + } + } + return true; +} + +unsigned short HashMgr::decode_flag(const char* f) const { unsigned short s = 0; int i; switch (flag_mode) { case FLAG_LONG: - s = ((unsigned short)f[0] << 8) + (unsigned short)f[1]; + s = ((unsigned short)((unsigned char)f[0]) << 8) + (unsigned char)f[1]; break; case FLAG_NUM: i = atoi(f); @@ -750,14 +803,14 @@ unsigned short HashMgr::decode_flag(const char* f) { break; } default: - s = (unsigned short)*((unsigned char*)f); + s = *(unsigned char*)f; } if (s == 0) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n"); return s; } -char* HashMgr::encode_flag(unsigned short f) { +char* HashMgr::encode_flag(unsigned short f) const { if (f == 0) return mystrdup("(NULL)"); std::string ch; @@ -780,7 +833,6 @@ char* HashMgr::encode_flag(unsigned short f) { // read in aff file and set flag mode int HashMgr::load_config(const char* affpath, const char* key) { - char* line; // io buffers int firstline = 1; // open the affix file @@ -794,29 +846,31 @@ int HashMgr::load_config(const char* affpath, const char* key) { // read in each line ignoring any that do not // start with a known line type indicator - while ((line = afflst->getline()) != NULL) { + std::string line; + while (afflst->getline(line)) { mychomp(line); /* remove byte order mark */ if (firstline) { firstline = 0; - if (strncmp(line, "\xEF\xBB\xBF", 3) == 0) - memmove(line, line + 3, strlen(line + 3) + 1); + if (line.compare(0, 3, "\xEF\xBB\xBF", 3) == 0) { + line.erase(0, 3); + } } /* parse in the try string */ - if ((strncmp(line, "FLAG", 4) == 0) && isspace(line[4])) { + if ((line.compare(0, 4, "FLAG", 4) == 0) && line.size() > 4 && isspace(line[4])) { if (flag_mode != FLAG_CHAR) { HUNSPELL_WARNING(stderr, "error: line %d: multiple definitions of the FLAG " "affix file parameter\n", afflst->getlinenum()); } - if (strstr(line, "long")) + if (line.find("long") != std::string::npos) flag_mode = FLAG_LONG; - if (strstr(line, "num")) + if (line.find("num") != std::string::npos) flag_mode = FLAG_NUM; - if (strstr(line, "UTF-8")) + if (line.find("UTF-8") != std::string::npos) flag_mode = FLAG_UNI; if (flag_mode == FLAG_CHAR) { HUNSPELL_WARNING( @@ -825,21 +879,22 @@ int HashMgr::load_config(const char* affpath, const char* key) { afflst->getlinenum()); } } - if (strncmp(line, "FORBIDDENWORD", 13) == 0) { - char* st = NULL; - if (parse_string(line, &st, afflst->getlinenum())) { + + if (line.compare(0, 13, "FORBIDDENWORD", 13) == 0) { + std::string st; + if (!parse_string(line, st, afflst->getlinenum())) { delete afflst; return 1; } - forbiddenword = decode_flag(st); - free(st); + forbiddenword = decode_flag(st.c_str()); } - if (strncmp(line, "SET", 3) == 0) { - if (parse_string(line, &enc, afflst->getlinenum())) { + + if (line.compare(0, 3, "SET", 3) == 0) { + if (!parse_string(line, enc, afflst->getlinenum())) { delete afflst; return 1; } - if (strcmp(enc, "UTF-8") == 0) { + if (enc == "UTF-8") { utf8 = 1; #ifndef OPENOFFICEORG #ifndef MOZILLA_CLIENT @@ -849,8 +904,9 @@ int HashMgr::load_config(const char* affpath, const char* key) { } else csconv = get_current_cs(enc); } - if (strncmp(line, "LANG", 4) == 0) { - if (parse_string(line, &lang, afflst->getlinenum())) { + + if (line.compare(0, 4, "LANG", 4) == 0) { + if (!parse_string(line, lang, afflst->getlinenum())) { delete afflst; return 1; } @@ -859,34 +915,36 @@ int HashMgr::load_config(const char* affpath, const char* key) { /* parse in the ignored characters (for example, Arabic optional diacritics * characters */ - if (strncmp(line, "IGNORE", 6) == 0) { - if (!parse_array(line, &ignorechars, ignorechars_utf16, + if (line.compare(0, 6, "IGNORE", 6) == 0) { + if (!parse_array(line, ignorechars, ignorechars_utf16, utf8, afflst->getlinenum())) { delete afflst; return 1; } } - if ((strncmp(line, "AF", 2) == 0) && isspace(line[2])) { - if (parse_aliasf(line, afflst)) { + if ((line.compare(0, 2, "AF", 2) == 0) && line.size() > 2 && isspace(line[2])) { + if (!parse_aliasf(line, afflst)) { delete afflst; return 1; } } - if ((strncmp(line, "AM", 2) == 0) && isspace(line[2])) { - if (parse_aliasm(line, afflst)) { + if ((line.compare(0, 2, "AM", 2) == 0) && line.size() > 2 && isspace(line[2])) { + if (!parse_aliasm(line, afflst)) { delete afflst; return 1; } } - if (strncmp(line, "COMPLEXPREFIXES", 15) == 0) + if (line.compare(0, 15, "COMPLEXPREFIXES", 15) == 0) complexprefixes = 1; - if (((strncmp(line, "SFX", 3) == 0) || (strncmp(line, "PFX", 3) == 0)) && - isspace(line[3])) + + if (((line.compare(0, 3, "SFX", 3) == 0) || + (line.compare(0, 3, "PFX", 3) == 0)) && line.size() > 3 && isspace(line[3])) break; } + if (csconv == NULL) csconv = get_current_cs(SPELL_ENCODING); delete afflst; @@ -894,57 +952,54 @@ int HashMgr::load_config(const char* affpath, const char* key) { } /* parse in the ALIAS table */ -int HashMgr::parse_aliasf(char* line, FileMgr* af) { +bool HashMgr::parse_aliasf(const std::string& line, FileMgr* af) { if (numaliasf != 0) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + numaliasf = atoi(std::string(start_piece, iter).c_str()); + if (numaliasf < 1) { + numaliasf = 0; + aliasf = NULL; + aliasflen = NULL; + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + return false; } - case 1: { - numaliasf = atoi(piece); - if (numaliasf < 1) { - numaliasf = 0; - aliasf = NULL; - aliasflen = NULL; - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - return 1; - } - aliasf = - (unsigned short**)malloc(numaliasf * sizeof(unsigned short*)); - aliasflen = - (unsigned short*)malloc(numaliasf * sizeof(unsigned short)); - if (!aliasf || !aliasflen) { - numaliasf = 0; - if (aliasf) - free(aliasf); - if (aliasflen) - free(aliasflen); - aliasf = NULL; - aliasflen = NULL; - return 1; - } - np++; - break; + aliasf = + (unsigned short**)malloc(numaliasf * sizeof(unsigned short*)); + aliasflen = + (unsigned short*)malloc(numaliasf * sizeof(unsigned short)); + if (!aliasf || !aliasflen) { + numaliasf = 0; + if (aliasf) + free(aliasf); + if (aliasflen) + free(aliasflen); + aliasf = NULL; + aliasflen = NULL; + return false; } - default: - break; + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { numaliasf = 0; @@ -954,48 +1009,47 @@ int HashMgr::parse_aliasf(char* line, FileMgr* af) { aliasflen = NULL; HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the numaliasf lines to read in the remainder of the table */ - char* nl; for (int j = 0; j < numaliasf; j++) { - if ((nl = af->getline()) == NULL) - return 1; + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; i = 0; aliasf[j] = NULL; aliasflen[j] = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - if (strncmp(piece, "AF", 2) != 0) { - numaliasf = 0; - free(aliasf); - free(aliasflen); - aliasf = NULL; - aliasflen = NULL; - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - return 1; - } - break; - } - case 1: { - aliasflen[j] = - (unsigned short)decode_flags(&(aliasf[j]), piece, af); - std::sort(aliasf[j], aliasf[j] + aliasflen[j]); - break; + iter = nl.begin(); + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + case 0: { + if (nl.compare(start_piece - nl.begin(), 2, "AF", 2) != 0) { + numaliasf = 0; + free(aliasf); + free(aliasflen); + aliasf = NULL; + aliasflen = NULL; + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + return false; } - default: - break; + break; + } + case 1: { + std::string piece(start_piece, iter); + aliasflen[j] = + (unsigned short)decode_flags(&(aliasf[j]), piece, af); + std::sort(aliasf[j], aliasf[j] + aliasflen[j]); + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(nl, iter); } if (!aliasf[j]) { free(aliasf); @@ -1005,17 +1059,17 @@ int HashMgr::parse_aliasf(char* line, FileMgr* af) { numaliasf = 0; HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - return 1; + return false; } } - return 0; + return true; } -int HashMgr::is_aliasf() { +int HashMgr::is_aliasf() const { return (aliasf != NULL); } -int HashMgr::get_aliasf(int index, unsigned short** fvec, FileMgr* af) { +int HashMgr::get_aliasf(int index, unsigned short** fvec, FileMgr* af) const { if ((index > 0) && (index <= numaliasf)) { *fvec = aliasf[index - 1]; return aliasflen[index - 1]; @@ -1027,45 +1081,42 @@ int HashMgr::get_aliasf(int index, unsigned short** fvec, FileMgr* af) { } /* parse morph alias definitions */ -int HashMgr::parse_aliasm(char* line, FileMgr* af) { +bool HashMgr::parse_aliasm(const std::string& line, FileMgr* af) { if (numaliasm != 0) { HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum()); - return 1; + return false; } - char* tp = line; - char* piece; int i = 0; int np = 0; - piece = mystrsep(&tp, 0); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - np++; - break; + std::string::const_iterator iter = line.begin(); + std::string::const_iterator start_piece = mystrsep(line, iter); + while (start_piece != line.end()) { + switch (i) { + case 0: { + np++; + break; + } + case 1: { + numaliasm = atoi(std::string(start_piece, iter).c_str()); + if (numaliasm < 1) { + HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", + af->getlinenum()); + return false; } - case 1: { - numaliasm = atoi(piece); - if (numaliasm < 1) { - HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", - af->getlinenum()); - return 1; - } - aliasm = (char**)malloc(numaliasm * sizeof(char*)); - if (!aliasm) { - numaliasm = 0; - return 1; - } - np++; - break; + aliasm = (char**)malloc(numaliasm * sizeof(char*)); + if (!aliasm) { + numaliasm = 0; + return false; } - default: - break; + np++; + break; } - i++; + default: + break; } - piece = mystrsep(&tp, 0); + ++i; + start_piece = mystrsep(line, iter); } if (np != 2) { numaliasm = 0; @@ -1073,55 +1124,50 @@ int HashMgr::parse_aliasm(char* line, FileMgr* af) { aliasm = NULL; HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum()); - return 1; + return false; } /* now parse the numaliasm lines to read in the remainder of the table */ - char* nl = line; for (int j = 0; j < numaliasm; j++) { - if ((nl = af->getline()) == NULL) - return 1; + std::string nl; + if (!af->getline(nl)) + return false; mychomp(nl); - tp = nl; - i = 0; aliasm[j] = NULL; - piece = mystrsep(&tp, ' '); - while (piece) { - if (*piece != '\0') { - switch (i) { - case 0: { - if (strncmp(piece, "AM", 2) != 0) { - HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", - af->getlinenum()); - numaliasm = 0; - free(aliasm); - aliasm = NULL; - return 1; - } - break; + iter = nl.begin(); + i = 0; + start_piece = mystrsep(nl, iter); + while (start_piece != nl.end()) { + switch (i) { + case 0: { + if (nl.compare(start_piece - nl.begin(), 2, "AM", 2) != 0) { + HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", + af->getlinenum()); + numaliasm = 0; + free(aliasm); + aliasm = NULL; + return false; } - case 1: { - // add the remaining of the line - if (*tp) { - *(tp - 1) = ' '; - tp = tp + strlen(tp); - } - std::string chunk(piece); - if (complexprefixes) { - if (utf8) - reverseword_utf(chunk); - else - reverseword(chunk); - } - aliasm[j] = mystrdup(chunk.c_str()); - break; + break; + } + case 1: { + // add the remaining of the line + std::string::const_iterator end = nl.end(); + std::string chunk(start_piece, end); + if (complexprefixes) { + if (utf8) + reverseword_utf(chunk); + else + reverseword(chunk); } - default: - break; + aliasm[j] = mystrdup(chunk.c_str()); + break; } - i++; + default: + break; } - piece = mystrsep(&tp, ' '); + ++i; + start_piece = mystrsep(nl, iter); } if (!aliasm[j]) { numaliasm = 0; @@ -1129,17 +1175,17 @@ int HashMgr::parse_aliasm(char* line, FileMgr* af) { aliasm = NULL; HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum()); - return 1; + return false; } } - return 0; + return true; } -int HashMgr::is_aliasm() { +int HashMgr::is_aliasm() const { return (aliasm != NULL); } -char* HashMgr::get_aliasm(int index) { +char* HashMgr::get_aliasm(int index) const { if ((index > 0) && (index <= numaliasm)) return aliasm[index - 1]; HUNSPELL_WARNING(stderr, "error: bad morph. alias index: %d\n", index); diff --git a/libs/hunspell/src/hashmgr.hxx b/libs/hunspell/src/hashmgr.hxx index 95b06b13f9..da485d7afa 100644 --- a/libs/hunspell/src/hashmgr.hxx +++ b/libs/hunspell/src/hashmgr.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -71,10 +68,8 @@ * SUCH DAMAGE. */ -#ifndef _HASHMGR_HXX_ -#define _HASHMGR_HXX_ - -#include "hunvisapi.h" +#ifndef HASHMGR_HXX_ +#define HASHMGR_HXX_ #include <stdio.h> #include <string> @@ -86,7 +81,7 @@ enum flag { FLAG_CHAR, FLAG_LONG, FLAG_NUM, FLAG_UNI }; -class LIBHUNSPELL_DLL_EXPORTED HashMgr { +class HashMgr { int tablesize; struct hentry** tableptr; flag flag_mode; @@ -94,10 +89,10 @@ class LIBHUNSPELL_DLL_EXPORTED HashMgr { int utf8; unsigned short forbiddenword; int langnum; - char* enc; - char* lang; + std::string enc; + std::string lang; struct cs_info* csconv; - char* ignorechars; + std::string ignorechars; std::vector<w_char> ignorechars_utf16; int numaliasf; // flag vector `compression' with aliases unsigned short** aliasf; @@ -114,35 +109,36 @@ class LIBHUNSPELL_DLL_EXPORTED HashMgr { struct hentry* walk_hashtable(int& col, struct hentry* hp) const; int add(const std::string& word); - int add_with_affix(const char* word, const char* pattern); - int remove(const char* word); - int decode_flags(unsigned short** result, char* flags, FileMgr* af); - unsigned short decode_flag(const char* flag); - char* encode_flag(unsigned short flag); - int is_aliasf(); - int get_aliasf(int index, unsigned short** fvec, FileMgr* af); - int is_aliasm(); - char* get_aliasm(int index); + int add_with_affix(const std::string& word, const std::string& pattern); + int remove(const std::string& word); + int decode_flags(unsigned short** result, const std::string& flags, FileMgr* af) const; + bool decode_flags(std::vector<unsigned short>& result, const std::string& flags, FileMgr* af) const; + unsigned short decode_flag(const char* flag) const; + char* encode_flag(unsigned short flag) const; + int is_aliasf() const; + int get_aliasf(int index, unsigned short** fvec, FileMgr* af) const; + int is_aliasm() const; + char* get_aliasm(int index) const; private: int get_clen_and_captype(const std::string& word, int* captype); + int get_clen_and_captype(const std::string& word, int* captype, std::vector<w_char> &workbuf); int load_tables(const char* tpath, const char* key); - int add_word(const char* word, - int wbl, + int add_word(const std::string& word, int wcl, unsigned short* ap, int al, - const char* desc, + const std::string* desc, bool onlyupcase); int load_config(const char* affpath, const char* key); - int parse_aliasf(char* line, FileMgr* af); + bool parse_aliasf(const std::string& line, FileMgr* af); int add_hidden_capitalized_word(const std::string& word, int wcl, unsigned short* flags, int al, - char* dp, + const std::string* dp, int captype); - int parse_aliasm(char* line, FileMgr* af); + bool parse_aliasm(const std::string& line, FileMgr* af); int remove_forbidden_flag(const std::string& word); }; diff --git a/libs/hunspell/src/htypes.hxx b/libs/hunspell/src/htypes.hxx index d244394416..8f66a0080e 100644 --- a/libs/hunspell/src/htypes.hxx +++ b/libs/hunspell/src/htypes.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -38,8 +35,8 @@ * * ***** END LICENSE BLOCK ***** */ -#ifndef _HTYPES_HXX_ -#define _HTYPES_HXX_ +#ifndef HTYPES_HXX_ +#define HTYPES_HXX_ #define ROTATE_LEN 5 diff --git a/libs/hunspell/src/hunspell.c++ b/libs/hunspell/src/hunspell.cxx index f7c1581087..1ef11df341 100644 --- a/libs/hunspell/src/hunspell.c++ +++ b/libs/hunspell/src/hunspell.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -75,35 +72,100 @@ #include <string.h> #include <stdio.h> +#include "affixmgr.hxx" #include "hunspell.hxx" +#include "suggestmgr.hxx" #include "hunspell.h" -#ifndef MOZILLA_CLIENT -#include "config.h" -#endif #include "csutil.hxx" #include <limits> #include <string> -#define MAXWORDLEN 176 #define MAXWORDUTF8LEN (MAXWORDLEN * 3) -Hunspell::Hunspell(const char* affpath, const char* dpath, const char* key) { - encoding = NULL; +class HunspellImpl +{ +public: + HunspellImpl(const char* affpath, const char* dpath, const char* key); + ~HunspellImpl(); + int add_dic(const char* dpath, const char* key); + std::vector<std::string> suffix_suggest(const std::string& root_word); + std::vector<std::string> generate(const std::string& word, const std::vector<std::string>& pl); + std::vector<std::string> generate(const std::string& word, const std::string& pattern); + std::vector<std::string> stem(const std::string& word); + std::vector<std::string> stem(const std::vector<std::string>& morph); + std::vector<std::string> analyze(const std::string& word); + int get_langnum() const; + bool input_conv(const std::string& word, std::string& dest); + bool spell(const std::string& word, int* info = NULL, std::string* root = NULL); + std::vector<std::string> suggest(const std::string& word); + const std::string& get_wordchars() const; + const std::vector<w_char>& get_wordchars_utf16() const; + const std::string& get_dict_encoding() const; + int add(const std::string& word); + int add_with_affix(const std::string& word, const std::string& example); + int remove(const std::string& word); + const std::string& get_version() const; + struct cs_info* get_csconv(); + std::vector<char> dic_encoding_vec; + +private: + AffixMgr* pAMgr; + std::vector<HashMgr*> m_HMgrs; + SuggestMgr* pSMgr; + char* affixpath; + std::string encoding; + struct cs_info* csconv; + int langnum; + int utf8; + int complexprefixes; + std::vector<std::string> wordbreak; + +private: + void cleanword(std::string& dest, const std::string&, int* pcaptype, int* pabbrev); + size_t cleanword2(std::string& dest, + std::vector<w_char>& dest_u, + const std::string& src, + int* pcaptype, + size_t* pabbrev); + void mkinitcap(std::string& u8); + int mkinitcap2(std::string& u8, std::vector<w_char>& u16); + int mkinitsmall2(std::string& u8, std::vector<w_char>& u16); + void mkallcap(std::string& u8); + int mkallsmall2(std::string& u8, std::vector<w_char>& u16); + struct hentry* checkword(const std::string& source, int* info, std::string* root); + std::string sharps_u8_l1(const std::string& source); + hentry* + spellsharps(std::string& base, size_t start_pos, int, int, int* info, std::string* root); + int is_keepcase(const hentry* rv); + void insert_sug(std::vector<std::string>& slst, const std::string& word); + void cat_result(std::string& result, const std::string& st); + std::vector<std::string> spellml(const std::string& word); + std::string get_xml_par(const char* par); + const char* get_xml_pos(const char* s, const char* attr); + std::vector<std::string> get_xml_list(const char* list, const char* tag); + int check_xml_par(const char* q, const char* attr, const char* value); +private: + HunspellImpl(const HunspellImpl&); + HunspellImpl& operator=(const HunspellImpl&); +}; + +Hunspell::Hunspell(const char* affpath, const char* dpath, const char* key) + : m_Impl(new HunspellImpl(affpath, dpath, key)) { +} + +HunspellImpl::HunspellImpl(const char* affpath, const char* dpath, const char* key) { csconv = NULL; utf8 = 0; complexprefixes = 0; affixpath = mystrdup(affpath); - maxdic = 0; /* first set up the hash manager */ - pHMgr[0] = new HashMgr(dpath, affpath, key); - if (pHMgr[0]) - maxdic = 1; + m_HMgrs.push_back(new HashMgr(dpath, affpath, key)); /* next set up the affix manager */ /* it needs access to the hash manager lookup methods */ - pAMgr = new AffixMgr(affpath, pHMgr, &maxdic, key); + pAMgr = new AffixMgr(affpath, m_HMgrs, key); /* get the preferred try string and the dictionary */ /* encoding from the Affix Manager for that dictionary */ @@ -116,6 +178,9 @@ Hunspell::Hunspell(const char* affpath, const char* dpath, const char* key) { complexprefixes = pAMgr->get_complexprefixes(); wordbreak = pAMgr->get_breaktable(); + dic_encoding_vec.resize(encoding.size()+1); + strcpy(&dic_encoding_vec[0], encoding.c_str()); + /* and finally set up the suggestion manager */ pSMgr = new SuggestMgr(try_string, MAXSUGGESTION, pAMgr); if (try_string) @@ -123,20 +188,20 @@ Hunspell::Hunspell(const char* affpath, const char* dpath, const char* key) { } Hunspell::~Hunspell() { + delete m_Impl; +} + +HunspellImpl::~HunspellImpl() { delete pSMgr; delete pAMgr; - for (int i = 0; i < maxdic; i++) - delete pHMgr[i]; - maxdic = 0; + for (size_t i = 0; i < m_HMgrs.size(); ++i) + delete m_HMgrs[i]; pSMgr = NULL; pAMgr = NULL; #ifdef MOZILLA_CLIENT delete[] csconv; #endif csconv = NULL; - if (encoding) - free(encoding); - encoding = NULL; if (affixpath) free(affixpath); affixpath = NULL; @@ -144,13 +209,14 @@ Hunspell::~Hunspell() { // load extra dictionaries int Hunspell::add_dic(const char* dpath, const char* key) { - if (maxdic == MAXDIC || !affixpath) - return 1; - pHMgr[maxdic] = new HashMgr(dpath, affixpath, key); - if (pHMgr[maxdic]) - maxdic++; - else + return m_Impl->add_dic(dpath, key); +} + +// load extra dictionaries +int HunspellImpl::add_dic(const char* dpath, const char* key) { + if (!affixpath) return 1; + m_HMgrs.push_back(new HashMgr(dpath, affixpath, key)); return 0; } @@ -161,20 +227,19 @@ int Hunspell::add_dic(const char* dpath, const char* key) { // set the capitalization type // return the length of the "cleaned" (and UTF-8 encoded) word -size_t Hunspell::cleanword2(std::string& dest, +size_t HunspellImpl::cleanword2(std::string& dest, std::vector<w_char>& dest_utf, - const char* src, - int* nc, + const std::string& src, int* pcaptype, size_t* pabbrev) { dest.clear(); dest_utf.clear(); - const char* q = src; + const char* q = src.c_str(); // first skip over any leading blanks - while ((*q != '\0') && (*q == ' ')) - q++; + while (*q == ' ') + ++q; // now strip off any trailing periods (recording their presence) *pabbrev = 0; @@ -193,26 +258,25 @@ size_t Hunspell::cleanword2(std::string& dest, dest.append(q, nl); nl = dest.size(); if (utf8) { - *nc = u8_u16(dest_utf, dest); + u8_u16(dest_utf, dest); *pcaptype = get_captype_utf8(dest_utf, langnum); } else { *pcaptype = get_captype(dest, csconv); - *nc = nl; } return nl; } -void Hunspell::cleanword(std::string& dest, - const char* src, +void HunspellImpl::cleanword(std::string& dest, + const std::string& src, int* pcaptype, int* pabbrev) { dest.clear(); - const unsigned char* q = (const unsigned char*)src; + const unsigned char* q = (const unsigned char*)src.c_str(); int firstcap = 0; // first skip over any leading blanks - while ((*q != '\0') && (*q == ' ')) - q++; + while (*q == ' ') + ++q; // now strip off any trailing periods (recording their presence) *pabbrev = 0; @@ -277,7 +341,7 @@ void Hunspell::cleanword(std::string& dest, } } -void Hunspell::mkallcap(std::string& u8) { +void HunspellImpl::mkallcap(std::string& u8) { if (utf8) { std::vector<w_char> u16; u8_u16(u16, u8); @@ -288,7 +352,7 @@ void Hunspell::mkallcap(std::string& u8) { } } -int Hunspell::mkallsmall2(std::string& u8, std::vector<w_char>& u16) { +int HunspellImpl::mkallsmall2(std::string& u8, std::vector<w_char>& u16) { if (utf8) { ::mkallsmall_utf(u16, langnum); u16_u8(u8, u16); @@ -299,19 +363,19 @@ int Hunspell::mkallsmall2(std::string& u8, std::vector<w_char>& u16) { } // convert UTF-8 sharp S codes to latin 1 -std::string Hunspell::sharps_u8_l1(const std::string& source) { +std::string HunspellImpl::sharps_u8_l1(const std::string& source) { std::string dest(source); mystrrep(dest, "\xC3\x9F", "\xDF"); return dest; } // recursive search for right ss - sharp s permutations -hentry* Hunspell::spellsharps(std::string& base, +hentry* HunspellImpl::spellsharps(std::string& base, size_t n_pos, int n, int repnum, int* info, - char** root) { + std::string* root) { size_t pos = base.find("ss", n_pos); if (pos != std::string::npos && (n < MAXSHARPS)) { base[pos] = '\xC3'; @@ -326,36 +390,28 @@ hentry* Hunspell::spellsharps(std::string& base, return h; } else if (repnum > 0) { if (utf8) - return checkword(base.c_str(), info, root); + return checkword(base, info, root); std::string tmp(sharps_u8_l1(base)); - return checkword(tmp.c_str(), info, root); + return checkword(tmp, info, root); } return NULL; } -int Hunspell::is_keepcase(const hentry* rv) { +int HunspellImpl::is_keepcase(const hentry* rv) { return pAMgr && rv->astr && pAMgr->get_keepcase() && TESTAFF(rv->astr, pAMgr->get_keepcase(), rv->alen); } -/* insert a word to the beginning of the suggestion array and return ns */ -int Hunspell::insert_sug(char*** slst, const char* word, int ns) { - if (!*slst) - return ns; - char* dup = mystrdup(word); - if (!dup) - return ns; - if (ns == MAXSUGGESTION) { - ns--; - free((*slst)[ns]); - } - for (int k = ns; k > 0; k--) - (*slst)[k] = (*slst)[k - 1]; - (*slst)[0] = dup; - return ns + 1; +/* insert a word to the beginning of the suggestion array */ +void HunspellImpl::insert_sug(std::vector<std::string>& slst, const std::string& word) { + slst.insert(slst.begin(), word); } -int Hunspell::spell(const char* word, int* info, char** root) { +bool Hunspell::spell(const std::string& word, int* info, std::string* root) { + return m_Impl->spell(word, info, root); +} + +bool HunspellImpl::spell(const std::string& word, int* info, std::string* root) { struct hentry* rv = NULL; int info2 = 0; @@ -365,15 +421,14 @@ int Hunspell::spell(const char* word, int* info, char** root) { *info = 0; // Hunspell supports XML input of the simplified API (see manual) - if (strcmp(word, SPELL_XML) == 0) - return 1; - int nc = strlen(word); + if (word == SPELL_XML) + return true; if (utf8) { - if (nc >= MAXWORDUTF8LEN) - return 0; + if (word.size() >= MAXWORDUTF8LEN) + return false; } else { - if (nc >= MAXWORDLEN) - return 0; + if (word.size() >= MAXWORDLEN) + return false; } int captype = NOCAP; size_t abbv = 0; @@ -383,17 +438,15 @@ int Hunspell::spell(const char* word, int* info, char** root) { std::vector<w_char> sunicw; // input conversion - RepList* rl = (pAMgr) ? pAMgr->get_iconvtable() : NULL; + RepList* rl = pAMgr ? pAMgr->get_iconvtable() : NULL; { std::string wspace; - int convstatus = rl ? rl->conv(word, wspace) : 0; - if (convstatus < 0) - return 0; - else if (convstatus > 0) - wl = cleanword2(scw, sunicw, wspace.c_str(), &nc, &captype, &abbv); + bool convstatus = rl ? rl->conv(word, wspace) : false; + if (convstatus) + wl = cleanword2(scw, sunicw, wspace, &captype, &abbv); else - wl = cleanword2(scw, sunicw, word, &nc, &captype, &abbv); + wl = cleanword2(scw, sunicw, word, &captype, &abbv); } #ifdef MOZILLA_CLIENT @@ -402,10 +455,10 @@ int Hunspell::spell(const char* word, int* info, char** root) { abbv = 1; #endif - if (wl == 0 || maxdic == 0) - return 1; + if (wl == 0 || m_HMgrs.empty()) + return true; if (root) - *root = NULL; + root->clear(); // allow numbers with dots, dashes and commas (but forbid double separators: // "..", "--" etc.) @@ -424,7 +477,7 @@ int Hunspell::spell(const char* word, int* info, char** root) { break; } if ((i == wl) && (nstate == NNUM)) - return 1; + return true; switch (captype) { case HUHCAP: @@ -433,22 +486,22 @@ int Hunspell::spell(const char* word, int* info, char** root) { *info += SPELL_ORIGCAP; /* FALLTHROUGH */ case NOCAP: - rv = checkword(scw.c_str(), info, root); + rv = checkword(scw, info, root); if ((abbv) && !(rv)) { std::string u8buffer(scw); u8buffer.push_back('.'); - rv = checkword(u8buffer.c_str(), info, root); + rv = checkword(u8buffer, info, root); } break; case ALLCAP: { *info += SPELL_ORIGCAP; - rv = checkword(scw.c_str(), info, root); + rv = checkword(scw, info, root); if (rv) break; if (abbv) { std::string u8buffer(scw); u8buffer.push_back('.'); - rv = checkword(u8buffer.c_str(), info, root); + rv = checkword(u8buffer, info, root); if (rv) break; } @@ -470,18 +523,18 @@ int Hunspell::spell(const char* word, int* info, char** root) { scw = part1 + part2; sunicw = part1u; sunicw.insert(sunicw.end(), part2u.begin(), part2u.end()); - rv = checkword(scw.c_str(), info, root); + rv = checkword(scw, info, root); if (rv) break; } else { mkinitcap2(part2, sunicw); scw = part1 + part2; - rv = checkword(scw.c_str(), info, root); + rv = checkword(scw, info, root); if (rv) break; } mkinitcap2(scw, sunicw); - rv = checkword(scw.c_str(), info, root); + rv = checkword(scw, info, root); if (rv) break; } @@ -516,7 +569,7 @@ int Hunspell::spell(const char* word, int* info, char** root) { mkinitcap2(scw, sunicw); if (captype == INITCAP) *info += SPELL_INITCAP; - rv = checkword(scw.c_str(), info, root); + rv = checkword(scw, info, root); if (captype == INITCAP) *info -= SPELL_INITCAP; // forbid bad capitalization @@ -531,16 +584,16 @@ int Hunspell::spell(const char* word, int* info, char** root) { if (rv) break; - rv = checkword(u8buffer.c_str(), info, root); + rv = checkword(u8buffer, info, root); if (abbv && !rv) { u8buffer.push_back('.'); - rv = checkword(u8buffer.c_str(), info, root); + rv = checkword(u8buffer, info, root); if (!rv) { u8buffer = scw; u8buffer.push_back('.'); if (captype == INITCAP) *info += SPELL_INITCAP; - rv = checkword(u8buffer.c_str(), info, root); + rv = checkword(u8buffer, info, root); if (captype == INITCAP) *info -= SPELL_INITCAP; if (rv && is_keepcase(rv) && (captype == ALLCAP)) @@ -565,89 +618,86 @@ int Hunspell::spell(const char* word, int* info, char** root) { TESTAFF(rv->astr, pAMgr->get_warn(), rv->alen)) { *info += SPELL_WARN; if (pAMgr->get_forbidwarn()) - return 0; - return HUNSPELL_OK_WARN; + return false; + return true; } - return HUNSPELL_OK; + return true; } // recursive breaking at break points - if (wordbreak) { + if (!wordbreak.empty()) { int nbr = 0; wl = scw.size(); - int numbreak = pAMgr ? pAMgr->get_numbreak() : 0; // calculate break points for recursion limit - for (int j = 0; j < numbreak; j++) { - size_t len = strlen(wordbreak[j]); + for (size_t j = 0; j < wordbreak.size(); ++j) { size_t pos = 0; - while ((pos = scw.find(wordbreak[j], pos, len)) != std::string::npos) { + while ((pos = scw.find(wordbreak[j], pos)) != std::string::npos) { ++nbr; - pos += len; + pos += wordbreak[j].size(); } } if (nbr >= 10) - return 0; + return false; // check boundary patterns (^begin and end$) - for (int j = 0; j < numbreak; j++) { - size_t plen = strlen(wordbreak[j]); + for (size_t j = 0; j < wordbreak.size(); ++j) { + size_t plen = wordbreak[j].size(); if (plen == 1 || plen > wl) continue; if (wordbreak[j][0] == '^' && - scw.compare(0, plen - 1, wordbreak[j] + 1, plen -1) == 0 && spell(scw.c_str() + plen - 1)) - return 1; + scw.compare(0, plen - 1, wordbreak[j], 1, plen -1) == 0 && spell(scw.substr(plen - 1))) + return true; if (wordbreak[j][plen - 1] == '$' && - scw.compare(wl - plen + 1, plen - 1, wordbreak[j], plen - 1) == 0) { - char r = scw[wl - plen + 1]; - scw[wl - plen + 1] = '\0'; - if (spell(scw.c_str())) - return 1; - scw[wl - plen + 1] = r; + scw.compare(wl - plen + 1, plen - 1, wordbreak[j], 0, plen - 1) == 0) { + std::string suffix(scw.substr(wl - plen + 1)); + scw.resize(wl - plen + 1); + if (spell(scw)) + return true; + scw.append(suffix); } } // other patterns - for (int j = 0; j < numbreak; j++) { - size_t plen = strlen(wordbreak[j]); + for (size_t j = 0; j < wordbreak.size(); ++j) { + size_t plen = wordbreak[j].size(); size_t found = scw.find(wordbreak[j]); if ((found > 0) && (found < wl - plen)) { - if (!spell(scw.c_str() + found + plen)) + if (!spell(scw.substr(found + plen))) continue; - char r = scw[found]; - scw[found] = '\0'; + std::string suffix(scw.substr(found)); + scw.resize(found); // examine 2 sides of the break point - if (spell(scw.c_str())) - return 1; - scw[found] = r; + if (spell(scw)) + return true; + scw.append(suffix); // LANG_hu: spec. dash rule - if (langnum == LANG_hu && strcmp(wordbreak[j], "-") == 0) { - r = scw[found + 1]; - scw[found + 1] = '\0'; - if (spell(scw.c_str())) - return 1; // check the first part with dash - scw[found + 1] = r; + if (langnum == LANG_hu && wordbreak[j] == "-") { + suffix = scw.substr(found + 1); + scw.resize(found + 1); + if (spell(scw)) + return true; // check the first part with dash + scw.append(suffix); } // end of LANG specific region } } } - return 0; + return false; } -struct hentry* Hunspell::checkword(const char* w, int* info, char** root) { - struct hentry* he = NULL; +struct hentry* HunspellImpl::checkword(const std::string& w, int* info, std::string* root) { bool usebuffer = false; - int len, i; std::string w2; const char* word; + int len; - char* ignoredchars = pAMgr ? pAMgr->get_ignore() : NULL; + const char* ignoredchars = pAMgr ? pAMgr->get_ignore() : NULL; if (ignoredchars != NULL) { w2.assign(w); if (utf8) { @@ -658,11 +708,12 @@ struct hentry* Hunspell::checkword(const char* w, int* info, char** root) { remove_ignored_chars(w2, ignoredchars); } word = w2.c_str(); + len = w2.size(); usebuffer = true; - } else - word = w; - - len = strlen(word); + } else { + word = w.c_str(); + len = w.size(); + } if (!len) return NULL; @@ -684,8 +735,9 @@ struct hentry* Hunspell::checkword(const char* w, int* info, char** root) { } // look word in hash table - for (i = 0; (i < maxdic) && !he; i++) { - he = (pHMgr[i])->lookup(word); + struct hentry* he = NULL; + for (size_t i = 0; (i < m_HMgrs.size()) && !he; ++i) { + he = m_HMgrs[i]->lookup(word); // check forbidden and onlyincompound words if ((he) && (he->astr) && (pAMgr) && @@ -736,40 +788,33 @@ struct hentry* Hunspell::checkword(const char* w, int* info, char** root) { return NULL; } if (root) { - std::string word_root(he->word); + root->assign(he->word); if (complexprefixes) { if (utf8) - reverseword_utf(word_root); + reverseword_utf(*root); else - reverseword(word_root); + reverseword(*root); } - *root = mystrdup(word_root.c_str()); } // try check compound word } else if (pAMgr->get_compound()) { struct hentry* rwords[100]; // buffer for COMPOUND pattern checking - he = pAMgr->compound_check(word, len, 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, 0, info); + he = pAMgr->compound_check(word, 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, 0, info); // LANG_hu section: `moving rule' with last dash if ((!he) && (langnum == LANG_hu) && (word[len - 1] == '-')) { - char* dup = mystrdup(word); - if (!dup) - return NULL; - dup[len - 1] = '\0'; - he = pAMgr->compound_check(dup, len - 1, -5, 0, 100, 0, NULL, (hentry**)&rwords, 1, 0, - info); - free(dup); + std::string dup(word, len - 1); + he = pAMgr->compound_check(dup, -5, 0, 100, 0, NULL, (hentry**)&rwords, 1, 0, info); } // end of LANG specific region if (he) { if (root) { - std::string word_root(he->word); + root->assign(he->word); if (complexprefixes) { if (utf8) - reverseword_utf(word_root); + reverseword_utf(*root); else - reverseword(word_root); + reverseword(*root); } - *root = mystrdup(word_root.c_str()); } if (info) *info += SPELL_COMPOUND; @@ -780,22 +825,27 @@ struct hentry* Hunspell::checkword(const char* w, int* info, char** root) { return he; } -int Hunspell::suggest(char*** slst, const char* word) { +std::vector<std::string> Hunspell::suggest(const std::string& word) { + return m_Impl->suggest(word); +} + +std::vector<std::string> HunspellImpl::suggest(const std::string& word) { + std::vector<std::string> slst; + int onlycmpdsug = 0; - if (!pSMgr || maxdic == 0) - return 0; - *slst = NULL; + if (!pSMgr || m_HMgrs.empty()) + return slst; + // process XML input of the simplified API (see manual) - if (strncmp(word, SPELL_XML, sizeof(SPELL_XML) - 3) == 0) { - return spellml(slst, word); + if (word.compare(0, sizeof(SPELL_XML) - 3, SPELL_XML, sizeof(SPELL_XML) - 3) == 0) { + return spellml(word); } - int nc = strlen(word); if (utf8) { - if (nc >= MAXWORDUTF8LEN) - return 0; + if (word.size() >= MAXWORDUTF8LEN) + return slst; } else { - if (nc >= MAXWORDLEN) - return 0; + if (word.size() >= MAXWORDLEN) + return slst; } int captype = NOCAP; size_t abbv = 0; @@ -809,121 +859,102 @@ int Hunspell::suggest(char*** slst, const char* word) { { std::string wspace; - int convstatus = rl ? rl->conv(word, wspace) : 0; - if (convstatus < 0) - return 0; - else if (convstatus > 0) - wl = cleanword2(scw, sunicw, wspace.c_str(), &nc, &captype, &abbv); + bool convstatus = rl ? rl->conv(word, wspace) : false; + if (convstatus) + wl = cleanword2(scw, sunicw, wspace, &captype, &abbv); else - wl = cleanword2(scw, sunicw, word, &nc, &captype, &abbv); + wl = cleanword2(scw, sunicw, word, &captype, &abbv); if (wl == 0) - return 0; + return slst; } - int ns = 0; int capwords = 0; // check capitalized form for FORCEUCASE if (pAMgr && captype == NOCAP && pAMgr->get_forceucase()) { int info = SPELL_ORIGCAP; - if (checkword(scw.c_str(), &info, NULL)) { + if (checkword(scw, &info, NULL)) { std::string form(scw); mkinitcap(form); - - char** wlst = (char**)malloc(MAXSUGGESTION * sizeof(char*)); - if (wlst == NULL) - return -1; - *slst = wlst; - wlst[0] = mystrdup(form.c_str()); - for (int i = 1; i < MAXSUGGESTION; ++i) { - wlst[i] = NULL; - } - - return 1; + slst.push_back(form); + return slst; } } switch (captype) { case NOCAP: { - ns = pSMgr->suggest(slst, scw.c_str(), ns, &onlycmpdsug); + pSMgr->suggest(slst, scw.c_str(), &onlycmpdsug); break; } case INITCAP: { capwords = 1; - ns = pSMgr->suggest(slst, scw.c_str(), ns, &onlycmpdsug); - if (ns == -1) - break; + pSMgr->suggest(slst, scw.c_str(), &onlycmpdsug); std::string wspace(scw); mkallsmall2(wspace, sunicw); - ns = pSMgr->suggest(slst, wspace.c_str(), ns, &onlycmpdsug); + pSMgr->suggest(slst, wspace.c_str(), &onlycmpdsug); break; } case HUHINITCAP: capwords = 1; case HUHCAP: { - ns = pSMgr->suggest(slst, scw.c_str(), ns, &onlycmpdsug); - if (ns != -1) { - // something.The -> something. The - size_t dot_pos = scw.find('.'); - if (dot_pos != std::string::npos) { - std::string postdot = scw.substr(dot_pos + 1); - int captype_; - if (utf8) { - std::vector<w_char> postdotu; - u8_u16(postdotu, postdot); - captype_ = get_captype_utf8(postdotu, langnum); - } else { - captype_ = get_captype(postdot, csconv); - } - if (captype_ == INITCAP) { - std::string str(scw); - str.insert(dot_pos + 1, 1, ' '); - ns = insert_sug(slst, str.c_str(), ns); - } + pSMgr->suggest(slst, scw.c_str(), &onlycmpdsug); + // something.The -> something. The + size_t dot_pos = scw.find('.'); + if (dot_pos != std::string::npos) { + std::string postdot = scw.substr(dot_pos + 1); + int captype_; + if (utf8) { + std::vector<w_char> postdotu; + u8_u16(postdotu, postdot); + captype_ = get_captype_utf8(postdotu, langnum); + } else { + captype_ = get_captype(postdot, csconv); + } + if (captype_ == INITCAP) { + std::string str(scw); + str.insert(dot_pos + 1, 1, ' '); + insert_sug(slst, str); } + } - std::string wspace; + std::string wspace; - if (captype == HUHINITCAP) { - // TheOpenOffice.org -> The OpenOffice.org - wspace = scw; - mkinitsmall2(wspace, sunicw); - ns = pSMgr->suggest(slst, wspace.c_str(), ns, &onlycmpdsug); - } + if (captype == HUHINITCAP) { + // TheOpenOffice.org -> The OpenOffice.org wspace = scw; - mkallsmall2(wspace, sunicw); + mkinitsmall2(wspace, sunicw); + pSMgr->suggest(slst, wspace.c_str(), &onlycmpdsug); + } + wspace = scw; + mkallsmall2(wspace, sunicw); + if (spell(wspace.c_str())) + insert_sug(slst, wspace); + size_t prevns = slst.size(); + pSMgr->suggest(slst, wspace.c_str(), &onlycmpdsug); + if (captype == HUHINITCAP) { + mkinitcap2(wspace, sunicw); if (spell(wspace.c_str())) - ns = insert_sug(slst, wspace.c_str(), ns); - int prevns = ns; - ns = pSMgr->suggest(slst, wspace.c_str(), ns, &onlycmpdsug); - if (captype == HUHINITCAP) { - mkinitcap2(wspace, sunicw); - if (spell(wspace.c_str())) - ns = insert_sug(slst, wspace.c_str(), ns); - ns = pSMgr->suggest(slst, wspace.c_str(), ns, &onlycmpdsug); - } - // aNew -> "a New" (instead of "a new") - for (int j = prevns; j < ns; j++) { - char* space = strchr((*slst)[j], ' '); - if (space) { - size_t slen = strlen(space + 1); - // different case after space (need capitalisation) - if ((slen < wl) && strcmp(scw.c_str() + wl - slen, space + 1)) { - std::string first((*slst)[j], space + 1); - std::string second(space + 1); - std::vector<w_char> w; - if (utf8) - u8_u16(w, second); - mkinitcap2(second, w); - // set as first suggestion - char* r = (*slst)[j]; - for (int k = j; k > 0; k--) - (*slst)[k] = (*slst)[k - 1]; - free(r); - (*slst)[0] = mystrdup((first + second).c_str()); - } + insert_sug(slst, wspace); + pSMgr->suggest(slst, wspace.c_str(), &onlycmpdsug); + } + // aNew -> "a New" (instead of "a new") + for (size_t j = prevns; j < slst.size(); ++j) { + const char* space = strchr(slst[j].c_str(), ' '); + if (space) { + size_t slen = strlen(space + 1); + // different case after space (need capitalisation) + if ((slen < wl) && strcmp(scw.c_str() + wl - slen, space + 1)) { + std::string first(slst[j].c_str(), space + 1); + std::string second(space + 1); + std::vector<w_char> w; + if (utf8) + u8_u16(w, second); + mkinitcap2(second, w); + // set as first suggestion + slst.erase(slst.begin() + j); + slst.insert(slst.begin(), first + second); } } } @@ -933,28 +964,20 @@ int Hunspell::suggest(char*** slst, const char* word) { case ALLCAP: { std::string wspace(scw); mkallsmall2(wspace, sunicw); - ns = pSMgr->suggest(slst, wspace.c_str(), ns, &onlycmpdsug); - if (ns == -1) - break; + pSMgr->suggest(slst, wspace.c_str(), &onlycmpdsug); if (pAMgr && pAMgr->get_keepcase() && spell(wspace.c_str())) - ns = insert_sug(slst, wspace.c_str(), ns); + insert_sug(slst, wspace); mkinitcap2(wspace, sunicw); - ns = pSMgr->suggest(slst, wspace.c_str(), ns, &onlycmpdsug); - for (int j = 0; j < ns; j++) { - std::string form((*slst)[j]); - mkallcap(form); - + pSMgr->suggest(slst, wspace.c_str(), &onlycmpdsug); + for (size_t j = 0; j < slst.size(); ++j) { + mkallcap(slst[j]); if (pAMgr && pAMgr->get_checksharps()) { if (utf8) { - mystrrep(form, "\xC3\x9F", "SS"); + mystrrep(slst[j], "\xC3\x9F", "SS"); } else { - mystrrep(form, "\xDF", "SS"); + mystrrep(slst[j], "\xDF", "SS"); } } - - free((*slst)[j]); - (*slst)[j] = mystrdup(form.c_str()); - } break; } @@ -962,29 +985,27 @@ int Hunspell::suggest(char*** slst, const char* word) { // LANG_hu section: replace '-' with ' ' in Hungarian if (langnum == LANG_hu) { - for (int j = 0; j < ns; j++) { - char* pos = strchr((*slst)[j], '-'); - if (pos) { + for (size_t j = 0; j < slst.size(); ++j) { + size_t pos = slst[j].find('-'); + if (pos != std::string::npos) { int info; - *pos = '\0'; - std::string w((*slst)[j]); - w.append(pos + 1); - (void)spell(w.c_str(), &info, NULL); + std::string w(slst[j].substr(0, pos)); + w.append(slst[j].substr(pos + 1)); + (void)spell(w, &info, NULL); if ((info & SPELL_COMPOUND) && (info & SPELL_FORBIDDEN)) { - *pos = ' '; + slst[j][pos] = ' '; } else - *pos = '-'; + slst[j][pos] = '-'; } } } // END OF LANG_hu section // try ngram approach since found nothing or only compound words - if (pAMgr && (ns == 0 || onlycmpdsug) && (pAMgr->get_maxngramsugs() != 0) && - (*slst)) { + if (pAMgr && (slst.empty() || onlycmpdsug) && (pAMgr->get_maxngramsugs() != 0)) { switch (captype) { case NOCAP: { - ns = pSMgr->ngsuggest(*slst, scw.c_str(), ns, pHMgr, maxdic); + pSMgr->ngsuggest(slst, scw.c_str(), m_HMgrs); break; } case HUHINITCAP: @@ -992,26 +1013,23 @@ int Hunspell::suggest(char*** slst, const char* word) { case HUHCAP: { std::string wspace(scw); mkallsmall2(wspace, sunicw); - ns = pSMgr->ngsuggest(*slst, wspace.c_str(), ns, pHMgr, maxdic); + pSMgr->ngsuggest(slst, wspace.c_str(), m_HMgrs); break; } case INITCAP: { capwords = 1; std::string wspace(scw); mkallsmall2(wspace, sunicw); - ns = pSMgr->ngsuggest(*slst, wspace.c_str(), ns, pHMgr, maxdic); + pSMgr->ngsuggest(slst, wspace.c_str(), m_HMgrs); break; } case ALLCAP: { std::string wspace(scw); mkallsmall2(wspace, sunicw); - int oldns = ns; - ns = pSMgr->ngsuggest(*slst, wspace.c_str(), ns, pHMgr, maxdic); - for (int j = oldns; j < ns; j++) { - std::string form((*slst)[j]); - mkallcap(form); - free((*slst)[j]); - (*slst)[j] = mystrdup(form.c_str()); + size_t oldns = slst.size(); + pSMgr->ngsuggest(slst, wspace.c_str(), m_HMgrs); + for (size_t j = oldns; j < slst.size(); ++j) { + mkallcap(slst[j]); } break; } @@ -1022,8 +1040,8 @@ int Hunspell::suggest(char*** slst, const char* word) { size_t dash_pos = scw.find('-'); if (dash_pos != std::string::npos) { int nodashsug = 1; - for (int j = 0; j < ns && nodashsug == 1; j++) { - if (strchr((*slst)[j], '-')) + for (size_t j = 0; j < slst.size() && nodashsug == 1; ++j) { + if (slst[j].find('-') != std::string::npos) nodashsug = 0; } @@ -1035,20 +1053,16 @@ int Hunspell::suggest(char*** slst, const char* word) { last = 1; std::string chunk = scw.substr(prev_pos, dash_pos - prev_pos); if (!spell(chunk.c_str())) { - char** nlst = NULL; - int nn = suggest(&nlst, chunk.c_str()); - for (int j = nn - 1; j >= 0; j--) { + std::vector<std::string> nlst = suggest(chunk.c_str()); + for (std::vector<std::string>::reverse_iterator j = nlst.rbegin(); j != nlst.rend(); ++j) { std::string wspace = scw.substr(0, prev_pos); - wspace.append(nlst[j]); + wspace.append(*j); if (!last) { wspace.append("-"); wspace.append(scw.substr(dash_pos + 1)); } - ns = insert_sug(slst, wspace.c_str(), ns); - free(nlst[j]); + insert_sug(slst, wspace); } - if (nlst != NULL) - free(nlst); nodashsug = 0; } if (!last) { @@ -1062,31 +1076,24 @@ int Hunspell::suggest(char*** slst, const char* word) { // word reversing wrapper for complex prefixes if (complexprefixes) { - for (int j = 0; j < ns; j++) { - std::string root((*slst)[j]); - free((*slst)[j]); + for (size_t j = 0; j < slst.size(); ++j) { if (utf8) - reverseword_utf(root); + reverseword_utf(slst[j]); else - reverseword(root); - (*slst)[j] = mystrdup(root.c_str()); + reverseword(slst[j]); } } // capitalize if (capwords) - for (int j = 0; j < ns; j++) { - std::string form((*slst)[j]); - free((*slst)[j]); - mkinitcap(form); - (*slst)[j] = mystrdup(form.c_str()); + for (size_t j = 0; j < slst.size(); ++j) { + mkinitcap(slst[j]); } // expand suggestions with dot(s) if (abbv && pAMgr && pAMgr->get_sugswithdots()) { - for (int j = 0; j < ns; j++) { - (*slst)[j] = (char*)realloc((*slst)[j], strlen((*slst)[j]) + 1 + abbv); - strcat((*slst)[j], word + strlen(word) - abbv); + for (size_t j = 0; j < slst.size(); ++j) { + slst[j].append(word.substr(word.size() - abbv)); } } @@ -1095,96 +1102,90 @@ int Hunspell::suggest(char*** slst, const char* word) { switch (captype) { case INITCAP: case ALLCAP: { - int l = 0; - for (int j = 0; j < ns; j++) { - if (!strchr((*slst)[j], ' ') && !spell((*slst)[j])) { + size_t l = 0; + for (size_t j = 0; j < slst.size(); ++j) { + if (slst[j].find(' ') == std::string::npos && !spell(slst[j])) { std::string s; std::vector<w_char> w; if (utf8) { - u8_u16(w, (*slst)[j]); + u8_u16(w, slst[j]); } else { - s = (*slst)[j]; + s = slst[j]; } mkallsmall2(s, w); - free((*slst)[j]); - if (spell(s.c_str())) { - (*slst)[l] = mystrdup(s.c_str()); - if ((*slst)[l]) - l++; + if (spell(s)) { + slst[l] = s; + ++l; } else { mkinitcap2(s, w); - if (spell(s.c_str())) { - (*slst)[l] = mystrdup(s.c_str()); - if ((*slst)[l]) - l++; + if (spell(s)) { + slst[l] = s; + ++l; } } } else { - (*slst)[l] = (*slst)[j]; - l++; + slst[l] = slst[j]; + ++l; } } - ns = l; + slst.resize(l); } } } // remove duplications - int l = 0; - for (int j = 0; j < ns; j++) { - (*slst)[l] = (*slst)[j]; - for (int k = 0; k < l; k++) { - if (strcmp((*slst)[k], (*slst)[j]) == 0) { - free((*slst)[j]); - l--; + size_t l = 0; + for (size_t j = 0; j < slst.size(); ++j) { + slst[l] = slst[j]; + for (size_t k = 0; k < l; ++k) { + if (slst[k] == slst[j]) { + --l; break; } } - l++; + ++l; } - ns = l; + slst.resize(l); // output conversion rl = (pAMgr) ? pAMgr->get_oconvtable() : NULL; - for (int j = 0; rl && j < ns; j++) { + for (size_t j = 0; rl && j < slst.size(); ++j) { std::string wspace; - if (rl->conv((*slst)[j], wspace) > 0) { - free((*slst)[j]); - (*slst)[j] = mystrdup(wspace.c_str()); + if (rl->conv(slst[j], wspace)) { + slst[j] = wspace; } } - // if suggestions removed by nosuggest, onlyincompound parameters - if (l == 0 && *slst) { - free(*slst); - *slst = NULL; - } - return l; + return slst; } -void Hunspell::free_list(char*** slst, int n) { - freelist(slst, n); +const std::string& Hunspell::get_dict_encoding() const { + return m_Impl->get_dict_encoding(); } -char* Hunspell::get_dic_encoding() { +const std::string& HunspellImpl::get_dict_encoding() const { return encoding; } -int Hunspell::stem(char*** slst, char** desc, int n) { +std::vector<std::string> Hunspell::stem(const std::vector<std::string>& desc) { + return m_Impl->stem(desc); +} + +std::vector<std::string> HunspellImpl::stem(const std::vector<std::string>& desc) { + std::vector<std::string> slst; std::string result2; - *slst = NULL; - if (n == 0) - return 0; - for (int i = 0; i < n; i++) { + if (desc.empty()) + return slst; + for (size_t i = 0; i < desc.size(); ++i) { std::string result; // add compound word parts (except the last one) - char* s = (char*)desc[i]; - char* part = strstr(s, MORPH_PART); + const char* s = desc[i].c_str(); + const char* part = strstr(s, MORPH_PART); if (part) { - char* nextpart = strstr(part + 1, MORPH_PART); + const char* nextpart = strstr(part + 1, MORPH_PART); while (nextpart) { std::string field; copy_field(field, part, MORPH_PART); @@ -1195,36 +1196,34 @@ int Hunspell::stem(char*** slst, char** desc, int n) { s = part; } - char** pl; std::string tok(s); size_t alt = 0; while ((alt = tok.find(" | ", alt)) != std::string::npos) { tok[alt + 1] = MSEP_ALT; } - int pln = line_tok(tok.c_str(), &pl, MSEP_ALT); - for (int k = 0; k < pln; k++) { + std::vector<std::string> pl = line_tok(tok, MSEP_ALT); + for (size_t k = 0; k < pl.size(); ++k) { // add derivational suffixes - if (strstr(pl[k], MORPH_DERI_SFX)) { + if (pl[k].find(MORPH_DERI_SFX) != std::string::npos) { // remove inflectional suffixes - char* is = strstr(pl[k], MORPH_INFL_SFX); - if (is) - *is = '\0'; - char* sg = pSMgr->suggest_gen(&(pl[k]), 1, pl[k]); - if (sg) { - char** gen; - int genl = line_tok(sg, &gen, MSEP_REC); - free(sg); - for (int j = 0; j < genl; j++) { + const size_t is = pl[k].find(MORPH_INFL_SFX); + if (is != std::string::npos) + pl[k].resize(is); + std::vector<std::string> singlepl; + singlepl.push_back(pl[k]); + std::string sg = pSMgr->suggest_gen(singlepl, pl[k]); + if (!sg.empty()) { + std::vector<std::string> gen = line_tok(sg, MSEP_REC); + for (size_t j = 0; j < gen.size(); ++j) { result2.push_back(MSEP_REC); result2.append(result); result2.append(gen[j]); } - freelist(&gen, genl); } } else { result2.push_back(MSEP_REC); result2.append(result); - if (strstr(pl[k], MORPH_SURF_PFX)) { + if (pl[k].find(MORPH_SURF_PFX) != std::string::npos) { std::string field; copy_field(field, pl[k], MORPH_SURF_PFX); result2.append(field); @@ -1234,29 +1233,41 @@ int Hunspell::stem(char*** slst, char** desc, int n) { result2.append(field); } } - freelist(&pl, pln); } - int sln = line_tok(result2.c_str(), slst, MSEP_REC); - return uniqlist(*slst, sln); + slst = line_tok(result2, MSEP_REC); + uniqlist(slst); + return slst; } -int Hunspell::stem(char*** slst, const char* word) { - char** pl; - int pln = analyze(&pl, word); - int pln2 = stem(slst, pl, pln); - freelist(&pl, pln); - return pln2; +std::vector<std::string> Hunspell::stem(const std::string& word) { + return m_Impl->stem(word); +} + +std::vector<std::string> HunspellImpl::stem(const std::string& word) { + return stem(analyze(word)); +} + +const char* Hunspell::get_wordchars() const { + return m_Impl->get_wordchars().c_str(); } -const char* Hunspell::get_wordchars() { +const std::string& Hunspell::get_wordchars_cpp() const { + return m_Impl->get_wordchars(); +} + +const std::string& HunspellImpl::get_wordchars() const { return pAMgr->get_wordchars(); } -const std::vector<w_char>& Hunspell::get_wordchars_utf16() { +const std::vector<w_char>& Hunspell::get_wordchars_utf16() const { + return m_Impl->get_wordchars_utf16(); +} + +const std::vector<w_char>& HunspellImpl::get_wordchars_utf16() const { return pAMgr->get_wordchars_utf16(); } -void Hunspell::mkinitcap(std::string& u8) { +void HunspellImpl::mkinitcap(std::string& u8) { if (utf8) { std::vector<w_char> u16; u8_u16(u16, u8); @@ -1267,7 +1278,7 @@ void Hunspell::mkinitcap(std::string& u8) { } } -int Hunspell::mkinitcap2(std::string& u8, std::vector<w_char>& u16) { +int HunspellImpl::mkinitcap2(std::string& u8, std::vector<w_char>& u16) { if (utf8) { ::mkinitcap_utf(u16, langnum); u16_u8(u8, u16); @@ -1277,7 +1288,7 @@ int Hunspell::mkinitcap2(std::string& u8, std::vector<w_char>& u16) { return u8.size(); } -int Hunspell::mkinitsmall2(std::string& u8, std::vector<w_char>& u16) { +int HunspellImpl::mkinitsmall2(std::string& u8, std::vector<w_char>& u16) { if (utf8) { ::mkinitsmall_utf(u16, langnum); u16_u8(u8, u16); @@ -1287,52 +1298,78 @@ int Hunspell::mkinitsmall2(std::string& u8, std::vector<w_char>& u16) { return u8.size(); } -int Hunspell::add(const char* word) { - if (pHMgr[0]) - return (pHMgr[0])->add(word); +int Hunspell::add(const std::string& word) { + return m_Impl->add(word); +} + +int HunspellImpl::add(const std::string& word) { + if (!m_HMgrs.empty()) + return m_HMgrs[0]->add(word); return 0; } -int Hunspell::add_with_affix(const char* word, const char* example) { - if (pHMgr[0]) - return (pHMgr[0])->add_with_affix(word, example); +int Hunspell::add_with_affix(const std::string& word, const std::string& example) { + return m_Impl->add_with_affix(word, example); +} + +int HunspellImpl::add_with_affix(const std::string& word, const std::string& example) { + if (!m_HMgrs.empty()) + return m_HMgrs[0]->add_with_affix(word, example); return 0; } -int Hunspell::remove(const char* word) { - if (pHMgr[0]) - return (pHMgr[0])->remove(word); +int Hunspell::remove(const std::string& word) { + return m_Impl->remove(word); +} + +int HunspellImpl::remove(const std::string& word) { + if (!m_HMgrs.empty()) + return m_HMgrs[0]->remove(word); return 0; } -const char* Hunspell::get_version() { +const char* Hunspell::get_version() const { + return m_Impl->get_version().c_str(); +} + +const std::string& Hunspell::get_version_cpp() const { + return m_Impl->get_version(); +} + +const std::string& HunspellImpl::get_version() const { return pAMgr->get_version(); } -struct cs_info* Hunspell::get_csconv() { +struct cs_info* HunspellImpl::get_csconv() { return csconv; } -void Hunspell::cat_result(std::string& result, char* st) { - if (st) { +struct cs_info* Hunspell::get_csconv() { + return m_Impl->get_csconv(); +} + +void HunspellImpl::cat_result(std::string& result, const std::string& st) { + if (!st.empty()) { if (!result.empty()) result.append("\n"); result.append(st); - free(st); } } -int Hunspell::analyze(char*** slst, const char* word) { - *slst = NULL; - if (!pSMgr || maxdic == 0) - return 0; - int nc = strlen(word); +std::vector<std::string> Hunspell::analyze(const std::string& word) { + return m_Impl->analyze(word); +} + +std::vector<std::string> HunspellImpl::analyze(const std::string& word) { + std::vector<std::string> slst; + if (!pSMgr || m_HMgrs.empty()) + return slst; if (utf8) { - if (nc >= MAXWORDUTF8LEN) - return 0; + if (word.size() >= MAXWORDUTF8LEN) + return slst; } else { - if (nc >= MAXWORDLEN) - return 0; + if (word.size() >= MAXWORDLEN) + return slst; } int captype = NOCAP; size_t abbv = 0; @@ -1346,13 +1383,11 @@ int Hunspell::analyze(char*** slst, const char* word) { { std::string wspace; - int convstatus = rl ? rl->conv(word, wspace) : 0; - if (convstatus < 0) - return 0; - else if (convstatus > 0) - wl = cleanword2(scw, sunicw, wspace.c_str(), &nc, &captype, &abbv); + bool convstatus = rl ? rl->conv(word, wspace) : false; + if (convstatus) + wl = cleanword2(scw, sunicw, wspace, &captype, &abbv); else - wl = cleanword2(scw, sunicw, word, &nc, &captype, &abbv); + wl = cleanword2(scw, sunicw, word, &captype, &abbv); } if (wl == 0) { @@ -1362,18 +1397,18 @@ int Hunspell::analyze(char*** slst, const char* word) { scw.push_back('.'); abbv = 0; } else - return 0; + return slst; } std::string result; size_t n = 0; - size_t n2 = 0; - size_t n3 = 0; - // test numbers // LANG_hu section: set dash information for suggestions if (langnum == LANG_hu) { + size_t n2 = 0; + size_t n3 = 0; + while ((n < wl) && (((scw[n] <= '9') && (scw[n] >= '0')) || (((scw[n] == '.') || (scw[n] == ',')) && (n > 0)))) { n++; @@ -1387,22 +1422,20 @@ int Hunspell::analyze(char*** slst, const char* word) { } if ((n == wl) && (n3 > 0) && (n - n3 > 3)) - return 0; + return slst; if ((n == wl) || ((n > 0) && ((scw[n] == '%') || (scw[n] == '\xB0')) && - checkword(scw.c_str() + n, NULL, NULL))) { + checkword(scw.substr(n), NULL, NULL))) { result.append(scw); result.resize(n - 1); if (n == wl) - cat_result(result, pSMgr->suggest_morph(scw.c_str() + n - 1)); + cat_result(result, pSMgr->suggest_morph(scw.substr(n - 1))); else { - char sign = scw[n]; - scw[n] = '\0'; - cat_result(result, pSMgr->suggest_morph(scw.c_str() + n - 1)); + std::string chunk = scw.substr(n - 1, 1); + cat_result(result, pSMgr->suggest_morph(chunk)); result.push_back('+'); // XXX SPEC. MORPHCODE - scw[n] = sign; - cat_result(result, pSMgr->suggest_morph(scw.c_str() + n)); + cat_result(result, pSMgr->suggest_morph(scw.substr(n))); } - return line_tok(result.c_str(), slst, MSEP_REC); + return line_tok(result, MSEP_REC); } } // END OF LANG_hu section @@ -1411,52 +1444,52 @@ int Hunspell::analyze(char*** slst, const char* word) { case HUHCAP: case HUHINITCAP: case NOCAP: { - cat_result(result, pSMgr->suggest_morph(scw.c_str())); + cat_result(result, pSMgr->suggest_morph(scw)); if (abbv) { std::string u8buffer(scw); u8buffer.push_back('.'); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); } break; } case INITCAP: { - wl = mkallsmall2(scw, sunicw); + mkallsmall2(scw, sunicw); std::string u8buffer(scw); mkinitcap2(scw, sunicw); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); - cat_result(result, pSMgr->suggest_morph(scw.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); + cat_result(result, pSMgr->suggest_morph(scw)); if (abbv) { u8buffer.push_back('.'); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); u8buffer = scw; u8buffer.push_back('.'); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); } break; } case ALLCAP: { - cat_result(result, pSMgr->suggest_morph(scw.c_str())); + cat_result(result, pSMgr->suggest_morph(scw)); if (abbv) { std::string u8buffer(scw); u8buffer.push_back('.'); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); } mkallsmall2(scw, sunicw); std::string u8buffer(scw); mkinitcap2(scw, sunicw); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); - cat_result(result, pSMgr->suggest_morph(scw.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); + cat_result(result, pSMgr->suggest_morph(scw)); if (abbv) { u8buffer.push_back('.'); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); u8buffer = scw; u8buffer.push_back('.'); - cat_result(result, pSMgr->suggest_morph(u8buffer.c_str())); + cat_result(result, pSMgr->suggest_morph(u8buffer)); } break; } @@ -1470,62 +1503,58 @@ int Hunspell::analyze(char*** slst, const char* word) { else reverseword(result); } - return line_tok(result.c_str(), slst, MSEP_REC); + return line_tok(result, MSEP_REC); } // compound word with dash (HU) I18n // LANG_hu section: set dash information for suggestions size_t dash_pos = langnum == LANG_hu ? scw.find('-') : std::string::npos; - int nresult = 0; if (dash_pos != std::string::npos) { + int nresult = 0; + std::string part1 = scw.substr(0, dash_pos); std::string part2 = scw.substr(dash_pos+1); // examine 2 sides of the dash if (part2.empty()) { // base word ending with dash - if (spell(part1.c_str())) { - char* p = pSMgr->suggest_morph(part1.c_str()); - if (p) { - int ret = line_tok(p, slst, MSEP_REC); - free(p); - return ret; + if (spell(part1)) { + std::string p = pSMgr->suggest_morph(part1); + if (!p.empty()) { + slst = line_tok(p, MSEP_REC); + return slst; } } } else if (part2.size() == 1 && part2[0] == 'e') { // XXX (HU) -e hat. - if (spell(part1.c_str()) && (spell("-e"))) { - char* st = pSMgr->suggest_morph(part1.c_str()); - if (st) { + if (spell(part1) && (spell("-e"))) { + std::string st = pSMgr->suggest_morph(part1); + if (!st.empty()) { result.append(st); - free(st); } result.push_back('+'); // XXX spec. separator in MORPHCODE st = pSMgr->suggest_morph("-e"); - if (st) { + if (!st.empty()) { result.append(st); - free(st); } - return line_tok(result.c_str(), slst, MSEP_REC); + return line_tok(result, MSEP_REC); } } else { // first word ending with dash: word- XXX ??? part1.push_back(' '); - nresult = spell(part1.c_str()); + nresult = spell(part1); part1.erase(part1.size() - 1); - if (nresult && spell(part2.c_str()) && + if (nresult && spell(part2) && ((part2.size() > 1) || ((part2[0] > '0') && (part2[0] < '9')))) { - char* st = pSMgr->suggest_morph(part1.c_str()); - if (st) { + std::string st = pSMgr->suggest_morph(part1); + if (!st.empty()) { result.append(st); - free(st); result.push_back('+'); // XXX spec. separator in MORPHCODE } - st = pSMgr->suggest_morph(part2.c_str()); - if (st) { + st = pSMgr->suggest_morph(part2); + if (!st.empty()) { result.append(st); - free(st); } - return line_tok(result.c_str(), slst, MSEP_REC); + return line_tok(result, MSEP_REC); } } // affixed number in correct word @@ -1550,37 +1579,38 @@ int Hunspell::analyze(char*** slst, const char* word) { continue; } std::string chunk = scw.substr(dash_pos - n); - if (checkword(chunk.c_str(), NULL, NULL)) { + if (checkword(chunk, NULL, NULL)) { result.append(chunk); - char* st = pSMgr->suggest_morph(chunk.c_str()); - if (st) { + std::string st = pSMgr->suggest_morph(chunk); + if (!st.empty()) { result.append(st); - free(st); } - return line_tok(result.c_str(), slst, MSEP_REC); + return line_tok(result, MSEP_REC); } } } } - return 0; + return slst; } -int Hunspell::generate(char*** slst, const char* word, char** pl, int pln) { - *slst = NULL; - if (!pSMgr || !pln) - return 0; - char** pl2; - int pl2n = analyze(&pl2, word); +std::vector<std::string> Hunspell::generate(const std::string& word, const std::vector<std::string>& pl) { + return m_Impl->generate(word, pl); +} + +std::vector<std::string> HunspellImpl::generate(const std::string& word, const std::vector<std::string>& pl) { + std::vector<std::string> slst; + if (!pSMgr || pl.empty()) + return slst; + std::vector<std::string> pl2 = analyze(word); int captype = NOCAP; int abbv = 0; std::string cw; cleanword(cw, word, &captype, &abbv); std::string result; - for (int i = 0; i < pln; i++) { - cat_result(result, pSMgr->suggest_gen(pl2, pl2n, pl[i])); + for (size_t i = 0; i < pl.size(); ++i) { + cat_result(result, pSMgr->suggest_gen(pl2, pl[i])); } - freelist(&pl2, pl2n); if (!result.empty()) { // allcap @@ -1588,50 +1618,42 @@ int Hunspell::generate(char*** slst, const char* word, char** pl, int pln) { mkallcap(result); // line split - int linenum = line_tok(result.c_str(), slst, MSEP_REC); + slst = line_tok(result, MSEP_REC); // capitalize if (captype == INITCAP || captype == HUHINITCAP) { - for (int j = 0; j < linenum; j++) { - std::string form((*slst)[j]); - free((*slst)[j]); - mkinitcap(form); - (*slst)[j] = mystrdup(form.c_str()); + for (size_t j = 0; j < slst.size(); ++j) { + mkinitcap(slst[j]); } } // temporary filtering of prefix related errors (eg. // generate("undrinkable", "eats") --> "undrinkables" and "*undrinks") - - int r = 0; - for (int j = 0; j < linenum; j++) { - if (!spell((*slst)[j])) { - free((*slst)[j]); - (*slst)[j] = NULL; - } else { - if (r < j) - (*slst)[r] = (*slst)[j]; - r++; + std::vector<std::string>::iterator it = slst.begin(); + while (it != slst.end()) { + if (!spell(*it)) { + it = slst.erase(it); + } else { + ++it; } } - if (r > 0) - return r; - free(*slst); - *slst = NULL; } - return 0; + return slst; } -int Hunspell::generate(char*** slst, const char* word, const char* pattern) { - char** pl; - int pln = analyze(&pl, pattern); - int n = generate(slst, word, pl, pln); - freelist(&pl, pln); - return uniqlist(*slst, n); +std::vector<std::string> Hunspell::generate(const std::string& word, const std::string& pattern) { + return m_Impl->generate(word, pattern); +} + +std::vector<std::string> HunspellImpl::generate(const std::string& word, const std::string& pattern) { + std::vector<std::string> pl = analyze(pattern); + std::vector<std::string> slst = generate(word, pl); + uniqlist(slst); + return slst; } // minimal XML parser functions -std::string Hunspell::get_xml_par(const char* par) { +std::string HunspellImpl::get_xml_par(const char* par) { std::string dest; if (!par) return dest; @@ -1639,7 +1661,7 @@ std::string Hunspell::get_xml_par(const char* par) { if (end == '>') end = '<'; else if (end != '\'' && end != '"') - return 0; // bad XML + return dest; // bad XML for (par++; *par != '\0' && *par != end; ++par) { dest.push_back(*par); } @@ -1649,29 +1671,54 @@ std::string Hunspell::get_xml_par(const char* par) { } int Hunspell::get_langnum() const { + return m_Impl->get_langnum(); +} + +int HunspellImpl::get_langnum() const { return langnum; } +bool Hunspell::input_conv(const std::string& word, std::string& dest) { + return m_Impl->input_conv(word, dest); +} + int Hunspell::input_conv(const char* word, char* dest, size_t destsize) { - RepList* rl = (pAMgr) ? pAMgr->get_iconvtable() : NULL; - return (rl && rl->conv(word, dest, destsize) > 0); + std::string d; + bool ret = input_conv(word, d); + if (ret && d.size() < destsize) { + strncpy(dest, d.c_str(), destsize); + return 1; + } + return 0; +} + +bool HunspellImpl::input_conv(const std::string& word, std::string& dest) { + RepList* rl = pAMgr ? pAMgr->get_iconvtable() : NULL; + if (rl) { + return rl->conv(word, dest); + } + dest.assign(word); + return false; } // return the beginning of the element (attr == NULL) or the attribute -const char* Hunspell::get_xml_pos(const char* s, const char* attr) { +const char* HunspellImpl::get_xml_pos(const char* s, const char* attr) { const char* end = strchr(s, '>'); - const char* p = s; if (attr == NULL) return end; - do { + const char* p = s; + while (1) { p = strstr(p, attr); if (!p || p >= end) return 0; - } while (*(p - 1) != ' ' && *(p - 1) != '\n'); + if (*(p - 1) == ' ' || *(p - 1) == '\n') + break; + p += strlen(attr); + } return p + strlen(attr); } -int Hunspell::check_xml_par(const char* q, +int HunspellImpl::check_xml_par(const char* q, const char* attr, const char* value) { std::string cw = get_xml_par(get_xml_pos(q, attr)); @@ -1680,53 +1727,48 @@ int Hunspell::check_xml_par(const char* q, return 0; } -int Hunspell::get_xml_list(char*** slst, const char* list, const char* tag) { +std::vector<std::string> HunspellImpl::get_xml_list(const char* list, const char* tag) { + std::vector<std::string> slst; if (!list) - return 0; - int n = 0; - const char* p; - for (p = list; ((p = strstr(p, tag)) != NULL); p++) - n++; - if (n == 0) - return 0; - *slst = (char**)malloc(sizeof(char*) * n); - if (!*slst) - return 0; - for (p = list, n = 0; ((p = strstr(p, tag)) != NULL); p++, n++) { + return slst; + const char* p = list; + for (size_t n = 0; ((p = strstr(p, tag)) != NULL); ++p, ++n) { std::string cw = get_xml_par(p + strlen(tag) - 1); if (cw.empty()) { break; } - (*slst)[n] = mystrdup(cw.c_str()); + slst.push_back(cw); } - return n; + return slst; } -int Hunspell::spellml(char*** slst, const char* word) { +std::vector<std::string> HunspellImpl::spellml(const std::string& in_word) { + std::vector<std::string> slst; + + const char* word = in_word.c_str(); + const char* q = strstr(word, "<query"); if (!q) - return 0; // bad XML input + return slst; // bad XML input const char* q2 = strchr(q, '>'); if (!q2) - return 0; // bad XML input + return slst; // bad XML input q2 = strstr(q2, "<word"); if (!q2) - return 0; // bad XML input + return slst; // bad XML input if (check_xml_par(q, "type=", "analyze")) { - int n = 0; std::string cw = get_xml_par(strchr(q2, '>')); if (!cw.empty()) - n = analyze(slst, cw.c_str()); - if (n == 0) - return 0; + slst = analyze(cw); + if (slst.empty()) + return slst; // convert the result to <code><a>ana1</a><a>ana2</a></code> format std::string r; r.append("<code>"); - for (int i = 0; i < n; i++) { + for (size_t i = 0; i < slst.size(); ++i) { r.append("<a>"); - std::string entry((*slst)[i]); - free((*slst)[i]); + std::string entry(slst[i]); mystrrep(entry, "\t", " "); mystrrep(entry, "&", "&"); mystrrep(entry, "<", "<"); @@ -1735,36 +1777,101 @@ int Hunspell::spellml(char*** slst, const char* word) { r.append("</a>"); } r.append("</code>"); - (*slst)[0] = mystrdup(r.c_str()); - return 1; + slst.clear(); + slst.push_back(r); + return slst; } else if (check_xml_par(q, "type=", "stem")) { std::string cw = get_xml_par(strchr(q2, '>')); if (!cw.empty()) - return stem(slst, cw.c_str()); + return stem(cw); } else if (check_xml_par(q, "type=", "generate")) { std::string cw = get_xml_par(strchr(q2, '>')); if (cw.empty()) - return 0; + return slst; const char* q3 = strstr(q2 + 1, "<word"); if (q3) { std::string cw2 = get_xml_par(strchr(q3, '>')); if (!cw2.empty()) { - return generate(slst, cw.c_str(), cw2.c_str()); + return generate(cw, cw2); } } else { if ((q2 = strstr(q2 + 1, "<code")) != NULL) { - char** slst2; - int n = get_xml_list(&slst2, strchr(q2, '>'), "<a>"); - if (n != 0) { - int n2 = generate(slst, cw.c_str(), slst2, n); - freelist(&slst2, n); - return uniqlist(*slst, n2); + std::vector<std::string> slst2 = get_xml_list(strchr(q2, '>'), "<a>"); + if (!slst2.empty()) { + slst = generate(cw, slst2); + uniqlist(slst); + return slst; } - freelist(&slst2, n); } } } - return 0; + return slst; +} + +int Hunspell::spell(const char* word, int* info, char** root) { + std::string sroot; + bool ret = m_Impl->spell(word, info, root ? &sroot : NULL); + if (root) { + if (sroot.empty()) { + *root = NULL; + } else { + *root = mystrdup(sroot.c_str()); + } + } + return ret; +} + +namespace { + int munge_vector(char*** slst, const std::vector<std::string>& items) { + if (items.empty()) { + *slst = NULL; + return 0; + } else { + *slst = (char**)malloc(sizeof(char*) * items.size()); + if (!*slst) + return 0; + for (size_t i = 0; i < items.size(); ++i) + (*slst)[i] = mystrdup(items[i].c_str()); + } + return items.size(); + } +} + +void Hunspell::free_list(char*** slst, int n) { + Hunspell_free_list((Hunhandle*)(this), slst, n); +} + +int Hunspell::suggest(char*** slst, const char* word) { + return Hunspell_suggest((Hunhandle*)(this), slst, word); +} + +int Hunspell::suffix_suggest(char*** slst, const char* root_word) { + std::vector<std::string> stems = m_Impl->suffix_suggest(root_word); + return munge_vector(slst, stems); +} + +char* Hunspell::get_dic_encoding() { + return &(m_Impl->dic_encoding_vec[0]); +} + +int Hunspell::stem(char*** slst, char** desc, int n) { + return Hunspell_stem2((Hunhandle*)(this), slst, desc, n); +} + +int Hunspell::stem(char*** slst, const char* word) { + return Hunspell_stem((Hunhandle*)(this), slst, word); +} + +int Hunspell::analyze(char*** slst, const char* word) { + return Hunspell_analyze((Hunhandle*)(this), slst, word); +} + +int Hunspell::generate(char*** slst, const char* word, char** pl, int pln) { + return Hunspell_generate2((Hunhandle*)(this), slst, word, pl, pln); +} + +int Hunspell::generate(char*** slst, const char* word, const char* pattern) { + return Hunspell_generate((Hunhandle*)(this), slst, word, pattern); } Hunhandle* Hunspell_create(const char* affpath, const char* dpath) { @@ -1774,46 +1881,56 @@ Hunhandle* Hunspell_create(const char* affpath, const char* dpath) { Hunhandle* Hunspell_create_key(const char* affpath, const char* dpath, const char* key) { - return (Hunhandle*)(new Hunspell(affpath, dpath, key)); + return reinterpret_cast<Hunhandle*>(new Hunspell(affpath, dpath, key)); } void Hunspell_destroy(Hunhandle* pHunspell) { - delete (Hunspell*)(pHunspell); + delete reinterpret_cast<Hunspell*>(pHunspell); } int Hunspell_add_dic(Hunhandle* pHunspell, const char* dpath) { - return ((Hunspell*)pHunspell)->add_dic(dpath); + return reinterpret_cast<Hunspell*>(pHunspell)->add_dic(dpath); } int Hunspell_spell(Hunhandle* pHunspell, const char* word) { - return ((Hunspell*)pHunspell)->spell(word); + return reinterpret_cast<Hunspell*>(pHunspell)->spell(std::string(word)); } char* Hunspell_get_dic_encoding(Hunhandle* pHunspell) { - return ((Hunspell*)pHunspell)->get_dic_encoding(); + return reinterpret_cast<Hunspell*>(pHunspell)->get_dic_encoding(); } int Hunspell_suggest(Hunhandle* pHunspell, char*** slst, const char* word) { - return ((Hunspell*)pHunspell)->suggest(slst, word); + std::vector<std::string> suggests = reinterpret_cast<Hunspell*>(pHunspell)->suggest(word); + return munge_vector(slst, suggests); } int Hunspell_analyze(Hunhandle* pHunspell, char*** slst, const char* word) { - return ((Hunspell*)pHunspell)->analyze(slst, word); + std::vector<std::string> stems = reinterpret_cast<Hunspell*>(pHunspell)->analyze(word); + return munge_vector(slst, stems); } int Hunspell_stem(Hunhandle* pHunspell, char*** slst, const char* word) { - return ((Hunspell*)pHunspell)->stem(slst, word); + + std::vector<std::string> stems = reinterpret_cast<Hunspell*>(pHunspell)->stem(word); + return munge_vector(slst, stems); } int Hunspell_stem2(Hunhandle* pHunspell, char*** slst, char** desc, int n) { - return ((Hunspell*)pHunspell)->stem(slst, desc, n); + std::vector<std::string> morph; + for (int i = 0; i < n; ++i) + morph.push_back(desc[i]); + + std::vector<std::string> stems = reinterpret_cast<Hunspell*>(pHunspell)->stem(morph); + return munge_vector(slst, stems); } int Hunspell_generate(Hunhandle* pHunspell, char*** slst, const char* word, - const char* word2) { - return ((Hunspell*)pHunspell)->generate(slst, word, word2); + const char* pattern) { + std::vector<std::string> stems = reinterpret_cast<Hunspell*>(pHunspell)->generate(word, pattern); + return munge_vector(slst, stems); } int Hunspell_generate2(Hunhandle* pHunspell, @@ -1821,7 +1938,12 @@ int Hunspell_generate2(Hunhandle* pHunspell, const char* word, char** desc, int n) { - return ((Hunspell*)pHunspell)->generate(slst, word, desc, n); + std::vector<std::string> morph; + for (int i = 0; i < n; ++i) + morph.push_back(desc[i]); + + std::vector<std::string> stems = reinterpret_cast<Hunspell*>(pHunspell)->generate(word, morph); + return munge_vector(slst, stems); } /* functions for run-time modification of the dictionary */ @@ -1829,7 +1951,7 @@ int Hunspell_generate2(Hunhandle* pHunspell, /* add word to the run-time dictionary */ int Hunspell_add(Hunhandle* pHunspell, const char* word) { - return ((Hunspell*)pHunspell)->add(word); + return reinterpret_cast<Hunspell*>(pHunspell)->add(word); } /* add word to the run-time dictionary with affix flags of @@ -1840,25 +1962,35 @@ int Hunspell_add(Hunhandle* pHunspell, const char* word) { int Hunspell_add_with_affix(Hunhandle* pHunspell, const char* word, const char* example) { - return ((Hunspell*)pHunspell)->add_with_affix(word, example); + return reinterpret_cast<Hunspell*>(pHunspell)->add_with_affix(word, example); } /* remove word from the run-time dictionary */ int Hunspell_remove(Hunhandle* pHunspell, const char* word) { - return ((Hunspell*)pHunspell)->remove(word); + return reinterpret_cast<Hunspell*>(pHunspell)->remove(word); } -void Hunspell_free_list(Hunhandle*, char*** slst, int n) { - freelist(slst, n); +void Hunspell_free_list(Hunhandle*, char*** list, int n) { + if (list && *list) { + for (int i = 0; i < n; i++) + free((*list)[i]); + free(*list); + *list = NULL; + } } -int Hunspell::suffix_suggest(char*** slst, const char* root_word) { +std::vector<std::string> Hunspell::suffix_suggest(const std::string& root_word) { + return m_Impl->suffix_suggest(root_word); +} + +std::vector<std::string> HunspellImpl::suffix_suggest(const std::string& root_word) { + std::vector<std::string> slst; struct hentry* he = NULL; int len; std::string w2; const char* word; - char* ignoredchars = pAMgr->get_ignore(); + const char* ignoredchars = pAMgr->get_ignore(); if (ignoredchars != NULL) { w2.assign(root_word); if (utf8) { @@ -1870,26 +2002,18 @@ int Hunspell::suffix_suggest(char*** slst, const char* root_word) { } word = w2.c_str(); } else - word = root_word; + word = root_word.c_str(); len = strlen(word); if (!len) - return 0; + return slst; - char** wlst = (char**)malloc(MAXSUGGESTION * sizeof(char*)); - if (wlst == NULL) - return -1; - *slst = wlst; - for (int i = 0; i < MAXSUGGESTION; i++) { - wlst[i] = NULL; - } - - for (int i = 0; (i < maxdic) && !he; i++) { - he = (pHMgr[i])->lookup(word); + for (size_t i = 0; (i < m_HMgrs.size()) && !he; ++i) { + he = m_HMgrs[i]->lookup(word); } if (he) { - return pAMgr->get_suffix_words(he->astr, he->alen, root_word, *slst); + slst = pAMgr->get_suffix_words(he->astr, he->alen, root_word.c_str()); } - return 0; + return slst; } diff --git a/libs/hunspell/src/hunspell.h b/libs/hunspell/src/hunspell.h index 726bbe2077..3aca30ab2f 100644 --- a/libs/hunspell/src/hunspell.h +++ b/libs/hunspell/src/hunspell.h @@ -38,8 +38,8 @@ * * ***** END LICENSE BLOCK ***** */ -#ifndef _MYSPELLMGR_H_ -#define _MYSPELLMGR_H_ +#ifndef MYSPELLMGR_H_ +#define MYSPELLMGR_H_ #include "hunvisapi.h" diff --git a/libs/hunspell/src/hunspell.hxx b/libs/hunspell/src/hunspell.hxx index 3bcf75e39c..a06bdd43ab 100644 --- a/libs/hunspell/src/hunspell.hxx +++ b/libs/hunspell/src/hunspell.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -70,26 +67,33 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#ifndef MYSPELLMGR_HXX_ +#define MYSPELLMGR_HXX_ #include "hunvisapi.h" - -#include "hashmgr.hxx" -#include "affixmgr.hxx" -#include "suggestmgr.hxx" -#include "langnum.hxx" +#include "w_char.hxx" +#include "atypes.hxx" +#include <string> #include <vector> #define SPELL_XML "<?xml?>" -#define MAXDIC 20 #define MAXSUGGESTION 15 #define MAXSHARPS 5 -#define HUNSPELL_OK (1 << 0) -#define HUNSPELL_OK_WARN (1 << 1) +#ifndef MAXWORDLEN +#define MAXWORDLEN 100 +#endif -#ifndef _MYSPELLMGR_HXX_ -#define _MYSPELLMGR_HXX_ +#if defined __GNUC__ && (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)) +# define H_DEPRECATED __attribute__((__deprecated__)) +#elif defined(_MSC_VER) && (_MSC_VER >= 1300) +# define H_DEPRECATED __declspec(deprecated) +#else +# define H_DEPRECATED +#endif + +class HunspellImpl; class LIBHUNSPELL_DLL_EXPORTED Hunspell { private: @@ -97,17 +101,7 @@ class LIBHUNSPELL_DLL_EXPORTED Hunspell { Hunspell& operator=(const Hunspell&); private: - AffixMgr* pAMgr; - HashMgr* pHMgr[MAXDIC]; - int maxdic; - SuggestMgr* pSMgr; - char* affixpath; - char* encoding; - struct cs_info* csconv; - int langnum; - int utf8; - int complexprefixes; - char** wordbreak; + HunspellImpl* m_Impl; public: /* Hunspell(aff, dic) - constructor of Hunspell class @@ -125,7 +119,7 @@ class LIBHUNSPELL_DLL_EXPORTED Hunspell { int add_dic(const char* dpath, const char* key = NULL); /* spell(word) - spellcheck word - * output: 0 = bad word, not 0 = good word + * output: false = bad word, true = good word * * plus output: * info: information bit array, fields: @@ -133,8 +127,8 @@ class LIBHUNSPELL_DLL_EXPORTED Hunspell { * SPELL_FORBIDDEN = an explicit forbidden word * root: root (stem), when input is a word with affix(es) */ - - int spell(const char* word, int* info = NULL, char** root = NULL); + bool spell(const std::string& word, int* info = NULL, std::string* root = NULL); + H_DEPRECATED int spell(const char* word, int* info = NULL, char** root = NULL); /* suggest(suggestions, word) - search suggestions * input: pointer to an array of strings pointer and the (bad) word @@ -143,8 +137,8 @@ class LIBHUNSPELL_DLL_EXPORTED Hunspell { * a newly allocated array of strings (*slts will be NULL when number * of suggestion equals 0.) */ - - int suggest(char*** slst, const char* word); + std::vector<std::string> suggest(const std::string& word); + H_DEPRECATED int suggest(char*** slst, const char* word); /* Suggest words from suffix rules * suffix_suggest(suggestions, root_word) @@ -154,36 +148,37 @@ class LIBHUNSPELL_DLL_EXPORTED Hunspell { * a newly allocated array of strings (*slts will be NULL when number * of suggestion equals 0.) */ - int suffix_suggest(char*** slst, const char* root_word); + std::vector<std::string> suffix_suggest(const std::string& root_word); + H_DEPRECATED int suffix_suggest(char*** slst, const char* root_word); /* deallocate suggestion lists */ + H_DEPRECATED void free_list(char*** slst, int n); - void free_list(char*** slst, int n); - + const std::string& get_dict_encoding() const; char* get_dic_encoding(); /* morphological functions */ /* analyze(result, word) - morphological analysis of the word */ + std::vector<std::string> analyze(const std::string& word); + H_DEPRECATED int analyze(char*** slst, const char* word); - int analyze(char*** slst, const char* word); + /* stem(word) - stemmer function */ + std::vector<std::string> stem(const std::string& word); + H_DEPRECATED int stem(char*** slst, const char* word); - /* stem(result, word) - stemmer function */ - - int stem(char*** slst, const char* word); - - /* stem(result, analysis, n) - get stems from a morph. analysis + /* stem(analysis, n) - get stems from a morph. analysis * example: * char ** result, result2; * int n1 = analyze(&result, "words"); * int n2 = stem(&result2, result, n1); */ - - int stem(char*** slst, char** morph, int n); + std::vector<std::string> stem(const std::vector<std::string>& morph); + H_DEPRECATED int stem(char*** slst, char** morph, int n); /* generate(result, word, word2) - morphological generation by example(s) */ - - int generate(char*** slst, const char* word, const char* word2); + std::vector<std::string> generate(const std::string& word, const std::string& word2); + H_DEPRECATED int generate(char*** slst, const char* word, const char* word2); /* generate(result, word, desc, n) - generation by morph. description(s) * example: @@ -192,71 +187,43 @@ class LIBHUNSPELL_DLL_EXPORTED Hunspell { * int n = generate(&result, "word", &affix, 1); * for (int i = 0; i < n; i++) printf("%s\n", result[i]); */ - - int generate(char*** slst, const char* word, char** desc, int n); + std::vector<std::string> generate(const std::string& word, const std::vector<std::string>& pl); + H_DEPRECATED int generate(char*** slst, const char* word, char** desc, int n); /* functions for run-time modification of the dictionary */ /* add word to the run-time dictionary */ - int add(const char* word); + int add(const std::string& word); /* add word to the run-time dictionary with affix flags of * the example (a dictionary word): Hunspell will recognize * affixed forms of the new word, too. */ - int add_with_affix(const char* word, const char* example); + int add_with_affix(const std::string& word, const std::string& example); /* remove word from the run-time dictionary */ - int remove(const char* word); + int remove(const std::string& word); /* other */ /* get extra word characters definied in affix file for tokenization */ - const char* get_wordchars(); - const std::vector<w_char>& get_wordchars_utf16(); + const char* get_wordchars() const; + const std::string& get_wordchars_cpp() const; + const std::vector<w_char>& get_wordchars_utf16() const; struct cs_info* get_csconv(); - const char* get_version(); + + const char* get_version() const; + const std::string& get_version_cpp() const; int get_langnum() const; /* need for putdic */ - int input_conv(const char* word, char* dest, size_t destsize); - - inline char *get_try_string() - { - return pAMgr->get_try_string(); - } - - private: - void cleanword(std::string& dest, const char*, int* pcaptype, int* pabbrev); - size_t cleanword2(std::string& dest, - std::vector<w_char>& dest_u, - const char*, - int* w_len, - int* pcaptype, - size_t* pabbrev); - void mkinitcap(std::string& u8); - int mkinitcap2(std::string& u8, std::vector<w_char>& u16); - int mkinitsmall2(std::string& u8, std::vector<w_char>& u16); - void mkallcap(std::string& u8); - int mkallsmall2(std::string& u8, std::vector<w_char>& u16); - struct hentry* checkword(const char*, int* info, char** root); - std::string sharps_u8_l1(const std::string& source); - hentry* - spellsharps(std::string& base, size_t start_pos, int, int, int* info, char** root); - int is_keepcase(const hentry* rv); - int insert_sug(char*** slst, const char* word, int ns); - void cat_result(std::string& result, char* st); - char* stem_description(const char* desc); - int spellml(char*** slst, const char* word); - std::string get_xml_par(const char* par); - const char* get_xml_pos(const char* s, const char* attr); - int get_xml_list(char*** slst, const char* list, const char* tag); - int check_xml_par(const char* q, const char* attr, const char* value); + bool input_conv(const std::string& word, std::string& dest); + H_DEPRECATED int input_conv(const char* word, char* dest, size_t destsize); }; #endif diff --git a/libs/hunspell/src/hunspelldll.h b/libs/hunspell/src/hunspelldll.h deleted file mode 100644 index 32d168236a..0000000000 --- a/libs/hunspell/src/hunspelldll.h +++ /dev/null @@ -1,39 +0,0 @@ -/* ***** BEGIN LICENSE BLOCK *****
- * Version: MPL 1.1/GPL 2.0/LGPL 2.1
- *
- * The contents of this file are subject to the Mozilla Public License Version
- * 1.1 (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- * http://www.mozilla.org/MPL/
- *
- * Software distributed under the License is distributed on an "AS IS" basis,
- * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
- * for the specific language governing rights and limitations under the
- * License.
- *
- * Copyright (C) 2006
- * Miha Vrhovnik (http://simail.sf.net, http://xcollect.sf.net)
- * All Rights Reserved.
- *
- * Contributor(s):
- *
- *
- * Alternatively, the contents of this file may be used under the terms of
- * either the GNU General Public License Version 2 or later (the "GPL"), or
- * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
- * in which case the provisions of the GPL or the LGPL are applicable instead
- * of those above. If you wish to allow use of your version of this file only
- * under the terms of either the GPL or the LGPL, and not to allow others to
- * use your version of this file under the terms of the MPL, indicate your
- * decision by deleting the provisions above and replace them with the notice
- * and other provisions required by the GPL or the LGPL. If you do not delete
- * the provisions above, a recipient may use your version of this file under
- * the terms of any one of the MPL, the GPL or the LGPL.
- *
- * ***** END LICENSE BLOCK ***** **/
-#include "hunspell.hxx"
-
-#ifndef _DLL_H_
-#define _DLL_H_
-
-#endif /* _DLL_H_ */
diff --git a/libs/hunspell/src/hunvisapi.h b/libs/hunspell/src/hunvisapi.h index abf025ae97..eb2b348091 100644 --- a/libs/hunspell/src/hunvisapi.h +++ b/libs/hunspell/src/hunvisapi.h @@ -1,5 +1,5 @@ -#ifndef _HUNSPELL_VISIBILITY_H_ -#define _HUNSPELL_VISIBILITY_H_ +#ifndef HUNSPELL_VISIBILITY_H_ +#define HUNSPELL_VISIBILITY_H_ #if defined(HUNSPELL_STATIC) # define LIBHUNSPELL_DLL_EXPORTED @@ -9,7 +9,7 @@ # else # define LIBHUNSPELL_DLL_EXPORTED __declspec(dllimport) # endif -#elif defined(BUILDING_LIBHUNSPELL) && @HAVE_VISIBILITY@ +#elif defined(BUILDING_LIBHUNSPELL) && 1 # define LIBHUNSPELL_DLL_EXPORTED __attribute__((__visibility__("default"))) #else # define LIBHUNSPELL_DLL_EXPORTED diff --git a/libs/hunspell/src/hunzip.c++ b/libs/hunspell/src/hunzip.cxx index b2788a1055..8962b100b1 100644 --- a/libs/hunspell/src/hunzip.c++ +++ b/libs/hunspell/src/hunzip.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -59,7 +56,7 @@ int Hunzip::fail(const char* err, const char* par) { } Hunzip::Hunzip(const char* file, const char* key) - : fin(NULL), bufsiz(0), lastbit(0), inc(0), inbits(0), outc(0), dec(NULL) { + : bufsiz(0), lastbit(0), inc(0), inbits(0), outc(0) { in[0] = out[0] = line[0] = '\0'; filename = mystrdup(file); if (getcode(key) == -1) @@ -70,19 +67,19 @@ Hunzip::Hunzip(const char* file, const char* key) int Hunzip::getcode(const char* key) { unsigned char c[2]; - int i, j, n, p; + int i, j, n; int allocatedbit = BASEBITREC; const char* enc = key; if (!filename) return -1; - fin = myfopen(filename, "rb"); - if (!fin) + myopen(fin, filename, std::ios_base::in | std::ios_base::binary); + if (!fin.is_open()) return -1; // read magic number - if ((fread(in, 1, 3, fin) < MAGICLEN) || + if (!fin.read(in, 3) || !(strncmp(MAGIC, in, MAGICLEN) == 0 || strncmp(MAGIC_ENCRYPT, in, MAGICLEN) == 0)) { return fail(MSG_FORMAT, filename); @@ -93,7 +90,7 @@ int Hunzip::getcode(const char* key) { unsigned char cs; if (!key) return fail(MSG_KEY, filename); - if (fread(&c, 1, 1, fin) < 1) + if (!fin.read(reinterpret_cast<char*>(c), 1)) return fail(MSG_FORMAT, filename); for (cs = 0; *enc; enc++) cs ^= *enc; @@ -104,7 +101,7 @@ int Hunzip::getcode(const char* key) { key = NULL; // read record count - if (fread(&c, 1, 2, fin) < 2) + if (!fin.read(reinterpret_cast<char*>(c), 2)) return fail(MSG_FORMAT, filename); if (key) { @@ -115,16 +112,14 @@ int Hunzip::getcode(const char* key) { } n = ((int)c[0] << 8) + c[1]; - dec = (struct bit*)malloc(BASEBITREC * sizeof(struct bit)); - if (!dec) - return fail(MSG_MEMORY, filename); + dec.resize(BASEBITREC); dec[0].v[0] = 0; dec[0].v[1] = 0; // read codes for (i = 0; i < n; i++) { unsigned char l; - if (fread(c, 1, 2, fin) < 2) + if (!fin.read(reinterpret_cast<char*>(c), 2)) return fail(MSG_FORMAT, filename); if (key) { if (*(++enc) == '\0') @@ -134,14 +129,14 @@ int Hunzip::getcode(const char* key) { enc = key; c[1] ^= *enc; } - if (fread(&l, 1, 1, fin) < 1) + if (!fin.read(reinterpret_cast<char*>(&l), 1)) return fail(MSG_FORMAT, filename); if (key) { if (*(++enc) == '\0') enc = key; l ^= *enc; } - if (fread(in, 1, l / 8 + 1, fin) < (size_t)l / 8 + 1) + if (!fin.read(in, l / 8 + 1)) return fail(MSG_FORMAT, filename); if (key) for (j = 0; j <= l / 8; j++) { @@ -149,7 +144,7 @@ int Hunzip::getcode(const char* key) { enc = key; in[j] ^= *enc; } - p = 0; + int p = 0; for (j = 0; j < l; j++) { int b = (in[j / 8] & (1 << (7 - (j % 8)))) ? 1 : 0; int oldp = p; @@ -158,7 +153,7 @@ int Hunzip::getcode(const char* key) { lastbit++; if (lastbit == allocatedbit) { allocatedbit += BASEBITREC; - dec = (struct bit*)realloc(dec, allocatedbit * sizeof(struct bit)); + dec.resize(allocatedbit); } dec[lastbit].v[0] = 0; dec[lastbit].v[1] = 0; @@ -173,10 +168,6 @@ int Hunzip::getcode(const char* key) { } Hunzip::~Hunzip() { - if (dec) - free(dec); - if (fin) - fclose(fin); if (filename) free(filename); } @@ -185,16 +176,17 @@ int Hunzip::getbuf() { int p = 0; int o = 0; do { - if (inc == 0) - inbits = fread(in, 1, BUFSIZE, fin) * 8; + if (inc == 0) { + fin.read(in, BUFSIZE); + inbits = fin.gcount() * 8; + } for (; inc < inbits; inc++) { int b = (in[inc / 8] & (1 << (7 - (inc % 8)))) ? 1 : 0; int oldp = p; p = dec[p].v[b]; if (p == 0) { if (oldp == lastbit) { - fclose(fin); - fin = NULL; + fin.close(); // add last odd byte if (dec[lastbit].c[0]) out[o++] = dec[lastbit].c[1]; @@ -212,11 +204,11 @@ int Hunzip::getbuf() { return fail(MSG_FORMAT, filename); } -const char* Hunzip::getline() { +bool Hunzip::getline(std::string& dest) { char linebuf[BUFSIZE]; int l = 0, eol = 0, left = 0, right = 0; if (bufsiz == -1) - return NULL; + return false; while (l < bufsiz && !eol) { linebuf[l++] = out[outc]; switch (out[outc]) { @@ -251,7 +243,7 @@ const char* Hunzip::getline() { } if (++outc == bufsiz) { outc = 0; - bufsiz = fin ? getbuf() : -1; + bufsiz = fin.is_open() ? getbuf() : -1; } } if (right) @@ -259,5 +251,6 @@ const char* Hunzip::getline() { else linebuf[l] = '\0'; strcpy(line + left, linebuf); - return line; + dest.assign(line); + return true; } diff --git a/libs/hunspell/src/hunzip.hxx b/libs/hunspell/src/hunzip.hxx index 5082adddb0..ea2bc58d26 100644 --- a/libs/hunspell/src/hunzip.hxx +++ b/libs/hunspell/src/hunzip.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -41,12 +38,14 @@ /* hunzip: file decompression for sorted dictionaries with optional encryption, * algorithm: prefix-suffix encoding and 16-bit Huffman encoding */ -#ifndef _HUNZIP_HXX_ -#define _HUNZIP_HXX_ +#ifndef HUNZIP_HXX_ +#define HUNZIP_HXX_ #include "hunvisapi.h" #include <stdio.h> +#include <fstream> +#include <vector> #define BUFSIZE 65536 #define HZIP_EXTENSION ".hz" @@ -68,9 +67,9 @@ class LIBHUNSPELL_DLL_EXPORTED Hunzip { protected: char* filename; - FILE* fin; + std::ifstream fin; int bufsiz, lastbit, inc, inbits, outc; - struct bit* dec; // code table + std::vector<bit> dec; // code table char in[BUFSIZE]; // input buffer char out[BUFSIZE + 1]; // Huffman-decoded buffer char line[BUFSIZE + 50]; // decoded line @@ -81,7 +80,8 @@ class LIBHUNSPELL_DLL_EXPORTED Hunzip { public: Hunzip(const char* filename, const char* key = NULL); ~Hunzip(); - const char* getline(); + bool is_open() { return fin.is_open(); } + bool getline(std::string& dest); }; #endif diff --git a/libs/hunspell/src/langnum.hxx b/libs/hunspell/src/langnum.hxx index af5c86e4fe..a64d3d7869 100644 --- a/libs/hunspell/src/langnum.hxx +++ b/libs/hunspell/src/langnum.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -38,12 +35,12 @@ * * ***** END LICENSE BLOCK ***** */ -#ifndef _LANGNUM_HXX_ -#define _LANGNUM_HXX_ +#ifndef LANGNUM_HXX_ +#define LANGNUM_HXX_ /* language numbers for language specific codes - see http://l10n.openoffice.org/languages.html + see https://wiki.openoffice.org/w/index.php?title=Languages&oldid=230199 */ enum { diff --git a/libs/hunspell/src/phonet.c++ b/libs/hunspell/src/phonet.cxx index 17350e74a7..69601a2872 100644 --- a/libs/hunspell/src/phonet.c++ +++ b/libs/hunspell/src/phonet.cxx @@ -36,15 +36,13 @@ #include "phonet.hxx" void init_phonet_hash(phonetable& parms) { - int i, k; - - for (i = 0; i < HASHSIZE; i++) { + for (int i = 0; i < HASHSIZE; i++) { parms.hash[i] = -1; } - for (i = 0; parms.rules[i][0] != '\0'; i += 2) { + for (int i = 0; parms.rules[i][0] != '\0'; i += 2) { /** set hash value **/ - k = (unsigned char)parms.rules[i][0]; + int k = (unsigned char)parms.rules[i][0]; if (parms.hash[k] < 0) { parms.hash[k] = i; @@ -73,9 +71,8 @@ static int myisalpha(char ch) { std::string phonet(const std::string& inword, phonetable& parms) { int i, k = 0, p, z; - int k0, n0, p0 = -333, z0; + int k0, n0, p0 = -333; char c; - const char* s; typedef unsigned char uchar; size_t len = inword.size(); @@ -90,15 +87,15 @@ std::string phonet(const std::string& inword, phonetable& parms) { i = z = 0; while ((c = word[i]) != '\0') { int n = parms.hash[(uchar)c]; - z0 = 0; + int z0 = 0; - if (n >= 0) { + if (n >= 0 && !parms.rules[n].empty()) { /** check all rules for the same letter **/ while (parms.rules[n][0] == c) { /** check whole string **/ k = 1; /** number of found letters **/ p = 5; /** default priority **/ - s = parms.rules[n]; + const char*s = parms.rules[n].c_str(); s++; /** important for (see below) "*(s-1)" **/ while (*s != '\0' && word[i + k] == *s && !isdigit((unsigned char)*s) && @@ -142,13 +139,13 @@ std::string phonet(const std::string& inword, phonetable& parms) { n0 = parms.hash[(uchar)c0]; // if (parms.followup && k > 1 && n0 >= 0 - if (k > 1 && n0 >= 0 && p0 != (int)'-' && word[i + k] != '\0') { + if (k > 1 && n0 >= 0 && p0 != (int)'-' && word[i + k] != '\0' && !parms.rules[n0].empty()) { /** test follow-up rule for "word[i+k]" **/ while (parms.rules[n0][0] == c0) { /** check whole string **/ k0 = k; p0 = 5; - s = parms.rules[n0]; + s = parms.rules[n0].c_str(); s++; while (*s != '\0' && word[i + k0] == *s && !isdigit((unsigned char)*s) && @@ -206,9 +203,9 @@ std::string phonet(const std::string& inword, phonetable& parms) { } /** end of follow-up stuff **/ /** replace string **/ - s = parms.rules[n + 1]; - p0 = (parms.rules[n][0] != '\0' && - strchr(parms.rules[n] + 1, '<') != NULL) + s = parms.rules[n + 1].c_str(); + p0 = (!parms.rules[n].empty() && + strchr(parms.rules[n].c_str() + 1, '<') != NULL) ? 1 : 0; if (p0 == 1 && z == 0) { @@ -241,8 +238,8 @@ std::string phonet(const std::string& inword, phonetable& parms) { } /** new "actual letter" **/ c = *s; - if (parms.rules[n][0] != '\0' && - strstr(parms.rules[n] + 1, "^^") != NULL) { + if (!parms.rules[n].empty() && + strstr(parms.rules[n].c_str() + 1, "^^") != NULL) { if (c != '\0') { target.push_back(c); } @@ -257,8 +254,7 @@ std::string phonet(const std::string& inword, phonetable& parms) { } /** end of while (parms.rules[n][0] == c) **/ } /** end of if (n >= 0) **/ if (z0 == 0) { - if (k && !p0 && target.size() < len && c != '\0' && - (1 || target.empty() || target[target.size()-1] != c)) { + if (k && !p0 && target.size() < len && c != '\0') { /** condense only double letters **/ target.push_back(c); /// printf("\n setting \n"); diff --git a/libs/hunspell/src/phonet.hxx b/libs/hunspell/src/phonet.hxx index eb9fd0c628..2d58b3ba1b 100644 --- a/libs/hunspell/src/phonet.hxx +++ b/libs/hunspell/src/phonet.hxx @@ -27,8 +27,8 @@ Porting from Aspell to Hunspell using C-like structs */ -#ifndef __PHONETHXX__ -#define __PHONETHXX__ +#ifndef PHONET_HXX_ +#define PHONET_HXX_ #define HASHSIZE 256 #define MAXPHONETLEN 256 @@ -38,9 +38,7 @@ struct phonetable { char utf8; - cs_info* lang; - int num; - char** rules; + std::vector<std::string> rules; int hash[HASHSIZE]; }; diff --git a/libs/hunspell/src/replist.c++ b/libs/hunspell/src/replist.cxx index b3e6b37d20..cabe382bfd 100644 --- a/libs/hunspell/src/replist.c++ +++ b/libs/hunspell/src/replist.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -90,104 +87,110 @@ RepList::RepList(int n) { RepList::~RepList() { for (int i = 0; i < pos; i++) { - free(dat[i]->pattern); - free(dat[i]->pattern2); - free(dat[i]); + delete dat[i]; } free(dat); } -int RepList::get_pos() { - return pos; -} - replentry* RepList::item(int n) { return dat[n]; } -int RepList::near(const char* word) { +int RepList::find(const char* word) { int p1 = 0; - int p2 = pos; - while ((p2 - p1) > 1) { - int m = (p1 + p2) / 2; - int c = strcmp(word, dat[m]->pattern); - if (c <= 0) { - if (c < 0) - p2 = m; - else - p1 = p2 = m; - } else - p1 = m; + int p2 = pos - 1; + int ret = -1; + while (p1 <= p2) { + int m = ((unsigned)p1 + (unsigned)p2) >> 1; + int c = strncmp(word, dat[m]->pattern.c_str(), dat[m]->pattern.size()); + if (c < 0) + p2 = m - 1; + else if (c > 0) + p1 = m + 1; + else { // scan in the right half for a longer match + ret = m; + p1 = m + 1; + } } - return p1; + return ret; } -int RepList::match(const char* word, int n) { - if (strncmp(word, dat[n]->pattern, strlen(dat[n]->pattern)) == 0) - return strlen(dat[n]->pattern); - return 0; +std::string RepList::replace(const char* word, int ind, bool atstart) { + int type = atstart ? 1 : 0; + if (ind < 0) + return std::string(); + if (strlen(word) == dat[ind]->pattern.size()) + type = atstart ? 3 : 2; + while (type && dat[ind]->outstrings[type].empty()) + type = (type == 2 && !atstart) ? 0 : type - 1; + return dat[ind]->outstrings[type]; } -int RepList::add(char* pat1, char* pat2) { - if (pos >= size || pat1 == NULL || pat2 == NULL) +int RepList::add(const std::string& in_pat1, const std::string& pat2) { + if (pos >= size || in_pat1.empty() || pat2.empty()) { return 1; - replentry* r = (replentry*)malloc(sizeof(replentry)); + } + // analyse word context + int type = 0; + std::string pat1(in_pat1); + if (pat1[0] == '_') { + pat1.erase(0, 1); + type = 1; + } + if (!pat1.empty() && pat1[pat1.size() - 1] == '_') { + type = type + 2; + pat1.erase(pat1.size() - 1); + } + mystrrep(pat1, "_", " "); + + // find existing entry + int m = find(pat1.c_str()); + if (m >= 0 && dat[m]->pattern == pat1) { + // since already used + dat[m]->outstrings[type] = pat2; + mystrrep(dat[m]->outstrings[type], "_", " "); + return 0; + } + + // make a new entry if none exists + replentry* r = new replentry; if (r == NULL) return 1; - r->pattern = mystrrep(pat1, "_", " "); - r->pattern2 = mystrrep(pat2, "_", " "); - r->start = false; - r->end = false; + r->pattern = pat1; + r->outstrings[type] = pat2; + mystrrep(r->outstrings[type], "_", " "); dat[pos++] = r; - for (int i = pos - 1; i > 0; i--) { - r = dat[i]; - if (strcmp(r->pattern, dat[i - 1]->pattern) < 0) { + // sort to the right place in the list + int i; + for (i = pos - 1; i > 0; i--) { + if (strcmp(r->pattern.c_str(), dat[i - 1]->pattern.c_str()) < 0) { dat[i] = dat[i - 1]; - dat[i - 1] = r; } else break; } + dat[i] = r; return 0; } -int RepList::conv(const char* word, char* dest, size_t destsize) { - size_t stl = 0; - int change = 0; - for (size_t i = 0; i < strlen(word); i++) { - int n = near(word + i); - int l = match(word + i, n); - if (l) { - size_t replen = strlen(dat[n]->pattern2); - if (stl + replen >= destsize) - return -1; - strcpy(dest + stl, dat[n]->pattern2); - stl += replen; - i += l - 1; - change = 1; - } else { - if (stl + 1 >= destsize) - return -1; - dest[stl++] = word[i]; - } - } - dest[stl] = '\0'; - return change; -} - -bool RepList::conv(const char* word, std::string& dest) { +bool RepList::conv(const std::string& in_word, std::string& dest) { dest.clear(); + size_t wordlen = in_word.size(); + const char* word = in_word.c_str(); + bool change = false; - for (size_t i = 0; i < strlen(word); i++) { - int n = near(word + i); - int l = match(word + i, n); - if (l) { - dest.append(dat[n]->pattern2); - i += l - 1; + for (size_t i = 0; i < wordlen; ++i) { + int n = find(word + i); + std::string l = replace(word + i, n, i == 0); + if (!l.empty()) { + dest.append(l); + i += dat[n]->pattern.size() - 1; change = true; } else { dest.push_back(word[i]); } } + return change; } + diff --git a/libs/hunspell/src/replist.hxx b/libs/hunspell/src/replist.hxx index 59366e9e02..1e3efa4131 100644 --- a/libs/hunspell/src/replist.hxx +++ b/libs/hunspell/src/replist.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -72,17 +69,15 @@ */ /* string replacement list class */ -#ifndef _REPLIST_HXX_ -#define _REPLIST_HXX_ - -#include "hunvisapi.h" +#ifndef REPLIST_HXX_ +#define REPLIST_HXX_ #include "w_char.hxx" #include <string> #include <vector> -class LIBHUNSPELL_DLL_EXPORTED RepList { +class RepList { private: RepList(const RepList&); RepList& operator=(const RepList&); @@ -93,16 +88,13 @@ class LIBHUNSPELL_DLL_EXPORTED RepList { int pos; public: - RepList(int n); + explicit RepList(int n); ~RepList(); - int get_pos(); - int add(char* pat1, char* pat2); + int add(const std::string& pat1, const std::string& pat2); replentry* item(int n); -#undef near - int near(const char* word); - int match(const char* word, int n); - int conv(const char* word, char* dest, size_t destsize); - bool conv(const char* word, std::string& dest); + int find(const char* word); + std::string replace(const char* word, int n, bool atstart); + bool conv(const std::string& word, std::string& dest); }; #endif diff --git a/libs/hunspell/src/resource.h b/libs/hunspell/src/resource.h deleted file mode 100644 index e1df211357..0000000000 --- a/libs/hunspell/src/resource.h +++ /dev/null @@ -1,14 +0,0 @@ -//{{NO_DEPENDENCIES}}
-// Microsoft Visual C++ generated include file.
-// Used by hunspell.rc
-
-// Следующие стандартные значения для новых объектов
-//
-#ifdef APSTUDIO_INVOKED
-#ifndef APSTUDIO_READONLY_SYMBOLS
-#define _APS_NEXT_RESOURCE_VALUE 101
-#define _APS_NEXT_COMMAND_VALUE 40001
-#define _APS_NEXT_CONTROL_VALUE 1001
-#define _APS_NEXT_SYMED_VALUE 101
-#endif
-#endif
diff --git a/libs/hunspell/src/suggestmgr.c++ b/libs/hunspell/src/suggestmgr.cxx index 17becd7582..73ea91e3a3 100644 --- a/libs/hunspell/src/suggestmgr.c++ +++ b/libs/hunspell/src/suggestmgr.cxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -82,7 +79,7 @@ const w_char W_VLINE = {'\0', '|'}; -SuggestMgr::SuggestMgr(const char* tryme, int maxn, AffixMgr* aptr) { +SuggestMgr::SuggestMgr(const char* tryme, unsigned int maxn, AffixMgr* aptr) { // register affix manager and check in string of chars to // try when building candidate suggestions pAMgr = aptr; @@ -91,11 +88,9 @@ SuggestMgr::SuggestMgr(const char* tryme, int maxn, AffixMgr* aptr) { ckeyl = 0; ckey = NULL; - ckey_utf = NULL; ctryl = 0; ctry = NULL; - ctry_utf = NULL; utf8 = 0; langnum = 0; @@ -116,22 +111,14 @@ SuggestMgr::SuggestMgr(const char* tryme, int maxn, AffixMgr* aptr) { if (pAMgr->get_maxcpdsugs() >= 0) maxcpdsugs = pAMgr->get_maxcpdsugs(); if (!utf8) { - char* enc = pAMgr->get_encoding(); - csconv = get_current_cs(enc); - free(enc); + csconv = get_current_cs(pAMgr->get_encoding()); } complexprefixes = pAMgr->get_complexprefixes(); } if (ckey) { if (utf8) { - std::vector<w_char> t; - ckeyl = u8_u16(t, ckey); - ckey_utf = (w_char*)malloc(ckeyl * sizeof(w_char)); - if (ckey_utf) - memcpy(ckey_utf, &t[0], ckeyl * sizeof(w_char)); - else - ckeyl = 0; + ckeyl = u8_u16(ckey_utf, ckey); } else { ckeyl = strlen(ckey); } @@ -142,13 +129,7 @@ SuggestMgr::SuggestMgr(const char* tryme, int maxn, AffixMgr* aptr) { if (ctry) ctryl = strlen(ctry); if (ctry && utf8) { - std::vector<w_char> t; - ctryl = u8_u16(t, tryme); - ctry_utf = (w_char*)malloc(ctryl * sizeof(w_char)); - if (ctry_utf) - memcpy(ctry_utf, &t[0], ctryl * sizeof(w_char)); - else - ctryl = 0; + ctryl = u8_u16(ctry_utf, tryme); } } } @@ -158,16 +139,10 @@ SuggestMgr::~SuggestMgr() { if (ckey) free(ckey); ckey = NULL; - if (ckey_utf) - free(ckey_utf); - ckey_utf = NULL; ckeyl = 0; if (ctry) free(ctry); ctry = NULL; - if (ctry_utf) - free(ctry_utf); - ctry_utf = NULL; ctryl = 0; maxSug = 0; #ifdef MOZILLA_CLIENT @@ -175,50 +150,38 @@ SuggestMgr::~SuggestMgr() { #endif } -int SuggestMgr::testsug(char** wlst, - const char* candidate, - int wl, - int ns, +void SuggestMgr::testsug(std::vector<std::string>& wlst, + const std::string& candidate, int cpdsuggest, int* timer, clock_t* timelimit) { int cwrd = 1; - if (ns == maxSug) - return maxSug; - for (int k = 0; k < ns; k++) { - if (strcmp(candidate, wlst[k]) == 0) { + if (wlst.size() == maxSug) + return; + for (size_t k = 0; k < wlst.size(); ++k) { + if (wlst[k] == candidate) { cwrd = 0; break; } } - if ((cwrd) && checkword(candidate, wl, cpdsuggest, timer, timelimit)) { - wlst[ns] = mystrdup(candidate); - if (wlst[ns] == NULL) { - for (int j = 0; j < ns; j++) - free(wlst[j]); - return -1; - } - ns++; + if ((cwrd) && checkword(candidate, cpdsuggest, timer, timelimit)) { + wlst.push_back(candidate); } - return ns; } // generate suggestions for a misspelled word // pass in address of array of char * pointers // onlycompoundsug: probably bad suggestions (need for ngram sugs, too) - -int SuggestMgr::suggest(char*** slst, +void SuggestMgr::suggest(std::vector<std::string>& slst, const char* w, - int nsug, int* onlycompoundsug) { int nocompoundtwowords = 0; - char** wlst; std::vector<w_char> word_utf; int wl = 0; - int nsugorig = nsug; + size_t nsugorig = slst.size(); std::string w2; const char* word = w; - int oldSug = 0; + size_t oldSug = 0; // word reversing wrapper for complex prefixes if (complexprefixes) { @@ -230,22 +193,10 @@ int SuggestMgr::suggest(char*** slst, word = w2.c_str(); } - if (*slst) { - wlst = *slst; - } else { - wlst = (char**)malloc(maxSug * sizeof(char*)); - if (wlst == NULL) - return -1; - for (int i = 0; i < maxSug; i++) { - wlst[i] = NULL; - } - } - if (utf8) { wl = u8_u16(word_utf, word); if (wl == -1) { - *slst = wlst; - return nsug; + return; } } @@ -253,139 +204,131 @@ int SuggestMgr::suggest(char*** slst, cpdsuggest++) { // limit compound suggestion if (cpdsuggest > 0) - oldSug = nsug; + oldSug = slst.size(); // suggestions for an uppercase word (html -> HTML) - if ((nsug < maxSug) && (nsug > -1)) { - nsug = (utf8) ? capchars_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : capchars(wlst, word, nsug, cpdsuggest); + if (slst.size() < maxSug) { + if (utf8) + capchars_utf(slst, &word_utf[0], wl, cpdsuggest); + else + capchars(slst, word, cpdsuggest); } // perhaps we made a typical fault of spelling - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = replchars(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + replchars(slst, word, cpdsuggest); } // perhaps we made chose the wrong char from a related set - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = mapchars(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && + (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + mapchars(slst, word, cpdsuggest); } // only suggest compound words when no other suggestion - if ((cpdsuggest == 0) && (nsug > nsugorig)) + if ((cpdsuggest == 0) && (slst.size() > nsugorig)) nocompoundtwowords = 1; // did we swap the order of chars by mistake - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? swapchar_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : swapchar(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + swapchar_utf(slst, &word_utf[0], wl, cpdsuggest); + else + swapchar(slst, word, cpdsuggest); } // did we swap the order of non adjacent chars by mistake - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? longswapchar_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : longswapchar(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + longswapchar_utf(slst, &word_utf[0], wl, cpdsuggest); + else + longswapchar(slst, word, cpdsuggest); } // did we just hit the wrong key in place of a good char (case and keyboard) - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? badcharkey_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : badcharkey(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + badcharkey_utf(slst, &word_utf[0], wl, cpdsuggest); + else + badcharkey(slst, word, cpdsuggest); } // did we add a char that should not be there - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? extrachar_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : extrachar(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + extrachar_utf(slst, &word_utf[0], wl, cpdsuggest); + else + extrachar(slst, word, cpdsuggest); } // did we forgot a char - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? forgotchar_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : forgotchar(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + forgotchar_utf(slst, &word_utf[0], wl, cpdsuggest); + else + forgotchar(slst, word, cpdsuggest); } // did we move a char - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? movechar_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : movechar(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + movechar_utf(slst, &word_utf[0], wl, cpdsuggest); + else + movechar(slst, word, cpdsuggest); } // did we just hit the wrong key in place of a good char - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? badchar_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : badchar(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + badchar_utf(slst, &word_utf[0], wl, cpdsuggest); + else + badchar(slst, word, cpdsuggest); } // did we double two characters - if ((nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = (utf8) ? doubletwochars_utf(wlst, &word_utf[0], wl, nsug, cpdsuggest) - : doubletwochars(wlst, word, nsug, cpdsuggest); + if ((slst.size() < maxSug) && (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + if (utf8) + doubletwochars_utf(slst, &word_utf[0], wl, cpdsuggest); + else + doubletwochars(slst, word, cpdsuggest); } // perhaps we forgot to hit space and two words ran together - if (!nosplitsugs && (nsug < maxSug) && (nsug > -1) && - (!cpdsuggest || (nsug < oldSug + maxcpdsugs))) { - nsug = twowords(wlst, word, nsug, cpdsuggest); + if (!nosplitsugs && (slst.size() < maxSug) && + (!cpdsuggest || (slst.size() < oldSug + maxcpdsugs))) { + twowords(slst, word, cpdsuggest); } } // repeating ``for'' statement compounding support - if (nsug < 0) { - // we ran out of memory - we should free up as much as possible - for (int i = 0; i < maxSug; i++) - if (wlst[i] != NULL) - free(wlst[i]); - free(wlst); - wlst = NULL; - } - - if (!nocompoundtwowords && (nsug > 0) && onlycompoundsug) + if (!nocompoundtwowords && (!slst.empty()) && onlycompoundsug) *onlycompoundsug = 1; - - *slst = wlst; - return nsug; } // suggestions for an uppercase word (html -> HTML) -int SuggestMgr::capchars_utf(char** wlst, - const w_char* word, - int wl, - int ns, - int cpdsuggest) { +void SuggestMgr::capchars_utf(std::vector<std::string>& wlst, + const w_char* word, + int wl, + int cpdsuggest) { std::vector<w_char> candidate_utf(word, word + wl); mkallcap_utf(candidate_utf, langnum); std::string candidate; u16_u8(candidate, candidate_utf); - return testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } // suggestions for an uppercase word (html -> HTML) -int SuggestMgr::capchars(char** wlst, - const char* word, - int ns, - int cpdsuggest) { +void SuggestMgr::capchars(std::vector<std::string>& wlst, + const char* word, + int cpdsuggest) { std::string candidate(word); mkallcap(candidate, csconv); - return testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } // suggestions for when chose the wrong char out of a related set -int SuggestMgr::mapchars(char** wlst, +int SuggestMgr::mapchars(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate; clock_t timelimit; @@ -393,120 +336,108 @@ int SuggestMgr::mapchars(char** wlst, int wl = strlen(word); if (wl < 2 || !pAMgr) - return ns; + return wlst.size(); - int nummap = pAMgr->get_nummap(); - struct mapentry* maptable = pAMgr->get_maptable(); - if (maptable == NULL) - return ns; + const std::vector<mapentry>& maptable = pAMgr->get_maptable(); + if (maptable.empty()) + return wlst.size(); timelimit = clock(); timer = MINTIMER; - return map_related(word, candidate, 0, wlst, cpdsuggest, ns, - maptable, nummap, &timer, &timelimit); + return map_related(word, candidate, 0, wlst, cpdsuggest, + maptable, &timer, &timelimit); } int SuggestMgr::map_related(const char* word, std::string& candidate, int wn, - char** wlst, + std::vector<std::string>& wlst, int cpdsuggest, - int ns, - const mapentry* maptable, - int nummap, + const std::vector<mapentry>& maptable, int* timer, clock_t* timelimit) { if (*(word + wn) == '\0') { int cwrd = 1; - for (int m = 0; m < ns; m++) { - if (candidate == wlst[m]) { + for (size_t m = 0; m < wlst.size(); ++m) { + if (wlst[m] == candidate) { cwrd = 0; break; } } - if ((cwrd) && checkword(candidate.c_str(), candidate.size(), cpdsuggest, timer, timelimit)) { - if (ns < maxSug) { - wlst[ns] = mystrdup(candidate.c_str()); - if (wlst[ns] == NULL) - return -1; - ns++; + if ((cwrd) && checkword(candidate, cpdsuggest, timer, timelimit)) { + if (wlst.size() < maxSug) { + wlst.push_back(candidate); } } - return ns; + return wlst.size(); } int in_map = 0; - for (int j = 0; j < nummap; j++) { - for (int k = 0; k < maptable[j].len; k++) { - int len = strlen(maptable[j].set[k]); - if (strncmp(maptable[j].set[k], word + wn, len) == 0) { + for (size_t j = 0; j < maptable.size(); ++j) { + for (size_t k = 0; k < maptable[j].size(); ++k) { + size_t len = maptable[j][k].size(); + if (strncmp(maptable[j][k].c_str(), word + wn, len) == 0) { in_map = 1; size_t cn = candidate.size(); - for (int l = 0; l < maptable[j].len; l++) { + for (size_t l = 0; l < maptable[j].size(); ++l) { candidate.resize(cn); - candidate.append(maptable[j].set[l]); - ns = map_related(word, candidate, wn + len, wlst, - cpdsuggest, ns, maptable, nummap, timer, timelimit); + candidate.append(maptable[j][l]); + map_related(word, candidate, wn + len, wlst, + cpdsuggest, maptable, timer, timelimit); if (!(*timer)) - return ns; + return wlst.size(); } } } } if (!in_map) { candidate.push_back(*(word + wn)); - ns = map_related(word, candidate, wn + 1, wlst, cpdsuggest, ns, - maptable, nummap, timer, timelimit); + map_related(word, candidate, wn + 1, wlst, cpdsuggest, + maptable, timer, timelimit); } - return ns; + return wlst.size(); } // suggestions for a typical fault of spelling, that // differs with more, than 1 letter from the right form. -int SuggestMgr::replchars(char** wlst, +int SuggestMgr::replchars(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate; int wl = strlen(word); if (wl < 2 || !pAMgr) - return ns; - int numrep = pAMgr->get_numrep(); - struct replentry* reptable = pAMgr->get_reptable(); - if (reptable == NULL) - return ns; - for (int i = 0; i < numrep; i++) { + return wlst.size(); + const std::vector<replentry>& reptable = pAMgr->get_reptable(); + for (size_t i = 0; i < reptable.size(); ++i) { const char* r = word; // search every occurence of the pattern in the word - while ((r = strstr(r, reptable[i].pattern)) != NULL && - (!reptable[i].end || strlen(r) == strlen(reptable[i].pattern)) && - (!reptable[i].start || r == word)) { + while ((r = strstr(r, reptable[i].pattern.c_str())) != NULL) { + int type = (r == word) ? 1 : 0; + if (r - word + reptable[i].pattern.size() == strlen(word)) + type += 2; + while (type && reptable[i].outstrings[type].empty()) + type = (type == 2 && r != word) ? 0 : type - 1; + const std::string&out = reptable[i].outstrings[type]; + if (out.empty()) { + ++r; + continue; + } candidate.assign(word); candidate.resize(r - word); - candidate.append(reptable[i].pattern2); - int lenp = strlen(reptable[i].pattern); - candidate.append(r + lenp); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + candidate.append(reptable[i].outstrings[type]); + candidate.append(r + reptable[i].pattern.size()); + testsug(wlst, candidate, cpdsuggest, NULL, NULL); // check REP suggestions with space size_t sp = candidate.find(' '); if (sp != std::string::npos) { size_t prev = 0; while (sp != std::string::npos) { std::string prev_chunk = candidate.substr(prev, sp - prev); - if (checkword(prev_chunk.c_str(), prev_chunk.size(), 0, NULL, NULL)) { - int oldns = ns; + if (checkword(prev_chunk, 0, NULL, NULL)) { + size_t oldns = wlst.size(); std::string post_chunk = candidate.substr(sp + 1); - ns = testsug(wlst, post_chunk.c_str(), post_chunk.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; - if (oldns < ns) { - free(wlst[ns - 1]); - wlst[ns - 1] = mystrdup(candidate.c_str()); - if (!wlst[ns - 1]) - return -1; + testsug(wlst, post_chunk, cpdsuggest, NULL, NULL); + if (oldns < wlst.size()) { + wlst[wlst.size() - 1] = candidate; } } prev = sp + 1; @@ -516,47 +447,43 @@ int SuggestMgr::replchars(char** wlst, r++; // search for the next letter } } - return ns; + return wlst.size(); } // perhaps we doubled two characters (pattern aba -> ababa, for example vacation // -> vacacation) -int SuggestMgr::doubletwochars(char** wlst, +int SuggestMgr::doubletwochars(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { int state = 0; int wl = strlen(word); if (wl < 5 || !pAMgr) - return ns; + return wlst.size(); for (int i = 2; i < wl; i++) { if (word[i] == word[i - 2]) { state++; if (state == 3) { std::string candidate(word, word + i - 1); candidate.insert(candidate.end(), word + i + 1, word + wl); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); state = 0; } } else { state = 0; } } - return ns; + return wlst.size(); } // perhaps we doubled two characters (pattern aba -> ababa, for example vacation // -> vacacation) -int SuggestMgr::doubletwochars_utf(char** wlst, +int SuggestMgr::doubletwochars_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { int state = 0; if (wl < 5 || !pAMgr) - return ns; + return wlst.size(); for (int i = 2; i < wl; i++) { if (word[i] == word[i - 2]) { state++; @@ -565,24 +492,20 @@ int SuggestMgr::doubletwochars_utf(char** wlst, candidate_utf.insert(candidate_utf.end(), word + i + 1, word + wl); std::string candidate; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); state = 0; } } else { state = 0; } } - return ns; + return wlst.size(); } // error is wrong char in place of correct one (case and keyboard related // version) -int SuggestMgr::badcharkey(char** wlst, +int SuggestMgr::badcharkey(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate(word); @@ -593,9 +516,7 @@ int SuggestMgr::badcharkey(char** wlst, // check with uppercase letters candidate[i] = csconv[((unsigned char)tmpc)].cupper; if (tmpc != candidate[i]) { - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); candidate[i] = tmpc; } // check neighbor characters in keyboard string @@ -605,29 +526,24 @@ int SuggestMgr::badcharkey(char** wlst, while (loc) { if ((loc > ckey) && (*(loc - 1) != '|')) { candidate[i] = *(loc - 1); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } if ((*(loc + 1) != '|') && (*(loc + 1) != '\0')) { candidate[i] = *(loc + 1); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } loc = strchr(loc + 1, tmpc); } candidate[i] = tmpc; } - return ns; + return wlst.size(); } // error is wrong char in place of correct one (case and keyboard related // version) -int SuggestMgr::badcharkey_utf(char** wlst, +int SuggestMgr::badcharkey_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { std::string candidate; std::vector<w_char> candidate_utf(word, word + wl); @@ -639,73 +555,61 @@ int SuggestMgr::badcharkey_utf(char** wlst, candidate_utf[i] = upper_utf(candidate_utf[i], 1); if (tmpc != candidate_utf[i]) { u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); candidate_utf[i] = tmpc; } // check neighbor characters in keyboard string if (!ckey) continue; - w_char* loc = ckey_utf; - while ((loc < (ckey_utf + ckeyl)) && *loc != tmpc) - loc++; - while (loc < (ckey_utf + ckeyl)) { - if ((loc > ckey_utf) && *(loc - 1) != W_VLINE) { - candidate_utf[i] = *(loc - 1); + size_t loc = 0; + while ((loc < ckeyl) && ckey_utf[loc] != tmpc) + ++loc; + while (loc < ckeyl) { + if ((loc > 0) && ckey_utf[loc - 1] != W_VLINE) { + candidate_utf[i] = ckey_utf[loc - 1]; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } - if (((loc + 1) < (ckey_utf + ckeyl)) && (*(loc + 1) != W_VLINE)) { - candidate_utf[i] = *(loc + 1); + if (((loc + 1) < ckeyl) && (ckey_utf[loc + 1] != W_VLINE)) { + candidate_utf[i] = ckey_utf[loc + 1]; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } do { loc++; - } while ((loc < (ckey_utf + ckeyl)) && *loc != tmpc); + } while ((loc < ckeyl) && ckey_utf[loc] != tmpc); } candidate_utf[i] = tmpc; } - return ns; + return wlst.size(); } // error is wrong char in place of correct one -int SuggestMgr::badchar(char** wlst, const char* word, int ns, int cpdsuggest) { +int SuggestMgr::badchar(std::vector<std::string>& wlst, const char* word, int cpdsuggest) { std::string candidate(word); clock_t timelimit = clock(); int timer = MINTIMER; // swap out each char one by one and try all the tryme // chars in its place to see if that makes a good word - for (int j = 0; j < ctryl; j++) { + for (size_t j = 0; j < ctryl; ++j) { for (std::string::reverse_iterator aI = candidate.rbegin(), aEnd = candidate.rend(); aI != aEnd; ++aI) { char tmpc = *aI; if (ctry[j] == tmpc) continue; *aI = ctry[j]; - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, &timer, &timelimit); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, &timer, &timelimit); if (!timer) - return ns; + return wlst.size(); *aI = tmpc; } } - return ns; + return wlst.size(); } // error is wrong char in place of correct one -int SuggestMgr::badchar_utf(char** wlst, +int SuggestMgr::badchar_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { std::vector<w_char> candidate_utf(word, word + wl); std::string candidate; @@ -713,34 +617,30 @@ int SuggestMgr::badchar_utf(char** wlst, int timer = MINTIMER; // swap out each char one by one and try all the tryme // chars in its place to see if that makes a good word - for (int j = 0; j < ctryl; j++) { + for (size_t j = 0; j < ctryl; ++j) { for (int i = wl - 1; i >= 0; i--) { w_char tmpc = candidate_utf[i]; if (tmpc == ctry_utf[j]) continue; candidate_utf[i] = ctry_utf[j]; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, &timer, - &timelimit); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, &timer, &timelimit); if (!timer) - return ns; + return wlst.size(); candidate_utf[i] = tmpc; } } - return ns; + return wlst.size(); } // error is word has an extra letter it does not need -int SuggestMgr::extrachar_utf(char** wlst, +int SuggestMgr::extrachar_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { std::vector<w_char> candidate_utf(word, word + wl); if (candidate_utf.size() < 2) - return ns; + return wlst.size(); // try omitting one char of word at a time for (size_t i = 0; i < candidate_utf.size(); ++i) { size_t index = candidate_utf.size() - 1 - i; @@ -748,39 +648,33 @@ int SuggestMgr::extrachar_utf(char** wlst, candidate_utf.erase(candidate_utf.begin() + index); std::string candidate; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); candidate_utf.insert(candidate_utf.begin() + index, tmpc); } - return ns; + return wlst.size(); } // error is word has an extra letter it does not need -int SuggestMgr::extrachar(char** wlst, +int SuggestMgr::extrachar(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate(word); if (candidate.size() < 2) - return ns; + return wlst.size(); // try omitting one char of word at a time for (size_t i = 0; i < candidate.size(); ++i) { size_t index = candidate.size() - 1 - i; char tmpc = candidate[index]; candidate.erase(candidate.begin() + index); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); candidate.insert(candidate.begin() + index, tmpc); } - return ns; + return wlst.size(); } // error is missing a letter it needs -int SuggestMgr::forgotchar(char** wlst, +int SuggestMgr::forgotchar(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate(word); clock_t timelimit = clock(); @@ -788,26 +682,23 @@ int SuggestMgr::forgotchar(char** wlst, // try inserting a tryme character before every letter (and the null // terminator) - for (int k = 0; k < ctryl; ++k) { + for (size_t k = 0; k < ctryl; ++k) { for (size_t i = 0; i <= candidate.size(); ++i) { size_t index = candidate.size() - i; candidate.insert(candidate.begin() + index, ctry[k]); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, &timer, &timelimit); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, &timer, &timelimit); if (!timer) - return ns; + return wlst.size(); candidate.erase(candidate.begin() + index); } } - return ns; + return wlst.size(); } // error is missing a letter it needs -int SuggestMgr::forgotchar_utf(char** wlst, +int SuggestMgr::forgotchar_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { std::vector<w_char> candidate_utf(word, word + wl); clock_t timelimit = clock(); @@ -815,36 +706,32 @@ int SuggestMgr::forgotchar_utf(char** wlst, // try inserting a tryme character at the end of the word and before every // letter - for (int k = 0; k < ctryl; ++k) { + for (size_t k = 0; k < ctryl; ++k) { for (size_t i = 0; i <= candidate_utf.size(); ++i) { size_t index = candidate_utf.size() - i; candidate_utf.insert(candidate_utf.begin() + index, ctry_utf[k]); std::string candidate; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, &timer, - &timelimit); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, &timer, &timelimit); if (!timer) - return ns; + return wlst.size(); candidate_utf.erase(candidate_utf.begin() + index); } } - return ns; + return wlst.size(); } /* error is should have been two words */ -int SuggestMgr::twowords(char** wlst, +int SuggestMgr::twowords(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { - int c1, c2; + int c2; int forbidden = 0; int cwrd; int wl = strlen(word); if (wl < 3) - return ns; + return wlst.size(); if (langnum == LANG_hu) forbidden = check_forbidden(word, wl); @@ -864,9 +751,9 @@ int SuggestMgr::twowords(char** wlst, if (utf8 && p[1] == '\0') break; // last UTF-8 character *p = '\0'; - c1 = checkword(candidate, strlen(candidate), cpdsuggest, NULL, NULL); + int c1 = checkword(candidate, cpdsuggest, NULL, NULL); if (c1) { - c2 = checkword((p + 1), strlen(p + 1), cpdsuggest, NULL, NULL); + c2 = checkword((p + 1), cpdsuggest, NULL, NULL); if (c2) { *p = ' '; @@ -880,24 +767,19 @@ int SuggestMgr::twowords(char** wlst, *p = '-'; cwrd = 1; - for (int k = 0; k < ns; k++) { - if (strcmp(candidate, wlst[k]) == 0) { + for (size_t k = 0; k < wlst.size(); ++k) { + if (wlst[k] == candidate) { cwrd = 0; break; } } - if (ns < maxSug) { + if (wlst.size() < maxSug) { if (cwrd) { - wlst[ns] = mystrdup(candidate); - if (wlst[ns] == NULL) { - free(candidate); - return -1; - } - ns++; + wlst.push_back(candidate); } } else { free(candidate); - return ns; + return wlst.size(); } // add two word suggestion with dash, if TRY string contains // "a" or "-" @@ -905,48 +787,40 @@ int SuggestMgr::twowords(char** wlst, if (ctry && (strchr(ctry, 'a') || strchr(ctry, '-')) && mystrlen(p + 1) > 1 && mystrlen(candidate) - mystrlen(p) > 1) { *p = '-'; - for (int k = 0; k < ns; k++) { - if (strcmp(candidate, wlst[k]) == 0) { + for (size_t k = 0; k < wlst.size(); ++k) { + if (wlst[k] == candidate) { cwrd = 0; break; } } - if (ns < maxSug) { + if (wlst.size() < maxSug) { if (cwrd) { - wlst[ns] = mystrdup(candidate); - if (wlst[ns] == NULL) { - free(candidate); - return -1; - } - ns++; + wlst.push_back(candidate); } } else { free(candidate); - return ns; + return wlst.size(); } } } } } free(candidate); - return ns; + return wlst.size(); } // error is adjacent letter were swapped -int SuggestMgr::swapchar(char** wlst, +int SuggestMgr::swapchar(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate(word); if (candidate.size() < 2) - return ns; + return wlst.size(); // try swapping adjacent chars one by one for (size_t i = 0; i < candidate.size() - 1; ++i) { std::swap(candidate[i], candidate[i+1]); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); std::swap(candidate[i], candidate[i+1]); } @@ -958,40 +832,33 @@ int SuggestMgr::swapchar(char** wlst, candidate[2] = word[2]; candidate[candidate.size() - 2] = word[candidate.size() - 1]; candidate[candidate.size() - 1] = word[candidate.size() - 2]; - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); if (candidate.size() == 5) { candidate[0] = word[0]; candidate[1] = word[2]; candidate[2] = word[1]; - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } } - return ns; + return wlst.size(); } // error is adjacent letter were swapped -int SuggestMgr::swapchar_utf(char** wlst, +int SuggestMgr::swapchar_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { std::vector<w_char> candidate_utf(word, word + wl); if (candidate_utf.size() < 2) - return ns; + return wlst.size(); std::string candidate; // try swapping adjacent chars one by one for (size_t i = 0; i < candidate_utf.size() - 1; ++i) { std::swap(candidate_utf[i], candidate_utf[i+1]); u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); std::swap(candidate_utf[i], candidate_utf[i+1]); } @@ -1004,76 +871,64 @@ int SuggestMgr::swapchar_utf(char** wlst, candidate_utf[candidate_utf.size() - 2] = word[candidate_utf.size() - 1]; candidate_utf[candidate_utf.size() - 1] = word[candidate_utf.size() - 2]; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); if (candidate_utf.size() == 5) { candidate_utf[0] = word[0]; candidate_utf[1] = word[2]; candidate_utf[2] = word[1]; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } } - return ns; + return wlst.size(); } // error is not adjacent letter were swapped -int SuggestMgr::longswapchar(char** wlst, +int SuggestMgr::longswapchar(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate(word); // try swapping not adjacent chars one by one for (std::string::iterator p = candidate.begin(); p < candidate.end(); ++p) { for (std::string::iterator q = candidate.begin(); q < candidate.end(); ++q) { - if (abs(std::distance(q, p)) > 1) { + if (std::abs(std::distance(q, p)) > 1) { std::swap(*p, *q); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); std::swap(*p, *q); } } } - return ns; + return wlst.size(); } // error is adjacent letter were swapped -int SuggestMgr::longswapchar_utf(char** wlst, +int SuggestMgr::longswapchar_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { std::vector<w_char> candidate_utf(word, word + wl); // try swapping not adjacent chars for (std::vector<w_char>::iterator p = candidate_utf.begin(); p < candidate_utf.end(); ++p) { for (std::vector<w_char>::iterator q = candidate_utf.begin(); q < candidate_utf.end(); ++q) { - if (abs(std::distance(q, p)) > 1) { + if (std::abs(std::distance(q, p)) > 1) { std::swap(*p, *q); std::string candidate; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); std::swap(*p, *q); } } } - return ns; + return wlst.size(); } // error is a letter was moved -int SuggestMgr::movechar(char** wlst, +int SuggestMgr::movechar(std::vector<std::string>& wlst, const char* word, - int ns, int cpdsuggest) { std::string candidate(word); if (candidate.size() < 2) - return ns; + return wlst.size(); // try moving a char for (std::string::iterator p = candidate.begin(); p < candidate.end(); ++p) { @@ -1081,9 +936,7 @@ int SuggestMgr::movechar(char** wlst, std::swap(*q, *(q - 1)); if (std::distance(p, q) < 2) continue; // omit swap char - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } std::copy(word, word + candidate.size(), candidate.begin()); } @@ -1093,25 +946,22 @@ int SuggestMgr::movechar(char** wlst, std::swap(*q, *(q - 1)); if (std::distance(p, q) < 2) continue; // omit swap char - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } std::copy(word, word + candidate.size(), candidate.begin()); } - return ns; + return wlst.size(); } // error is a letter was moved -int SuggestMgr::movechar_utf(char** wlst, +int SuggestMgr::movechar_utf(std::vector<std::string>& wlst, const w_char* word, int wl, - int ns, int cpdsuggest) { std::vector<w_char> candidate_utf(word, word + wl); if (candidate_utf.size() < 2) - return ns; + return wlst.size(); // try moving a char for (std::vector<w_char>::iterator p = candidate_utf.begin(); p < candidate_utf.end(); ++p) { @@ -1121,39 +971,30 @@ int SuggestMgr::movechar_utf(char** wlst, continue; // omit swap char std::string candidate; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } std::copy(word, word + candidate_utf.size(), candidate_utf.begin()); } - for (std::vector<w_char>::iterator p = candidate_utf.begin() + candidate_utf.size() - 1; p > candidate_utf.begin(); --p) { - for (std::vector<w_char>::iterator q = p - 1; q >= candidate_utf.begin() && std::distance(q, p) < 10; --q) { - std::swap(*q, *(q + 1)); - if (std::distance(q, p) < 2) + for (std::vector<w_char>::reverse_iterator p = candidate_utf.rbegin(); p < candidate_utf.rend(); ++p) { + for (std::vector<w_char>::reverse_iterator q = p + 1; q < candidate_utf.rend() && std::distance(p, q) < 10; ++q) { + std::swap(*q, *(q - 1)); + if (std::distance(p, q) < 2) continue; // omit swap char std::string candidate; u16_u8(candidate, candidate_utf); - ns = testsug(wlst, candidate.c_str(), candidate.size(), ns, cpdsuggest, NULL, - NULL); - if (ns == -1) - return -1; + testsug(wlst, candidate, cpdsuggest, NULL, NULL); } std::copy(word, word + candidate_utf.size(), candidate_utf.begin()); } - return ns; + return wlst.size(); } // generate a set of suggestions for very poorly spelled words -int SuggestMgr::ngsuggest(char** wlst, +void SuggestMgr::ngsuggest(std::vector<std::string>& wlst, const char* w, - int ns, - HashMgr** pHMgr, - int md) { - int i, j; + const std::vector<HashMgr*>& rHMgr) { int lval; int sc; int lp, lpphon; @@ -1165,7 +1006,7 @@ int SuggestMgr::ngsuggest(char** wlst, char* rootsphon[MAX_ROOTS]; int scores[MAX_ROOTS]; int scoresphon[MAX_ROOTS]; - for (i = 0; i < MAX_ROOTS; i++) { + for (int i = 0; i < MAX_ROOTS; i++) { roots[i] = NULL; scores[i] = -100 * i; rootsphon[i] = NULL; @@ -1206,12 +1047,12 @@ int SuggestMgr::ngsuggest(char** wlst, phonetable* ph = (pAMgr) ? pAMgr->get_phonetable() : NULL; std::string target; std::string candidate; + std::vector<w_char> w_candidate; if (ph) { if (utf8) { - std::vector<w_char> _w; - u8_u16(_w, word); - mkallcap_utf(_w, langnum); - u16_u8(candidate, _w); + u8_u16(w_candidate, word); + mkallcap_utf(w_candidate, langnum); + u16_u8(candidate, w_candidate); } else { candidate.assign(word); if (!nonbmp) @@ -1225,8 +1066,17 @@ int SuggestMgr::ngsuggest(char** wlst, FLAG nongramsuggest = pAMgr ? pAMgr->get_nongramsuggest() : FLAG_NULL; FLAG onlyincompound = pAMgr ? pAMgr->get_onlyincompound() : FLAG_NULL; - for (i = 0; i < md; i++) { - while (0 != (hp = (pHMgr[i])->walk_hashtable(col, hp))) { + std::vector<w_char> w_word, w_target; + if (utf8) { + u8_u16(w_word, word); + u8_u16(w_target, target); + } + + std::string f; + std::vector<w_char> w_f; + + for (size_t i = 0; i < rHMgr.size(); ++i) { + while (0 != (hp = rHMgr[i]->walk_hashtable(col, hp))) { if ((hp->astr) && (pAMgr) && (TESTAFF(hp->astr, forbiddenword, hp->alen) || TESTAFF(hp->astr, ONLYUPCASEFLAG, hp->alen) || @@ -1235,15 +1085,48 @@ int SuggestMgr::ngsuggest(char** wlst, TESTAFF(hp->astr, onlyincompound, hp->alen))) continue; - sc = ngram(3, word, HENTRY_WORD(hp), NGRAM_LONGER_WORSE + low) + - leftcommonsubstring(word, HENTRY_WORD(hp)); + if (utf8) { + u8_u16(w_f, HENTRY_WORD(hp)); + + int leftcommon = leftcommonsubstring(w_word, w_f); + if (low) { + // lowering dictionary word + mkallsmall_utf(w_f, langnum); + } + sc = ngram(3, w_word, w_f, NGRAM_LONGER_WORSE) + leftcommon; + } else { + f.assign(HENTRY_WORD(hp)); + + int leftcommon = leftcommonsubstring(word, f.c_str()); + if (low) { + // lowering dictionary word + mkallsmall(f, csconv); + } + sc = ngram(3, word, f, NGRAM_LONGER_WORSE) + leftcommon; + } // check special pronounciation - std::string f; + f.clear(); if ((hp->var & H_OPT_PHON) && copy_field(f, HENTRY_DATA(hp), MORPH_PHON)) { - int sc2 = ngram(3, word, f, NGRAM_LONGER_WORSE + low) + - +leftcommonsubstring(word, f.c_str()); + int sc2; + if (utf8) { + u8_u16(w_f, f); + + int leftcommon = leftcommonsubstring(w_word, w_f); + if (low) { + // lowering dictionary word + mkallsmall_utf(w_f, langnum); + } + sc2 = ngram(3, w_word, w_f, NGRAM_LONGER_WORSE) + leftcommon; + } else { + int leftcommon = leftcommonsubstring(word, f.c_str()); + if (low) { + // lowering dictionary word + mkallsmall(f, csconv); + } + sc2 = ngram(3, word, f, NGRAM_LONGER_WORSE) + leftcommon; + } if (sc2 > sc) sc = sc2; } @@ -1251,23 +1134,29 @@ int SuggestMgr::ngsuggest(char** wlst, int scphon = -20000; if (ph && (sc > 2) && (abs(n - (int)hp->clen) <= 3)) { if (utf8) { - std::vector<w_char> _w; - u8_u16(_w, HENTRY_WORD(hp)); - mkallcap_utf(_w, langnum); - u16_u8(candidate, _w); + u8_u16(w_candidate, HENTRY_WORD(hp)); + mkallcap_utf(w_candidate, langnum); + u16_u8(candidate, w_candidate); } else { - candidate.assign(HENTRY_WORD(hp)); + candidate = HENTRY_WORD(hp); mkallcap(candidate, csconv); } - std::string target2 = phonet(candidate, *ph); - scphon = 2 * ngram(3, target, target2, NGRAM_LONGER_WORSE); + f = phonet(candidate, *ph); + if (utf8) { + u8_u16(w_f, f); + scphon = 2 * ngram(3, w_target, w_f, + NGRAM_LONGER_WORSE); + } else { + scphon = 2 * ngram(3, target, f, + NGRAM_LONGER_WORSE); + } } if (sc > scores[lp]) { scores[lp] = sc; roots[lp] = hp; lval = sc; - for (j = 0; j < MAX_ROOTS; j++) + for (int j = 0; j < MAX_ROOTS; j++) if (scores[j] < lval) { lp = j; lval = scores[j]; @@ -1278,7 +1167,7 @@ int SuggestMgr::ngsuggest(char** wlst, scoresphon[lpphon] = scphon; rootsphon[lpphon] = HENTRY_WORD(hp); lval = scphon; - for (j = 0; j < MAX_ROOTS; j++) + for (int j = 0; j < MAX_ROOTS; j++) if (scoresphon[j] < lval) { lpphon = j; lval = scoresphon[j]; @@ -1290,21 +1179,33 @@ int SuggestMgr::ngsuggest(char** wlst, // find minimum threshold for a passable suggestion // mangle original word three differnt ways // and score them to generate a minimum acceptable score + std::vector<w_char> w_mw; int thresh = 0; for (int sp = 1; sp < 4; sp++) { if (utf8) { + w_mw = w_word; for (int k = sp; k < n; k += 4) { - u8[k].l = '*'; - u8[k].h = 0; + w_mw[k].l = '*'; + w_mw[k].h = 0; } - std::string mw; - u16_u8(mw, u8); - thresh = thresh + ngram(n, word, mw, NGRAM_ANY_MISMATCH + low); + + if (low) { + // lowering dictionary word + mkallsmall_utf(w_mw, langnum); + } + + thresh += ngram(n, w_word, w_mw, NGRAM_ANY_MISMATCH); } else { - std::string mw(word); + std::string mw = word; for (int k = sp; k < n; k += 4) mw[k] = '*'; - thresh = thresh + ngram(n, word, mw, NGRAM_ANY_MISMATCH + low); + + if (low) { + // lowering dictionary word + mkallsmall(mw, csconv); + } + + thresh += ngram(n, word, mw, NGRAM_ANY_MISMATCH); } } thresh = thresh / 3; @@ -1316,7 +1217,7 @@ int SuggestMgr::ngsuggest(char** wlst, char* guess[MAX_GUESS]; char* guessorig[MAX_GUESS]; int gscore[MAX_GUESS]; - for (i = 0; i < MAX_GUESS; i++) { + for (int i = 0; i < MAX_GUESS; i++) { guess[i] = NULL; guessorig[i] = NULL; gscore[i] = -100 * i; @@ -1329,14 +1230,14 @@ int SuggestMgr::ngsuggest(char** wlst, if (!glst) { if (nonbmp) utf8 = 1; - return ns; + return; } - for (i = 0; i < MAX_ROOTS; i++) { + for (int i = 0; i < MAX_ROOTS; i++) { if (roots[i]) { struct hentry* rp = roots[i]; - std::string f; + f.clear(); const char *field = NULL; if ((rp->var & H_OPT_PHON) && copy_field(f, HENTRY_DATA(rp), MORPH_PHON)) field = f.c_str(); @@ -1345,8 +1246,27 @@ int SuggestMgr::ngsuggest(char** wlst, nc, field); for (int k = 0; k < nw; k++) { - sc = ngram(n, word, glst[k].word, NGRAM_ANY_MISMATCH + low) + - leftcommonsubstring(word, glst[k].word); + if (utf8) { + u8_u16(w_f, glst[k].word); + + int leftcommon = leftcommonsubstring(w_word, w_f); + if (low) { + // lowering dictionary word + mkallsmall_utf(w_f, langnum); + } + + sc = ngram(n, w_word, w_f, NGRAM_ANY_MISMATCH) + leftcommon; + } else { + f = glst[k].word; + + int leftcommon = leftcommonsubstring(word, f.c_str()); + if (low) { + // lowering dictionary word + mkallsmall(f, csconv); + } + + sc = ngram(n, word, f, NGRAM_ANY_MISMATCH) + leftcommon; + } if (sc > thresh) { if (sc > gscore[lp]) { @@ -1361,7 +1281,7 @@ int SuggestMgr::ngsuggest(char** wlst, guess[lp] = glst[k].word; guessorig[lp] = glst[k].orig; lval = sc; - for (j = 0; j < MAX_GUESS; j++) + for (int j = 0; j < MAX_GUESS; j++) if (gscore[j] < lval) { lp = j; lval = gscore[j]; @@ -1400,16 +1320,16 @@ int SuggestMgr::ngsuggest(char** wlst, fact = (10.0 - maxd) / 5.0; } - for (i = 0; i < MAX_GUESS; i++) { + std::vector<w_char> w_gl; + for (int i = 0; i < MAX_GUESS; i++) { if (guess[i]) { // lowering guess[i] std::string gl; int len; if (utf8) { - std::vector<w_char> _w; - len = u8_u16(_w, guess[i]); - mkallsmall_utf(_w, langnum); - u16_u8(gl, _w); + len = u8_u16(w_gl, guess[i]); + mkallsmall_utf(w_gl, langnum); + u16_u8(gl, w_gl); } else { gl.assign(guess[i]); if (!nonbmp) @@ -1426,14 +1346,46 @@ int SuggestMgr::ngsuggest(char** wlst, } // using 2-gram instead of 3, and other weightening - re = ngram(2, word, gl, NGRAM_ANY_MISMATCH + low + NGRAM_WEIGHTED) + - ngram(2, gl, word, NGRAM_ANY_MISMATCH + low + NGRAM_WEIGHTED); + if (utf8) { + u8_u16(w_gl, gl); + //w_gl is lowercase already at this point + re = ngram(2, w_word, w_gl, NGRAM_ANY_MISMATCH + NGRAM_WEIGHTED); + if (low) { + w_f = w_word; + // lowering dictionary word + mkallsmall_utf(w_f, langnum); + re += ngram(2, w_gl, w_f, NGRAM_ANY_MISMATCH + NGRAM_WEIGHTED); + } else { + re += ngram(2, w_gl, w_word, NGRAM_ANY_MISMATCH + NGRAM_WEIGHTED); + } + } else { + //gl is lowercase already at this point + re = ngram(2, word, gl, NGRAM_ANY_MISMATCH + NGRAM_WEIGHTED); + if (low) { + f = word; + // lowering dictionary word + mkallsmall(f, csconv); + re += ngram(2, gl, f, NGRAM_ANY_MISMATCH + NGRAM_WEIGHTED); + } else { + re += ngram(2, gl, word, NGRAM_ANY_MISMATCH + NGRAM_WEIGHTED); + } + } + int ngram_score, leftcommon_score; + if (utf8) { + //w_gl is lowercase already at this point + ngram_score = ngram(4, w_word, w_gl, NGRAM_ANY_MISMATCH); + leftcommon_score = leftcommonsubstring(w_word, w_gl); + } else { + //gl is lowercase already at this point + ngram_score = ngram(4, word, gl, NGRAM_ANY_MISMATCH); + leftcommon_score = leftcommonsubstring(word, gl.c_str()); + } gscore[i] = // length of longest common subsequent minus length difference 2 * _lcs - abs((int)(n - len)) + // weight length of the left common substring - leftcommonsubstring(word, gl.c_str()) + + leftcommon_score + // weight equal character positions (!nonbmp && commoncharacterpositions(word, gl.c_str(), &is_swap) ? 1 @@ -1441,7 +1393,7 @@ int SuggestMgr::ngsuggest(char** wlst, // swap character (not neighboring) ((is_swap) ? 10 : 0) + // ngram - ngram(4, word, gl, NGRAM_ANY_MISMATCH + low) + + ngram_score + // weighted ngrams re + // different limit for dictionaries with PHONE rules @@ -1454,16 +1406,15 @@ int SuggestMgr::ngsuggest(char** wlst, // phonetic version if (ph) - for (i = 0; i < MAX_ROOTS; i++) { + for (int i = 0; i < MAX_ROOTS; i++) { if (rootsphon[i]) { // lowering rootphon[i] std::string gl; int len; if (utf8) { - std::vector<w_char> _w; - len = u8_u16(_w, rootsphon[i]); - mkallsmall_utf(_w, langnum); - u16_u8(gl, _w); + len = u8_u16(w_gl, rootsphon[i]); + mkallsmall_utf(w_gl, langnum); + u16_u8(gl, w_gl); } else { gl.assign(rootsphon[i]); if (!nonbmp) @@ -1471,10 +1422,15 @@ int SuggestMgr::ngsuggest(char** wlst, len = strlen(rootsphon[i]); } + // weight length of the left common substring + int leftcommon_score; + if (utf8) + leftcommon_score = leftcommonsubstring(w_word, w_gl); + else + leftcommon_score = leftcommonsubstring(word, gl.c_str()); // heuristic weigthing of ngram scores scoresphon[i] += 2 * lcslen(word, gl) - abs((int)(n - len)) + - // weight length of the left common substring - leftcommonsubstring(word, gl.c_str()); + leftcommon_score; } } @@ -1482,12 +1438,12 @@ int SuggestMgr::ngsuggest(char** wlst, bubblesort(&rootsphon[0], NULL, &scoresphon[0], MAX_ROOTS); // copy over - int oldns = ns; + size_t oldns = wlst.size(); int same = 0; - for (i = 0; i < MAX_GUESS; i++) { + for (int i = 0; i < MAX_GUESS; i++) { if (guess[i]) { - if ((ns < oldns + maxngramsugs) && (ns < maxSug) && + if ((wlst.size() < oldns + maxngramsugs) && (wlst.size() < maxSug) && (!same || (gscore[i] > 1000))) { int unique = 1; // leave only excellent suggestions, if exists @@ -1496,35 +1452,34 @@ int SuggestMgr::ngsuggest(char** wlst, else if (gscore[i] < -100) { same = 1; // keep the best ngram suggestions, unless in ONLYMAXDIFF mode - if (ns > oldns || (pAMgr && pAMgr->get_onlymaxdiff())) { + if (wlst.size() > oldns || (pAMgr && pAMgr->get_onlymaxdiff())) { free(guess[i]); if (guessorig[i]) free(guessorig[i]); continue; } } - for (j = 0; j < ns; j++) { + for (size_t j = 0; j < wlst.size(); ++j) { // don't suggest previous suggestions or a previous suggestion with // prefixes or affixes - if ((!guessorig[i] && strstr(guess[i], wlst[j])) || - (guessorig[i] && strstr(guessorig[i], wlst[j])) || + if ((!guessorig[i] && strstr(guess[i], wlst[j].c_str())) || + (guessorig[i] && strstr(guessorig[i], wlst[j].c_str())) || // check forbidden words - !checkword(guess[i], strlen(guess[i]), 0, NULL, NULL)) { + !checkword(guess[i], 0, NULL, NULL)) { unique = 0; break; } } if (unique) { - wlst[ns++] = guess[i]; if (guessorig[i]) { - free(guess[i]); - wlst[ns - 1] = guessorig[i]; + wlst.push_back(guessorig[i]); + } else { + wlst.push_back(guess[i]); } - } else { - free(guess[i]); - if (guessorig[i]) - free(guessorig[i]); } + free(guess[i]); + if (guessorig[i]) + free(guessorig[i]); } else { free(guess[i]); if (guessorig[i]) @@ -1533,26 +1488,24 @@ int SuggestMgr::ngsuggest(char** wlst, } } - oldns = ns; + oldns = wlst.size(); if (ph) - for (i = 0; i < MAX_ROOTS; i++) { + for (int i = 0; i < MAX_ROOTS; i++) { if (rootsphon[i]) { - if ((ns < oldns + MAXPHONSUGS) && (ns < maxSug)) { + if ((wlst.size() < oldns + MAXPHONSUGS) && (wlst.size() < maxSug)) { int unique = 1; - for (j = 0; j < ns; j++) { + for (size_t j = 0; j < wlst.size(); ++j) { // don't suggest previous suggestions or a previous suggestion with // prefixes or affixes - if (strstr(rootsphon[i], wlst[j]) || + if (strstr(rootsphon[i], wlst[j].c_str()) || // check forbidden words - !checkword(rootsphon[i], strlen(rootsphon[i]), 0, NULL, NULL)) { + !checkword(rootsphon[i], 0, NULL, NULL)) { unique = 0; break; } } if (unique) { - wlst[ns++] = mystrdup(rootsphon[i]); - if (!wlst[ns - 1]) - return ns - 1; + wlst.push_back(rootsphon[i]); } } } @@ -1560,7 +1513,6 @@ int SuggestMgr::ngsuggest(char** wlst, if (nonbmp) utf8 = 1; - return ns; } // see if a candidate suggestion is spelled correctly @@ -1569,15 +1521,10 @@ int SuggestMgr::ngsuggest(char** wlst, // obsolote MySpell-HU modifications: // return value 2 and 3 marks compounding with hyphen (-) // `3' marks roots without suffix -int SuggestMgr::checkword(const char* word, - int len, +int SuggestMgr::checkword(const std::string& word, int cpdsuggest, int* timer, clock_t* timelimit) { - struct hentry* rv = NULL; - struct hentry* rv2 = NULL; - int nosuffix = 0; - // check time limit if (timer) { (*timer)--; @@ -1589,13 +1536,16 @@ int SuggestMgr::checkword(const char* word, } if (pAMgr) { + struct hentry* rv = NULL; + int nosuffix = 0; + if (cpdsuggest == 1) { if (pAMgr->get_compound()) { + struct hentry* rv2 = NULL; struct hentry* rwords[100]; // buffer for COMPOUND pattern checking - rv = pAMgr->compound_check(word, len, 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, 1, - 0); // EXT + rv = pAMgr->compound_check(word, 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, 1, 0); // EXT if (rv && - (!(rv2 = pAMgr->lookup(word)) || !rv2->astr || + (!(rv2 = pAMgr->lookup(word.c_str())) || !rv2->astr || !(TESTAFF(rv2->astr, pAMgr->get_forbiddenword(), rv2->alen) || TESTAFF(rv2->astr, pAMgr->get_nosuggest(), rv2->alen)))) return 3; // XXX obsolote categorisation + only ICONV needs affix @@ -1604,7 +1554,7 @@ int SuggestMgr::checkword(const char* word, return 0; } - rv = pAMgr->lookup(word); + rv = pAMgr->lookup(word.c_str()); if (rv) { if ((rv->astr) && @@ -1621,20 +1571,20 @@ int SuggestMgr::checkword(const char* word, break; } } else - rv = pAMgr->prefix_check(word, len, + rv = pAMgr->prefix_check(word.c_str(), word.size(), 0); // only prefix, and prefix + suffix XXX if (rv) { nosuffix = 1; } else { - rv = pAMgr->suffix_check(word, len, 0, NULL, NULL, 0, - NULL); // only suffix + rv = pAMgr->suffix_check(word.c_str(), word.size(), 0, NULL, + FLAG_NULL, FLAG_NULL, IN_CPD_NOT); // only suffix } if (!rv && pAMgr->have_contclass()) { - rv = pAMgr->suffix_check_twosfx(word, len, 0, NULL, FLAG_NULL); + rv = pAMgr->suffix_check_twosfx(word.c_str(), word.size(), 0, NULL, FLAG_NULL); if (!rv) - rv = pAMgr->prefix_check_twosfx(word, len, 1, FLAG_NULL); + rv = pAMgr->prefix_check_twosfx(word.c_str(), word.size(), 1, FLAG_NULL); } // check forbidden words @@ -1656,17 +1606,15 @@ int SuggestMgr::checkword(const char* word, } int SuggestMgr::check_forbidden(const char* word, int len) { - struct hentry* rv = NULL; - if (pAMgr) { - rv = pAMgr->lookup(word); + struct hentry* rv = pAMgr->lookup(word); if (rv && rv->astr && (TESTAFF(rv->astr, pAMgr->get_needaffix(), rv->alen) || TESTAFF(rv->astr, pAMgr->get_onlyincompound(), rv->alen))) rv = NULL; if (!(pAMgr->prefix_check(word, len, 1))) - rv = pAMgr->suffix_check(word, len, 0, NULL, NULL, 0, - NULL); // prefix+suffix, suffix + rv = pAMgr->suffix_check(word, len, 0, NULL, + FLAG_NULL, FLAG_NULL, IN_CPD_NOT); // prefix+suffix, suffix // check forbidden words if ((rv) && (rv->astr) && TESTAFF(rv->astr, pAMgr->get_forbiddenword(), rv->alen)) @@ -1675,32 +1623,25 @@ int SuggestMgr::check_forbidden(const char* word, int len) { return 0; } -char* SuggestMgr::suggest_morph(const char* w) { - char result[MAXLNLEN]; - char* r = (char*)result; - char* st; +std::string SuggestMgr::suggest_morph(const std::string& in_w) { + std::string result; struct hentry* rv = NULL; - *result = '\0'; - if (!pAMgr) - return NULL; + return std::string(); - std::string w2; - const char* word = w; + std::string w(in_w); // word reversing wrapper for complex prefixes if (complexprefixes) { - w2.assign(w); if (utf8) - reverseword_utf(w2); + reverseword_utf(w); else - reverseword(w2); - word = w2.c_str(); + reverseword(w); } - rv = pAMgr->lookup(word); + rv = pAMgr->lookup(w.c_str()); while (rv) { if ((!rv->astr) || @@ -1708,65 +1649,83 @@ char* SuggestMgr::suggest_morph(const char* w) { TESTAFF(rv->astr, pAMgr->get_needaffix(), rv->alen) || TESTAFF(rv->astr, pAMgr->get_onlyincompound(), rv->alen))) { if (!HENTRY_FIND(rv, MORPH_STEM)) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, MORPH_STEM, MAXLNLEN); - mystrcat(result, word, MAXLNLEN); + result.append(" "); + result.append(MORPH_STEM); + result.append(w); } if (HENTRY_DATA(rv)) { - mystrcat(result, " ", MAXLNLEN); - mystrcat(result, HENTRY_DATA2(rv), MAXLNLEN); + result.append(" "); + result.append(HENTRY_DATA2(rv)); } - mystrcat(result, "\n", MAXLNLEN); + result.append("\n"); } rv = rv->next_homonym; } - st = pAMgr->affix_check_morph(word, strlen(word)); - if (st) { - mystrcat(result, st, MAXLNLEN); - free(st); + std::string st = pAMgr->affix_check_morph(w.c_str(), w.size()); + if (!st.empty()) { + result.append(st); } - if (pAMgr->get_compound() && (*result == '\0')) { + if (pAMgr->get_compound() && result.empty()) { struct hentry* rwords[100]; // buffer for COMPOUND pattern checking - pAMgr->compound_check_morph(word, strlen(word), 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, &r, + pAMgr->compound_check_morph(w.c_str(), w.size(), 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, result, NULL); } - return (*result) ? mystrdup(line_uniq(result, MSEP_REC)) : NULL; + line_uniq(result, MSEP_REC); + + return result; +} + +static int get_sfxcount(const char* morph) { + if (!morph || !*morph) + return 0; + int n = 0; + const char* old = morph; + morph = strstr(morph, MORPH_DERI_SFX); + if (!morph) + morph = strstr(old, MORPH_INFL_SFX); + if (!morph) + morph = strstr(old, MORPH_TERM_SFX); + while (morph) { + n++; + old = morph; + morph = strstr(morph + 1, MORPH_DERI_SFX); + if (!morph) + morph = strstr(old + 1, MORPH_INFL_SFX); + if (!morph) + morph = strstr(old + 1, MORPH_TERM_SFX); + } + return n; } /* affixation */ -char* SuggestMgr::suggest_hentry_gen(hentry* rv, const char* pattern) { - char result[MAXLNLEN]; - *result = '\0'; +std::string SuggestMgr::suggest_hentry_gen(hentry* rv, const char* pattern) { + std::string result; int sfxcount = get_sfxcount(pattern); if (get_sfxcount(HENTRY_DATA(rv)) > sfxcount) - return NULL; + return result; if (HENTRY_DATA(rv)) { - char* aff = pAMgr->morphgen(HENTRY_WORD(rv), rv->blen, rv->astr, rv->alen, - HENTRY_DATA(rv), pattern, 0); - if (aff) { - mystrcat(result, aff, MAXLNLEN); - mystrcat(result, "\n", MAXLNLEN); - free(aff); + std::string aff = pAMgr->morphgen(HENTRY_WORD(rv), rv->blen, rv->astr, rv->alen, + HENTRY_DATA(rv), pattern, 0); + if (!aff.empty()) { + result.append(aff); + result.append("\n"); } } // check all allomorphs - char allomorph[MAXLNLEN]; char* p = NULL; if (HENTRY_DATA(rv)) p = (char*)strstr(HENTRY_DATA2(rv), MORPH_ALLOMORPH); while (p) { - struct hentry* rv2 = NULL; p += MORPH_TAG_LEN; int plen = fieldlen(p); - strncpy(allomorph, p, plen); - allomorph[plen] = '\0'; - rv2 = pAMgr->lookup(allomorph); + std::string allomorph(p, plen); + struct hentry* rv2 = pAMgr->lookup(allomorph.c_str()); while (rv2) { // if (HENTRY_DATA(rv2) && get_sfxcount(HENTRY_DATA(rv2)) <= // sfxcount) { @@ -1774,12 +1733,11 @@ char* SuggestMgr::suggest_hentry_gen(hentry* rv, const char* pattern) { char* st = (char*)strstr(HENTRY_DATA2(rv2), MORPH_STEM); if (st && (strncmp(st + MORPH_TAG_LEN, HENTRY_WORD(rv), fieldlen(st + MORPH_TAG_LEN)) == 0)) { - char* aff = pAMgr->morphgen(HENTRY_WORD(rv2), rv2->blen, rv2->astr, - rv2->alen, HENTRY_DATA(rv2), pattern, 0); - if (aff) { - mystrcat(result, aff, MAXLNLEN); - mystrcat(result, "\n", MAXLNLEN); - free(aff); + std::string aff = pAMgr->morphgen(HENTRY_WORD(rv2), rv2->blen, rv2->astr, + rv2->alen, HENTRY_DATA(rv2), pattern, 0); + if (!aff.empty()) { + result.append(aff); + result.append("\n"); } } } @@ -1788,27 +1746,28 @@ char* SuggestMgr::suggest_hentry_gen(hentry* rv, const char* pattern) { p = strstr(p + plen, MORPH_ALLOMORPH); } - return (*result) ? mystrdup(result) : NULL; + return result; } -char* SuggestMgr::suggest_gen(char** desc, int n, const char* pattern) { - if (n == 0 || !pAMgr) - return NULL; +std::string SuggestMgr::suggest_gen(const std::vector<std::string>& desc, const std::string& in_pattern) { + if (desc.empty() || !pAMgr) + return std::string(); + const char* pattern = in_pattern.c_str(); std::string result2; std::string newpattern; struct hentry* rv = NULL; // search affixed forms with and without derivational suffixes while (1) { - for (int k = 0; k < n; k++) { + for (size_t k = 0; k < desc.size(); ++k) { std::string result; // add compound word parts (except the last one) - char* s = (char*)desc[k]; - char* part = strstr(s, MORPH_PART); + const char* s = desc[k].c_str(); + const char* part = strstr(s, MORPH_PART); if (part) { - char* nextpart = strstr(part + 1, MORPH_PART); + const char* nextpart = strstr(part + 1, MORPH_PART); while (nextpart) { std::string field; copy_field(field, part, MORPH_PART); @@ -1819,56 +1778,50 @@ char* SuggestMgr::suggest_gen(char** desc, int n, const char* pattern) { s = part; } - char** pl; std::string tok(s); size_t pos = tok.find(" | "); while (pos != std::string::npos) { tok[pos + 1] = MSEP_ALT; pos = tok.find(" | ", pos); } - int pln = line_tok(tok.c_str(), &pl, MSEP_ALT); - for (int i = 0; i < pln; i++) { + std::vector<std::string> pl = line_tok(tok, MSEP_ALT); + for (size_t i = 0; i < pl.size(); ++i) { // remove inflectional and terminal suffixes - char* is = strstr(pl[i], MORPH_INFL_SFX); - if (is) - *is = '\0'; - char* ts = strstr(pl[i], MORPH_TERM_SFX); - while (ts) { - *ts = '_'; - ts = strstr(pl[i], MORPH_TERM_SFX); + size_t is = pl[i].find(MORPH_INFL_SFX); + if (is != std::string::npos) + pl[i].resize(is); + size_t ts = pl[i].find(MORPH_TERM_SFX); + while (ts != std::string::npos) { + pl[i][ts] = '_'; + ts = pl[i].find(MORPH_TERM_SFX); } - char* st = strstr(s, MORPH_STEM); + const char* st = strstr(s, MORPH_STEM); if (st) { copy_field(tok, st, MORPH_STEM); rv = pAMgr->lookup(tok.c_str()); while (rv) { std::string newpat(pl[i]); newpat.append(pattern); - char* sg = suggest_hentry_gen(rv, newpat.c_str()); - if (!sg) + std::string sg = suggest_hentry_gen(rv, newpat.c_str()); + if (sg.empty()) sg = suggest_hentry_gen(rv, pattern); - if (sg) { - char** gen; - int genl = line_tok(sg, &gen, MSEP_REC); - free(sg); - sg = NULL; - for (int j = 0; j < genl; j++) { + if (!sg.empty()) { + std::vector<std::string> gen = line_tok(sg, MSEP_REC); + for (size_t j = 0; j < gen.size(); ++j) { result2.push_back(MSEP_REC); result2.append(result); - if (strstr(pl[i], MORPH_SURF_PFX)) { + if (pl[i].find(MORPH_SURF_PFX) != std::string::npos) { std::string field; copy_field(field, pl[i], MORPH_SURF_PFX); result2.append(field); } result2.append(gen[j]); } - freelist(&gen, genl); } rv = rv->next_homonym; } } } - freelist(&pl, pln); } if (!result2.empty() || !strstr(pattern, MORPH_DERI_SFX)) @@ -1878,13 +1831,13 @@ char* SuggestMgr::suggest_gen(char** desc, int n, const char* pattern) { mystrrep(newpattern, MORPH_DERI_SFX, MORPH_TERM_SFX); pattern = newpattern.c_str(); } - return (!result2.empty() ? mystrdup(result2.c_str()) : NULL); + return result2; } -// generate an n-gram score comparing s1 and s2 +// generate an n-gram score comparing s1 and s2, UTF16 version int SuggestMgr::ngram(int n, - const std::string& s1, - const std::string& s2, + const std::vector<w_char>& su1, + const std::vector<w_char>& su2, int opt) { int nscore = 0; int ns; @@ -1892,68 +1845,36 @@ int SuggestMgr::ngram(int n, int l2; int test = 0; - if (utf8) { - std::vector<w_char> su1; - std::vector<w_char> su2; - l1 = u8_u16(su1, s1); - l2 = u8_u16(su2, s2); - if ((l2 <= 0) || (l1 == -1)) - return 0; - // lowering dictionary word - if (opt & NGRAM_LOWERING) - mkallsmall_utf(su2, langnum); - for (int j = 1; j <= n; j++) { - ns = 0; - for (int i = 0; i <= (l1 - j); i++) { - int k = 0; - for (int l = 0; l <= (l2 - j); l++) { - for (k = 0; k < j; k++) { - w_char& c1 = su1[i + k]; - w_char& c2 = su2[l + k]; - if ((c1.l != c2.l) || (c1.h != c2.h)) - break; - } - if (k == j) { - ns++; + l1 = su1.size(); + l2 = su2.size(); + if (l2 == 0) + return 0; + for (int j = 1; j <= n; j++) { + ns = 0; + for (int i = 0; i <= (l1 - j); i++) { + int k = 0; + for (int l = 0; l <= (l2 - j); l++) { + for (k = 0; k < j; k++) { + const w_char& c1 = su1[i + k]; + const w_char& c2 = su2[l + k]; + if ((c1.l != c2.l) || (c1.h != c2.h)) break; - } - } - if (k != j && opt & NGRAM_WEIGHTED) { - ns--; - test++; - if (i == 0 || i == l1 - j) - ns--; // side weight } - } - nscore = nscore + ns; - if (ns < 2 && !(opt & NGRAM_WEIGHTED)) - break; - } - } else { - l2 = s2.size(); - if (l2 == 0) - return 0; - l1 = s1.size(); - std::string t(s2); - if (opt & NGRAM_LOWERING) - mkallsmall(t, csconv); - for (int j = 1; j <= n; j++) { - ns = 0; - for (int i = 0; i <= (l1 - j); i++) { - std::string temp(s1.substr(i, j)); - if (t.find(temp) != std::string::npos) { + if (k == j) { ns++; - } else if (opt & NGRAM_WEIGHTED) { - ns--; - test++; - if (i == 0 || i == l1 - j) - ns--; // side weight + break; } } - nscore = nscore + ns; - if (ns < 2 && !(opt & NGRAM_WEIGHTED)) - break; + if (k != j && opt & NGRAM_WEIGHTED) { + ns--; + test++; + if (i == 0 || i == l1 - j) + ns--; // side weight + } } + nscore = nscore + ns; + if (ns < 2 && !(opt & NGRAM_WEIGHTED)) + break; } ns = 0; @@ -1965,46 +1886,92 @@ int SuggestMgr::ngram(int n, return ns; } -// length of the left common substring of s1 and (decapitalised) s2 -int SuggestMgr::leftcommonsubstring(const char* s1, const char* s2) { - if (utf8) { - std::vector<w_char> su1; - std::vector<w_char> su2; - int l1 = u8_u16(su1, s1); - int l2 = u8_u16(su2, s2); - // decapitalize dictionary word - if (complexprefixes) { - if (su1[l1 - 1] == su2[l2 - 1]) - return 1; - } else { - unsigned short idx = su2.empty() ? 0 : (su2[0].h << 8) + su2[0].l; - unsigned short otheridx = su1.empty() ? 0 : (su1[0].h << 8) + su1[0].l; - if (otheridx != idx && (otheridx != unicodetolower(idx, langnum))) - return 0; - int i; - for (i = 1; (i < l1) && (i < l2) && (su1[i].l == su2[i].l) && - (su1[i].h == su2[i].h); - i++) - ; - return i; +// generate an n-gram score comparing s1 and s2, non-UTF16 version +int SuggestMgr::ngram(int n, + const std::string& s1, + const std::string& s2, + int opt) { + int nscore = 0; + int ns; + int l1; + int l2; + int test = 0; + + l2 = s2.size(); + if (l2 == 0) + return 0; + l1 = s1.size(); + for (int j = 1; j <= n; j++) { + ns = 0; + for (int i = 0; i <= (l1 - j); i++) { + //s2 is haystack, s1[i..i+j) is needle + if (s2.find(s1.c_str()+i, 0, j) != std::string::npos) { + ns++; + } else if (opt & NGRAM_WEIGHTED) { + ns--; + test++; + if (i == 0 || i == l1 - j) + ns--; // side weight + } } + nscore = nscore + ns; + if (ns < 2 && !(opt & NGRAM_WEIGHTED)) + break; + } + + ns = 0; + if (opt & NGRAM_LONGER_WORSE) + ns = (l2 - l1) - 2; + if (opt & NGRAM_ANY_MISMATCH) + ns = abs(l2 - l1) - 2; + ns = (nscore - ((ns > 0) ? ns : 0)); + return ns; +} + +// length of the left common substring of s1 and (decapitalised) s2, UTF version +int SuggestMgr::leftcommonsubstring( + const std::vector<w_char>& su1, + const std::vector<w_char>& su2) { + int l1 = su1.size(); + int l2 = su2.size(); + // decapitalize dictionary word + if (complexprefixes) { + if (su1[l1 - 1] == su2[l2 - 1]) + return 1; } else { - if (complexprefixes) { - int l1 = strlen(s1); - int l2 = strlen(s2); - if (l1 <= l2 && s2[l1 - 1] == s2[l2 - 1]) - return 1; - } else if (csconv) { - const char* olds = s1; - // decapitalise dictionary word - if ((*s1 != *s2) && (*s1 != csconv[((unsigned char)*s2)].clower)) - return 0; - do { - s1++; - s2++; - } while ((*s1 == *s2) && (*s1 != '\0')); - return (int)(s1 - olds); - } + unsigned short idx = su2.empty() ? 0 : (su2[0].h << 8) + su2[0].l; + unsigned short otheridx = su1.empty() ? 0 : (su1[0].h << 8) + su1[0].l; + if (otheridx != idx && (otheridx != unicodetolower(idx, langnum))) + return 0; + int i; + for (i = 1; (i < l1) && (i < l2) && (su1[i].l == su2[i].l) && + (su1[i].h == su2[i].h); + i++) + ; + return i; + } + return 0; +} + +// length of the left common substring of s1 and (decapitalised) s2, non-UTF +int SuggestMgr::leftcommonsubstring( + const char* s1, + const char* s2) { + if (complexprefixes) { + int l1 = strlen(s1); + int l2 = strlen(s2); + if (l1 <= l2 && s2[l1 - 1] == s2[l2 - 1]) + return 1; + } else if (csconv) { + const char* olds = s1; + // decapitalise dictionary word + if ((*s1 != *s2) && (*s1 != csconv[((unsigned char)*s2)].clower)) + return 0; + do { + s1++; + s2++; + } while ((*s1 == *s2) && (*s1 != '\0')); + return (int)(s1 - olds); } return 0; } @@ -2054,7 +2021,7 @@ int SuggestMgr::commoncharacterpositions(const char* s1, } else { mkallsmall(t, csconv); } - for (i = 0; (*(s1 + i) != 0) && i < t.size(); i++) { + for (i = 0; i < t.size() && (*(s1 + i) != 0); ++i) { if (*(s1 + i) == t[i]) { num++; } else { diff --git a/libs/hunspell/src/suggestmgr.hxx b/libs/hunspell/src/suggestmgr.hxx index 675d98eb8f..19ffc03a84 100644 --- a/libs/hunspell/src/suggestmgr.hxx +++ b/libs/hunspell/src/suggestmgr.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -71,8 +68,8 @@ * SUCH DAMAGE. */ -#ifndef _SUGGESTMGR_HXX_ -#define _SUGGESTMGR_HXX_ +#ifndef SUGGESTMGR_HXX_ +#define SUGGESTMGR_HXX_ #define MAX_ROOTS 100 #define MAX_WORDS 100 @@ -91,8 +88,6 @@ #define NGRAM_LOWERING (1 << 2) #define NGRAM_WEIGHTED (1 << 3) -#include "hunvisapi.h" - #include "atypes.hxx" #include "affixmgr.hxx" #include "hashmgr.hxx" @@ -101,22 +96,22 @@ enum { LCS_UP, LCS_LEFT, LCS_UPLEFT }; -class LIBHUNSPELL_DLL_EXPORTED SuggestMgr { +class SuggestMgr { private: SuggestMgr(const SuggestMgr&); SuggestMgr& operator=(const SuggestMgr&); private: char* ckey; - int ckeyl; - w_char* ckey_utf; + size_t ckeyl; + std::vector<w_char> ckey_utf; char* ctry; - int ctryl; - w_char* ctry_utf; + size_t ctryl; + std::vector<w_char> ctry_utf; AffixMgr* pAMgr; - int maxSug; + unsigned int maxSug; struct cs_info* csconv; int utf8; int langnum; @@ -126,73 +121,68 @@ class LIBHUNSPELL_DLL_EXPORTED SuggestMgr { int complexprefixes; public: - SuggestMgr(const char* tryme, int maxn, AffixMgr* aptr); + SuggestMgr(const char* tryme, unsigned int maxn, AffixMgr* aptr); ~SuggestMgr(); - int suggest(char*** slst, const char* word, int nsug, int* onlycmpdsug); - int ngsuggest(char** wlst, const char* word, int ns, HashMgr** pHMgr, int md); - int suggest_auto(char*** slst, const char* word, int nsug); - int suggest_stems(char*** slst, const char* word, int nsug); - int suggest_pos_stems(char*** slst, const char* word, int nsug); + void suggest(std::vector<std::string>& slst, const char* word, int* onlycmpdsug); + void ngsuggest(std::vector<std::string>& slst, const char* word, const std::vector<HashMgr*>& rHMgr); - char* suggest_morph(const char* word); - char* suggest_gen(char** pl, int pln, const char* pattern); - char* suggest_morph_for_spelling_error(const char* word); + std::string suggest_morph(const std::string& word); + std::string suggest_gen(const std::vector<std::string>& pl, const std::string& pattern); private: - int testsug(char** wlst, - const char* candidate, - int wl, - int ns, - int cpdsuggest, - int* timer, - clock_t* timelimit); - int checkword(const char*, int, int, int*, clock_t*); + void testsug(std::vector<std::string>& wlst, + const std::string& candidate, + int cpdsuggest, + int* timer, + clock_t* timelimit); + int checkword(const std::string& word, int, int*, clock_t*); int check_forbidden(const char*, int); - int capchars(char**, const char*, int, int); - int replchars(char**, const char*, int, int); - int doubletwochars(char**, const char*, int, int); - int forgotchar(char**, const char*, int, int); - int swapchar(char**, const char*, int, int); - int longswapchar(char**, const char*, int, int); - int movechar(char**, const char*, int, int); - int extrachar(char**, const char*, int, int); - int badcharkey(char**, const char*, int, int); - int badchar(char**, const char*, int, int); - int twowords(char**, const char*, int, int); - int fixstems(char**, const char*, int); - - int capchars_utf(char**, const w_char*, int wl, int, int); - int doubletwochars_utf(char**, const w_char*, int wl, int, int); - int forgotchar_utf(char**, const w_char*, int wl, int, int); - int extrachar_utf(char**, const w_char*, int wl, int, int); - int badcharkey_utf(char**, const w_char*, int wl, int, int); - int badchar_utf(char**, const w_char*, int wl, int, int); - int swapchar_utf(char**, const w_char*, int wl, int, int); - int longswapchar_utf(char**, const w_char*, int, int, int); - int movechar_utf(char**, const w_char*, int, int, int); - - int mapchars(char**, const char*, int, int); + void capchars(std::vector<std::string>&, const char*, int); + int replchars(std::vector<std::string>&, const char*, int); + int doubletwochars(std::vector<std::string>&, const char*, int); + int forgotchar(std::vector<std::string>&, const char*, int); + int swapchar(std::vector<std::string>&, const char*, int); + int longswapchar(std::vector<std::string>&, const char*, int); + int movechar(std::vector<std::string>&, const char*, int); + int extrachar(std::vector<std::string>&, const char*, int); + int badcharkey(std::vector<std::string>&, const char*, int); + int badchar(std::vector<std::string>&, const char*, int); + int twowords(std::vector<std::string>&, const char*, int); + + void capchars_utf(std::vector<std::string>&, const w_char*, int wl, int); + int doubletwochars_utf(std::vector<std::string>&, const w_char*, int wl, int); + int forgotchar_utf(std::vector<std::string>&, const w_char*, int wl, int); + int extrachar_utf(std::vector<std::string>&, const w_char*, int wl, int); + int badcharkey_utf(std::vector<std::string>&, const w_char*, int wl, int); + int badchar_utf(std::vector<std::string>&, const w_char*, int wl, int); + int swapchar_utf(std::vector<std::string>&, const w_char*, int wl, int); + int longswapchar_utf(std::vector<std::string>&, const w_char*, int, int); + int movechar_utf(std::vector<std::string>&, const w_char*, int, int); + + int mapchars(std::vector<std::string>&, const char*, int); int map_related(const char*, std::string&, int, - char** wlst, - int, - int, - const mapentry*, + std::vector<std::string>& wlst, int, + const std::vector<mapentry>&, int*, clock_t*); + int ngram(int n, const std::vector<w_char>& su1, + const std::vector<w_char>& su2, int opt); int ngram(int n, const std::string& s1, const std::string& s2, int opt); int mystrlen(const char* word); + int leftcommonsubstring(const std::vector<w_char>& su1, + const std::vector<w_char>& su2); int leftcommonsubstring(const char* s1, const char* s2); int commoncharacterpositions(const char* s1, const char* s2, int* is_swap); void bubblesort(char** rwd, char** rwd2, int* rsc, int n); void lcs(const char* s, const char* s2, int* l1, int* l2, char** result); int lcslen(const char* s, const char* s2); int lcslen(const std::string& s, const std::string& s2); - char* suggest_hentry_gen(hentry* rv, const char* pattern); + std::string suggest_hentry_gen(hentry* rv, const char* pattern); }; #endif diff --git a/libs/hunspell/src/utf_info.cxx b/libs/hunspell/src/utf_info.c++ index 74742b8e43..6bb847f2a6 100644 --- a/libs/hunspell/src/utf_info.cxx +++ b/libs/hunspell/src/utf_info.c++ @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -39,7 +36,6 @@ * ***** END LICENSE BLOCK ***** */ #include "csutil.hxx" - /* fields: Unicode letter, toupper, tolower */ static struct unicode_info utf_lst[] = { {0x0041, 0x0041, 0x0061}, {0x0042, 0x0042, 0x0062}, @@ -9878,4 +9874,3 @@ static struct unicode_info utf_lst[] = { {0xFFD5, 0xFFD5, 0xFFD5}, {0xFFD6, 0xFFD6, 0xFFD6}, {0xFFD7, 0xFFD7, 0xFFD7}, {0xFFDA, 0xFFDA, 0xFFDA}, {0xFFDB, 0xFFDB, 0xFFDB}, {0xFFDC, 0xFFDC, 0xFFDC}}; -
\ No newline at end of file diff --git a/libs/hunspell/src/w_char.hxx b/libs/hunspell/src/w_char.hxx index 336c454f79..5accb7568f 100644 --- a/libs/hunspell/src/w_char.hxx +++ b/libs/hunspell/src/w_char.hxx @@ -1,6 +1,8 @@ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * + * Copyright (C) 2002-2017 Németh László + * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at @@ -11,12 +13,7 @@ * for the specific language governing rights and limitations under the * License. * - * The Original Code is Hunspell, based on MySpell. - * - * The Initial Developers of the Original Code are - * Kevin Hendricks (MySpell) and Németh László (Hunspell). - * Portions created by the Initial Developers are Copyright (C) 2002-2005 - * the Initial Developers. All Rights Reserved. + * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks. * * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno, * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád, @@ -38,8 +35,10 @@ * * ***** END LICENSE BLOCK ***** */ -#ifndef __WCHARHXX__ -#define __WCHARHXX__ +#ifndef W_CHAR_HXX_ +#define W_CHAR_HXX_ + +#include <string> #ifndef GCC struct w_char { @@ -66,10 +65,8 @@ struct __attribute__((packed)) w_char { // two character arrays struct replentry { - char* pattern; - char* pattern2; - bool start; - bool end; + std::string pattern; + std::string outstrings[4]; // med, ini, fin, isol }; #endif |