Configurable word characters
As of v0.0.22, you can configure the characters that check-spelling handles.
Previously, check-spelling would only look at /[A-Za-z']/
and generally with a minimum run length of 3.
Caveats
Certain escaped characters are converted to decoded characters first. (e.g. '
and '
)
Support for similarly html encoded entities isn't currently supported.
Spanish
To support Spanish, this needs to be extended to allow some accent characters and ñ.
extra_dictionaries:
cspell:es_ES/src/hunspell/index.dic
ignore-pattern: "[^'a-záéíóúñçüA-ZÁÉÍÓÚÑÇÜ]"
upper-pattern: '[A-ZÁÉÍÓÚÑÇÜ]'
lower-pattern: '[a-záéíóúñçü]'
not-lower-pattern: '[^a-záéíóúñçü]'
not-upper-or-lower-pattern: '[^A-ZÁÉÍÓÚÑÇÜa-záéíóúñçü]'
punctuation-pattern: "'"
Unicode
- Ll, Lm, Lt, Lu
Perl Unicode: General Category
[\p{Ll}\p{Lm}\p{Lt}\p{Lu}]
The general configuration is:
ignore-pattern: '[^\p{Ll}\p{Lm}\p{Lt}\p{Lu}]'
upper-pattern: '[\p{Lu}\p{Lt}\p{Lm}]'
lower-pattern: '[\p{Ll}\p{Lm}]'
not-lower-pattern: '[^\p{Ll}\p{Lm}]'
not-upper-or-lower-pattern: '[^\p{Lu}\p{Lt}\p{Lm}]'
punctuation-pattern: "'"
With some selection from available dictionaries:
extra_dictionaries:
cspell:ar/src/ayaspell/ar.dic
cspell:bg_BG/bg_BG.dic
cspell:ca/ca.dic
cspell:cs_CZ/Czech.dic
cspell:da_DK/da_DK.dic
cspell:de_CH/src/hunspell/index.dic
cspell:de_DE/src/German_de_DE.dic
cspell:de_DE/src/hunspell/index.dic
cspell:el/src/hunspell/el-GR.dic
cspell:en_GB/src/aoo-mozilla-en-dict/en-GB.dic
cspell:en_GB/src/hunspell/en_GB.dic
cspell:en_US/src/aoo-mozilla-en-dict/en_US.dic
cspell:en_US/src/hunspell/en_US.dic
cspell:eo/eo.dic
cspell:es_ES/src/hunspell/index.dic
cspell:et-EE/src/index.dic
cspell:fa_IR/hunspell/fa-IR.dic
cspell:fr_FR/src/hunspell-french-dictionaries-v7.0/fr-classique.dic
cspell:fr_FR/src/hunspell-french-dictionaries-v7.0/fr-reforme1990.dic
cspell:fr_FR/src/hunspell-french-dictionaries-v7.0/fr-toutesvariantes.dic
cspell:fr_FR_90/src/hunspell-french-dictionaries-v7.0/fr-classique.dic
cspell:fr_FR_90/src/hunspell-french-dictionaries-v7.0/fr-reforme1990.dic
cspell:fr_FR_90/src/hunspell-french-dictionaries-v7.0/fr-toutesvariantes.dic
cspell:he/hunspell/he.dic
cspell:hr_HR/src/hr_HR.dic
cspell:it_IT/it_IT.dic
cspell:lt_LT/lt_LT.dic
cspell:nb_NO/src/nb.dic
cspell:nl_NL/src/hunspell/index.dic
cspell:pl_PL/pl_pl.dic
cspell:pt_BR/src/hunspell/index.dic
cspell:pt_PT/Portuguese-European.dic
cspell:ru_RU/src/Russian.dic
cspell:ru_RU/src/hunspell/index.dic
cspell:ru_RU/src/ru_ru.dic
cspell:ru_RU/src/russian-aot.dic
cspell:sl_SI/src/sl_SI.dic
cspell:sv/src/hunspell/index.dic
cspell:sv/src/ooo-swedish-dict-2-42/dictionaries/sv_FI.dic
cspell:sv/src/ooo-swedish-dict-2-42/dictionaries/sv_SE.dic
cspell:sv/src/open-office-2008/Swedish.dic
cspell:tr_TR/Turkish.dic
cspell:uk_UA/uk_ua.dic
cspell:vi_VN/vi.dic
Dictionaries
In order for this to work reasonably well, support for hunspell .dic
and .aff
files has been added (in v0.0.22).
Related
Right now, characters that fall outside the recognized set are effectively blanked (replaced with a non-word character, currently =
). I might switch to only parsing characters that match the regex. That'd save me a pass.
FAQ | Showcase | Event descriptions | Configuration information | Known Issues | Possible features | Deprecations | Release notes | Helpful scripts