3

I am trying to run full text search operations, such as to_tsvector, to_tsquery, etc and have a need for dictionaries in about 80+ languages.

Postgres seems to only come with 16 language configurations, with an additional two I am testing for Chinese (jiebacfg and testzhcg aka ZHParse). I'm looking for documentation or a repository of other languages beyond these.

mydatabase=# \dF List of text search configurations Schema | Name | Description ------------+------------+--------------------------------------- pg_catalog | danish | configuration for danish language pg_catalog | dutch | configuration for dutch language pg_catalog | english | configuration for english language pg_catalog | finnish | configuration for finnish language pg_catalog | french | configuration for french language pg_catalog | german | configuration for german language pg_catalog | hungarian | configuration for hungarian language pg_catalog | italian | configuration for italian language pg_catalog | norwegian | configuration for norwegian language pg_catalog | portuguese | configuration for portuguese language pg_catalog | romanian | configuration for romanian language pg_catalog | russian | configuration for russian language pg_catalog | simple | simple configuration pg_catalog | spanish | configuration for spanish language pg_catalog | swedish | configuration for swedish language pg_catalog | turkish | configuration for turkish language public | jiebacfg | configuration for jieba public | testzhcfg | (18 rows) 
3
  • @a_horse_with_no_name typo - 9.6.1 Commented Jan 18, 2017 at 9:32
  • 1
    You may want to look at some of OpenOfiice's Ispell (MySpell/Hunspell) dictionaries (PostgreSQL docs have some directions about how to import them, but I have never done it before). Commented Jan 18, 2017 at 13:52
  • @pozs yikes, installing even one additional language dictionary (lasr.cs.ucla.edu/geoff/ispell-dictionaries.html) is arduous Commented Jan 18, 2017 at 18:04

1 Answer 1

3

As pozs commented you can get dictionary files from OpenOffice (or LibreOffice) extensions. From documentation:

To create an Ispell dictionary perform these steps:

  • download dictionary configuration files. OpenOffice extension files have the .oxt extension. It is necessary to extract .aff and .dic files, change extensions to .affix and .dict. For some dictionary files it is also needed to convert characters to the UTF-8 encoding with commands (for example, for a Norwegian language dictionary):

iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff
iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic

  • copy files to the $SHAREDIR/tsearch_data directory

  • load files into PostgreSQL with the following command:

CREATE TEXT SEARCH DICTIONARY english_hunspell (
TEMPLATE = ispell,
DictFile = en_us,
AffFile = en_us,
Stopwords = english);

Also there is a list of extensions which provide easy way of dictionary installing. You can download them from github.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.