2

I'm really at my wits end, with this Problem, and I really hope someone could help me. I am using a Postgresql 9.3. My Database contains mostly german texts but not only, so it's encoded in utf-8. I want to establish a fulltextsearch wich supports german language, nothing special so far. But the search is behaving really strange,, and I can't find out what I am doing wrong.

So, given the following table given as example

select * from test; a ------------- ein Baum viele Bäume Überleben Tisch Tische Café \d test Tabelle »public.test« Spalte | Typ | Attribute --------+------+----------- a | text | sintext=# \d Liste der Relationen Schema | Name | Typ | Eigentümer --------+---------------------+---------+------------ (...) public | test | Tabelle | paf 

Now, lets have a look at some textsearch examples:

select * from test where to_tsvector('german', a) @@ plainto_tsquery('Baum'); a ------------- ein Baum viele Bäume select * from test where to_tsvector('german', a) @@ plainto_tsquery('Bäume'); --> No Hits select * from test where to_tsvector('german', a) @@ plainto_tsquery('Überleben'); --> No Hits select * from test where to_tsvector('german', a) @@ plainto_tsquery('Tisch'); a -------- Tisch Tische 

Whereas Tische is Plural of Tisch (table) and Bäume is plural of Baum (tree). So, Obviously Umlauts does not work while textsearch perfoms well.

But what really confuses me is, that a) non-german special characters are matching

select * from test where to_tsvector('german', a) @@ plainto_tsquery('Café'); a ------ Café 

and b) if I don't use the german dictionary, there is no Problem with umlauts (but of course no real textsearch as well)

select * from test where to_tsvector(a) @@ plainto_tsquery('Bäume'); a ------------- viele Bäume 

So, if I use the german dictionary for Text-Search, just the german special characters do not work? Seriously? What the hell is wrong here? I Really can't figure it out, please help!

1

1 Answer 1

7

You're explicitly using the German dictionary for the to_tsvector calls, but not for the to_tsquery or plainto_tsquery calls. Presumably your default dictionary isn't set to german; check with SHOW default_text_search_config.

Compare:

regress=> select plainto_tsquery('simple', 'Bäume'), plainto_tsquery('english','Bäume'), plainto_tsquery('german', 'Bäume'); plainto_tsquery | plainto_tsquery | plainto_tsquery -----------------+-----------------+----------------- 'bäume' | 'bäume' | 'baum' (1 row) 

The language setting affects word simplification and root extraction, so a vector from one language won't necessarily match a query from another:

regress=> SELECT to_tsvector('german', 'viele Bäume'), plainto_tsquery('Bäume'), to_tsvector('german', 'viele Bäume') @@ plainto_tsquery('Bäume'); to_tsvector | plainto_tsquery | ?column? -------------------+-----------------+---------- 'baum':2 'viel':1 | 'bäume' | f (1 row) 

If you use a consistent language setting, all is well:

regress=> SELECT to_tsvector('german', 'viele Bäume'), plainto_tsquery('german', 'Bäume'), to_tsvector('german', 'viele Bäume') @@ plainto_tsquery('german', 'Bäume'); to_tsvector | plainto_tsquery | ?column? -------------------+-----------------+---------- 'baum':2 'viel':1 | 'baum' | t (1 row) 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! I don't know how I could oversee that... Now it works perfectly, the day is saved :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.