PostgreSQL accent + case insensitive search

Question

I'm looking for a way to support with good performances case insensitive + accent insensitive search. Till now we had no issue on this using MSSql server, on Oracle we had to use OracleText, and now we need it on PostgreSQL.

I've found this post about it, but we need to combine it with case insensitive. We also need to use indexes, otherwise performances could be impacted. Any real experience about the best approach for large databases?

Did you check the full text search functions in PostgreSQL? postgresql.org/docs/current/interactive/textsearch.html — Frank Heikens
– Frank Heikens, Commented Feb 20, 2015 at 12:07
The upcoming PostgreSQL 10 release will add support for this via ICU Collation Support. rhaas.blogspot.com/2017/04/… — Ben Claar
– Ben Claar, Commented Aug 16, 2017 at 15:19

Erwin Brandstetter · Accepted Answer · 2022-05-16 22:51:06Z

If you need to "combine with case insensitive", there are a number of options, depending on your exact requirements.

Maybe simplest, make the expression index case insensitive.

Building on the function f_unaccent() laid out in the referenced answer:

Does PostgreSQL support "accent insensitive" collations?

CREATE INDEX users_lower_unaccent_name_idx ON users(lower(f_unaccent(name)));

Then:

SELECT * FROM users WHERE lower(f_unaccent(name)) = lower(f_unaccent('João'));

Or you could build the lower() into the function f_unaccent(), to derive something like f_lower_unaccent().

Or (especially if you need to do fuzzy pattern matching anyways) you can use a trigram index provided by the additional module pg_trgm building on above function, which also supports ILIKE. Details:

LOWER LIKE vs iLIKE

I added a note to the referenced answer.

Or you could use the additional module citext (but I rather avoid it):

Deferrable, case-insensitive unique constraint

that's quite a good list of hint to look into. Our requrements are "simple" in theory:if a column contains let's say Firstname = “Aloïse”, we want to be able to find the row using for instance: "Aloise" or "aloise" or even "lois".

Evan Carroll · Accepted Answer · 2018-05-30 02:00:58Z

Full-Text-Search Dictionary that Unaccent case-insensitive

FTS is naturally case-insensitive by default,

Converting tokens into lexemes. A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case, and often involves removal of suffixes (such as s or es in English).

And you can define your own dictionary using unaccent,

CREATE EXTENSION unaccent; CREATE TEXT SEARCH CONFIGURATION mydict ( COPY = simple ); ALTER TEXT SEARCH CONFIGURATION mydict ALTER MAPPING FOR hword, hword_part, word WITH unaccent, simple;

Which you can then index with a functional index,

-- Just some sample data... CREATE TABLE myTable ( myCol ) AS VALUES ('fóó bar baz'),('qux quz'); -- No index required, but feel free to create one CREATE INDEX ON myTable USING GIST (to_tsvector('mydict', myCol));

You can now query it very simply

SELECT * FROM myTable WHERE to_tsvector('mydict', myCol) @@ 'foo & bar' mycol ------------- fóó bar baz (1 row)

Collectives™ on Stack Overflow

PostgreSQL accent + case insensitive search

2 Answers 2

1 Comment

Full-Text-Search Dictionary that Unaccent case-insensitive

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Full-Text-Search Dictionary that Unaccent case-insensitive

1 Comment

Linked

Related