Similar to what @willglynn already posted, I would consider the pg_trgm module. But preferably with a GiST index:
CREATE INDEX tbl_location_name_trgm_idx USING gist(location_name gist_trgm_ops);
The gist_trgm_ops operator class ignore case generally, and ILIKE is just as fast as LIKE. Quoting the source code:
Caution: IGNORECASE macro means that trigrams are case-insensitive.
I use COLLATE "C" here - which is effectively no special collation (byte order instead), because you obviously have a mix of various collations in your column. Collation is relevant for ordering or ranges, for a basic similarity search, you can do without it. I would consider setting COLLATE "C" for your column to begin with.
This index would lend support to your first, simple form of the query:
SELECT * FROM tbl WHERE location_name ILIKE '%cafe%';
- Very fast.
- Retains capability to find partial matches.
- Adds capability for fuzzy search.
Check out the % operator and set_limit(). - GiST index is also very fast for queries with
LIMIT n to select n "best" matches. You could add to the above query:
ORDER BY location_name <-> 'cafe' LIMIT 20
Read more about the "distance" operator <-> in the manual here.
Or even:
SELECT * FROM tbl WHERE location_name ILIKE '%cafe%' -- exact partial match OR location_name % 'cafe' -- fuzzy match ORDER BY (location_name ILIKE 'cafe%') DESC -- exact beginning first ,(location_name ILIKE '%cafe%') DESC -- exact partial match next ,(location_name <-> 'cafe') -- then "best" matches ,location_name -- break remaining ties (collation!) LIMIT 20;
I use something like that in several applications for (to me) satisfactory results. Of course, it gets a bit slower with multiple features applied in combination. Find your sweet spot ...
You could go one step further and create a separate partial index for every language and use a matching collation for each:
CREATE INDEX location_name_trgm_idx USING gist(location_name COLLATE "de_DE" gist_trgm_ops) WHERE location_name_language = 'German'; -- repeat for each language
That would only be useful, if you only want results of a specific language per query and would be very fast in this case.