π Fuzzy String Matching in PostgreSQL with pg_trgm
When working with user data β names, addresses, search queries β exact string matching isnβt always enough. People make typos, use alternate spellings, or abbreviations.
PostgreSQL provides a powerful extension called pg_trgm (trigram search) that allows you to perform fuzzy text matching and find βclose enoughβ results.
π§© What is a Trigram?
A trigram is simply a group of three consecutive characters from a string.
For example, the word:
Talemul Would be split into trigrams (with start/end markers):
" T", "Tal", "ale", "lem", "emu", "mul", "ul " π Why the spaces?
- PostgreSQL pads the string at the beginning and end with spaces to catch edge cases.
Now, when comparing two strings (Talemul vs Talimul), PostgreSQL compares how many trigrams they share:
-
Talemul:[" T","Tal","ale","lem","emu","mul","ul "] -
Talimul:[" T","Tal","ali","lim","imu","mul","ul "]
They share trigrams like [" T","Tal","mul","ul "].
More shared trigrams = higher similarity score.
This makes trigram search excellent for catching typos or small spelling differences.
β Setup
Enable the extension:
CREATE EXTENSION IF NOT EXISTS pg_trgm; Check installed extensions:
\dx π Example: Finding Similar Names
Letβs say we have a table of names:
CREATE TABLE people ( id SERIAL PRIMARY KEY, name TEXT ); INSERT INTO people (name) VALUES ('Talemul'), ('Talimul'), ('Talimul Islam'), ('Tamim'), ('Talim'); Now, search for names similar to "Talemul":
SELECT name, similarity(name, 'Talemul') AS score FROM people WHERE name % 'Talemul' -- % means "is similar to" ORDER BY score DESC; π₯ Actual Output:
name | score ----------------+------------- Talemul | 1.00000000 Talimul | 0.45454547 Talimul Islam | 0.29411766 Talim | 0.27272728 π Notice how "Talimul Islam" and "Talim" get lower scores, and may even be excluded depending on the similarity threshold.
β‘ Boost Performance with Indexes
For large datasets, use a GIN index to speed up similarity searches:
CREATE INDEX idx_people_name_trgm ON people USING gin (name gin_trgm_ops); Now your searches with % and similarity() will scale much better.
π― Real-World Use Cases
- User search: Find users by name despite typos (
JonvsJohn). - Autocomplete: Suggest terms even when slightly misspelled.
- Deduplication: Detect near-duplicate entries.
- Search engines: Power βDid you mean?β features.
π Wrap Up
Trigrams are a simple but powerful idea: break words into three-character chunks, compare overlaps, and compute similarity.
PostgreSQLβs pg_trgm extension brings this technique directly into SQL, letting you build typo-tolerant search features without needing an external search engine.
π References:
Top comments (0)