DEV Community

Talemul Islam
Talemul Islam

Posted on

Fuzzy String Matching in PostgreSQL with pg_trgm (Trigram Search Tutorial)

πŸ” Fuzzy String Matching in PostgreSQL with pg_trgm

When working with user data β€” names, addresses, search queries β€” exact string matching isn’t always enough. People make typos, use alternate spellings, or abbreviations.

PostgreSQL provides a powerful extension called pg_trgm (trigram search) that allows you to perform fuzzy text matching and find β€œclose enough” results.


🧩 What is a Trigram?

A trigram is simply a group of three consecutive characters from a string.

For example, the word:

Talemul 
Enter fullscreen mode Exit fullscreen mode

Would be split into trigrams (with start/end markers):

" T", "Tal", "ale", "lem", "emu", "mul", "ul " 
Enter fullscreen mode Exit fullscreen mode

πŸ‘‰ Why the spaces?

  • PostgreSQL pads the string at the beginning and end with spaces to catch edge cases.

Now, when comparing two strings (Talemul vs Talimul), PostgreSQL compares how many trigrams they share:

  • Talemul: [" T","Tal","ale","lem","emu","mul","ul "]
  • Talimul: [" T","Tal","ali","lim","imu","mul","ul "]

They share trigrams like [" T","Tal","mul","ul "].

More shared trigrams = higher similarity score.

This makes trigram search excellent for catching typos or small spelling differences.


βœ… Setup

Enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm; 
Enter fullscreen mode Exit fullscreen mode

Check installed extensions:

\dx 
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Example: Finding Similar Names

Let’s say we have a table of names:

CREATE TABLE people ( id SERIAL PRIMARY KEY, name TEXT ); INSERT INTO people (name) VALUES ('Talemul'), ('Talimul'), ('Talimul Islam'), ('Tamim'), ('Talim'); 
Enter fullscreen mode Exit fullscreen mode

Now, search for names similar to "Talemul":

SELECT name, similarity(name, 'Talemul') AS score FROM people WHERE name % 'Talemul' -- % means "is similar to" ORDER BY score DESC; 
Enter fullscreen mode Exit fullscreen mode

πŸ–₯ Actual Output:

 name | score ----------------+------------- Talemul | 1.00000000 Talimul | 0.45454547 Talimul Islam | 0.29411766 Talim | 0.27272728 
Enter fullscreen mode Exit fullscreen mode

πŸ‘‰ Notice how "Talimul Islam" and "Talim" get lower scores, and may even be excluded depending on the similarity threshold.


⚑ Boost Performance with Indexes

For large datasets, use a GIN index to speed up similarity searches:

CREATE INDEX idx_people_name_trgm ON people USING gin (name gin_trgm_ops); 
Enter fullscreen mode Exit fullscreen mode

Now your searches with % and similarity() will scale much better.


🎯 Real-World Use Cases

  • User search: Find users by name despite typos (Jon vs John).
  • Autocomplete: Suggest terms even when slightly misspelled.
  • Deduplication: Detect near-duplicate entries.
  • Search engines: Power β€œDid you mean?” features.

πŸ“ Wrap Up

Trigrams are a simple but powerful idea: break words into three-character chunks, compare overlaps, and compute similarity.

PostgreSQL’s pg_trgm extension brings this technique directly into SQL, letting you build typo-tolerant search features without needing an external search engine.

πŸ”— References:

Top comments (0)