Return to Revisions

3 of 6

more instructive demo, add more

edited May 14, 2018 at 16:49

186.6k
28
465
639

Trigram similarity and distance operators put more weight on leading matches automatically and to a lesser extent on trailing matches, due to the way trigrams are extracted from strings. The manual:

Each word is considered to have two spaces prefixed and one space suffixed when determining the set of trigrams contained in the string.

There is more, including examples. Read the manual. And consider the demo in my fiddle at the bottom.

###Test setup Based on your table definition and example:

CREATE TABLE search (id int PRIMARY KEY, search_on text, comment text); INSERT INTO search (id, search_on, comment) VALUES ( 1, 'abc123456789', 'leading') , ( 2, '123abc456789', 'nested') , ( 3, '123456789abc', 'trailing') , ( 4, 'abc123abc456', 'leading, nested 1x') , ( 5, '123abc456abc', 'trailing,nested 1x') , ( 6, 'abcabcabc123', 'leading, nested 2x') , ( 7, '123abcabcabc', 'trailing nested 2x') , ( 8, '1abcabcabc23', 'nested 3x') , (10, 'abc12' , 'leading short') , (11, '12abc' , 'trailing short') , (12, '1abc2' , 'nested short'); CREATE INDEX index_search_search_on ON search USING gist (search_on gist_trgm_ops);

Not using your odd type bpchar (blank padded character type) for id - and I suggest you don't either. text or varchar should serve better:

Any downsides of using data type “text” for storing strings?

###Queries We need a low threshold for the demo:

SET pg_trgm.similarity_threshold = .01; -- show weak matches, too

Demonstrating the built-in bias in your favor:

SELECT *, search_on <-> 'abc' AS distance FROM search WHERE search_on % 'abc' ORDER BY search_on <-> 'abc';

 id | search_on | comment | distance -: | :----------- | :----------------- | :------- 10 | abc12 | leading short | 0.571429 6 | abcabcabc123 | leading, nested 2x | 0.7 11 | 12abc | trailing short | 0.75 4 | abc123abc456 | leading, nested 1x | 0.769231 1 | abc123456789 | leading | 0.785714 7 | 123abcabcabc | trailing nested 2x | 0.818182 5 | 123abc456abc | trailing,nested 1x | 0.857143 3 | 123456789abc | trailing | 0.866667 12 | 1abc2 | nested short | 0.888889 8 | 1abcabcabc23 | nested 3x | 0.916667 2 | 123abc456789 | nested | 0.9375

As you can see, leading matches have more weight. But it's still just a relative bias.

To make this absolute:

... at the beginning of my column's text, I'd want that factored into the ordering to come higher ...

SELECT *, search_on <-> 'abc' AS distance, search_on ILIKE 'abc%' AS prefix_match FROM search WHERE search_on % 'abc' ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on <-> 'abc'; -- then sort by distance

 id | search_on | comment | distance | prefix_match -: | :----------- | :----------------- | :------- | :----------- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f

I chose the expression search_on NOT ILIKE 'abc%' to still sort NULL values last. Equivalent: search_on ILIKE 'abc%' DESC NULLS LAST. Related:

PostgreSQL sort by datetime asc, null first?

db<>fiddle here

You could sort trailing matches in a similar fashion or combine both:

... ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on NOT ILIKE '%abc' -- suffix matches next , search_on <-> 'abc'; -- then sort by distance

BTW 1: Full Text Search also supports prefix matching.
BTW 2: The "C" collation COLLATE "C" allows plain btree index support for prefix matches.

answered May 14, 2018 at 2:38

Erwin Brandstetter

186.6k
28
465
639