Skip to main content
links, format, markup, fiddle
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 639

Trigram similarity and distance operatorsTrigram similarity and distance operators put more weight on leading matches (prefix) automatically and to a lesser extent on trailing matches (suffix), due to the way trigrams are extracted from strings. The manual:The manual:

CREATE TABLE search (id int PRIMARY KEY, search_on text, comment text); INSERT INTO search (id, search_on, comment) VALUES ( 1, 'abc123456789', 'leading') , ( 2, '123abc456789', 'nested') , ( 3, '123456789abc', 'trailing') , ( 4, 'abc123abc456', 'leading, nested 1x') , ( 5, '123abc456abc', 'trailing,nested 1x') , ( 6, 'abcabcabc123', 'leading, nested 2x') , ( 7, '123abcabcabc', 'trailing nested 2x') , ( 8, '1abcabcabc23', 'nested 3x') , (10, 'abc12' , 'leading short') , (11, '12abc' , 'trailing short') , (12, '1abc2' , 'nested short'); CREATE INDEX index_search_search_on ON search USING gist (search_on gist_trgm_ops); 
CREATE TABLE search (id int PRIMARY KEY, search_on text, comment text); INSERT INTO search (id, search_on, comment) VALUES ( 1, 'abc123456789', 'leading') , ( 2, '123abc456789', 'nested') , ( 3, '123456789abc', 'trailing') , ( 4, 'abc123abc456', 'leading, nested 1x') , ( 5, '123abc456abc', 'trailing,nested 1x') , ( 6, 'abcabcabc123', 'leading, nested 2x') , ( 7, '123abcabcabc', 'trailing nested 2x') , ( 8, '1abcabcabc23', 'nested 3x') , (10, 'abc12' , 'leading short') , (11, '12abc' , 'trailing short') , (12, '1abc2' , 'nested short') ; CREATE INDEX index_search_search_on ON search USING gist (search_on gist_trgm_ops); 
SELECT *, search_on <-> 'abc' AS distance FROM search WHERE search_on % 'abc' ORDER BY search_on <-> 'abc'; 
SELECT *, search_on <-> 'abc' AS distance FROM search WHERE search_on % 'abc' ORDER BY search_on <-> 'abc'; 
 id | search_on | comment | distance -: | :----------- | :----------------- | :------- 10 | abc12 | leading short | 0.571429 6 | abcabcabc123 | leading, nested 2x | 0.7 11 | 12abc | trailing short | 0.75 4 | abc123abc456 | leading, nested 1x | 0.769231 1 | abc123456789 | leading | 0.785714 7 | 123abcabcabc | trailing nested 2x | 0.818182 5 | 123abc456abc | trailing,nested 1x | 0.857143 3 | 123456789abc | trailing | 0.866667 12 | 1abc2 | nested short | 0.888889 8 | 1abcabcabc23 | nested 3x | 0.916667 2 | 123abc456789 | nested | 0.9375 
id | search_on | comment | distance -: | :----------- | :----------------- | :------- 10 | abc12 | leading short | 0.571429 6 | abcabcabc123 | leading, nested 2x | 0.7 11 | 12abc | trailing short | 0.75 4 | abc123abc456 | leading, nested 1x | 0.769231 1 | abc123456789 | leading | 0.785714 7 | 123abcabcabc | trailing nested 2x | 0.818182 5 | 123abc456abc | trailing,nested 1x | 0.857143 3 | 123456789abc | trailing | 0.866667 12 | 1abc2 | nested short | 0.888889 8 | 1abcabcabc23 | nested 3x | 0.916667 2 | 123abc456789 | nested | 0.9375 
SELECT *, search_on <-> 'abc' AS distance, search_on ILIKE 'abc%' AS prefix_match FROM search WHERE search_on % 'abc' ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on <-> 'abc'; -- then sort by distance 
SELECT *, search_on <-> 'abc' AS distance, search_on ILIKE 'abc%' AS prefix_match FROM search WHERE search_on % 'abc' ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on <-> 'abc'; -- then sort by distance 
 id | search_on | comment | distance | prefix -: | :----------- | :----------------- | :------- | :----- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f 
id | search_on | comment | distance | prefix -: | :----------- | :----------------- | :------- | :----- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f 
... ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on NOT ILIKE '%abc' -- suffix matches next , search_on <-> 'abc'; -- then sort by distance 
... ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on NOT ILIKE '%abc' -- suffix matches next , search_on <-> 'abc'; -- then sort by distance 

db<>fiddle herefiddle

Trigram similarity and distance operators put more weight on leading matches (prefix) automatically and to a lesser extent on trailing matches (suffix), due to the way trigrams are extracted from strings. The manual:

CREATE TABLE search (id int PRIMARY KEY, search_on text, comment text); INSERT INTO search (id, search_on, comment) VALUES ( 1, 'abc123456789', 'leading') , ( 2, '123abc456789', 'nested') , ( 3, '123456789abc', 'trailing') , ( 4, 'abc123abc456', 'leading, nested 1x') , ( 5, '123abc456abc', 'trailing,nested 1x') , ( 6, 'abcabcabc123', 'leading, nested 2x') , ( 7, '123abcabcabc', 'trailing nested 2x') , ( 8, '1abcabcabc23', 'nested 3x') , (10, 'abc12' , 'leading short') , (11, '12abc' , 'trailing short') , (12, '1abc2' , 'nested short'); CREATE INDEX index_search_search_on ON search USING gist (search_on gist_trgm_ops); 
SELECT *, search_on <-> 'abc' AS distance FROM search WHERE search_on % 'abc' ORDER BY search_on <-> 'abc'; 
 id | search_on | comment | distance -: | :----------- | :----------------- | :------- 10 | abc12 | leading short | 0.571429 6 | abcabcabc123 | leading, nested 2x | 0.7 11 | 12abc | trailing short | 0.75 4 | abc123abc456 | leading, nested 1x | 0.769231 1 | abc123456789 | leading | 0.785714 7 | 123abcabcabc | trailing nested 2x | 0.818182 5 | 123abc456abc | trailing,nested 1x | 0.857143 3 | 123456789abc | trailing | 0.866667 12 | 1abc2 | nested short | 0.888889 8 | 1abcabcabc23 | nested 3x | 0.916667 2 | 123abc456789 | nested | 0.9375 
SELECT *, search_on <-> 'abc' AS distance, search_on ILIKE 'abc%' AS prefix_match FROM search WHERE search_on % 'abc' ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on <-> 'abc'; -- then sort by distance 
 id | search_on | comment | distance | prefix -: | :----------- | :----------------- | :------- | :----- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f 
... ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on NOT ILIKE '%abc' -- suffix matches next , search_on <-> 'abc'; -- then sort by distance 

db<>fiddle here

Trigram similarity and distance operators put more weight on leading matches (prefix) automatically and to a lesser extent on trailing matches (suffix), due to the way trigrams are extracted from strings. The manual:

CREATE TABLE search (id int PRIMARY KEY, search_on text, comment text); INSERT INTO search (id, search_on, comment) VALUES ( 1, 'abc123456789', 'leading') , ( 2, '123abc456789', 'nested') , ( 3, '123456789abc', 'trailing') , ( 4, 'abc123abc456', 'leading, nested 1x') , ( 5, '123abc456abc', 'trailing,nested 1x') , ( 6, 'abcabcabc123', 'leading, nested 2x') , ( 7, '123abcabcabc', 'trailing nested 2x') , ( 8, '1abcabcabc23', 'nested 3x') , (10, 'abc12' , 'leading short') , (11, '12abc' , 'trailing short') , (12, '1abc2' , 'nested short') ; CREATE INDEX index_search_search_on ON search USING gist (search_on gist_trgm_ops); 
SELECT *, search_on <-> 'abc' AS distance FROM search WHERE search_on % 'abc' ORDER BY search_on <-> 'abc'; 
id | search_on | comment | distance -: | :----------- | :----------------- | :------- 10 | abc12 | leading short | 0.571429 6 | abcabcabc123 | leading, nested 2x | 0.7 11 | 12abc | trailing short | 0.75 4 | abc123abc456 | leading, nested 1x | 0.769231 1 | abc123456789 | leading | 0.785714 7 | 123abcabcabc | trailing nested 2x | 0.818182 5 | 123abc456abc | trailing,nested 1x | 0.857143 3 | 123456789abc | trailing | 0.866667 12 | 1abc2 | nested short | 0.888889 8 | 1abcabcabc23 | nested 3x | 0.916667 2 | 123abc456789 | nested | 0.9375 
SELECT *, search_on <-> 'abc' AS distance, search_on ILIKE 'abc%' AS prefix_match FROM search WHERE search_on % 'abc' ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on <-> 'abc'; -- then sort by distance 
id | search_on | comment | distance | prefix -: | :----------- | :----------------- | :------- | :----- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f 
... ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on NOT ILIKE '%abc' -- suffix matches next , search_on <-> 'abc'; -- then sort by distance 

fiddle

Commonmark migration
Source Link

Test setup

###Test setup BasedBased on your table definition and example:

Queries

###Queries WeWe need a low threshold for the demo:

###Test setup Based on your table definition and example:

###Queries We need a low threshold for the demo:

Test setup

Based on your table definition and example:

Queries

We need a low threshold for the demo:

more instructive demo, add more
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 639

Trigram similarity and distance operators put more weight on leading matches (prefix) automatically and to a lesser extent on trailingtrailing matches (suffix), due to the way trigrams are extracted from strings. The manual:

There is more, including examples. Read the manual. And consider the demodemos in my fiddle at the bottom of this answer.

 id | search_on | comment | distance | prefix_matchprefix -: | :----------- | :----------------- | :------- | :----------- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f 

db<>fiddle here

You could sort trailing matches in a similar fashion or combine both:

db<>fiddle here

Trigram similarity and distance operators put more weight on leading matches automatically and to a lesser extent on trailing matches, due to the way trigrams are extracted from strings. The manual:

There is more, including examples. Read the manual. And consider the demo in my fiddle at the bottom.

 id | search_on | comment | distance | prefix_match -: | :----------- | :----------------- | :------- | :----------- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f 

db<>fiddle here

You could sort trailing matches in a similar fashion or combine both:

Trigram similarity and distance operators put more weight on leading matches (prefix) automatically and to a lesser extent on trailing matches (suffix), due to the way trigrams are extracted from strings. The manual:

There is more, including examples. Read the manual. And consider the demos in my fiddle at the bottom of this answer.

 id | search_on | comment | distance | prefix -: | :----------- | :----------------- | :------- | :----- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f 

You could sort trailing matches in a similar fashion or combine both:

db<>fiddle here

more instructive demo, add more
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 639
Loading
clarify
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 639
Loading
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 639
Loading