Trigram similarity and distance operatorsTrigram similarity and distance operators put more weight on leading matches (prefix) automatically and to a lesser extent on trailing matches (suffix), due to the way trigrams are extracted from strings. The manual:The manual:
CREATE TABLE search (id int PRIMARY KEY, search_on text, comment text); INSERT INTO search (id, search_on, comment) VALUES ( 1, 'abc123456789', 'leading') , ( 2, '123abc456789', 'nested') , ( 3, '123456789abc', 'trailing') , ( 4, 'abc123abc456', 'leading, nested 1x') , ( 5, '123abc456abc', 'trailing,nested 1x') , ( 6, 'abcabcabc123', 'leading, nested 2x') , ( 7, '123abcabcabc', 'trailing nested 2x') , ( 8, '1abcabcabc23', 'nested 3x') , (10, 'abc12' , 'leading short') , (11, '12abc' , 'trailing short') , (12, '1abc2' , 'nested short'); CREATE INDEX index_search_search_on ON search USING gist (search_on gist_trgm_ops); CREATE TABLE search (id int PRIMARY KEY, search_on text, comment text); INSERT INTO search (id, search_on, comment) VALUES ( 1, 'abc123456789', 'leading') , ( 2, '123abc456789', 'nested') , ( 3, '123456789abc', 'trailing') , ( 4, 'abc123abc456', 'leading, nested 1x') , ( 5, '123abc456abc', 'trailing,nested 1x') , ( 6, 'abcabcabc123', 'leading, nested 2x') , ( 7, '123abcabcabc', 'trailing nested 2x') , ( 8, '1abcabcabc23', 'nested 3x') , (10, 'abc12' , 'leading short') , (11, '12abc' , 'trailing short') , (12, '1abc2' , 'nested short') ; CREATE INDEX index_search_search_on ON search USING gist (search_on gist_trgm_ops); - Any downsides of using data type “text” for storing strings?Any downsides of using data type “text” for storing strings?
SELECT *, search_on <-> 'abc' AS distance FROM search WHERE search_on % 'abc' ORDER BY search_on <-> 'abc'; SELECT *, search_on <-> 'abc' AS distance FROM search WHERE search_on % 'abc' ORDER BY search_on <-> 'abc'; id | search_on | comment | distance -: | :----------- | :----------------- | :------- 10 | abc12 | leading short | 0.571429 6 | abcabcabc123 | leading, nested 2x | 0.7 11 | 12abc | trailing short | 0.75 4 | abc123abc456 | leading, nested 1x | 0.769231 1 | abc123456789 | leading | 0.785714 7 | 123abcabcabc | trailing nested 2x | 0.818182 5 | 123abc456abc | trailing,nested 1x | 0.857143 3 | 123456789abc | trailing | 0.866667 12 | 1abc2 | nested short | 0.888889 8 | 1abcabcabc23 | nested 3x | 0.916667 2 | 123abc456789 | nested | 0.9375 id | search_on | comment | distance -: | :----------- | :----------------- | :------- 10 | abc12 | leading short | 0.571429 6 | abcabcabc123 | leading, nested 2x | 0.7 11 | 12abc | trailing short | 0.75 4 | abc123abc456 | leading, nested 1x | 0.769231 1 | abc123456789 | leading | 0.785714 7 | 123abcabcabc | trailing nested 2x | 0.818182 5 | 123abc456abc | trailing,nested 1x | 0.857143 3 | 123456789abc | trailing | 0.866667 12 | 1abc2 | nested short | 0.888889 8 | 1abcabcabc23 | nested 3x | 0.916667 2 | 123abc456789 | nested | 0.9375 SELECT *, search_on <-> 'abc' AS distance, search_on ILIKE 'abc%' AS prefix_match FROM search WHERE search_on % 'abc' ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on <-> 'abc'; -- then sort by distance SELECT *, search_on <-> 'abc' AS distance, search_on ILIKE 'abc%' AS prefix_match FROM search WHERE search_on % 'abc' ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on <-> 'abc'; -- then sort by distance id | search_on | comment | distance | prefix -: | :----------- | :----------------- | :------- | :----- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f id | search_on | comment | distance | prefix -: | :----------- | :----------------- | :------- | :----- 10 | abc12 | leading short | 0.571429 | t 6 | abcabcabc123 | leading, nested 2x | 0.7 | t 4 | abc123abc456 | leading, nested 1x | 0.769231 | t 1 | abc123456789 | leading | 0.785714 | t 11 | 12abc | trailing short | 0.75 | f 7 | 123abcabcabc | trailing nested 2x | 0.818182 | f 5 | 123abc456abc | trailing,nested 1x | 0.857143 | f 3 | 123456789abc | trailing | 0.866667 | f 12 | 1abc2 | nested short | 0.888889 | f 8 | 1abcabcabc23 | nested 3x | 0.916667 | f 2 | 123abc456789 | nested | 0.9375 | f ... ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on NOT ILIKE '%abc' -- suffix matches next , search_on <-> 'abc'; -- then sort by distance ... ORDER BY search_on NOT ILIKE 'abc%' -- prefix matches first , search_on NOT ILIKE '%abc' -- suffix matches next , search_on <-> 'abc'; -- then sort by distance - Pattern matching with LIKE, SIMILAR TO or regular expressionsPattern matching with LIKE, SIMILAR TO or regular expressions
- Is unique index better than unique constraint when an index with an operator class is required
- Get partial match from GIN indexed TSVECTOR column
- Best index for similarity function