I have a table with English sentences. Given a sentence which may contain an additional word, or a distorted word, can I find the closest sentence in the table using Postgres' Text-Search capabilities?
to_tsvector('a b c') @@ plainto_tsquery('a b') returns true
to_tsvector('a b') @@ plainto_tsquery('a b c') returns false
I would like scenario 2 to return true as well.
Notes:
- The length of the sentences may be dozens of words. I'm looking for an efficient solution..
- Other text search engines such as Elastic/Solr will successfully return the closest result.
More information regrading performance of the trigram index.
EXPLAIN (ANALYSE, BUFFERS) SELECT similarity(title, 'electode paste composition') as sml, title FROM table WHERE title % 'electode paste composition' ORDER BY sml DESC; returns:
Gather Merge (cost=1880112.22..1902381.94 rows=190870 width=93) (actual time=36355.303..36356.143 rows=5 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=407649 -> Sort (cost=1879112.20..1879350.78 rows=95435 width=93) (actual time=36344.180..36344.180 rows=2 loops=3) Sort Key: (similarity(title, 'electode paste composition'::text)) DESC" Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB Buffers: shared hit=407649 -> Parallel Bitmap Heap Scan on table (cost=2759.10..1866325.66 rows=95435 width=93) (actual time=35940.284..36344.141 rows=2 loops=3) Recheck Cond: (title % 'electode paste composition'::text)" Rows Removed by Index Recheck: 14904 Heap Blocks: exact=16199 Buffers: shared hit=407635 -> Bitmap Index Scan on title_trgm (cost=0.00..2701.84 rows=229045 width=0) (actual time=35543.907..35543.907 rows=44716 loops=1) Index Cond: (title % 'electode paste composition'::text)" Buffers: shared hit=362988 Planning Time: 0.084 ms Execution Time: 36356.187 ms The same query using a tsquery takes less than 2.5 seconds.