Skip to main content
added 235 characters in body
Source Link
Evan Carroll
  • 65.8k
  • 50
  • 263
  • 512

There is also a plainto_tsqueryto_tsquery that supports conditionals. It takes the exact form you used except & and | instead of AND and OR . ;)

plainto_tsquerySELECT to_tsquery('english', $$( 'audit' AND& 'trial' ) OR| 'automate'$$ ); plainto_tsquery to_tsquery  ----------------------------- 'audit' & 'trial' &| 'autom' (1 row) 
SELECT * FROM pulledtext WHERE to_tsvector('english', basetext)  @@ plainto_tsqueryto_tsquery(   'english',   $$('audit' AND& 'trial') OR| 'automate'$$ ); id | basetext ----+------------------------- 1 | automate business audit 2 | audit trial 7 | automate this script   9 | automate this business (4 rows); 

On naming, and not using double-quotes, I agree with all the advice given Erwin's answer.

There is also a plainto_tsquery that supports conditionals. It takes the exact form you used. ;)

plainto_tsquery('english', $$('audit' AND 'trial') OR 'automate'$$ ); plainto_tsquery ----------------------------- 'audit' & 'trial' & 'autom' (1 row) 
SELECT * FROM pulledtext WHERE to_tsvector('english', basetext)  @@ plainto_tsquery(   'english',   $$('audit' AND 'trial') OR 'automate'$$ ); 

There is also a to_tsquery that supports conditionals. It takes the exact form you used except & and | instead of AND and OR . ;)

SELECT to_tsquery('english', $$( 'audit' & 'trial' ) | 'automate'$$ ); to_tsquery  ----------------------------- 'audit' & 'trial' | 'autom' (1 row) 
SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsquery( 'english', $$('audit' & 'trial') | 'automate'$$ ); id | basetext ----+------------------------- 1 | automate business audit 2 | audit trial 7 | automate this script   9 | automate this business (4 rows) 

On naming, and not using double-quotes, I agree with all the advice given Erwin's answer.

added 336 characters in body
Source Link
Evan Carroll
  • 65.8k
  • 50
  • 263
  • 512
CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsquery('english', search_term'); 

There is also a plainto_tsquery that supports conditionals. It takes the exact form you used. ;)

plainto_tsquery('english', $$('audit' AND 'trial') OR 'automate'$$ ); plainto_tsquery ----------------------------- 'audit' & 'trial' & 'autom' (1 row) 

Converting that into a query -- it'll also work on the index.

SELECT * FROM pulledtext WHERE to_tsvector('search_term''english', basetext) @@ plainto_tsquery( 'english', $$('audit' AND 'trial') OR 'automate'$$ ); 
CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsvector('search_term'); 
CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsquery('english', search_term'); 

There is also a plainto_tsquery that supports conditionals. It takes the exact form you used. ;)

plainto_tsquery('english', $$('audit' AND 'trial') OR 'automate'$$ ); plainto_tsquery ----------------------------- 'audit' & 'trial' & 'autom' (1 row) 

Converting that into a query -- it'll also work on the index.

SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ plainto_tsquery( 'english', $$('audit' AND 'trial') OR 'automate'$$ ); 
added 336 characters in body
Source Link
Evan Carroll
  • 65.8k
  • 50
  • 263
  • 512

Full Text Search

What you're doing is creating a normalized relational Full Text Search. That seems problematic and wasteful. PostgreSQL (and most other professional databases) have the functionality to handle all of this..

SELECT txt, to_tsvector(txt) FROM ( VALUES ( 'automate business audit' ), ( 'audit trial' ), ( 'trial' ), ( 'audit' ), ( 'fresh report' ), ( 'fresh audit' ), ( 'automate this script' ), ( 'im trying here' ), ( 'automate this business' ), ( 'lateral' ) ) AS t(txt); txt | to_tsvector -------------------------+------------------------------ automate business audit | 'audit':3 'autom':1 'busi':2 audit trial | 'audit':1 'trial':2 trial | 'trial':1 audit | 'audit':1 fresh report | 'fresh':1 'report':2 fresh audit | 'audit':2 'fresh':1 automate this script | 'autom':1 'script':3 im trying here | 'im':1 'tri':2 automate this business | 'autom':1 'busi':3 lateral | 'later':1 

You can see here that not all of these stems make sense. That's because we're using the default algorithmic stemmer Snowball. The important thing here is you can search these terms effectively because the queries undergo the same stemming: both queries and input get reduced to lexemes. If you need something more accurate that doesn't reduce to later or autom then Hunspell can take you the full distance: it'll be massively faster than your method, but slower then the Snowball method.

Using Full Text Search can drastically simplify your schema too. If you don't actually need the internal mechanisms, you can just

CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsvector('search_term'); 

What you're doing is creating a normalized relational Full Text Search. That seems problematic and wasteful. PostgreSQL (and most other professional databases) have the functionality to handle all of this..

SELECT txt, to_tsvector(txt) FROM ( VALUES ( 'automate business audit' ), ( 'audit trial' ), ( 'trial' ), ( 'audit' ), ( 'fresh report' ), ( 'fresh audit' ), ( 'automate this script' ), ( 'im trying here' ), ( 'automate this business' ), ( 'lateral' ) ) AS t(txt); txt | to_tsvector -------------------------+------------------------------ automate business audit | 'audit':3 'autom':1 'busi':2 audit trial | 'audit':1 'trial':2 trial | 'trial':1 audit | 'audit':1 fresh report | 'fresh':1 'report':2 fresh audit | 'audit':2 'fresh':1 automate this script | 'autom':1 'script':3 im trying here | 'im':1 'tri':2 automate this business | 'autom':1 'busi':3 lateral | 'later':1 

You can see here that not all of these stems make sense. That's because we're using the default algorithmic stemmer Snowball. The important thing here is you can search these terms effectively because the queries undergo the same stemming: both queries and input get reduced to lexemes. If you need something more accurate that doesn't reduce to later or autom then Hunspell can take you the full distance: it'll be massively faster than your method, but slower then the Snowball method.

Full Text Search

What you're doing is creating a normalized relational Full Text Search. That seems problematic and wasteful. PostgreSQL (and most other professional databases) have the functionality to handle all of this..

SELECT txt, to_tsvector(txt) FROM ( VALUES ( 'automate business audit' ), ( 'audit trial' ), ( 'trial' ), ( 'audit' ), ( 'fresh report' ), ( 'fresh audit' ), ( 'automate this script' ), ( 'im trying here' ), ( 'automate this business' ), ( 'lateral' ) ) AS t(txt); txt | to_tsvector -------------------------+------------------------------ automate business audit | 'audit':3 'autom':1 'busi':2 audit trial | 'audit':1 'trial':2 trial | 'trial':1 audit | 'audit':1 fresh report | 'fresh':1 'report':2 fresh audit | 'audit':2 'fresh':1 automate this script | 'autom':1 'script':3 im trying here | 'im':1 'tri':2 automate this business | 'autom':1 'busi':3 lateral | 'later':1 

You can see here that not all of these stems make sense. That's because we're using the default algorithmic stemmer Snowball. The important thing here is you can search these terms effectively because the queries undergo the same stemming: both queries and input get reduced to lexemes. If you need something more accurate that doesn't reduce to later or autom then Hunspell can take you the full distance: it'll be massively faster than your method, but slower then the Snowball method.

Using Full Text Search can drastically simplify your schema too. If you don't actually need the internal mechanisms, you can just

CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsvector('search_term'); 
Source Link
Evan Carroll
  • 65.8k
  • 50
  • 263
  • 512
Loading