added 235 characters in body

edited Dec 24, 2017 at 21:14

65.8k
50
263
512

There is also a plainto_tsqueryto_tsquery that supports conditionals. It takes the exact form you used except & and | instead of AND and OR . ;)

plainto_tsquerySELECT to_tsquery('english', $$( 'audit' AND& 'trial' ) OR| 'automate'$$ ); plainto_tsquery to_tsquery  ----------------------------- 'audit' & 'trial' &| 'autom' (1 row)

SELECT * FROM pulledtext WHERE to_tsvector('english', basetext)  @@ plainto_tsqueryto_tsquery(   'english',   $$('audit' AND& 'trial') OR| 'automate'$$ ); id | basetext ----+------------------------- 1 | automate business audit 2 | audit trial 7 | automate this script   9 | automate this business (4 rows);

On naming, and not using double-quotes, I agree with all the advice given Erwin's answer.

There is also a plainto_tsquery that supports conditionals. It takes the exact form you used. ;)

plainto_tsquery('english', $$('audit' AND 'trial') OR 'automate'$$ ); plainto_tsquery ----------------------------- 'audit' & 'trial' & 'autom' (1 row)

SELECT * FROM pulledtext WHERE to_tsvector('english', basetext)  @@ plainto_tsquery(   'english',   $$('audit' AND 'trial') OR 'automate'$$ );

There is also a to_tsquery that supports conditionals. It takes the exact form you used except & and | instead of AND and OR . ;)

SELECT to_tsquery('english', $$( 'audit' & 'trial' ) | 'automate'$$ ); to_tsquery  ----------------------------- 'audit' & 'trial' | 'autom' (1 row)

SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsquery( 'english', $$('audit' & 'trial') | 'automate'$$ ); id | basetext ----+------------------------- 1 | automate business audit 2 | audit trial 7 | automate this script   9 | automate this business (4 rows)

On naming, and not using double-quotes, I agree with all the advice given Erwin's answer.

added 336 characters in body

Source Link

edited Dec 24, 2017 at 21:05

Evan Carroll

65.8k
50
263
512

CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsquery('english', search_term');

There is also a plainto_tsquery that supports conditionals. It takes the exact form you used. ;)

plainto_tsquery('english', $$('audit' AND 'trial') OR 'automate'$$ ); plainto_tsquery ----------------------------- 'audit' & 'trial' & 'autom' (1 row)

Converting that into a query -- it'll also work on the index.

SELECT * FROM pulledtext WHERE to_tsvector('search_term''english', basetext) @@ plainto_tsquery( 'english', $$('audit' AND 'trial') OR 'automate'$$ );

CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsvector('search_term');

CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsquery('english', search_term');

There is also a plainto_tsquery that supports conditionals. It takes the exact form you used. ;)

plainto_tsquery('english', $$('audit' AND 'trial') OR 'automate'$$ ); plainto_tsquery ----------------------------- 'audit' & 'trial' & 'autom' (1 row)

Converting that into a query -- it'll also work on the index.

SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ plainto_tsquery( 'english', $$('audit' AND 'trial') OR 'automate'$$ );

added 336 characters in body

Source Link

edited Dec 24, 2017 at 20:58

Evan Carroll

65.8k
50
263
512

Full Text Search

What you're doing is creating a normalized relational Full Text Search. That seems problematic and wasteful. PostgreSQL (and most other professional databases) have the functionality to handle all of this..

SELECT txt, to_tsvector(txt) FROM ( VALUES ( 'automate business audit' ), ( 'audit trial' ), ( 'trial' ), ( 'audit' ), ( 'fresh report' ), ( 'fresh audit' ), ( 'automate this script' ), ( 'im trying here' ), ( 'automate this business' ), ( 'lateral' ) ) AS t(txt); txt | to_tsvector -------------------------+------------------------------ automate business audit | 'audit':3 'autom':1 'busi':2 audit trial | 'audit':1 'trial':2 trial | 'trial':1 audit | 'audit':1 fresh report | 'fresh':1 'report':2 fresh audit | 'audit':2 'fresh':1 automate this script | 'autom':1 'script':3 im trying here | 'im':1 'tri':2 automate this business | 'autom':1 'busi':3 lateral | 'later':1

You can see here that not all of these stems make sense. That's because we're using the default algorithmic stemmer Snowball. The important thing here is you can search these terms effectively because the queries undergo the same stemming: both queries and input get reduced to lexemes. If you need something more accurate that doesn't reduce to later or autom then Hunspell can take you the full distance: it'll be massively faster than your method, but slower then the Snowball method.

Using Full Text Search can drastically simplify your schema too. If you don't actually need the internal mechanisms, you can just

CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsvector('search_term');

What you're doing is creating a normalized relational Full Text Search. That seems problematic and wasteful. PostgreSQL (and most other professional databases) have the functionality to handle all of this..

SELECT txt, to_tsvector(txt) FROM ( VALUES ( 'automate business audit' ), ( 'audit trial' ), ( 'trial' ), ( 'audit' ), ( 'fresh report' ), ( 'fresh audit' ), ( 'automate this script' ), ( 'im trying here' ), ( 'automate this business' ), ( 'lateral' ) ) AS t(txt); txt | to_tsvector -------------------------+------------------------------ automate business audit | 'audit':3 'autom':1 'busi':2 audit trial | 'audit':1 'trial':2 trial | 'trial':1 audit | 'audit':1 fresh report | 'fresh':1 'report':2 fresh audit | 'audit':2 'fresh':1 automate this script | 'autom':1 'script':3 im trying here | 'im':1 'tri':2 automate this business | 'autom':1 'busi':3 lateral | 'later':1

You can see here that not all of these stems make sense. That's because we're using the default algorithmic stemmer Snowball. The important thing here is you can search these terms effectively because the queries undergo the same stemming: both queries and input get reduced to lexemes. If you need something more accurate that doesn't reduce to later or autom then Hunspell can take you the full distance: it'll be massively faster than your method, but slower then the Snowball method.

Full Text Search

What you're doing is creating a normalized relational Full Text Search. That seems problematic and wasteful. PostgreSQL (and most other professional databases) have the functionality to handle all of this..

SELECT txt, to_tsvector(txt) FROM ( VALUES ( 'automate business audit' ), ( 'audit trial' ), ( 'trial' ), ( 'audit' ), ( 'fresh report' ), ( 'fresh audit' ), ( 'automate this script' ), ( 'im trying here' ), ( 'automate this business' ), ( 'lateral' ) ) AS t(txt); txt | to_tsvector -------------------------+------------------------------ automate business audit | 'audit':3 'autom':1 'busi':2 audit trial | 'audit':1 'trial':2 trial | 'trial':1 audit | 'audit':1 fresh report | 'fresh':1 'report':2 fresh audit | 'audit':2 'fresh':1 automate this script | 'autom':1 'script':3 im trying here | 'im':1 'tri':2 automate this business | 'autom':1 'busi':3 lateral | 'later':1

You can see here that not all of these stems make sense. That's because we're using the default algorithmic stemmer Snowball. The important thing here is you can search these terms effectively because the queries undergo the same stemming: both queries and input get reduced to lexemes. If you need something more accurate that doesn't reduce to later or autom then Hunspell can take you the full distance: it'll be massively faster than your method, but slower then the Snowball method.

Using Full Text Search can drastically simplify your schema too. If you don't actually need the internal mechanisms, you can just

CREATE TABLE pulledtext ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, basetext text ); CREATE INDEX ON pulledtext ( to_tsvector('english', basetext) ); ## Index search like this, SELECT * FROM pulledtext WHERE to_tsvector('english', basetext) @@ to_tsvector('search_term');

Source Link

answered Dec 24, 2017 at 20:51

Evan Carroll

65.8k
50
263
512

Loading

Stack Exchange Network

Return to Answer

Full Text Search

Full Text Search