Postgres full-text search: make a tsvector by splitting on whitespace or by providing an array of tokens

Question

I have a text search problem where I need to search systematically-generated text, i.e. not human-written natural language text.

The typical ts_tovector('english', 'foo bar baz') is not particularly helpful. In some cases it generates tokens which I know will be lead to false-positive search results.

Instead I'd really just like to either provide the tokens in a string where each token is separated by whitespace, or provide an array of ordered tokens.

For example, something along the lines of to_tsvector(array["foo", "bar", "baz"]) should produce three tokens: foo, bar, and baz. This seems like a pretty basic thing, but so far I haven't found any explicit documentation of this functionality.

Laurenz Albe · Accepted Answer · 2021-04-13 05:58:29Z

2

This is indeed a basic thing, and all you have to do is use the simple text search configuration:

to_tsvector('simple', 'foo bar baz')

edited Apr 13, 2021 at 5:58

answered Apr 13, 2021 at 3:21

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alex Klibisz Over a year ago

Thanks. I also found array_to_tsvector works nicely for taking an array of pre-computed tokens.

Laurenz Albe Over a year ago

Ah, good. I have removed the misleading part of my answer.

Collectives™ on Stack Overflow

Postgres full-text search: make a tsvector by splitting on whitespace or by providing an array of tokens

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related