26

Given the string:

'I think that PostgreSQL is nifty'

I would like to operate on the individual words found within that string. Essentially, I have a separate from which I can get word details and would like to join an unnested array of that string on this dictionary.

So far I have:

select word, meaning, partofspeech from unnest(string_to_array('I think that PostgreSQL is nifty',' ')) as word from table t join dictionary d on t.word = d.wordname; 

This accomplishes the fundamentals of what I was hoping to do, but it does not preserve the original word order.

Related question:
PostgreSQL unnest() with element number

2
  • Do you want to process one string or a whole table of strings? If so, does the table have a primary key? Commented Oct 19, 2012 at 21:35
  • @ErwinBrandstetter one string in a table (that does have a primary key) Commented Oct 19, 2012 at 22:14

1 Answer 1

33

WITH ORDINALITY and string_to_table() in Postgres 14 or later

SELECT s.string_id, x.* FROM strings s , string_to_table(s.string, ' ') WITH ORDINALITY x(word, rn) ORDER BY s.string_id, x.rn; 

fiddle

WITH ORDINALITY in Postgres 9.4 or later

The query can now simply be:

SELECT * FROM regexp_split_to_table('I think Postgres is nifty', ' ') WITH ORDINALITY x(word, rn); 

Applied to a table:

SELECT s.string_id, x.* FROM strings s , regexp_split_to_table(s.string, ' ') WITH ORDINALITY x(word, rn) ORDER BY s.string_id, x.rn; 

Or:

SELECT s.string_id, x.* FROM strings s , unnest(string_to_array(s.string, ' ')) WITH ORDINALITY x(word, rn) ORDER BY s.string_id, x.rn; 

Details:

About the implicit LATERAL join:

fiddle

Postgres 9.3 or older - and more general explanation

For a single string

You can apply the window function row_number() to remember the order of elements. However, with the usual row_number() OVER (ORDER BY col) you get numbers according to the sort order, not the original position in the string.

You could simply omit ORDER BY to get the position "as is":

SELECT *, row_number() OVER () AS rn FROM regexp_split_to_table('I think Postgres is nifty', ' ') AS x(word); 

Performance of regexp_split_to_table() degrades with long strings. unnest(string_to_array(...)) scales better:

SELECT *, row_number() OVER () AS rn FROM unnest(string_to_array('I think Postgres is nifty', ' ')) AS x(word); 

However, while this normally works and I have never seen it break in simple queries, Postgres asserts nothing as to the order of rows without an explicit ORDER BY.

To guarantee ordinal numbers of elements in the original string, use generate_subscript() (improved with comment by @deszo):

SELECT arr[rn] AS word, rn FROM ( SELECT *, generate_subscripts(arr, 1) AS rn FROM string_to_array('I think Postgres is nifty', ' ') AS x(arr) ) y; 

For a table of strings

Add PARTITION BY id to the OVER clause ...

Demo table:

CREATE TABLE strings(string text); INSERT INTO strings VALUES ('I think Postgres is nifty') , ('And it keeps getting better') ; 

I use ctid as ad-hoc substitute for a primary key. If you have one (or any unique column) use that instead.

SELECT *, row_number() OVER (PARTITION BY ctid) AS rn FROM ( SELECT ctid, unnest(string_to_array(string, ' ')) AS word FROM strings ) x; 

This works without any distinct ID:

SELECT arr[rn] AS word, rn FROM ( SELECT *, generate_subscripts(arr, 1) AS rn FROM ( SELECT string_to_array(string, ' ') AS arr FROM strings ) x ) y; 

Answer to original question

SELECT z.arr, z.rn, z.word, d.meaning -- , partofspeech -- ? FROM ( SELECT *, arr[rn] AS word FROM ( SELECT *, generate_subscripts(arr, 1) AS rn FROM ( SELECT string_to_array(string, ' ') AS arr FROM strings ) x ) y ) z JOIN dictionary d ON d.wordname = z.word ORDER BY z.arr, z.rn; 
7
  • 1
    You can also exploit Pg's quirky SRF-in-SELECT-list behaviour: SELECT generate_series(1,array_length(word_array,1)), unnest(word_array) FROM ..... 9.3's LATERAL might provide nicer solutions to this problem. Commented Oct 21, 2012 at 8:46
  • 2
    Wouldn't generate_subscripts(arr, 1) work instead of generate_series(1, array_upper(arr, 1))? I'd prefer the former for clarity. Commented Nov 8, 2012 at 13:46
  • 1
    @Erwin have you seen this WITH ORDINALITY post from depesz? Commented Sep 28, 2013 at 15:03
  • 1
    @JackDouglas: As it happens, we had a discussion about a related topic on Friday, which lead me to a similar discovery. I added a bit to the answer. Commented Sep 30, 2013 at 13:00
  • 1
    The link for "details" just links to this same page. That's confusing. Commented Dec 30, 2019 at 21:21

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.