1

Sample data from my table test_table:

date symbol value created_time 2010-01-09 symbol1 101 3847474847 2010-01-10 symbol1 102 3847474847 2010-01-10 symbol1 102.5 3847475500 2010-01-10 symbol2 204 3847474847 2010-01-11 symbol1 109 3847474847 2010-01-12 symbol1 105 3847474847 2010-01-12 symbol2 206 3847474847 

Given the table above, I am trying to find the optimal index to put on the table (date, symbol, value and created_time should combined be unique) and the query to go along with it to return the following:

date symbol value created_time 2010-01-09 symbol1 101 3847474847 2010-01-10 symbol1 102.5 3847475500 2010-01-10 symbol2 204 3847474847 2010-01-11 symbol1 109 3847474847 2010-01-12 symbol1 105 3847474847 2010-01-12 symbol2 206 3847474847 

I am looking for date, symbol, value columns of data for each group of those three with the maximum created_time column (essentially row 1, 3, 4, 5, 6, 7 in the example above returned).

Currently I have tried this index...

CREATE UNIQUE INDEX "test_table_date_symbol_value_created_time" ON "test_table" USING btree (date, symbol, value, created_time) 

And am using this query. Not sure if it is the most effective way, it still seems pretty slow.

select * from( select date, symbol, value, created_time, max(created_time) over (partition by date, symbol) as max_created_time from "test_table" ) t where symbol in ('symbol1', 'symbol2') and created_time = max_created_time 
3
  • So you want the latest row for every unique date, symbol and value combination? Commented Jan 25, 2017 at 18:17
  • Yes exactly, sorry I am pretty inexperienced with SQL so have been trying to read a lot and take an educated guess before asking. Commented Jan 25, 2017 at 18:25
  • Sorry I meant to say the latest row for each date, symbol combination. Commented Jan 25, 2017 at 19:11

3 Answers 3

1

Postgres supports window functions that suit this situation:

select date, symbol, value, created_time from (select *, rank() over (partition by date, symbol order by created_time desc) as rownum from test_table) x where rownum = 1 

For every combination of date, symbol, this query returns the value and created_time from the row with the highest (ie last) created_time of that date and symbol.

I would suggest this index:

CREATE UNIQUE INDEX test_table_idx ON test_table (date, symbol, created_time, value) 

It's a covering index (has all values you need for the query, obviating the need to access the actual table, and which you already had), but note that created_time comes before value, so data is already in its partition order, and value is the least important attribute, because it doesn't participate in any determination of which row to return.

Sign up to request clarification or add additional context in comments.

4 Comments

Interesting that you for the query, how could I modify this to only give the latest date, symbol unique combination (so only the latest value would be given). I apologize I answered your comment above incorrectly.
@Trevor I have removed value from the partition.
Thank you for the explanations, they make a lot of sense and seems faster at least in my testing so far than my previous attempts. Although moving created_time before value did seem to make it slower but I dont know that much about indices so perhaps its faster in the long run with some more testing (I have about 2400 symbols and about 11 million rows right now).
@Travor try executing analyze test_table then re-trying the queries.
1

Postgresql has the very handy distinct on

select distinct on (symbol, date) * from t order by symbol, date, created_time desc 

https://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

Comments

0

Alternative:


SELECT * FROM test_table tt WHERE NOT EXISTS ( SELECT * FROM test_table nx WHERE nx."date" = tt."date" AND nx.symbol = tt.symbol AND nx.created_time > tt.created_time ); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.