Query Optimization

Question

I've tested the following query in 2 databases with exactly the same structure, in the first one, with has 4M entries, it returned me the result in 33 seconds. The second table has 29M rows and it's been 16 hours since I executed the query and I haven't got a return yet.

SELECT sbvpip*4 as smallbvpip,btnvpip*4 as buttonvpip, sum(amt_won)*400/count(*) AS winrate, count(*) as count FROM holdem_hand_player_statistics JOIN ( SELECT id_player AS pid2, id_hand AS hid, sbvpip FROM holdem_hand_player_statistics JOIN ( SELECT id_player AS pid, ROUND(avg(flg_vpip::int)*25) AS sbvpip FROM holdem_hand_player_statistics WHERE position = 8 AND cnt_players = 6 GROUP BY id_player ) AS auxtable ON pid = id_player WHERE position = 8 AND cnt_players = 6 ) AS auxtable2 ON hid = id_hand JOIN ( SELECT id_player AS pid4, id_hand AS hid2, btnvpip FROM holdem_hand_player_statistics JOIN ( SELECT id_player AS pid3, ROUND(avg(flg_vpip::int)*25) AS btnvpip FROM holdem_hand_player_statistics WHERE position = 0 AND cnt_players = 6 GROUP BY id_player ) AS auxtable3 ON pid3 = id_player WHERE position = 0 AND cnt_players = 6 ) AS auxtable4 ON hid2 = id_hand WHERE POSITION = 0 and cnt_players = 6 GROUP BY sbvpip,btnvpip ORDER BY 1,2;

What can I do to make this query execute faster ?

Is it likely that the table is corrupted or something like that ? One table is only 7~8 times bigger than the other, but the it is taking 15000x more time to process, is this normal ?

Any other comments are welcome !

If my english is not clear, just let me know that I'll try to express myself in a different way.

Thank you very much for any help,

ADDITIONAL INFO:

From the variables that I am using, 3 of them are indexes: id_hand,id_player,position. The primary key is (id_hand, id_player). The table has 129 columns and 6 indexes in total.

I also ran EXPLAIN in both tables and I got different results. Both the results are on a gdocs spreadsheet: https://spreadsheets.google.com/ccc?key=tGxqxVNzHYznb1VVjtKyAuw&authkey=CJ-BiYkN&authkey=CJ-BiYkN#gid=0

Without any idea about your datamodel and indexes, it's nearly impossible to help you. Could you also show us the result from EXPLAIN? — Frank Heikens
– Frank Heikens, Commented Dec 7, 2010 at 20:20
Frank, when I arrive home I'll get this info and post it. Thanks. — joaoavf
– joaoavf, Commented Dec 7, 2010 at 21:48
If there were a lot of deletes/updates then "a few days ago" is not enough. I hope you did not disable auto-vacuum! — user330315
– user330315, Commented Dec 8, 2010 at 19:12

HLGEM · Accepted Answer · 2010-12-07 21:55:27Z

3

I would suggest that indexing is either nonexistant or incorrect on one of the servers.

There also could be blocking preventing the query from finishing. Espcially if there isa an uncommitted transacation sitting out there.

edited Dec 7, 2010 at 21:55

answered Dec 7, 2010 at 20:23

HLGEM

97k15 gold badges120 silver badges191 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

joaoavf Over a year ago

The two tables were created by the same application, so I expect them to have exactly the same indexes. Other queries utilizing this columns are working fine. But in case there is a problem with indexing, how can I check it ?

HLGEM Over a year ago

I'm not sure how you would look for the indexes in PostgreSQl.

araqnid · Accepted Answer · 2010-12-07 20:44:49Z

Possibly you're using a lot more sort memory for the greater number of rows: what is your work_mem setting? Similarly with the buffercache- since you are scanning the same table multiple times, fitting rows into the cache is likely to be crucial.

Also, you should re-examine that query and try and find ways of not having to join the statistics table back onto itself so many times. It's hard to advise without at least some small test data and expected output. Which version of PostgreSQL are you using? With 8.4, you might be able to get both auxtable and auxtable3 coming from a single CTE at least...

Araqnid, I'll check this settings when I arrive home. I'll also check my PostgreSQL version, but I believe it is 8.4, so I'll try to build the query using a single CTE. What do you mean by small test data and expected output ? I can try to get it. Thanks.

Bonshington · Accepted Answer · 2010-12-07 20:28:11Z

the query looks fine. to improve performance try to do indexing like @HLGEM said. Also try to execute each individual sub-query to see which one has low performance.

Jason Goemaat · Accepted Answer · 2010-12-08 07:31:49Z

I could easily believe the queries taking that much longer. You have a 29M row table that you are doing multiple groups on and linking back to itself multiple times on different columns. If the entire table doesn't fit into memory there may be a lot of paging involved that wasn't needed with 1/7th the rows. Working inward, you are:

Selecting from a 29M row table on position = 0 and cnt_players = 6
Linking back to a 29M row table on the id_hand column twice
Filtering the 29M row table twice for cnt_players = 6 and positions 0 and 8 and calculating the average flg_vpip by player
Linking to the grouped results on id_hand for millions of rows

Could you split the table into separate ones? What exactly do your fields mean and what would a sample hand look like?

You would need indexes on id_player, id_hand, position and cnt_players at least.

It might be good to include all fields in the index. I'm not sure about postgresql, but SQL Server can skip loading the actual table data pages if all the data it needs for a query is in the index. So if you had an index on position, cnt_players, id_player and flg_vpip, your inner-most selects would probably be a lot faster.

A better way I think would be to calculate those inner selects ahead of time into one table or two if you aren't going to be running the query often.

select id_player, position, cnt_players, ROUND(avg(flg_vpip::int)*25) AS avg_vpip into auxtable from holdem oldem group by id_player, position, cnt_players alter table auxtable add constraint PK_auxtable primary key clustered (id_player, position, cnt_players)

Like this:

SELECT sbvpip*4 as smallbvpip,btnvpip*4 as buttonvpip, sum(amt_won)*400/count(*) AS winrate, count(*) as count FROM holdem JOIN ( SELECT id_player AS pid2, id_hand AS hid, sbvpip FROM holdem JOIN auxtable ON auxtable.id_payer = holdem.id_player and auxtable.position = holdem.position and auxtable.cnt_players = holdem.cnt_players WHERE holdem.position = 8 AND holdem.cnt_players = 6 ) AS auxtable2 ON hid = id_hand

The covering index will not help in Postgres. Unfortunately it does not have a "index only scan"

Collectives™ on Stack Overflow

Query Optimization

4 Answers 4

2 Comments

1 Comment

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

1 Comment

Related