I have a website for showing matches of a RTS game.
One of the main use cases is to show recent matches of a player or a group of players.
PostgreSQL 14.13
Table structure:
create table public.match ( match_id integer primary key not null, started timestamp(3) without time zone, ... ); create index "IDX_e7b6cfca8139b9aa85880aab9e" on match using btree (started); create table public.player ( match_id integer not null, profile_id integer not null, slot smallint not null, ..., primary key (match_id, profile_id, slot), foreign key (match_id) references public.match (match_id) match simple on update cascade on delete restrict, foreign key (profile_id) references public.profile (profile_id) match simple on update cascade on delete restrict ); create index idx_player_profile_match on player using btree (profile_id, match_id); create index "IDX_58afd2c450f166eacbdf982841" on player using btree (match_id); create index "IDX_ba3de28aa98207f3a21145feb8" on player using btree (profile_id); The match table has 50 million records. The player table 200 million records.
There are about 50000 matches with 4 players on average added per day to the database.
The tables currently contain data for the last 3 years. The match table has relation_size 6 GB and total_relation_size 12 GB. The player table 15 GB and 37 GB.
Querying the DB for the most recent 50 matches for a player or for all matches in the last month for a player takes almost a second. After the first execution it is more fast because it is cached but that does not help me as often different users are requested.
-- most recent 50 matches for a player EXPLAIN ANALYSE SELECT * FROM match m JOIN player p on p.match_id = m.match_id WHERE p.profile_id=582058 ORDER BY started desc LIMIT 50; Limit (cost=12075.34..12081.09 rows=50 width=107) (actual time=675.517..682.833 rows=50 loops=1) -> Gather Merge (cost=12075.34..12158.48 rows=723 width=107) (actual time=675.516..682.828 rows=50 loops=1) Workers Planned: 1 Workers Launched: 1 -> Sort (cost=11075.33..11077.14 rows=723 width=107) (actual time=663.726..663.731 rows=38 loops=2) Sort Key: m.started DESC Sort Method: top-N heapsort Memory: 39kB Worker 0: Sort Method: top-N heapsort Memory: 37kB -> Nested Loop (cost=22.66..11051.31 rows=723 width=107) (actual time=3.286..661.017 rows=1882 loops=2) -> Parallel Bitmap Heap Scan on player p (cost=22.09..4853.22 rows=723 width=40) (actual time=2.774..210.891 rows=1882 loops=2) Recheck Cond: (profile_id = 582058) Heap Blocks: exact=1855 " -> Bitmap Index Scan on ""IDX_ba3de28aa98207f3a21145feb8"" (cost=0.00..21.79 rows=1229 width=0) (actual time=4.818..4.819 rows=3772 loops=1)" Index Cond: (profile_id = 582058) -> Index Scan using match_pkey on match m (cost=0.56..8.57 rows=1 width=67) (actual time=0.237..0.237 rows=1 loops=3764) Index Cond: (match_id = p.match_id) Planning Time: 0.305 ms Execution Time: 682.897 ms -- matches in the last month for a player EXPLAIN ANALYSE SELECT * FROM match m JOIN player p on p.match_id = m.match_id WHERE m.started > '2024-09-01' AND p.profile_id=271202 ORDER BY started desc; Gather Merge (cost=12053.93..12057.61 rows=32 width=107) (actual time=797.710..797.881 rows=161 loops=1) Workers Planned: 1 Workers Launched: 1 -> Sort (cost=11053.92..11054.00 rows=32 width=107) (actual time=782.572..782.578 rows=80 loops=2) Sort Key: m.started DESC Sort Method: quicksort Memory: 36kB Worker 0: Sort Method: quicksort Memory: 35kB -> Nested Loop (cost=22.66..11053.12 rows=32 width=107) (actual time=171.842..781.756 rows=80 loops=2) -> Parallel Bitmap Heap Scan on player p (cost=22.09..4853.22 rows=723 width=40) (actual time=1.254..268.523 rows=1682 loops=2) Recheck Cond: (profile_id = 271202) Heap Blocks: exact=1721 " -> Bitmap Index Scan on ""IDX_ba3de28aa98207f3a21145feb8"" (cost=0.00..21.79 rows=1229 width=0) (actual time=1.712..1.712 rows=3378 loops=1)" Index Cond: (profile_id = 271202) -> Index Scan using match_pkey on match m (cost=0.56..8.58 rows=1 width=67) (actual time=0.304..0.304 rows=0 loops=3365) Index Cond: (match_id = p.match_id) Filter: (started > '2024-09-01 00:00:00'::timestamp without time zone) Rows Removed by Filter: 1 Planning Time: 0.446 ms Execution Time: 797.931 ms When executing this again the execution time drops to about 40ms because of caching.
As far a I understand Postgres first finds all player records for the given profile_id and then orders by started date and then filters by count/started date. This way the query will become more and more inefficient as more matches are added to the database.
My idea is to add the started date of the match to the player table and create an index on profile_id, started.
Do you think this is a good solution? Are there other options?
I also thought about partitioning match/player table by started date but this seems to be a lot of effort.
Update using the answer from Erwin Brandstetter:
Thank you for your advice!
I created the index and filled the started column of the player table:
CREATE INDEX player_profile_id_started_match_id ON player (profile_id, started) INCLUDE (match_id); The recent matches query for one player is very fast now:
-- get recent 50 matches for one player EXPLAIN ANALYSE SELECT * FROM ( SELECT p.match_id FROM player p WHERE p.profile_id=199325 ORDER BY p.started DESC LIMIT 50 ) p JOIN match m USING (match_id); Nested Loop (cost=1.14..500.47 rows=50 width=67) (actual time=2.721..21.442 rows=50 loops=1) -> Limit (cost=0.57..70.85 rows=50 width=12) (actual time=2.702..14.494 rows=50 loops=1) -> Index Only Scan Backward using player_profile_id_started_match_id on player p (cost=0.57..1739.19 rows=1237 width=12) (actual time=2.701..14.480 rows=50 loops=1) Index Cond: (profile_id = 199325) Heap Fetches: 50 -> Index Scan using match_pkey on match m (cost=0.56..8.58 rows=1 width=67) (actual time=0.138..0.138 rows=1 loops=50) Index Cond: (match_id = p.match_id) Planning Time: 0.222 ms Execution Time: 21.482 ms I also have the use case of querying the recent matches for multiple players (One or more of the players need to be in the match).
-- get recent 50 matches from multiple players EXPLAIN ANALYSE SELECT * FROM ( SELECT p.match_id FROM player p WHERE p.profile_id IN (1136191, 19455781) ORDER BY p.started DESC LIMIT 50 ) p JOIN match m USING (match_id); Nested Loop (cost=3557.08..3986.26 rows=50 width=67) (actual time=511.692..515.601 rows=50 loops=1) -> Limit (cost=3556.51..3556.64 rows=50 width=12) (actual time=511.658..511.671 rows=50 loops=1) -> Sort (cost=3556.51..3562.70 rows=2473 width=12) (actual time=511.657..511.663 rows=50 loops=1) Sort Key: p.started DESC Sort Method: top-N heapsort Memory: 29kB -> Index Only Scan using player_profile_id_started_match_id on player p (cost=0.57..3474.36 rows=2473 width=12) (actual time=0.592..508.846 rows=5277 loops=1) " Index Cond: (profile_id = ANY ('{1136191,19455781}'::integer[]))" Heap Fetches: 2965 -> Index Scan using match_pkey on match m (cost=0.56..8.58 rows=1 width=67) (actual time=0.078..0.078 rows=1 loops=50) Index Cond: (match_id = p.match_id) Planning Time: 1.090 ms Execution Time: 515.662 ms In this example it was very slow. But after running VACUUM ANALYZE public.player; which took 7 minutes, it is running fast (about 30ms).
Note: I have not yet updated the foreign key and unique index because the started date will never change after the creation of match and player records.
SELECT *. Which columns do you actually need? Postgres version?SELECT version();Can there be multiple entries for the same(match_id, profile_id)in tableplayer? (If so, why?)(match_id, profile_id)in tableplayerWHERE p.profile_id IN (1136191, 19455781)finds any matches for the given players. Didn't you want matches that all of the given players played together? That would be a different query, key word "relational division". See: stackoverflow.com/a/7774879/939860. Also, I hope you see index-only scans afterVACUUM? That's the whole point ofINCLUDE (match_id).