Revisions to PostgreSQL - Working with array of thousands of elements

added [postgresql-performance] to 571 questions - Shog9 (Id=1924)

Link

edited Jan 7, 2020 at 22:25

Community Bot

1

Bumped by Community user

occurred Jun 24, 2019 at 5:01

Bumped by Community user

occurred Feb 24, 2019 at 5:00

added 2487 characters in body

Source Link

edited Apr 6, 2018 at 19:25

Jukurrpa

195
1
5

Update: following Evan's comments, another approach I tried: each item is part of several groups, so instead of passing group's item ids, I tried adding the group ids in mytable:

 Column | Type | Collation | Nullable | Default ---------------+-----------------------------+-----------+----------+--------- item_id | integer | | not null | allowed | boolean | | not null | start_date | timestamp without time zone | | not null | end_date | timestamp without time zone | | not null | group_ids | integer[] | | not null | ... Indexes: "idx_dtr_query" btree (item_id, start_date, allowed, end_date) "idx_dtr_group_ids" gin (group_ids) ...

New query ($1 is the targeted group id):

SELECT item_id, other_stuff, ... FROM ( SELECT -- Partitioned row number as we only want N rows per id ROW_NUMBER() OVER (PARTITION BY item_id ORDER BY start_date) AS r, item_id, other_stuff, ... FROM mytable WHERE $1 = ANY (group_ids) AND end_date > $2 ORDER BY item_id ASC, start_date ASC, allowed ASC ) x WHERE x.r <= 12

Explain analyze:

Subquery Scan on x (cost=123356.60..137112.58 rows=131009 width=74) (actual time=811.337..1087.880 rows=172023 loops=1) Filter: (x.r <= 12) Rows Removed by Filter: 219726 -> WindowAgg (cost=123356.60..132199.73 rows=393028 width=74) (actual time=811.330..1040.121 rows=391749 loops=1) -> Sort (cost=123356.60..124339.17 rows=393028 width=74) (actual time=811.311..868.127 rows=391749 loops=1) Sort Key: item_id, start_date, allowed Sort Method: external sort Disk: 29176kB -> Seq Scan on mytable (cost=0.00..69370.90 rows=393028 width=74) (actual time=0.105..464.126 rows=391749 loops=1) Filter: ((end_date > '2018-04-06 12:00:00'::timestamp without time zone) AND (2928 = ANY (group_ids))) Rows Removed by Filter: 1482567 Planning time: 0.756 ms Execution time: 1098.348 ms

There might be room for improvement with indexes but I'm having a hard time understanding how postgres uses them, so I'm not sure what to change.

Update: following Evan's comments, another approach I tried: each item is part of several groups, so instead of passing group's item ids, I tried adding the group ids in mytable:

 Column | Type | Collation | Nullable | Default ---------------+-----------------------------+-----------+----------+--------- item_id | integer | | not null | allowed | boolean | | not null | start_date | timestamp without time zone | | not null | end_date | timestamp without time zone | | not null | group_ids | integer[] | | not null | ... Indexes: "idx_dtr_query" btree (item_id, start_date, allowed, end_date) "idx_dtr_group_ids" gin (group_ids) ...

New query ($1 is the targeted group id):

SELECT item_id, other_stuff, ... FROM ( SELECT -- Partitioned row number as we only want N rows per id ROW_NUMBER() OVER (PARTITION BY item_id ORDER BY start_date) AS r, item_id, other_stuff, ... FROM mytable WHERE $1 = ANY (group_ids) AND end_date > $2 ORDER BY item_id ASC, start_date ASC, allowed ASC ) x WHERE x.r <= 12

Explain analyze:

Subquery Scan on x (cost=123356.60..137112.58 rows=131009 width=74) (actual time=811.337..1087.880 rows=172023 loops=1) Filter: (x.r <= 12) Rows Removed by Filter: 219726 -> WindowAgg (cost=123356.60..132199.73 rows=393028 width=74) (actual time=811.330..1040.121 rows=391749 loops=1) -> Sort (cost=123356.60..124339.17 rows=393028 width=74) (actual time=811.311..868.127 rows=391749 loops=1) Sort Key: item_id, start_date, allowed Sort Method: external sort Disk: 29176kB -> Seq Scan on mytable (cost=0.00..69370.90 rows=393028 width=74) (actual time=0.105..464.126 rows=391749 loops=1) Filter: ((end_date > '2018-04-06 12:00:00'::timestamp without time zone) AND (2928 = ANY (group_ids))) Rows Removed by Filter: 1482567 Planning time: 0.756 ms Execution time: 1098.348 ms

There might be room for improvement with indexes but I'm having a hard time understanding how postgres uses them, so I'm not sure what to change.