In this example it queries ~135k rows, this can go up to almost ~600k. As you can see 99% of the time is spend on the Bitmap heap scan. I run a db.t2.large RDS database at AWS with 100gb (SSD) of storage, which runs postgres 9.4.7
In this example it queries ~135k rows, this can go up to almost ~600k. As you can see 99% of the time is spend on the Bitmap heap scan. I run a db.t2.large RDS database at AWS with 100gb of storage, which runs postgres 9.4.7
In this example it queries ~135k rows, this can go up to almost ~600k. As you can see 99% of the time is spend on the Bitmap heap scan. I run a db.t2.large RDS database at AWS with 100gb (SSD) of storage, which runs postgres 9.4.7
Column | Type | Modifiers --------------+-----------------------------+--------------------------------------------------- id | integer | pkey project_id | integer | not null deleted | boolean | active | boolean | created_at | timestamp with time zone | Indexes: "mailings_pkey" PRIMARY KEY, btree (id) "mailing_on_project_id_where_not_deleted_and_active" btree (project_id) (WHERE deleted IS NOT true AND active IS TRUE) + individual indexes on almost all columns Column | Type | Modifiers --------------+-----------------------------+--------------------------------------------------- id | integer | pkey project_id | integer | not null deleted | boolean | active | boolean | created_at | timestamp with time zone | Indexes: "mailings_pkey" PRIMARY KEY, btree (id) "mailing_on_project_id_where_not_deleted_and_active" btree (project_id) (WHERE deleted IS NOT true) + individual indexes on almost all columns Column | Type | Modifiers --------------+-----------------------------+--------------------------------------------------- id | integer | pkey project_id | integer | not null deleted | boolean | active | boolean | created_at | timestamp with time zone | Indexes: "mailings_pkey" PRIMARY KEY, btree (id) "mailing_on_project_id_where_not_deleted_and_active" btree (project_id) (WHERE deleted IS NOT true AND active IS TRUE) + individual indexes on almost all columns Bitmap Heapscan is slow
I have 2 tables: mailings and form_responses. The form_responses table has a foreign key with the mailings table, so a mailing has one-to-many form_responses. What I want to do is calculate mailing stats: per mailing get total form_responses and when the last form_response was received. I want to return the first 20 mailings that have last received a form_response. But when it has to query 50k+ rows, the query gets really slow.
mailings (~100k rows)
Column | Type | Modifiers --------------+-----------------------------+--------------------------------------------------- id | integer | pkey project_id | integer | not null deleted | boolean | active | boolean | created_at | timestamp with time zone | Indexes: "mailings_pkey" PRIMARY KEY, btree (id) "mailing_on_project_id_where_not_deleted_and_active" btree (project_id) (WHERE deleted IS NOT true) + individual indexes on almost all columns form_responses (~6m rows)
Column | Type | Modifiers --------------+-----------------------------+--------------------------------------------------- id | integer | pkey project_id | integer | not null mailing_id | integer | deleted | boolean | is_unanswered| boolean | created_at | timestamp with time zone | Indexes: "form_responses_pkey" PRIMARY KEY, btree (id) "form_responses_on_pid_cr_time_where_not_del_not_is_unans" btree (project_id, created_time) (WHERE deleted IS NOT true AND is_uanswered IS NOT TRUE) + individual indexes on almost all columns I want to get the mailing stats of a project (id=1), within a date range. I only want the stats of mailings that are not deleted and active and only count non deleted responses.
So this gives me the following query
EXPLAIN (ANALYZE, BUFFERS) SELECT count(DISTINCT form_responses.id) AS total, max(form_responses.created_time) AS form_response_last_received, (SELECT mailings.name FROM mailings WHERE id = form_responses.mailing_id) AS name, form_responses.mailing_id AS mailing_id FROM form_responses JOIN mailings ON mailings.id = form_responses.mailing_id WHERE form_responses.deleted IS NOT true AND form_responses.is_unanswered IS NOT true AND form_responses.created_time >= '2017-08-02 00:00:00' AND form_responses.created_time <= '2018-01-29 00:00:00' AND form_responses.project_id = 1 AND mailings.deleted IS NOT true AND mailings.project_id = 1 AND mailings.active IS true GROUP BY form_responses.mailing_id ORDER BY form_response_last_received DESC LIMIT 20 OFFSET 0 This gives me the following output:
Limit (cost=308997.69..308997.74 rows=20 width=16) (actual time=7762.554..7762.565 rows=20 loops=1) Buffers: shared hit=74535 read=13513 dirtied=1 -> Sort (cost=308997.69..308998.19 rows=197 width=16) (actual time=7762.552..7762.557 rows=20 loops=1) Sort Key: (max(form_responses.created_time)) Sort Method: top-N heapsort Memory: 26kB Buffers: shared hit=74535 read=13513 dirtied=1 -> GroupAggregate (cost=307039.18..308992.45 rows=197 width=16) (actual time=7673.523..7762.278 rows=686 loops=1) Group Key: form_responses.mailing_id Buffers: shared hit=74535 read=13513 dirtied=1 -> Sort (cost=307039.18..307111.58 rows=28961 width=16) (actual time=7673.277..7702.783 rows=137542 loops=1) Sort Key: form_responses.mailing_id Sort Method: quicksort Memory: 12592kB Buffers: shared hit=71790 read=13512 dirtied=1 -> Hash Join (cost=7376.42..304892.90 rows=28961 width=16) (actual time=44.576..7609.818 rows=137542 loops=1) Hash Cond: (form_responses.mailing_id = mailings.id) Buffers: shared hit=71790 read=13512 dirtied=1 -> Bitmap Heap Scan on form_responses (cost=3815.27..299862.55 rows=134811 width=16) (actual time=41.439..7499.305 rows=137543 loops=1) Recheck Cond: ((project_id = 1) AND (created_time >= '2017-08-02 00:00:00+00'::timestamp with time zone) AND (created_time <= '2018-01-29 00:00:00+00'::timestamp with time zone) AND (deleted IS NOT TRUE) AND (is_unanswered IS NOT TRUE)) Heap Blocks: exact=84014 Buffers: shared hit=71032 read=13512 -> Bitmap Index Scan on form_responses_on_pid_cr_time_where_not_del_not_is_unans (cost=0.00..3781.57 rows=134811 width=0) (actual time=23.892..23.892 rows=137543 loops=1) Index Cond: ((project_id = 1) AND (created_time >= '2017-08-02 00:00:00+00'::timestamp with time zone) AND (created_time <= '2018-01-29 00:00:00+00'::timestamp with time zone)) Buffers: shared hit=530 -> Hash (cost=3286.82..3286.82 rows=21946 width=4) (actual time=3.093..3.093 rows=2866 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 101kB Buffers: shared hit=758 dirtied=1 -> Bitmap Heap Scan on mailings (cost=874.50..3286.82 rows=21946 width=4) (actual time=0.403..2.145 rows=2866 loops=1) Recheck Cond: ((project_id = 1) AND (deleted IS NOT TRUE) AND (active IS TRUE)) Heap Blocks: exact=732 Buffers: shared hit=758 dirtied=1 -> Bitmap Index Scan on mailing_on_project_id_where_not_deleted_and_active (cost=0.00..869.01 rows=21946 width=0) (actual time=0.315..0.315 rows=2866 loops=1) Index Cond: (project_id = 1) Buffers: shared hit=26 SubPlan 1 -> Index Scan using mailings_pkey on mailings mailings_1 (cost=0.42..8.44 rows=1 width=26) (actual time=0.003..0.003 rows=1 loops=686) Index Cond: (id = form_responses.mailing_id) Buffers: shared hit=2745 read=1 Planning time: 0.865 ms Execution time: 7764.951 ms In this example it queries ~135k rows, this can go up to almost ~600k. As you can see 99% of the time is spend on the Bitmap heap scan. I run a db.t2.large RDS database at AWS with 100gb of storage, which runs postgres 9.4.7
- Is there anyway to avoid doing the bitmap heap scan?
- Or a way to improve the speed of this?
- Can I create a covering index, so it doesn't have to read the form responses table itself?
- Should I cluster the form_responses table on project_id and/or mailing_id?
- Should I upgrade my hardware?
- Am I asking too much of my Database?