Revisions to Postgres query is very slow for 1 billion rows

added 33 characters in body

edited Jan 17, 2021 at 21:12

109
1
9

I have around 1 billion rows and 6 columns (id, col1, col2..col5) in my Postgres database table. I have id as my PRIMARY_KEY. I am using this as a read-only database, and every time I only need to retrieve/read 10000 rows based on a list of 10000 ids. The query takes 40-60 seconds to retrieve 10000 rows, which is very slow. Below is my query:

SELECT * FROM table_1b WHERE id IN (18792, 8718, 15010, 16671,.....*list of 10000 ids*....., 290224379, 192494270)

EXPLAIN ANALYZE for the query:

Index Scan using table_1b_pkey on table_1b (cost=0.57..46197.28 rows=10000 width=239) (actual time=13.305..55927.060 rows=10000 loops=1) Index Cond: (id = ANY ('{18792, 8718, 15010, 16671,............,290224379,192494270}'::bigint[])) Planning Time: 13.380 ms Execution Time: 55935.614 ms

I would like to significantly speed up the retrieval of the 10000 rows. I am fairly new to Postgres. I have read other posts and tried a few things including modifying work_mem from 4 MB to 256 MB, but it did not help. The Postgres server is running on a machine with 384 GB RAM.

Appreciate any help in advance.

Edit 1: EXPLAIN ANALYZE for 10 records

Index Scan using table_1b_pkey on table_1b (cost=0.57..49.95 rows=10 width=239) (actual time=17.204..123.489 rows=10 loops=1) Planning Time: 0.378 ms Execution Time: 123.540 ms

Edit 2: track_io_timing = ON; EXPLAIN (ANALYZE, BUFFERS)

Index Scan using table_1b_pkey on table_1b (cost=0.57..17243.33 rows=10000 width=239) (actual time=10time=17.597304..5532455371.853511 rows=10000 loops=1) Buffers: shared hit=28913hit=29891 read=21086read=20108 I/O Timings: read=55032.422 Planning Time: 1113.770113 ms Execution Time: 5533555380.091644 ms

Edit 3: Updated Server Configuration

max_connections = 20 shared_buffers = 25GB effective_cache_size = 75GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 640MB min_wal_size = 1GB max_wal_size = 4GB

I have around 1 billion rows and 6 columns (id, col1, col2..col5) in my Postgres database table. I have id as my PRIMARY_KEY. I am using this as a read-only database, and every time I only need to retrieve/read 10000 rows based on a list of 10000 ids. The query takes 40-60 seconds to retrieve 10000 rows, which is very slow. Below is my query:

SELECT * FROM table_1b WHERE id IN (18792, 8718, 15010, 16671,.....*list of 10000 ids*....., 290224379, 192494270)

EXPLAIN ANALYZE for the query:

Index Scan using table_1b_pkey on table_1b (cost=0.57..46197.28 rows=10000 width=239) (actual time=13.305..55927.060 rows=10000 loops=1) Index Cond: (id = ANY ('{18792, 8718, 15010, 16671,............,290224379,192494270}'::bigint[])) Planning Time: 13.380 ms Execution Time: 55935.614 ms

I would like to significantly speed up the retrieval of the 10000 rows. I am fairly new to Postgres. I have read other posts and tried a few things including modifying work_mem from 4 MB to 256 MB, but it did not help. The Postgres server is running on a machine with 384 GB RAM.

Appreciate any help in advance.

Edit 1: EXPLAIN ANALYZE for 10 records

Index Scan using table_1b_pkey on table_1b (cost=0.57..49.95 rows=10 width=239) (actual time=17.204..123.489 rows=10 loops=1) Planning Time: 0.378 ms Execution Time: 123.540 ms

Edit 2: track_io_timing = ON; EXPLAIN (ANALYZE, BUFFERS)

Index Scan using table_1b_pkey on table_1b (cost=0.57..17243.33 rows=10000 width=239) (actual time=10.597..55324.853 rows=10000 loops=1) Buffers: shared hit=28913 read=21086 Planning Time: 11.770 ms Execution Time: 55335.091 ms

Edit 3: Updated Server Configuration

max_connections = 20 shared_buffers = 25GB effective_cache_size = 75GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 640MB min_wal_size = 1GB max_wal_size = 4GB

I have around 1 billion rows and 6 columns (id, col1, col2..col5) in my Postgres database table. I have id as my PRIMARY_KEY. I am using this as a read-only database, and every time I only need to retrieve/read 10000 rows based on a list of 10000 ids. The query takes 40-60 seconds to retrieve 10000 rows, which is very slow. Below is my query:

SELECT * FROM table_1b WHERE id IN (18792, 8718, 15010, 16671,.....*list of 10000 ids*....., 290224379, 192494270)

EXPLAIN ANALYZE for the query:

Index Scan using table_1b_pkey on table_1b (cost=0.57..46197.28 rows=10000 width=239) (actual time=13.305..55927.060 rows=10000 loops=1) Index Cond: (id = ANY ('{18792, 8718, 15010, 16671,............,290224379,192494270}'::bigint[])) Planning Time: 13.380 ms Execution Time: 55935.614 ms

I would like to significantly speed up the retrieval of the 10000 rows. I am fairly new to Postgres. I have read other posts and tried a few things including modifying work_mem from 4 MB to 256 MB, but it did not help. The Postgres server is running on a machine with 384 GB RAM.

Appreciate any help in advance.

Edit 1: EXPLAIN ANALYZE for 10 records

Index Scan using table_1b_pkey on table_1b (cost=0.57..49.95 rows=10 width=239) (actual time=17.204..123.489 rows=10 loops=1) Planning Time: 0.378 ms Execution Time: 123.540 ms

Edit 2: track_io_timing = ON; EXPLAIN (ANALYZE, BUFFERS)

Index Scan using table_1b_pkey on table_1b (cost=0.57..17243.33 rows=10000 width=239) (actual time=17.304..55371.511 rows=10000 loops=1) Buffers: shared hit=29891 read=20108 I/O Timings: read=55032.422 Planning Time: 13.113 ms Execution Time: 55380.644 ms

Edit 3: Updated Server Configuration

max_connections = 20 shared_buffers = 25GB effective_cache_size = 75GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 640MB min_wal_size = 1GB max_wal_size = 4GB

added 721 characters in body

Source Link

edited Jan 17, 2021 at 17:43

Sushant

109
1
9

I have around 1 billion rows and 6 columns (id, col1, col2..col5) in my Postgres database table. I have id as my PRIMARY_KEY. I am using this as a read-only database, and every time I only need to retrieve/read 10000 rows based on a list of 10000 ids. The query takes 40-60 seconds to retrieve 10000 rows, which is very slow. Below is my query:

SELECT * FROM table_1b WHERE id IN (18792, 8718, 15010, 16671,.....*list of 10000 ids*....., 290224379, 192494270)

EXPLAIN ANALYZE for the query:

Index Scan using table_1b_pkey on table_1b (cost=0.57..46197.28 rows=10000 width=239) (actual time=13.305..55927.060 rows=10000 loops=1) Index Cond: (id = ANY ('{18792, 8718, 15010, 16671,............,290224379,192494270}'::bigint[])) Planning Time: 13.380 ms Execution Time: 55935.614 ms

I would like to significantly speed up the retrieval of the 10000 rows. I am fairly new to Postgres. I have read other posts and tried a few things including modifying work_mem from 4 MB to 256 MB, but it did not help. The Postgres server is running on a machine with 384 GB RAM.

Appreciate any help in advance.

Edit 1: EXPLAIN ANALYZE for 10 records EXPLAIN ANALYZE for 10 records

Index Scan using table_1b_pkey on table_1b (cost=0.57..49.95 rows=10 width=239) (actual time=17.204..123.489 rows=10 loops=1) Planning Time: 0.378 ms Execution Time: 123.540 ms

Edit 2: track_io_timing = ON; EXPLAIN (ANALYZE, BUFFERS)

Index Scan using table_1b_pkey on table_1b (cost=0.57..17243.33 rows=10000 width=239) (actual time=10.597..55324.853 rows=10000 loops=1) Buffers: shared hit=28913 read=21086 Planning Time: 11.770 ms Execution Time: 55335.091 ms

Edit 3: Updated Server Configuration

max_connections = 20 shared_buffers = 25GB effective_cache_size = 75GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 640MB min_wal_size = 1GB max_wal_size = 4GB

I have around 1 billion rows and 6 columns (id, col1, col2..col5) in my Postgres database table. I have id as my PRIMARY_KEY. I am using this as a read-only database, and every time I only need to retrieve/read 10000 rows based on a list of 10000 ids. The query takes 40-60 seconds to retrieve 10000 rows, which is very slow. Below is my query:

SELECT * FROM table_1b WHERE id IN (18792, 8718, 15010, 16671,.....*list of 10000 ids*....., 290224379, 192494270)

EXPLAIN ANALYZE for the query:

Index Scan using table_1b_pkey on table_1b (cost=0.57..46197.28 rows=10000 width=239) (actual time=13.305..55927.060 rows=10000 loops=1) Index Cond: (id = ANY ('{18792, 8718, 15010, 16671,............,290224379,192494270}'::bigint[])) Planning Time: 13.380 ms Execution Time: 55935.614 ms

I would like to significantly speed up the retrieval of the 10000 rows. I am fairly new to Postgres. I have read other posts and tried a few things including modifying work_mem from 4 MB to 256 MB, but it did not help. The Postgres server is running on a machine with 384 GB RAM.

Appreciate any help in advance.

Edit 1: EXPLAIN ANALYZE for 10 records

Index Scan using table_1b_pkey on table_1b (cost=0.57..49.95 rows=10 width=239) (actual time=17.204..123.489 rows=10 loops=1) Planning Time: 0.378 ms Execution Time: 123.540 ms

I have around 1 billion rows and 6 columns (id, col1, col2..col5) in my Postgres database table. I have id as my PRIMARY_KEY. I am using this as a read-only database, and every time I only need to retrieve/read 10000 rows based on a list of 10000 ids. The query takes 40-60 seconds to retrieve 10000 rows, which is very slow. Below is my query:

SELECT * FROM table_1b WHERE id IN (18792, 8718, 15010, 16671,.....*list of 10000 ids*....., 290224379, 192494270)

EXPLAIN ANALYZE for the query:

Index Scan using table_1b_pkey on table_1b (cost=0.57..46197.28 rows=10000 width=239) (actual time=13.305..55927.060 rows=10000 loops=1) Index Cond: (id = ANY ('{18792, 8718, 15010, 16671,............,290224379,192494270}'::bigint[])) Planning Time: 13.380 ms Execution Time: 55935.614 ms

I would like to significantly speed up the retrieval of the 10000 rows. I am fairly new to Postgres. I have read other posts and tried a few things including modifying work_mem from 4 MB to 256 MB, but it did not help. The Postgres server is running on a machine with 384 GB RAM.

Appreciate any help in advance.

Edit 1: EXPLAIN ANALYZE for 10 records

Index Scan using table_1b_pkey on table_1b (cost=0.57..49.95 rows=10 width=239) (actual time=17.204..123.489 rows=10 loops=1) Planning Time: 0.378 ms Execution Time: 123.540 ms

Edit 2: track_io_timing = ON; EXPLAIN (ANALYZE, BUFFERS)

Index Scan using table_1b_pkey on table_1b (cost=0.57..17243.33 rows=10000 width=239) (actual time=10.597..55324.853 rows=10000 loops=1) Buffers: shared hit=28913 read=21086 Planning Time: 11.770 ms Execution Time: 55335.091 ms

Edit 3: Updated Server Configuration

max_connections = 20 shared_buffers = 25GB effective_cache_size = 75GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 640MB min_wal_size = 1GB max_wal_size = 4GB