5

In my postgreSQL database I have a table named "product". In this table I have a column named "date_touched" with type timestamp. I created a simple btree index on this column. This is the schema of my table (I omitted irrelevant column & index definitions):

 Table "public.product" Column | Type | Modifiers ---------------------------+--------------------------+------------------- id | integer | not null default nextval('product_id_seq'::regclass) date_touched | timestamp with time zone | not null Indexes: "product_pkey" PRIMARY KEY, btree (id) "product_date_touched_59b16cfb121e9f06_uniq" btree (date_touched) 

The table has ~300,000 rows and I want to get the n-th element from the table ordered by "date_touched". when I want to get the 1000th element, it takes 0.2s, but when I want to get the 100,000th element, it takes about 6s. My question is, why does it take too much time to retrieve the 100,000th element, although I've defined a btree index?

Here is my query with explain analyze that shows postgreSQL does not use the btree index and instead sorts all rows to find the 100,000th element:

  • first query (100th element):
explain analyze SELECT product.id FROM product ORDER BY product.date_touched ASC LIMIT 1 OFFSET 1000; QUERY PLAN ----------------------------------------------------------------------------------------------------- Limit (cost=3035.26..3038.29 rows=1 width=12) (actual time=160.208..160.209 rows=1 loops=1) -> Index Scan using product_date_touched_59b16cfb121e9f06_uniq on product (cost=0.42..1000880.59 rows=329797 width=12) (actual time=16.651..159.766 rows=1001 loops=1) Total runtime: 160.395 ms 
  • second query (100,000th element):
explain analyze SELECT product.id FROM product ORDER BY product.date_touched ASC LIMIT 1 OFFSET 100000; QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=106392.87..106392.88 rows=1 width=12) (actual time=6621.947..6621.950 rows=1 loops=1) -> Sort (cost=106142.87..106967.37 rows=329797 width=12) (actual time=6381.174..6568.802 rows=100001 loops=1) Sort Key: date_touched Sort Method: external merge Disk: 8376kB -> Seq Scan on product (cost=0.00..64637.97 rows=329797 width=12) (actual time=1.357..4184.115 rows=329613 loops=1) Total runtime: 6629.903 ms 
1
  • 1
    Looks like a similar problem mentioned here? Check your Postgres and machine configuration. postgresql.org/message-id/… Commented Jan 1, 2016 at 8:33

1 Answer 1

4

It is a very good thing, that SeqScan is used here. Your OFFSET 100000 is not a good thing for the IndexScan.

A bit of theory

Btree indexes contain 2 structures inside:

  1. balanced tree and
  2. double-linked list of keys.

First structure allows for fast keys lookups, second is responsible for the ordering. For bigger tables, linked list cannot fit into a single page and therefore it is a list of linked pages, where each page's entries maintain ordering, specified during index creation.

It is wrong to think, though, that such pages are sitting together on the disk. In fact, it is more probable that those are spread across different locations. And in order to read pages based on the index's order, system has to perform random disk reads. Random disk IO is expensive, compared to sequential access. Therefore good optimizer will prefer a SeqScan instead.

I highly recommend “SQL Performance Explained” book to better understand indexes. It is also available on-line.

What is going on?

Your OFFSET clause would cause database to read index's linked list of keys (causing lots of random disk reads) and than discarding all those results, till you hit the wanted offset. And it is good, in fact, that Postgres decided to use SeqScan + Sort here — this should be faster.

You can check this assumption by:

  • running EXPLAIN (analyze, buffers) of your big-OFFSET query
  • than do SET enable_seqscan TO 'off';
  • and run EXPLAIN (analyze, buffers) again, comparing the results.

In general, it is better to avoid OFFSET, as DBMSes not always pick the right approach here. (BTW, which version of PostgreSQL you're using?) Here's a comparison of how it performs for different offset values.


EDIT: In order to avoid OFFSET one would have to base pagination on the real data, that exists in the table and is a part of the index. For this particular case, the following might be possible:

  • show first N (say, 20) elements
  • include maximal date_touched that is shown on the page to all the “Next” links. You can compute this value on the application side. Do similar for the “Previous” links, except include minimal date_touch for these.
  • on the server side you will get the limiting value. Therefore, say for the “Next” case, you can do a query like this:
SELECT id FROM product WHERE date_touched > $max_date_seen_on_the_page ORDER BY date_touched ASC LIMIT 20; 

This query makes best use of the index.

Of course, you can adjust this example to your needs. I used pagination as it is a typical case for the OFFSET.

One more note — querying 1 row many times, increasing offset for each query by 1, will be much more time consuming, than doing a single batch query that returns all those records, which are then iterated from on the application side.

Sign up to request clarification or add additional context in comments.

2 Comments

so how can I avoid offset as you said? is there any alternative query? also why it does not use "balanced tree" to find n'th element? I know these kind of data structures can be used both for key lookups and finding n'th element (if it saves number of children in each node of tree). is there such kind of index in postgresql to perform finding n'th element in logarithmic time instead of linear scan? (thanks for your complete answer!)
thanks for your example. do you know if there is any index for postgresql that answers n'th element query using balanced tree in logarithmic time? (same as key lookup query)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.