132

Let's say I have a table with fields A and B. I make regular queries on A+B, so I created a composite index on (A,B). Would queries on only A also be fully optimized by the composite index?

Additionally, I created an index on A, but Postgres still uses the composite index for queries on only A. If the previous answer is positive, I guess it doesn't really matter, but why does it select the composite index by default, if the single A index is available?

4
  • I tried to set up a small test for this. In my case, however, the two-column index was used only when I dropped the single-column one, unrelated to which one was created first. It is interesting that if I created the two-column index first, the initial plan used a bitmap heap scan. If I created the one-column index, then run the query (used index scan) and dropped the newly created index, the plan involving the two-column index switched to index scan. See the steps on SQLFiddle Commented Oct 23, 2012 at 19:38
  • @dezso Interesting. Where are the costs for each query? Commented Oct 23, 2012 at 20:05
  • Bitmap index scan cost: 107.98, 43 ms execution time. Index scan one-column: cost 8.69, two-column: 43.69. Execution times do not differ significantly (the fluctuation is bigger than the difference between the two). Commented Oct 23, 2012 at 20:10
  • @Luciano Can you show the explain analyze and the query text? Commented Oct 23, 2012 at 22:32

3 Answers 3

136

All of this applies to (default) B-tree indices, not other types.

A composite index on (a,b) supports every query that an index on just (a) would. The same is not true for (b,a) vs. (a). Related:

So an additional index on just (a) is redundant. Only the matter of index size remains.

Space is allocated in multiples of MAXALIGN, which is typically 8 bytes on a 64-bit OS or (much less common) 4 bytes on a 32-bit OS. If you are not sure, check pg_controldata. It also depends on data types of indexed columns (some require alignment padding) and actual content.

An index on, say, two integer columns (4 bytes each) typically ends up to be exactly as big as an index on just one, where another 4 bytes are lost to alignment padding.

Apart from that, there is really no downside for the query planner to use an index on (a,b) - compared to an index on just (a). And it is generally preferable for multiple queries to use the same index. The chance for it (or parts of it) to reside in (fast) cache grows when shared.

Since Postgres 13, index deduplication changes the picture gradually. Repeated index values can be compressed. An index on (a,b) is often less duplicative, so it gains less from deduplication. If a is a unique column, deduplication doesn't change much. (There can still be duplicative entries due to the MVCC model, only one visible to any transaction at the same time, so adding b can still have a minor effect.) There is a lot of fine print to when and how deduplication actually takes effect.

Coming from the opposite direction, when you need an additional index on (a,b), consider dropping an existing index on just (a). Often not possible for a PRIMARY KEY or UNIQUE constraint. Since Postgres 11 you might just append b to the constraint definition with an INCLUDE clause instead.

Or create the new index on (b,a) instead to cover queries on just b additionally. For only equality conditions the order of index expressions does not matter. It does, though, when involving range conditions. See:

There are potential downsides to including additional columns in indexes, even if that only uses space otherwise lost to alignment padding:

  • Whenever the additional column is updated, the index now needs an update, too, which might add cost to write operations and create more index bloat.

  • HOT updates ("Heap Only Tuple") on the table are not possible while any index column is involved. See:

How to measure object sizes:

2
  • 2
    Could you extend this to say that, If I have an Index on column A, and a need arises to add a compound index (A,B), the index A should be dropped? If reusing an index improves cache efficiency, and (A,B) fully optimizes, A, then it seems like an additional index on A would waste space and potentially slow things down Commented Apr 2, 2019 at 17:05
  • 1
    @jvans: Generally true - with notable exceptions and alternatives. I added a paragraph to address that. Commented Apr 2, 2019 at 17:32
4

According to your question you have a table with field A and B. If you your query is:

SELECT * FROM [YOUR TBL] WHERE A='XXXX' 

Optimizer will chose the Composite index to avoid Extract random access!

0
-4

It is in the case if you just use just first in the predicate.

It will do scan if you use first columns of composite key and non-key column of composite key.

To trick it you can just dummy predicates like this and then non-key column:

[A,B] is your index, [C] - another column

To utilize index you write as:

SELECT A,B,C,D,E FROM test WHERE A=1 AND B=B AND C=3 

...why does it select the composite index by default, if the single A index is available?

It will use index only in the case if there are one or two predicates [A] Or [A],[B]. It will not use it in the order [B],[A] or [A],[C]. To be able to utilize index with additional column [C], you need to enforce index by ordering predicates as [A],[B] and [C].

2
  • 2
    What exactly do you achieve with B=B? I think you achieve nothing so I'm voting down absent any evidence this isn't just ignored by the optimizer Commented Oct 24, 2012 at 9:57
  • 2
    B=B is effectively the same as B IS NOT NULL, which seems uncalled for. Certainly not needed to use an index on (a,b). Commented May 27, 2014 at 15:05

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.