0

Is there a way to use both an index for join and a separate index for a different where clause? Whenever I add the join query- it causes the where index to be ignored.

I have Users table (80,000 records)

id company_id
1 4
2 4
3 5

with btree index on company_id: PRIMARY KEY, btree (id) Index btree on company_id

Comments table (7,000,000 records)

id user_id
11 2
22 2
33 3

with btree index on user_id:

  • PRIMARY KEY, btree (id)
  • Index btree on user_id

If I use a simple where - it uses the company index:

 SELECT users.id FROM users WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333) > Index Scan using index_users_on_company_id on users (cost=0.29..213.88 rows=97 width=4) Index Cond: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[])) 

But when I add, although it uses user_id index at comments, it stopped using the company_id index (Parallel Seq Scan instead of previously used Index Scan):

SELECT users.id FROM users INNER JOIN comments ON comments.user_id = users.id WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333) Gather (cost=1000.43..9632.88 rows=1410 width=4) Workers Planned: 1 -> Nested Loop (cost=0.43..8491.88 rows=829 width=4) -> Parallel Seq Scan on users (cost=0.00..2742.06 rows=57 width=4) Filter: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[])) -> Index Only Scan using index_comments_on_user_id on comments (cost=0.43..98.21 rows=266 width=4) Index Cond: (user_id = users.id) 

To summarize:

Is there a way to use both the company_id index and the user_id index in a single query?

Basically substitute the Parallel Seq Scan on users with Index Scan using index_users_on_company_id on users at the second query?

UPDATE:

After setting max_parallel_workers_per_gather to 0, as @Anton Grig suggested, The same query, uses index instead of Parallel Seq Scan It does performs faster. I wonder if there's a less intrusive way..

Nested Loop (cost=0.72..9998.67 rows=1410 width=4) -> Index Scan using index_users_on_company_id on users (cost=0.29..213.88 rows=97 width=4) Index Cond: (company_id = ANY ('{16111,16198,16113,16133,16135,16134,16136,16137,16138,16139,16141,16142,16143,16144,16145,16146,16147,16333}'::integer[])) -> Index Only Scan using index_comments_on_user_id on comments (cost=0.43..98.21 rows=266 width=4) Index Cond: (user_id = users.id) 

UPDATE2:

I'll add the full query (uses left outer join) I'm running, not a minified version for question

SELECT COUNT(DISTINCT comments.tag_id) FILTER (WHERE comments.archived = FALSE ) AS tags_count FROM users LEFT OUTER JOIN comments ON comments.user_id = users.id WHERE users.company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333) GROUP BY users.id 
3
  • 1
    Unfortunately, in my test example, the Index Cond was used in both cases. I would try to disable the parallel query by using SET max_parallel_workers_per_gather = 0, this should prompt the engine to use the Index Cond instead and don't forget to restore the default value for the Set after. Commented Dec 22, 2021 at 13:23
  • @AntonGrig updated answer, Like you said it uses index when I'm setting max_parallel_workers_per_gather to 0 Commented Dec 22, 2021 at 15:24
  • if users->comments is 1->N (what I assume) the query makes no sense anyway. Commented Dec 22, 2021 at 16:15

1 Answer 1

1

If using max_parallel_workers_per_gather seems intrusive, try changing the query while maintaining logic. I would try something like this:

SELECT user_id FROM comments WHERE user_id In (SELECT id FROM users WHERE company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333)) 

In my case, it still uses Index Cond and is executed in about the same time.

I put it in the answer block to ease the perception of the query.

UPDATE:

If the selectivity of column user_id in table comments is low, I would suggest using Exist operator as follows:

Select id From users Where Exists (Select user_id From comments Where user_id=users.id) And company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333) 

Note: It also removes duplicates from the output.

UPDATE2:

I would create another index in table User instead of the previous one to provide Index Scan only -> Index Cond.

CREATE INDEX idx ON Users(company_id, id); 

and I would change the query as follows:

Select id, (Select COUNT(DISTINCT comments.tag_id) From comments Where comments.archived = FALSE And user_id=users.id) As tags_count From users Where company_id IN (16111, 16198, 16113, 16133, 16135, 16134, 16136, 16137, 16138, 16139, 16141, 16142, 16143, 16144, 16145, 16146, 16147, 16333) 
Sign up to request clarification or add additional context in comments.

1 Comment

Added full query, it uses left outer join, I'll try to apply what you wrote

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.