How to optimize multiply join query?

Question

I've got a query like that

SELECT DISTINCT p.id FROM person p INNER JOIN person_func pf1 ON p.id = pf1.person_id INNER JOIN Func f1 ON f1.id = pf1.func_id LEFT JOIN person_location pf2 ON p.id = pf2.person_id LEFT JOIN Location f2 ON f2.id = pf2.location_id AND f2.val='1' LEFT JOIN person_location pf3 ON p.id = pf3.person_id LEFT JOIN Location f3 ON f3.id = pf3.location_id and f3.val='3' WHERE f2.val IS NOT NULL OR f3.val IS NOT NULL;

And in common there are 9-10 joins like these. And it runs very-very slow. I've added indexes on person_func(person_id), person_location(person_id) but it didn't help. What can I do to optimise it?

An example - SQLFiddle Example

If columns val can consist only numeric values, change column data type to INT or SMALLINT, and cover this columns with indexes, because it's common that WHERE clause is covered with indexes. Also you need to create indexes for all foreign keys that appear in JOINs (PERSON(func_id), PERSON(location_id)) — veljasije
– veljasije, Commented Mar 26, 2013 at 7:51
no, it contains varchar(20) data, that's just for simplicity — zobber
– zobber, Commented Mar 26, 2013 at 7:52
ok, whatever, you need to cover it with indexes, because it appears in WHERE clause — veljasije
– veljasije, Commented Mar 26, 2013 at 7:54

Community · Accepted Answer · 2020-06-20 09:12:55Z

If more than ~ 5% of the table have val IS NOT NULL (which is very likely the case) an index on val will be useless for the WHERE clause.

It might help with the JOIN clause though, where you join on a particular value. However, even better: a multi-column index:

CREATE INDEX some_name_idx ON Location (location_id, val)

Depending on the specific situation (in particular, if val = 1 and val = 3 are rare) , partial indexes may serve you even better:

CREATE INDEX some_name1_idx ON Location (location_id) WHERE val = 1 CREATE INDEX some_name3_idx ON Location (location_id) WHERE val = 3

Other than that you need an index on every column that is used in a JOIN or WHERE condition, if the values used are not common-place (less than ~ 5% of the table) - which is true, normally.

All the usual advice for performance optimization applies.

And if you need more specific advice, you need to post a lot mor information. Read the tag wiki of postgresql-performance.

Since you are joining so many tables, you may surpass some limit where Postgres just can't find the best plan any more (too many possible variations). It may become important to write the most selective JOINs first. Generally JOIN comes before LEFT JOIN in this case. Read more in the manual here.

`CROSS JOIN` by proxy

So you have like 10 LEFT JOINs. Example: If half of those have 3 matches, you multiply the row count by 3^5 = 243. Or if all of them have 5 matches, you multiply by 5^10 = 9765625. This has to result in terrible performance. And all for nothing, since you only want DISTINCT id in the end.

The icing on the cake: to get DISTINCT person.id, all those LEFT JOINs are 100 % useless. They don't change a thing. Just remove them all.

As for JOIN: Replace those with EXISTS to avoid multiplying your rows. Like:

EXISTS (SELECT 1 FROM Func f1 WHERE f1.id = pf1.func_id)

Location and so on joint tables are very small - 10-11 entries. These are the properties for person (e.g. Person is connected via One-To-Many for them and I am querying Person that must have some criteria (which I join by INNER JOIN) and NICE TO have some critreia (which I join using LEFT JOIN and putting a DISTINCT to p.Id). Then I am just counting the score using rows (the number of not nulls in NICE to have).
@zobber: For tiny tables with 10 entries, indexes are completely useless.

Community · Accepted Answer · 2020-06-20 09:12:55Z

The WHERE clause actually seems to be redundant. You've already joined based on the values being specific values, then your WHERE clause verifies those values are NOT NULL, so all that does is verify that the join occurred. So, if you are only taking the rows where joins occurred, you can just use an INNER JOIN rather than a LEFT JOIN. ==> f2.val will only be null if the LEFT JOIN did not occur.

LEFT JOIN Location f2 ON f2.id = pf2.location_id AND f2.val='1'

WHERE f2.val IS NOT NULL

So, you could probably just try an inner join in this case.

You could also identify the ID's that correspond to the values you want and join on those instead. Those would be primary keys and could be much more efficient.

Identifying The Real Issue

I would also suggest that you take out each join one by one and rerun the query, noting the time reduction each time. This will hopefully show you which join or where clause element is causing the issue.

Run the query as it is.
Remove first where clause expression.
Remove second where clause expression.
Remove one join
Remove another join...

Collectives™ on Stack Overflow

How to optimize multiply join query?

2 Answers 2

`CROSS JOIN` by proxy

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

CROSS JOIN by proxy

2 Comments

Comments

Related

`CROSS JOIN` by proxy