2

Looking at the following example:

SELECT * FROM customers WHERE EXISTS (SELECT * FROM order_details WHERE customers.customer_id = order_details.customer_id) ; 

Which differences are there from an equivalent inner join-based query between the two tables that retrieves the same resultset?

I'm concerned about the technical/performance aspect, not the readability/mantainabilty of the code.

2
  • 2
    The JOIN may produce duplicates. Commented Dec 12, 2018 at 10:17
  • In addition to previous comment, a join with a select * will return all the fields from both tables, instead of just customers which will be the returned data in your query Commented Dec 12, 2018 at 10:21

3 Answers 3

4

With an EXISTS clause you select all customers for which exist at least one order_details record.

SELECT * FROM customers c WHERE EXISTS (SELECT * FROM order_details od WHERE od.customer_id = c.customer_id); 

With a join you'd select again those customers. However, you'd select each as often as there exists an order_detail for them. I.e. you'd have many duplicates.

SELECT c.* FROM customers c JOIN order_details od ON c.customer_id = od.customer_id; 

You can remove the duplicates from your results with DISTINCT so as to get each customer only once again:

SELECT DISTINCT c.* FROM customers c JOIN order_details od ON c.customer_id = od.customer_id; 

But why generate all the duplicates only to have to remove them again? Don't do this. Only join when you really want joined results.

Another option, which I consider even more readable than an EXISTS clause is the IN clause by the way. This would be my way of writing the query:

SELECT * FROM customers WHERE customer_id IN (SELECT customer_id FROM order_details); 
Sign up to request clarification or add additional context in comments.

1 Comment

I think EXISTS is more readable and more easily optimized. Plus, it discourages use of NOT IN with a subquery.
2

EXISTS() is called a "semi-join". It starts a JOIN, but then stops when it finds the first match. For this reason, EXISTS will be faster than any equivalent JOIN.

Also, EXISTS( SELECT * ... WHERE ... ) does not really care about the *. It will use whatever index is optimal to discover the presence or absence of rows matching the WHERE, then it returns 1 or 0 (meaning "true" or "false").

Of course, if a LEFT JOIN is going to return 0 or 1 row, never more, then there is not much performance difference. Except that the LEFT JOIN will return values from the table.

Comments

1

The EXISTS would be logically working as follows

for x in (select * from customers) loop -- check if x.customer_id exists in order_details table. ---if yes --output the customer tables row -- else --ignore end if; end loop; 

So in the exists query the plan would generally be using a nested loop(Not a hard-fast rule though)

The JOIN query does the logical equivalent as follows

for x in (select * from customers) loop --for each row in customer -- fetch records from order_details which match this condition select * from order_details where customer_id=x.customerid end loop; 

1 Comment

If you statistics is updated in customers and order_details i wouldnt necessarily worry about the performance of EXISTS over JOIN. SQL is a declarative language, the work of trying to find out "how" to get the data is best left to the optimizer. As users of SQL we should be thinking of the logical and semantic well formedness of a query

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.