Can I get better performance using a JOIN or using EXISTS?

Question

I have two tables Institutions and Results and I want to see if there are any results for institutions that way I can exclude the ones that don't have results.

Thank you,
-Nimesh

It's not a real question because it's extremely generic. He needs to at least specify at least a little more context. — Onorio Catenacci
– Onorio Catenacci, Commented Oct 22, 2008 at 18:45
I added the subjective tag because of the extremely generic nature of the question. — Onorio Catenacci
– Onorio Catenacci, Commented Oct 22, 2008 at 18:45
@NImesh--in this particular case, details are your friend. The more details you can provide the more likely you are to get a useful, constructive answer. Which flavor of SQL (i. e. Oracle or MS SQL Server)? What's the structure of the tables you're querying? Etc. etc. etc. — Onorio Catenacci
– Onorio Catenacci, Commented Oct 22, 2008 at 18:47
I hope my edit is reasonable. Feel free to roll it back if that's not in the spirit of your original question. — Mark Biek
– Mark Biek, Commented Oct 22, 2008 at 18:47
You should show the full SQL query using each method so we can get a better idea of what you're trying to do. — Bill the Lizard
– Bill the Lizard, Commented Oct 22, 2008 at 18:48

Keith · Accepted Answer · 2008-10-22 19:39:56Z

Depending on the statement, statistics and DB server it may make no difference - the same optimised query plan may be produced.

There are basically 3 ways that DBs join tables under the hood:

Nested loop - for one table much bigger than the second. Every row in the smaller table is checked for every row in the larger.
Merge - for two tables in the same sort order. Both are run through in order and matched up where they correspond.
Hash - everything else. Temporary tables are used to build up the matches.

By using exists you may effectively force the query plan to do a nested loop. This may be the quickest way, but really you want the query planner to decide.

I would say that you need to write both SQL statements and compare the query plans. You may find that they change quite a bit depending on what data you have.

For instance if [Institutions] and [Results] are similar sizes and both are clustered on InstitutionID a merge join would be quickest. If [Results] is much bigger than [Institutions] a nested loop may be quicker.

VVS · Accepted Answer · 2009-08-12 08:10:58Z

It depends.

Ultimately the 2 serve entirely different purposes.

You JOIN 2 tables to access related records. If you don't need to access the data in the related records then you have no need to join them.

EXISTS can be used to determine if a token exists in a given dataset but won't allow you to access the related records.

Post an example of the 2 methods you have in mind and I might be able to give you a better idea.

With your two tables Institutions and Results if you want a list of institutions that have results, this query will be most efficient:

select Institutions.institution_name from Institutions inner join Results on (Institutions.institution_id = Results.institution_id)

If you have an institution_id and just want to know if it has results, using EXISTS might be faster:

if exists(select 1 from Results where institution_id = 2) print "institution_id 2 has results" else print "institution_id 2 does not have results"

Since your explanations are fine, "might be faster" is a little short answer to the question which one of the options is faster.
You said "You JOIN 2 tables to access related records. If you don't need to access the data in the related records then you have no need to join them." But if joining is faster than using EXIST, then that IS a reason to use JOIN. It just begs the original question.

Kon · Accepted Answer · 2008-10-22 18:46:43Z

8

I'd say a JOIN is slower, because your query execution stops as soon as an EXISTS call finds something, while a JOIN will continue until the very end.

EDIT: But it depends on the query. This is something that should be judged on a case-by-case basis.

answered Oct 22, 2008 at 18:46

Kon

27.5k12 gold badges64 silver badges87 bronze badges

3 Comments

Bob Probst Over a year ago

In his case he's looking for an absence of data so it will still need to scan the entire Results table (or, hopefully, index)

Kon Over a year ago

True. I posted that before he clarified.. I might as well delete this answer.. not sure if I should. :)

Oscar Bravo Over a year ago

Glad you didn't delete it - it provided a useful insight! (that EXISTS short-circuits...)

Kon · Accepted Answer · 2008-10-22 18:50:54Z

Whether there's a performance difference or not, you need to use what's more appropriate for your purpose. Your purpose is to get a list of Institutions (not Results - you don't need that extra data). So select Institutions that have no Results... translation - use EXISTS.

JosephStyons · Accepted Answer · 2008-10-22 19:36:34Z

6

It depends on your optimizer. I tried the below two in Oracle 10g and 11g. In 10g, the second one was slightly faster. In 11g, they were identical.

However, #1 is really a misuse of the EXISTS clause. Use joins to find matches.

select * from table_one t1 where exists ( select * from table_two t2 where t2.id_field = t1.id_field ) order by t1.id_field desc select t1.* from table_one t1 ,table_two t2 where t1.id_field = t2.id_field order by t1.id_field desc

answered Oct 22, 2008 at 19:36

JosephStyons

59.1k64 gold badges169 silver badges240 bronze badges

5 Comments

tanovellino Over a year ago

maybe if in the 1st option the sub-query returns only t2.id_field instead of *, it could compensate the time. Sometimes the size of the return also affects performance. Specially in this case that is just slightly different. Good luck and thanks!!!

ARLibertarian Over a year ago

Style of the 2nd query has been deprecated. Don't do that. And don't use splats. But do use schema.

Stevoisiak May 5 at 19:34

Why would #1 be considered a misuse of EXISTS? Seems like the exact scenario you would want to use EXISTS for.

JosephStyons May 6 at 14:58

@Stevoisiak an EXISTS statement might be what you want there. A JOIN will give you all the matches in the joined table. An EXISTS will only tell you if there is a match (not how many). So if that's what you want to know, then EXISTS is correct; but the original question was about JOIN vs EXISTS. If you are looking for an equivalent to JOIN, then EXISTS is not the right answer.

JosephStyons May 6 at 14:59

@ARLibertarian; you are correct of course. I'll leave the answer as it is for posterity, but I posted it in 2008, when that style was still considered acceptable.

Tom H · Accepted Answer · 2008-10-22 20:13:52Z

A LEFT OUTER JOIN will tend to perform better than a NOT EXISTS**, but in your case you want to do EXISTS and using a simple INNER JOIN doesn't exactly replicate the EXISTS behavior. If you have multiple Results for an Institution, doing the INNER JOIN will return multiple rows for that institution. You could get around that by using DISTINCT, but then the EXISTS will probably be better for performance anyway.

** For those not familiar with this method:

SELECT MyTable.MyTableID FROM dbo.MyTable T1 LEFT OUTER JOIN dbo.MyOtherTable T2 ON T2.MyTableID = T1.MyTableID WHERE T2.MyOtherTableID IS NULL

is equivalent to

SELECT MyTable.MyTableID FROM dbo.MyTable T1 WHERE NOT EXISTS (SELECT * FROM MyOtherTable T2 WHERE T2.MyTableID = T1.MyTableID)

assuming that MyOtherTableID is a NOT NULL column. The first method generally performs faster than the NOT EXISTS method though.

See dba.stackexchange.com/a/4010/630 + subsequent links. LEFT JOIN may require a DISTINCT which will bollix you too. Generally EXISTS is quicker and semantically correct

Barry Brown · Accepted Answer · 2008-10-22 18:50:05Z

Are you using EXISTS as part of a correlated subquery? If so, the join will almost always be faster.

Your database should have ways of benchmarking queries. Use them to see which query runs faster.

charles bretana · Accepted Answer · 2008-10-22 19:28:37Z

If you want the institutions that did not have results, then a 'Where Not Exists' subquery will be faster, as it will stop as soon as it finds a single result for those that have results...

If you want the institutions With results, but you don't actually want the results, same thing. Use a 'Where Exists' subquery.. It will stop as soon as it finds a single result... This also ensures that the result set will only have one record per institution, whereas if you had an institution with multiple results, using the join approach would require that you add the 'distinct' keyword or a 'Group By' clause to eliminate the duplicate cartesion product rows that would be prodcued from the multiple Result records that matched to a single insitution.

If you need the Results, then do a JOIN - An Inner Join if you don't want to see the insitutions without results, and an outer join if you want to see ALL institutions, including the ones with no Results.

Dave Costa · Accepted Answer · 2008-10-22 20:00:29Z

1

Actually, from your vague description of the problem, it sounds to me like a NOT IN query is the most obvious way to code it:

SELECT * FROM Institutions WHERE InstitutionID NOT IN ( SELECT DISTINCT InstitutionID FROM Results )

answered Oct 22, 2008 at 20:00

Dave Costa

48.3k8 gold badges61 silver badges73 bronze badges

2 Comments

Mark Brady Over a year ago

Why in the world would you add a distinct? Either the number 1 is in this list ( 1, 1, 1, 2, 2, 5, 5, 7) or it is not. It's completely unimportant to sort and filter the list. In fact, when i do it in 10.2.0.3, it's completely ignored.

Dave Costa Over a year ago

In my experience looking at execution plans, Oracle will filter it down to unique values whether you have the DISTINCT keyword or not. Therefore I like to include it since it makes the purpose of the code clearer.

Pk9 · Accepted Answer · 2012-04-03 06:55:35Z

In cases like above the Exists statement works faster than that of Joins. Exists will give you a single record and will save the time also. In case of joins the number of records will be more and all the records must be used.

Welcome to stackoverflow. This question is quite old and was well covered . Before resurrecting such an old thread, please be sure your response adds something significant to the conversation,

Chandra Sekhar K · Accepted Answer · 2012-04-03 10:05:38Z

If the RESULTS table has more than one row per INSTITUTION, EXISTS() has the added benefit of not requiring you to select distinct Institutions.

As for performance, I have seen joins, IN(), and EXISTS() each be fastest in a variety of uses. To find the best method for your purposes you must test.

BrynJ · Accepted Answer · 2008-10-22 18:42:57Z

If you're referring to using a left (or right) outer join or a not exists subquery, I'm fairly certain the left outer join wins performance-wise. For example:

SELECT t1.* FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id WHERE t2.id IS NULL

The above should be quicker than the equivalent sub-query, and if you're referring specifically to exists - well, where structure allows, an inner join will always be the preferred option.

Collectives™ on Stack Overflow

Can I get better performance using a JOIN or using EXISTS?

12 Answers 12

Comments

2 Comments

3 Comments

Comments

5 Comments

1 Comment

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

Comments

2 Comments

3 Comments

Comments

5 Comments

1 Comment

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Linked

Related