1

Let's start with the following two tables:

CREATE TABLE people AS ( SELECT * FROM VALUES ('david', 10), ('george', 20) AS tmp(name, age) ); CREATE TABLE position AS ( SELECT * FROM VALUES ('george', 'c++'), ('george', 'frontend') AS tmp(name, job) ); 

Is the following two equivalent ways to write the SEMI JOIN or ANTI JOIN pattern? If not, what is missing?

[SEMI-JOIN == WHERE EXISTS] SELECT * FROM people SEMI JOIN position ON (people.name=position.name) SELECT * FROM people WHERE EXISTS (SELECT * FROM position WHERE people.name=position.name) // or like this? SELECT * FROM people WHERE name IN (SELECT name FROM position) [ANTI-JOIN == WHERE NOT EXISTS] SELECT * FROM people ANTI JOIN position ON (people.name=position.name) SELECT * FROM people WHERE NOT EXISTS (SELECT * FROM position WHERE people.name=position.name) // or like this? SELECT * FROM people WHERE name NOT IN (SELECT name FROM position) 
3
  • Could you tag the rdbms you are using ? Commented Apr 1, 2024 at 23:03
  • @SelVazi not really any, and no DBs I'm using actually support it. Commented Apr 1, 2024 at 23:05
  • Never use NOT IN, it has weird and unexpected semantics when nulls are involved. Commented Apr 2, 2024 at 2:05

1 Answer 1

3

In each example (semi and anti joins), the following pair of options are not the same because of the SELECT * at the start - which will get all columns from all the joined tables.

If you used SELECT people.* as the start, they are functionally the same as far as I can see.

In SQL Server at least, the bottom two options of each have identical plans in an example I tried (SQL Server referred to them as Left semi join and Left anti semi join respectively).

Edit: ANTI JOIN Note that a more analagous way may be to use LEFT OUTER JOIN and then WHERE the joined table doesn't have any rows e.g.,

SELECT * FROM people LEFT OUTER JOIN position ON (people.name=position.name) WHERE position.name IS NULL 

All will give the same results, but the EXISTS versions tend to be closer to the theoretical approach.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks is that basic SQL or do you need to use T-SQL or U-SQL for that in sqlserver?
The left join example given above is standard SQL although the word OUTER is optional and the parentheses are unnecessary
The above is based on my data knowledge and SQL Server in particular. However, if the database supports EXISTS, this appears to be the logical way it would function.
LEFT JOIN... IS NULL is generally less efficient on modern optimizers as they can't reason out that the join is unique (can only return max 1 row). Also NOT IN has different semantics.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.