2

I'm working on merging an old database into a new one. In the new database I have four database tables: 'task_clone', 'potential_task', 'task' and 'task_archive'.

'task_clone' contains all the database entries of type task imported from the older database and I'm trying to distribute these entries across the other three tables in the new database. 'task_clone' is therefore a temporary table.

'task_clone' contains 649 entries. The structure of the data does not map very easily to the new database and after copying the rows from 'task_clone' the sum total of the other three tables entries is 566, which means there are 83 entries in 'task_clone' that have yet to be mapped into the new structure.

I'm trying to query 'task_clone' to figure out which entries are in the other three tables that are not in 'task_clone'.

All three tables contain the column 'task_id', which is unique id for each task entry. I should therefore be able to query the database and get all the 'task_id' columns in 'task_clone' returning the entires that do not match those in the other three tables.

I know this should be possible in a single query but I can't quite seem to get the syntax correct. Where am I going wrong and how should this be written? I initially tried:

SELECT task_clone.task_id FROM task_clone WHERE task_clone.task_id != potential_task.task_id AND task_clone.task_id != task.task_id AND task_clone.task_id != task_archive.task_id; 

I also looked at some other approaches to doing this with two tables (i.e. returning values from one that were not in the other) but I couldn't find an example that I could translate cleanly into a solution that would work for more than two tables without getting error messsages. Thanks for reading.

NOTE in response to this being marked as a duplicate: This question is not a duplicate of those previous questions which ask specifically about two tables since my question specially enquires about working with four tables. The solution supplied on the cited question, while using roughly the same syntax, does not provide a solution to the question of 4 tables. Further, in my question I clearly state that I've looked at previous stack answers that deal with two tables and I couldn't translate them to four without getting error messages.

6
  • Hint: ... table1 LEFT JOIN table2 WHERE table1.column IS NULL or simply ... table 1 WHERE id NOT IN(SELECT id FROM table2) Commented Feb 3, 2019 at 23:52
  • yes but that's two tables, not four. Commented Feb 3, 2019 at 23:53
  • 1
    if that not solving it i advice you to provice example data and expected results.. i advice you to read this Why should I provide an MCVE for what seems to me to be a very simple SQL query? into providing it Commented Feb 3, 2019 at 23:54
  • Thanks for that Raymond. I was not aware of that advice on meta. I thought the issue was not so much based on the structure of the database but the structure of the query hence posting only the code. Commented Feb 4, 2019 at 0:02
  • Possible duplicate of Select from one table where not in another Commented Feb 4, 2019 at 0:48

3 Answers 3

2

Given that task_id is a primary key in all tables, the LEFT JOIN approach seems more efficient and concise :

SELECT tc.* FROM task_clone tc LEFT JOIN potential_task pt ON pt.task_id = tc.task_id LEFT JOIN task t ON t.task_id = tc.task_id LEFT JOIN task_archive ta ON ta.task_id = tc.task_id WHERE pt.task_id IS NULL AND t.task_id IS NULL AND ta.task_id IS NULL 
Sign up to request clarification or add additional context in comments.

Comments

1

Could you use NOT IN?

 SELECT task_clone.task_id FROM task_clone WHERE task_clone.task_id NOT IN (SELECT task_id from potential_task) AND task_clone.task_id NOT IN (SELECT task_id from task) AND task_clone.task_id NOT IN (SELECT task_id from task_archive) 

1 Comment

Thanks! This solved my problem. I was not aware of the NOT IN clause. Thanks you. It wont let me click accept right now. I keep getting a message saying you can accept in 5 minutes.
1

I would use NOT EXISTS:

SELECT tc.task_id FROM task_clone tc WHERE NOT EXiSTS (SELECT 1 FROM potential_task pt WHERE pt.task_id = tc.task_id) AND NOT EXiSTS (SELECT 1 FROM task t WHERE t.task_id = tc.task_id) AND NOT EXiSTS (SELECT 1 FROM task_archive ta WHERE ta.task_id = tc.task_id) ; 

I much prefer NOT EXISTS instead of NOT IN with subqueries because the latter does not handle NULLs in an intuitive manner. If any task_id in any of the tables is NULL, then the outer query will return no rows at all. This is consistent with what NULL means in SQL, but it is counter-intuitive.

NOT EXISTS treats NULLs as you would expect -- they don't match on a given row but they don't affect results in other rows.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.