238

I need to implement the following query in SQL Server:

SELECT * FROM table1 WHERE (CM_PLAN_ID, Individual_ID) IN ( SELECT CM_PLAN_ID, Individual_ID FROM CRM_VCM_CURRENT_LEAD_STATUS WHERE Lead_Key = :_Lead_Key ) 

But the WHERE..IN clause allows only 1 column. How can I compare 2 or more columns with another inner SELECT?

1

15 Answers 15

160

You'll want to use the WHERE EXISTS syntax instead.

SELECT * FROM table1 WHERE EXISTS (SELECT * FROM table2 WHERE Lead_Key = @Lead_Key AND table1.CM_PLAN_ID = table2.CM_PLAN_ID AND table1.Individual_ID = table2.Individual_ID) 
Sign up to request clarification or add additional context in comments.

10 Comments

While this would work, it converts the uncorrelated query in the question into a correlated query. Unless the query optimizer is clever, this might give you O(n^2) performance :-(. But maybe I'm underestimating the optimizer...
I use syntaxes like this all the time without issue. Unless you are using an older optimizer (6.5, 7, 8, etc) it shouldn't have a problem with this syntax.
@sleske: EXISTS is by far better: see my comments in my answer. And test it first,. @mrdenny: I misread your answer at first, I'd use EXISTS too
This is most efficient, +1. See this article in my blog for performance comparison: explainextended.com/2009/06/17/efficient-exists
Even SQL 2000 could handle most correlated subqueries without turning the query into an O(n^2). Might have been a problem back on 6.5.
|
139

You can make a derived table from the subquery, and join table1 to this derived table:

SELECT * FROM table1 LEFT JOIN ( SELECT CM_PLAN_ID, Individual_ID FROM CRM_VCM_CURRENT_LEAD_STATUS WHERE Lead_Key = :_Lead_Key ) table2 ON table1.CM_PLAN_ID = table2.CM_PLAN_ID AND table1.Individual = table2.Individual WHERE table2.CM_PLAN_ID IS NOT NULL 

6 Comments

or more generally SELECT * FROM table INNER JOIN otherTable ON ( table.x = otherTable.a AND table.y = otherTable.b)
What about the multiple rows that would exist if table 2 is a child of table 1? And why LEFT JOIN?
Yeah, INNER JOIN would be more performant here. Doing a LEFT JOIN and filtering the nulls from table 2 is just a verbose way to use an INNER JOIN
Wrong, this delivers the row multiple times, assuming the joined table be can be joined several times... otherwise, do an inner join and you can spare yourself the where.
I solved the problem of possible duplicates in this answer: stackoverflow.com/a/54389589/983722
|
57

WARNING ABOUT SOLUTIONS:

MANY EXISTING SOLUTIONS WILL GIVE THE WRONG OUTPUT IF ROWS ARE NOT UNIQUE

If you are the only person creating tables, this may not be relevant, but several solutions will give a different number of output rows from the code in question, when one of the tables may not contain unique rows.

WARNING ABOUT PROBLEM STATEMENT:
IN WITH MULTIPLE COLUMNS DOES NOT EXIST, THINK CAREFULLY WHAT YOU WANT

When I see an IN with two columns, I can imagine it to mean two things:

  1. The value of column a and column b appear in the other table independently
  2. The values of column a and column b appear in the other table together on the same row

Scenario 1 is fairly trivial, simply use two IN statements.

In line with most existing answers, I hereby provide an overview of mentioned and additional approaches for Scenario 2 (and a brief judgement):

##EXISTS (Safe, recommended for SQL Server) As provided by @mrdenny, EXISTS sounds exactly as what you are looking for, here is his example:

SELECT * FROM T1 WHERE EXISTS (SELECT * FROM T2 WHERE T1.a=T2.a and T1.b=T2.b) 

##LEFT SEMI JOIN (Safe, recommended for dialects that support it) This is a very concise way to join, but unfortunately most SQL dialects, including SQL server do not currently suppport it.

SELECT * FROM T1 LEFT SEMI JOIN T2 ON T1.a=T2.a and T1.b=T2.b 

##Multiple IN statements (Safe, but beware of code duplication) As mentioned by @cataclysm using two IN statements can do the trick as well, perhaps it will even outperform the other solutions. However, what you should be very carefull with is code duplication. If you ever want to select from a different table, or change the where statement, it is an increased risk that you create inconsistencies in your logic.

Basic solution

SELECT * from T1 WHERE a IN (SELECT a FROM T2 WHERE something) AND b IN (SELECT b FROM T2 WHERE something) 

Solution without code duplication (I believe this does not work in regular SQL Server queries)

WITH mytmp AS (SELECT a, b FROM T2 WHERE something); SELECT * from T1 WHERE a IN (SELECT a FROM mytmp) AND b IN (SELECT b FROM mytmp) 

##INNER JOIN (technically it can be made safe, but often this is not done) The reason why I don't recommend using an inner join as a filter, is because in practice people often let duplicates in the right table cause duplicates in the left table. And then to make matters worse, they sometimes make the end result distinct whilst the left table may actually not need to be unique (or not unique in the columns you select). Futhermore it gives you the chance to actually select a column that does not exists in the left table.

SELECT T1.* FROM T1 INNER JOIN (SELECT DISTINCT a, b FROM T2) AS T2sub ON T1.a=T2sub.a AND T1.b=T2sub.b 

Most common mistakes:

  1. Joining directly on T2, without a safe subquery. Resulting in the risk of duplication)
  2. SELECT * (Guaranateed to get columns from T2)
  3. SELECT c (Does not guarantee that your column comes and always will come from T1)
  4. No DISTINCT or DISTINCT in the wrong place

#CONCATENATION OF COLUMNS WITH SEPARATOR (Not very safe, horrible performance) The functional problem is that if you use a separator which might occur in a column, it gets tricky to ensure that the outcome is 100% accurate. The technical problem is that this method often incurs type conversions and completely ignores indexes, resulting in possibly horrible performance. Despite these problems, I have to admit that I sometimes still use it for ad-hoc queries on small datasets.

SELECT * FROM T1 WHERE CONCAT(a,"_",b) IN (SELECT CONCAT(a,"_",b) FROM T2) 

Note that if your columns are numeric, some SQL dialects will require you to cast them to strings first. I believe SQL server will do this automatically.


To wrap things up: As usual there are many ways to do this in SQL, using safe choices will avoid suprises and save you time and headaces in the long run.

5 Comments

I'm not sure that "multiple in statements" method will work properly as written. Not only must both values be present in T2, they must be on the same row of T2. This doesn't check for that second condition.
What about common table expressions? (CTE)
The question does not state whether both values must be on the same row. (It simply shows an invalid coding attempt which could try to represent anything). So I don't think the amswer is 'incorrect', but it is true that multiple in statements do not ensure the value is on the same rows so definitely be aware of that!
@jla As I understand it CTE is mainly about how the data is persisted, I do not see how that would change anything in terms of join logic and such.
@DennisJaheruddin I've re-read your various reasons and all the other answers and see why you don't use it in your WITH mytmp CTE. Thanks! When I see a multi-column WHERE IN (SELECT) I only see case 2, since they would be returned as a list of N column tuples so the multiple IN solutions don't seem to match. The question's query looks like a case of matching on composite foreign keys to get all table1 records associated with :_Lead_Key and I would have gone to CTE/JOIN. The simple answer to their actual question is, currently, you can't. Use EXISTS. :)
18
select * from tab1 where (col1,col2) in (select col1,col2 from tab2) 

Note:
Oracle ignores rows where one or more of the selected columns is NULL. In these cases you probably want to make use of the NVL-Funktion to map NULL to a special value (that should not be in the values);

select * from tab1 where (col1, NVL(col2, '---') in (select col1, NVL(col2, '---') from tab2) 

6 Comments

postgres supports where (colA,colB) in (... some list of tuples...) but I'm not sure what other databases do the same. I'd be interested to know.
This syntax is supported in Oracle and DB2/400 as well (probably DB2, too). Wish SQL Server supported it.
DB2 supports this.
Even SQLite supports it.
Supported in Databricks SQL
|
17

A simple EXISTS clause is cleanest

SELECT * FROM table1 t1 WHERE EXISTS ( SELECT * --or 1. No difference... FROM CRM_VCM_CURRENT_LEAD_STATUS Ex WHERE Lead_Key = :_Lead_Key -- correlation here... AND t1.CM_PLAN_ID = Ex.CM_PLAN_ID AND t1.CM_PLAN_ID = Ex.Individual_ID ) 

If you have multiple rows in the correlation then a JOIN gives multiple rows in the output, so you'd need distinct. Which usually makes the EXISTS more efficient.

Note SELECT * with a JOIN would also include columns from the row limiting tables

Comments

2

Why use WHERE EXISTS or DERIVED TABLES when you can just do a normal inner join:

SELECT t.* FROM table1 t INNER JOIN CRM_VCM_CURRENT_LEAD_STATUS s ON t.CM_PLAN_ID = s.CM_PLAN_ID AND t.Individual_ID = s.Individual_ID WHERE s.Lead_Key = :_Lead_Key 

If the pair of (CM_PLAN_ID, Individual_ID) isn't unique in the status table, you might need a SELECT DISTINCT t.* instead.

1 Comment

And the DISTINCT usually means an EXISTS is more efficient
0

I used string_agg as a cheap hack to get some pseudo-normalization on the cheap.

Here's a sample:

select vendorId, affiliate_type_code, parent_vendor_id, state_abbr, county_abbr, litigation_activity_indicator, string_agg(employee_id,',') as employee_ids, string_agg(employee_in_deep_doodoo,',') as 'employee-inventory connections' from ( select distinct top 10000 -- so I could pre-order my employee id's - didn't want mixed sorting in those concats mi.missing_invintory_identifier as rqid, vendorId, affiliate_type_code, parent_vendor_id, state_abbr, county_abbr, litigation_activity_indicator, employee_identifier as employee_id, concat(employee_identifier,'-',mi.missing_invintory_identifier) as employee_in_deep_doodoo from missing_invintory as mi inner join vendor_employee_view as ev on mi.missing_invintory_identifier = ev.missing_invintory_identifier where ev.litigation_activity_indicator = 'N' order by employee_identifier desc ) as x group by vendorId, affiliate_type_code, parent_vendor_id, state_abbr, county_abbr, litigation_activity_indicator having count(employee_id) > 1 ┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ vendorId ┃ affiliate_type ┃ parent_vendor_id ┃ state_abbr ┃ county_abbr ┃ litigation_indicator ┃ employee_ids ┃ employee-inventory connections ┃ ┣━━━━━━━━━━╋━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫ ┃ 123 ┃ EXP ┃ 17 ┃ CA ┃ SDG ┃ N ┃ 112358,445678 ┃ 112358-1212,1534490-1212 ┃ ┣━━━━━━━━━━╋━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫ ┃ 4567 ┃ PRI ┃ 202 ┃ TX ┃ STB ┃ Y ┃ 998754,332165 ┃ 998754-4545,332165-4545 ┃ ┗━━━━━━━━━━┻━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ 

1 Comment

Hi Ryan what do you use to generate that table to show the results of your query? Pretty cool!
0
Postgres SQL : version 9.6 Total records on tables : mjr_agent = 145, mjr_transaction_item = 91800 
  1. Using with EXISTS [Average Query Time : 1.42s]
 SELECT count(txi.id) FROM mjr_transaction_item txi WHERE EXISTS ( SELECT 1 FROM mjr_agent agnt WHERE agnt.agent_group = 0 AND (txi.src_id = agnt.code OR txi.dest_id = agnt.code) ) 
  1. Using with two lines IN Clause [Average Query Time : 0.37s]
 SELECT count(txi.id) FROM mjr_transaction_item txi WHERE txi.src_id IN ( SELECT agnt.code FROM mjr_agent agnt WHERE agnt.agent_group = 0 ) OR txi.dest_id IN ( SELECT agnt.code FROM mjr_agent agnt WHERE agnt.agent_group = 0 ) 
  1. Using with INNNER JOIN pattern [Average Query Time : 2.9s]
 SELECT count(DISTINCT(txi.id)) FROM mjr_transaction_item txi INNER JOIN mjr_agent agnt ON agnt.code = txi.src_id OR agnt.code = txi.dest_id WHERE agnt.agent_group = 0 

So, I chose the second option.

2 Comments

Warning for future readers: In line with the question, you will probably want to use AND statements rather than OR statements.
@DennisJaheruddin .. Thank you for your comment and very nice detail explanations of your answer. You are right , OR statement probably raise duplications. In my case , there hasn't any rows that contains samesrc_id and dest_id in a single row. So, duplications won't happen in my case.
-1
with free_seats AS(SELECT seat_id , free as cureent_seat, LAG(free,1,0) OVER() previous_seat,LEAD(free,1,0) OVER() next_seat FROM seat)SELECT seat_id ,cureent_seat, previous_seat,next_seat from free_seats WHERE cureent_seat=1 AND ( previous_seat=1 OR next_seat=1) ORDER BY seat_id; 

1 Comment

Thank you for contributing to the Stack Overflow community. This may be a correct answer, but it’d be really useful to provide additional explanation of your code so developers can understand your reasoning. This is especially useful for new developers who aren’t as familiar with the syntax or struggling to understand the concepts. Would you kindly edit your answer to include additional details for the benefit of the community?
-2

If you want for one table then use following query

SELECT S.* FROM Student_info S INNER JOIN Student_info UT ON S.id = UT.id AND S.studentName = UT.studentName where S.id in (1,2) and S.studentName in ('a','b') 

and table data as follow

id|name|adde|city 1 a ad ca 2 b bd bd 3 a ad ad 4 b bd bd 5 c cd cd 

Then output as follow

id|name|adde|city 1 a ad ca 2 b bd bd 

1 Comment

id in (1,2) and studentName in ('a','b') is totally not the same as (id, studentName) in ((1,'a'),(2,'b')). Just think of a record having id=2 and name='a'. Of course, if ID is unique, then the effect is diminished, but then, if ID is unique, we don't need to filter over names at all.
-2

We can simply do this.

 select * from table1 t, CRM_VCM_CURRENT_LEAD_STATUS c WHERE t.CM_PLAN_ID = c.CRM_VCM_CURRENT_LEAD_STATUS and t.Individual_ID = c.Individual_ID 

Comments

-2

Query:

select ord_num, agent_code, ord_date, ord_amount from orders where (agent_code, ord_amount) IN (SELECT agent_code, MIN(ord_amount) FROM orders GROUP BY agent_code); 

above query worked for me in mysql. refer following link -->

https://www.w3resource.com/sql/subqueries/multiplee-row-column-subqueries.php

1 Comment

This is in Oracle DB
-2

Concatenating the columns together in some form is a "hack", but when the product doesn't support semi-joins for more than one column, sometimes you have no choice.

Example of where inner/outer join solution would not work:

select * from T1 where <boolean expression> and (<boolean expression> OR (ColA, ColB) in (select A, B ...)) and <boolean expression> ... 

When the queries aren't trivial in nature sometimes you don't have access to the base table set to perform regular inner/outer joins.

If you do use this "hack", when you combine fields just be sure to add enough of a delimiter in between them to avoid misinterpretations, e.g. ColA + ":-:" + ColB

1 Comment

This answer appears to be inconsistent (mentions concatenation and then provides a different example). Also, on a lighter note: We always have a choice ;-) I did add the concatenation example to my overview here, with the relevant footnotes: stackoverflow.com/a/54389589/983722
-3

I founded easier this way

Select * from table1 WHERE (convert(VARCHAR,CM_PLAN_ID) + convert(VARCHAR,Individual_ID)) IN ( Select convert(VARCHAR,CM_PLAN_ID) + convert(VARCHAR,Individual_ID) From CRM_VCM_CURRENT_LEAD_STATUS Where Lead_Key = :_Lead_Key ) 

Hope this help :)

4 Comments

Ouch, no index use here do to the string concat.
I've voted this down as it's plain dangerous! If CM_PLAN_ID = 45 and Individual_ID = 3 then concatenation results in 453 - which is indistinguishable from the case where CM_PLAN_ID = 4 and Individual_ID = 53... asking for trouble I would have thought
..of course you could concatenate with an arbitrary special char eg 45_3 or 45:3 but it's still not a nice solution and of course as @mrdenny says indexes will not be utilised now that a transform has taken place on the columns.
I also voted this down, as this solution is really a quick "hack" only. It's slow and as El Ronnoco said, it can lead to bugs.
-3

Simple and wrong way would be combine two columns using + or concatenate and make one columns.

Select * from XX where col1+col2 in (Select col1+col2 from YY) 

This would be offcourse pretty slow. Can not be used in programming but if in case you are just querying for verifying something may be used.

1 Comment

Indeed, and it can lead to errors, since e.g. 'ab' + 'c' = 'a'+'bc'

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.