0

What are the differences between the following two queries?

SELECT distinct(Invalid_Emails), [leads_id] FROM [dbo].[InvalidEmails_stg] ORDER BY LEADS_ID DESC 

vs

select invalid_emails, max(leads_id) as id from invalidEmails_stg group by invalid_emails having count(*) < 2 order by id desc 

The second one gave me fewer rows than the first.

2
  • maybe there are some invalid_emails that are HAVING >= 2? Commented Nov 9, 2019 at 4:46
  • what do you want to do? give some more details please Commented Nov 9, 2019 at 7:25

3 Answers 3

2

You are confused by the parentheses in the first query. They are doing nothing, so write the query as:

SELECT DISTINCT Invalid_Emails, leads_id FROM [dbo].[InvalidEmails_stg] ORDER BY LEADS_ID DESC; 

This returns all pairs of Invalid_Emails/Leads_id that appear in the database. No matter how many times a given pair appears, it will be in the result set exactly one time.

This query:

select invalid_emails, max(leads_id) as id from invalidEmails_stg group by invalid_emails having count(*) < 2 order by id desc; 

Returns invalid_emails/leads_id pairs that occur only once in your data. It filters out any pairs that occur more than once.

Here is a simple example:

invalid_emails leads_id [email protected] 1 [email protected] 1 [email protected] 2 [email protected] 3 [email protected] 1 

The first query will return:

 [email protected] 1 [email protected] 2 [email protected] 3 [email protected] 1 

[email protected] is returned once because duplicates are removed.

The second will return:

 [email protected] 2 [email protected] 3 [email protected] 1 

[email protected] is not returned because it appears twice.

Sign up to request clarification or add additional context in comments.

Comments

1

In first query

SELECT distinct(Invalid_Emails),[leads_id] FROM [dbo].[InvalidEmails_stg] ORDER BY LEADS_ID DESC 

you dont Check Constraint < 2

Actually in Second query :

select invalid_emails, max(leads_id) as id from invalidEmails_stg group by invalid_emails having count(*)<2 order by id desc 

if result Contain two or more than row Having Count(*) Filter Your Result .

another diffrence is NULL value . if Column Invalid_Emails having Null Value Appear in First Query and Filter By group by in Next Query

1 Comment

theres no null or none in the table
0

The queries have similar intent, to get a invalid_emails by leads_id.

The 2nd query uses aggregate functions to only bring back the maximum leads_id, and uses a having clause to remove duplicates.

2 Comments

does the first one not remove dups?
The first doesn't remove dupes, but it does use distinct. They are pretty much doing the same thing in different ways. You can probably look at the query results and compare the delta records to understand how they are working and chose the one that best meets your need.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.