Group by and Select Distinct in SQL Server

Question

What are the differences between the following two queries?

SELECT distinct(Invalid_Emails), [leads_id] FROM [dbo].[InvalidEmails_stg] ORDER BY LEADS_ID DESC

vs

select invalid_emails, max(leads_id) as id from invalidEmails_stg group by invalid_emails having count(*) < 2 order by id desc

The second one gave me fewer rows than the first.

maybe there are some invalid_emails that are HAVING >= 2?

LukStorms
– LukStorms

2019-11-09 04:46:30 +00:00
Commented Nov 9, 2019 at 4:46 — LukStorms
– LukStorms, Commented Nov 9, 2019 at 4:46
what do you want to do? give some more details please

Masoud Sedghi
– Masoud Sedghi

2019-11-09 07:25:32 +00:00
Commented Nov 9, 2019 at 7:25 — Masoud Sedghi
– Masoud Sedghi, Commented Nov 9, 2019 at 7:25

Thorsten Kettner · Accepted Answer · 2019-11-09 13:04:58Z

You are confused by the parentheses in the first query. They are doing nothing, so write the query as:

SELECT DISTINCT Invalid_Emails, leads_id FROM [dbo].[InvalidEmails_stg] ORDER BY LEADS_ID DESC;

This returns all pairs of Invalid_Emails/Leads_id that appear in the database. No matter how many times a given pair appears, it will be in the result set exactly one time.

This query:

select invalid_emails, max(leads_id) as id from invalidEmails_stg group by invalid_emails having count(*) < 2 order by id desc;

Returns invalid_emails/leads_id pairs that occur only once in your data. It filters out any pairs that occur more than once.

Here is a simple example:

invalid_emails leads_id [email protected] 1 [email protected] 1 [email protected] 2 [email protected] 3 [email protected] 1

The first query will return:

 [email protected] 1 [email protected] 2 [email protected] 3 [email protected] 1

[email protected] is returned once because duplicates are removed.

The second will return:

 [email protected] 2 [email protected] 3 [email protected] 1

[email protected] is not returned because it appears twice.

mhd.cs · Accepted Answer · 2019-11-09 05:26:20Z

In first query

SELECT distinct(Invalid_Emails),[leads_id] FROM [dbo].[InvalidEmails_stg] ORDER BY LEADS_ID DESC

you dont Check Constraint < 2

Actually in Second query :

select invalid_emails, max(leads_id) as id from invalidEmails_stg group by invalid_emails having count(*)<2 order by id desc

if result Contain two or more than row Having Count(*) Filter Your Result .

another diffrence is NULL value . if Column Invalid_Emails having Null Value Appear in First Query and Filter By group by in Next Query

Snuka · Accepted Answer · 2019-11-09 05:44:28Z

0

The queries have similar intent, to get a invalid_emails by leads_id.

The 2nd query uses aggregate functions to only bring back the maximum leads_id, and uses a having clause to remove duplicates.

answered Nov 9, 2019 at 5:44

Snuka

114 bronze badges

2 Comments

MCO Over a year ago

does the first one not remove dups?

Snuka Over a year ago

The first doesn't remove dupes, but it does use distinct. They are pretty much doing the same thing in different ways. You can probably look at the query results and compare the delta records to understand how they are working and chose the one that best meets your need.

Collectives™ on Stack Overflow

Group by and Select Distinct in SQL Server

3 Answers 3

Comments

1 Comment

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Related