DISTINCT not reducing two equal values to one

Question

Can anyone explain the below situation, where two seemingly equal values are not reduced by DISTINCT?

The query above is SELECT DISTINCT name FROM master.sys.dm_os_spinlock_stats where name = 'SBS_UCS_DISPATCH';

The equivalent SELECT name FROM master.sys.dm_os_spinlock_stats where name = 'SBS_UCS_DISPATCH' GROUP BY name; also does the same, and adding HAVING COUNT(1) > 1 does not yield the rows.

@@VERSION is Microsoft SQL Server 2019 (RTM-CU13) (KB5005679) - 15.0.4178.1 (X64) Sep 23 2021 16:47:49 Copyright (C) 2019 Microsoft Corporation Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393: )

I suspect @MartinSmith is right about bad internal meta-data so this looks to be a bug. FWIW, the same occurs with SQL 2022 and Azure SQL Database. The underlying internal view query is SELECT * FROM OpenRowSet(TABLE DM_OS_SPINLOCKSTATS). — Dan Guzman
– Dan Guzman, Commented Aug 3, 2023 at 15:41
Replicated on Microsoft SQL Server 2019 (RTM-CU21) (KB5025808) - 15.0.4316.3 (X64) Jun 1 2023 16:32:31 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows Server 2019 Datacenter 10.0 <X64> (Build 17763: ) (Hypervisor) . — Ian Kemp
– Ian Kemp, Commented Aug 4, 2023 at 11:21

Paul White · Accepted Answer · 2023-08-04 13:59:09Z

It looks like the DISTINCT gets optimised out during simplification.

select distinct name FROM master.sys.dm_os_spinlock_stats OPTION (QUERYTRACEON 8606,QUERYTRACEON 3604)

Gives

*** Input Tree: *** LogOp_GbAgg OUT(QCOL: DM_OS_SPINLOCKSTATS.name,) BY(QCOL: DM_OS_SPINLOCKSTATS.name,) LogOp_Project LogOp_Project QCOL: DM_OS_SPINLOCKSTATS.name LogOp_Project LogOp_StreamingTabUdfRESULT(QCOL: DM_OS_SPINLOCKSTATS.name QCOL: DM_OS_SPINLOCKSTATS.collisions QCOL: DM_OS_SPINLOCKSTATS.spins QCOL: DM_OS_SPINLOCKSTATS.spins_per_collision QCOL: DM_OS_SPINLOCKSTATS.sleep_time QCOL: DM_OS_SPINLOCKSTATS.backoffs) AncOp_PrjList AncOp_PrjList AncOp_PrjList AncOp_PrjList *******************

Then the GbAgg disappears in the Simplified Tree

*** Simplified Tree: *** LogOp_StreamingTabUdfRESULT(QCOL: DM_OS_SPINLOCKSTATS.name QCOL: DM_OS_SPINLOCKSTATS.collisions QCOL: DM_OS_SPINLOCKSTATS.spins QCOL: DM_OS_SPINLOCKSTATS.spins_per_collision QCOL: DM_OS_SPINLOCKSTATS.sleep_time QCOL: DM_OS_SPINLOCKSTATS.backoffs)

I assume that there is some metadata indicating name is unique despite the query results to the contrary but I can't see how to prove that.

One way of working around this is the below.

SELECT DISTINCT name COLLATE Latin1_General_100_CI_AS FROM master.sys.dm_os_spinlock_stats WHERE name COLLATE Latin1_General_100_CI_AS = 'SBS_UCS_DISPATCH'

In general even if name was unique under one collation then it could still have duplicates with different collate semantics. The original column collation for me is Latin1_General_CI_AS so this doesn't change the case insensitive accent sensitive nature but it is enough to preserve the GbAgg in the plan.

SELECT ca.name, COUNT(*) AS Count, SERVERPROPERTY('ProductVersion') AS [version] FROM master.sys.dm_os_spinlock_stats CROSS APPLY (VALUES(name COLLATE Latin1_General_100_BIN)) ca(name) GROUP BY ca.name HAVING COUNT(*) > 1

Returns the following on my dev instance

name	Count	version
LOGPOOL_FREEBUFMGR	2	16.0.1050.5
SBS_UCS_DISPATCH	2	16.0.1050.5

So SBS_UCS_DISPATCH is not alone in being duplicated.

For me the values in the other columns are all 0 for these. So I'm unclear if the entire rows will be duplicated or not in the event that these spin locks are encountered - or whether they would need to be aggregated.

The affected types aren't ones that are documented other than "for internal use" anyway though.

So I'm unclear if the entire rows will be duplicated or not in the event that these spin locks are encountered -- I encountered it originally when I was trying to INSERT/SELECT into a history table where I had made name part of the PK. I figured I would aggregate with SUM/GROUP BY name and it still happened, so I reduced it to the most straightforward example (i.e. SELECT DISTINCT) — jimbobmcgee
– jimbobmcgee, Commented Aug 4, 2023 at 18:20
The affected types aren't ones that are documented other than "for internal use" anyway though. -- INSERT/SELECT cares not for your "internal use" :-) — jimbobmcgee
– jimbobmcgee, Commented Aug 4, 2023 at 18:22
In any case, binary-collating the column is more than sufficient for this case -- I was more surprised that there was a property of equality that was being applied in WHERE that was not being applied in DISTINCT, GROUP BY, etc. — jimbobmcgee
– jimbobmcgee, Commented Aug 4, 2023 at 18:30

Stack Exchange Network

DISTINCT not reducing two equal values to one

1 Answer 1

Hot Network Questions

DISTINCT not reducing two equal values to one

1 Answer 1

Related

Hot Network Questions