Improve SQL Query to find redundant data

Question

the following shows my sample dataset

PatientID PatientName XXX-037070002 Riger, Jens^Wicki XXX-037070002 Riger^Wicki XXX-10052 Weier,Nicole^Peggy XXX-10052 Weier,Nicole^Peppy XXX-23310 Rodem^Sieglinde XXX-23310 Sauberger, Birgit^Finja XXX-23343 Je, Ronny^Wilma XXX-23343 Jer, Ronny^Wilma XXX-2349 Kel,Andy^Juka XXX-2349 Kel^Juka XXX-2998 Hel, Frank XXX-2998 Hel,Frank^Fenris XXX-3188 Mey, Marion XXX-3188 Mey, Marion^Paula XXX-3188 Schulz^Roma XXX-3218 Böntgen-Simnet,Dr. Regine^Cara XXX-3218 Simnet,Dr. Regine^Cara XXX-3826 Mertes, Bernd Uwe^Ellie XXX-3826 Mertes,Bernd^Ellie XXX-3826 Mertes^Ellie

This is the query I got from my last request:

with d as ( select distinct patid, patname from dicomstudys ) select * from d where d.patid in ( select d.patid from d group by d.patid having count(*) > 1 )

Now I want to adjust the query that only the following data get's an output:

PatientID PatientName XXX-23310 Rodem^Sieglinde XXX-23310 Sauberger, Birgit^Finja XXX-23343 Je, Ronny^Wilma XXX-23343 Jer, Ronny^Wilma XXX-3188 Mey, Marion XXX-3188 Mey, Marion^Paula XXX-3188 Schulz^Roma XXX-3218 Böntgen-Simnet,Dr. Regine^Cara XXX-3218 Simnet,Dr. Regine^Cara

Last names are either seperated with a ',' or '^' . If last names are the same for the same PatientID then I dont want them being displayed. I tried fiddling with a sub select statement featuring a combination of CHARINDEX commands and others but my SQL syntax knowledge is very limited with the complexity of the request.

Please also note that for the case for XXX-3188 has two datasets with the same last name but also another dataset with a complete new patientName and thus it needs to be in the output.

I can't figure out what logic you want to implement. How do you choose the rows that you want returned? — Gordon Linoff
– Gordon Linoff, Commented Oct 25, 2018 at 11:07

gotqn · Accepted Answer · 2018-10-25 11:15:31Z

Try this:

DECLARE @DataSource TABLE ( [ID] VARCHAR(32) ,[Name] VARCHAR(256) ); INSERT INTO @DataSource ([ID], [Name]) VALUES ('XXX-037070002', 'Riger, Jens^Wicki') ,('XXX-037070002', 'Riger^Wicki') ,('XXX-10052', 'Weier,Nicole^Peggy') ,('XXX-10052', 'Weier,Nicole^Peppy') ,('XXX-23310', 'Rodem^Sieglinde') ,('XXX-23310', 'Sauberger, Birgit^Finja') ,('XXX-23343', 'Je, Ronny^Wilma') ,('XXX-23343', 'Jer, Ronny^Wilma') ,('XXX-2349', 'Kel,Andy^Juka') ,('XXX-2349', 'Kel^Juka') ,('XXX-2998', 'Hel, Frank') ,('XXX-2998', 'Hel,Frank^Fenris') ,('XXX-3188', 'Mey, Marion') ,('XXX-3188', 'Mey, Marion^Paula') ,('XXX-3188', 'Schulz^Roma') ,('XXX-3218', 'Böntgen-Simnet,Dr. Regine^Cara') ,('XXX-3218', 'Simnet,Dr. Regine^Cara') ,('XXX-3826', 'Mertes, Bernd Uwe^Ellie') ,('XXX-3826', 'Mertes,Bernd^Ellie') ,('XXX-3826', 'Mertes^Ellie'); WITH DataSource AS ( SELECT [ID] ,[Name] ,COUNT(*) OVER (PARTITION BY [ID], LTRIM(RTRIM(SUBSTRING([Name], 0, CHARINDEX(',', REPLACE([Name], '^', ',')))))) AS [ID_Name_Count] ,COUNT(*) OVER (PARTITION BY [ID]) AS [ID_Count] ,LTRIM(RTRIM(SUBSTRING([Name], 0, CHARINDEX(',', REPLACE([Name], '^', ','))))) AS [FamilyName] FROM @DataSource ) SELECT [ID] ,[Name] FROM DataSource WHERE [ID_Name_Count] = 1 AND [ID_Count] = 2 OR [ID] IN ( SELECT [ID] FROM DataSource GROUP BY [ID] HAVING COUNT(DISTINCT [FamilyName]) > 1 );

Тhe solution is pretty easy. Here are the interesting parts:

replace the ^ with , in order to simplify the last name extraction
extract the last name and calculation count based on ID and last name
in the final select check for unique id-last name pairs with id count equal to 2 and add ids with more then one unique family names (your special case)

Jacek Wróbel · Accepted Answer · 2018-10-25 09:11:48Z

You can try something like that:

Test data

drop table if exists #Patient; create table #Patient ( PatientID varchar(20), PatientName varchar(50) ); insert into #Patient(PatientID,PatientName) values ('XXX-037070002' ,'Riger, Jens^Wicki'), ('XXX-037070002' ,'Riger^Wicki'), ('XXX-10052' ,'Weier,Nicole^Peggy'), ('XXX-10052' ,'Weier,Nicole^Peppy'), ('XXX-23310' ,'Rodem^Sieglinde'), ('XXX-23310' ,'Sauberger, Birgit^Finja'), ('XXX-23343' ,'Je, Ronny^Wilma'), ('XXX-23343' ,'Jer, Ronny^Wilma'), ('XXX-2349' ,'Kel,Andy^Juka'), ('XXX-2349' ,'Kel^Juka'), ('XXX-2998' ,'Hel, Frank'), ('XXX-2998' ,'Hel,Frank^Fenris'), ('XXX-3188' ,'Mey, Marion'), ('XXX-3188' ,'Mey, Marion^Paula'), ('XXX-3188' ,'Schulz^Roma'), ('XXX-3218' ,'Böntgen-Simnet,Dr. Regine^Cara'), ('XXX-3218' ,'Simnet,Dr. Regine^Cara'), ('XXX-3826' ,'Mertes, Bernd Uwe^Ellie'), ('XXX-3826' ,'Mertes,Bernd^Ellie'), ('XXX-3826' ,'Mertes^Ellie');

My solution

with q1 as ( select PatientID, PatientName, case when CHARINDEX(',',REPLACE( PatientName, '^',',')) > 0 then LEFT(PatientName,CHARINDEX(',',REPLACE( PatientName, '^',','))-1) else PatientName end as FullName from #Patient ) , q2 as ( select PatientID from q1 group by PatientID having COUNT(1) > 1 and COUNT(DISTINCT FullName) > 1 ) select t.PatientID,t.PatientName from #Patient t join q2 on t.PatientID = q2.PatientID;

Thanks, but doesnt work as intended. It still gives me double results.

Collectives™ on Stack Overflow

Improve SQL Query to find redundant data

2 Answers 2

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Linked

Related