Removing duplicated row having same keys in different column

Question

I am writing a code to look for duplicated Sub_brand_Descriptions in the same table having the same brand_code. I was able to produce the table below (which is what I need) however the first row and second row is the same because I am joining the table with itself. Is there a way to remove any of the duplicated row as distinct will not work (Different sub-brand id in the same column but because it appears in the row below in a different column, technically they are the same)?

select distinct brands.BRAND_ID as Brand_Code, sub.SUB_BRAND_ID as Sub_Brand_ID1, sub.SUB_BRAND as Sub_Brand_Descrption1, sub2.SUB_BRAND_ID as Sub_Brand_ID2, sub2.SUB_BRAND as Sub_Brand_Descrption2 from table1 as brands inner join table2 as sub on sub.BRAND_ID = brands.BRAND_ID and sub.LANGU = 'E' inner join table2 as sub2 on sub2.SUB_BRAND = sub.SUB_BRAND and sub2.LANGU = 'E' where sub.SUB_BRAND_ID != sub2.SUB_BRAND_ID and sub.BRAND_ID = sub2.BRAND_ID

Brand_Code	Sub_Brand_ID1	Sub_Brand_Descrption1	Sub_Brand_ID2	Sub_Brand_Descrption2
ABC	X123	X123ABC	Y123	X123ABC
ABC	Y123	X123ABC	X123	X123ABC

Desired output:

Brand_Code	Sub_Brand_ID1	Sub_Brand_Descrption1	Sub_Brand_ID2	Sub_Brand_Descrption2
ABC	X123	X123ABC	Y123	X123ABC

Source data: Table 1:

Brand_ID	label
ABC	1
CDE	1
EFG	2

source Table 2:

Brand_ID	Sub_Brand_ID	Sub_Brand	Language
ABC	X123	X123ABC	E
ABC	Y123	X123ABC	E
BBC	X223	H23ABC	E
BBC	Y223	H23ABC	E

Please provide sample data and actual desired results, ideally a Minimal, Reproducible Example — Stu
– Stu, Commented Nov 1, 2021 at 21:04
@Stu - added desired output. Essentially table2 contain both the sub_brandID and sub_brand_Description and I want to look for duplicated sub_brand_description within the table itself. — BKB
– BKB, Commented Nov 1, 2021 at 21:11
No you've added desired output of your current output, which is not the same as desired output given your actual source data; with what you've provided the answer is simply select distinct — Stu
– Stu, Commented Nov 1, 2021 at 21:13
How will you handle situations where there are more than two Sub_Brand_ID values for one Sub_Brand value? — Eric Brandt
– Eric Brandt, Commented Nov 1, 2021 at 21:29

Himanshu Agrawal · Accepted Answer · 2021-11-01 21:42:32Z

add a function for row_number in your query and filter it for 1, see example below: (you can change order_by_clause as per your requirement)

with cte as ( select distinct brands.BRAND_ID as Brand_Code, sub.SUB_BRAND_ID as Sub_Brand_ID1, sub.SUB_BRAND as Sub_Brand_Descrption1, sub2.SUB_BRAND_ID as Sub_Brand_ID2, sub2.SUB_BRAND as Sub_Brand_Descrption2, row_number() over (partition by Sub_Brand_Descrption1 order by Sub_Brand_ID1) as rn from table1 as brands inner join table2 as sub on sub.BRAND_ID = brands.BRAND_ID and sub.LANGU = 'E' inner join table2 as sub2 on sub2.SUB_BRAND = sub.SUB_BRAND and sub2.LANGU = 'E' where sub.SUB_BRAND_ID != sub2.SUB_BRAND_ID and sub.BRAND_ID = sub2.BRAND_ID ) Select * from cte where rn=1;

OR

Select * from ( select distinct brands.BRAND_ID as Brand_Code, sub.SUB_BRAND_ID as Sub_Brand_ID1, sub.SUB_BRAND as Sub_Brand_Descrption1, sub2.SUB_BRAND_ID as Sub_Brand_ID2, sub2.SUB_BRAND as Sub_Brand_Descrption2, row_number() over (partition by Sub_Brand_Descrption1 order by Sub_Brand_ID1) as rn from table1 as brands inner join table2 as sub on sub.BRAND_ID = brands.BRAND_ID and sub.LANGU = 'E' inner join table2 as sub2 on sub2.SUB_BRAND = sub.SUB_BRAND and sub2.LANGU = 'E' where sub.SUB_BRAND_ID != sub2.SUB_BRAND_ID and sub.BRAND_ID = sub2.BRAND_ID) as temp where rn=1

Thank you but is there a way to do this without a temp table? This will need to be used in another tool which will not allow cte
based on your source data you can use row_number().. for rows in table 2 first and then join with table 1 where row_number().. is equal to 1
Replace with cte as ( with SELECT * FROM ( and ) Select * from cte where rn=1 to ) AS x WHERE x.rn=1 and you eliminate the need for a CTE.
@criticalerror yes I was trying to point OP to same if the other tool supports sub queries.. I have edited same in the code as well.. Thanks

Caius Jard · Accepted Answer · 2021-11-01 22:09:50Z

You don't have to join twice, once and pivot the result will do

SELECT brands.brand_code, MAX(CASE WHEN rn = 1 THEN sub.sub_brand_id END) as Sub_Brand_ID1, MAX(CASE WHEN rn = 1 THEN sub.sub_brand END) as Sub_Brand_Descrption1, MAX(CASE WHEN rn = 2 THEN sub.sub_brand_id END) as Sub_Brand_ID2, MAX(CASE WHEN rn = 2 THEN sub.sub_brand END) as Sub_Brand_Descrption2 from table1 brands INNER JOIN (SELECT *, row_number() over (partition by Brand_Code order by sub_brand_id) as rn from table2) sub on sub.brand_code = brands.brand_code and sub.langu = 'E' GROUP BY brands.brand_code

I'm presuming you want other columns out of table1, otherwise ditch it and do the whole query out of t2, without a join:

SELECT sub.brand_code, MAX(CASE WHEN rn = 1 THEN sub.sub_brand_id END) as Sub_Brand_ID1, MAX(CASE WHEN rn = 1 THEN sub.sub_brand END) as Sub_Brand_Descrption1, MAX(CASE WHEN rn = 2 THEN sub.sub_brand_id END) as Sub_Brand_ID2, MAX(CASE WHEN rn = 2 THEN sub.sub_brand END) as Sub_Brand_Descrption2 from (SELECT *, row_number() over (partition by Brand_Code order by sub_brand_id) as rn from table2) sub WHERE sub.langu = 'E' GROUP BY sub.brand_code

ps; your question has some apparent typos:

You'll need to resolve these yourself, as I've no idea which is accurate

I did the first solution (table 1 is still required as I needed to display the label from it) however I got this error code: Msg 4104, Level 16, State 1, Line 10 The multi-part identifier "brands.brand_id" could not be bound.
@bkb Yea, your column names are really confusing; easy to make a typo with all the similar names... My answer doesn't contain brand_id though; are you sure you ran it right/did you make a mistake when adjusting the column names? (Your question claims the table1 column name is brand_code in the tabular representation, but brand_id in the code - see screenshot). In summary: I'm confident in the logic, but you might have to fix some typos in the column names
Thanks for pointing that out. Yes brand_code and brand_id are the same. I've managed to return results from your query however I'm not getting results where the sub_brand_description1 and sub_brand_description2 are identical, in fact some of the results have null values. Should there be an extra step to match the sub_brand_Descriptions?
Can you create an sql fiddle with some example data? I can't quite work out what the problem is from the way it is described. As long as there are at least two sub brands rows this query gives both. You'll get nulls In sub_brand_x_2 if there is only one sub brand record for a given brand; personally I would leave it null so you can tell there is only one, or handle it in your front end, but if you want the data to be repeated you can COALESCE(MAX(CASE WHEN rn = 2 ... END), MAX(CASE WHEN rn = 1 ... END)) which effectively means "if there is no second sub brand, use sub brand 1 details again"
Oh, and if you modify an answer when integrating it, and it develops a problem, be sure it's not because of your modifications - one of the classic problems of changing example data to eg hide true column/table names means that any answers you give have to be back-converted by you which introduces an extra potential for error.

Collectives™ on Stack Overflow

Removing duplicated row having same keys in different column

2 Answers 2

7 Comments

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

5 Comments

Related