finding duplicate records in a table in oracle

Question

Consider the scenario of loading of a table from a flat file. the table has no constraints or indexes defined.Somehow in between loading was interrupted and after some time the table was again loaded from the same file. So this time the records already inserted during first loading were duplicated. how to find the duplicate rows now ? assume there are 150 columns in the table so group by each and every column is tedious

Is there a primary key, or even fields which indicate a unique constraint, if only logical? Why not just delete all the records and start over? use a natural join a cte and row number? — xQbert
– xQbert, Commented Dec 26, 2014 at 20:58
Maybe calculate the hash value of your record at insert, you can use the hash value to quickly find duplicates. you can even use this hash as a primary if you like so you don't insert duplicates — Tien Dinh
– Tien Dinh, Commented Dec 27, 2014 at 20:11

Community · Accepted Answer · 2017-05-23 12:28:27Z

A record is truly duplicate only if all the column values match. It becomes different or unique even if 1 column has a different value. If your table has no primary constraints, you must compare all columns. An alternative way could be that you could do your 2nd load on a new temp table and populate your old table with records from this temp table where the records do not exist in the old table. In any case you have to compare all columns between the 2 tables to identify truly unique records.

You could also consider adding a primary key to your table and then running your delete query. Check the accepted answer on this link

void · Accepted Answer · 2014-12-26 21:05:27Z

You can use ROWID for deleting duplicate rows;

Select * FROM table_name A WHERE a.rowid > ANY ( SELECT B.rowid FROM table_name B WHERE A.col1 = B.col1 AND A.col2 = B.col2 );

here is a useful link: [http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm

xQbert · Accepted Answer · 2014-12-26 21:23:25Z

Tested... Appears to work...

1st we get a list of the table columns in a comma separated list

SELECT wm_concat(column_Name) FROM all_tab_cols WHERE table_name = 'TABLENAME'Select and Column_ID is not null;

copy the results into query below where ResultList is defined. adjust 'Tablename' to your table.

WITH CTE AS (SELECT TN.*, RowNum RN from 'TableName' TN order by ResultList), SELECT * FROM CTE A INNER JOIN CTE B using (ResultList) WHERE A.RN <> B.RN

The above uses natrual joins to join all the tables columns to the same table columns and since duplicate rows will have different row numbers, the result set will list both offending records.

Gary_W · Accepted Answer · 2014-12-26 21:43:00Z

0

I got this snippet somewhere along the line for deleting dups:

DELETE FROM TABLE_NAME WHERE ROWID IN (SELECT ROWID FROM TABLE_NAME MINUS SELECT MIN(ROWID) FROM TABLE_NAME GROUP BY <column list> );

Note the column_list lists the columns that are used to determine uniqueness.

answered Dec 26, 2014 at 21:43

Gary_W

10.4k1 gold badge26 silver badges42 bronze badges

2 Comments

Sundararaman P Over a year ago

yes, i am aware of this method, as I mentioned in the question if suppose the table contains 150 columns it is tedious to add each of them in the group by list. also this is a table with no constraints or indexes defined on it. in such a case is there any means of acheiving the same.

Gary_W Over a year ago

If you loaded with SQL*Loader the log file will tell you how many rows were loaded. When reloading, use the SKIP= option to skip what was already loaded. Another method to ease the hassle if you ever have to reload, make sure your table has a load_date column or a load sequence ID you set for each load. Then its a snap to delete where the date or ID matches the last load, then reload.

Eric Hauenstein · Accepted Answer · 2014-12-29 15:34:37Z

0

Select * FROM table_name A WHERE a.rowid > ( SELECT min (B.rowid) FROM table_name B WHERE A.row_id = B.row_id );

edited Dec 29, 2014 at 15:34

Eric Hauenstein

2,5656 gold badges35 silver badges43 bronze badges

answered Dec 29, 2014 at 15:04

arpit

111 silver badge4 bronze badges

1 Comment

Eric Hauenstein Over a year ago

Perhaps you could explain what is going on in your code, to help others.

Dharmesh Porwal · Accepted Answer · 2015-01-03 07:25:54Z

Suppose you are having a test table(table in which you moved the record using flat file) dummd which is having multiple columns (like 150 and you are not sure which column is unique or primary )and duplicate rows so to find all the unique records you can use union and then create a view or new table like i did as test1 :-

 create table test1 as select * from dummd union select * from dummd

Collectives™ on Stack Overflow

finding duplicate records in a table in oracle

6 Answers 6

Comments

1 Comment

Comments

2 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

Comments

2 Comments

1 Comment

Comments

Linked

Related