Consider the scenario of loading of a table from a flat file. the table has no constraints or indexes defined.Somehow in between loading was interrupted and after some time the table was again loaded from the same file. So this time the records already inserted during first loading were duplicated. how to find the duplicate rows now ? assume there are 150 columns in the table so group by each and every column is tedious
- Is there a primary key, or even fields which indicate a unique constraint, if only logical? Why not just delete all the records and start over? use a natural join a cte and row number?xQbert– xQbert2014-12-26 20:58:23 +00:00Commented Dec 26, 2014 at 20:58
- Maybe calculate the hash value of your record at insert, you can use the hash value to quickly find duplicates. you can even use this hash as a primary if you like so you don't insert duplicatesTien Dinh– Tien Dinh2014-12-27 20:11:58 +00:00Commented Dec 27, 2014 at 20:11
6 Answers
A record is truly duplicate only if all the column values match. It becomes different or unique even if 1 column has a different value. If your table has no primary constraints, you must compare all columns. An alternative way could be that you could do your 2nd load on a new temp table and populate your old table with records from this temp table where the records do not exist in the old table. In any case you have to compare all columns between the 2 tables to identify truly unique records.
You could also consider adding a primary key to your table and then running your delete query. Check the accepted answer on this link
Comments
You can use ROWID for deleting duplicate rows;
Select * FROM table_name A WHERE a.rowid > ANY ( SELECT B.rowid FROM table_name B WHERE A.col1 = B.col1 AND A.col2 = B.col2 ); here is a useful link: [http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm
1 Comment
Tested... Appears to work...
1st we get a list of the table columns in a comma separated list
SELECT wm_concat(column_Name) FROM all_tab_cols WHERE table_name = 'TABLENAME'Select and Column_ID is not null; copy the results into query below where ResultList is defined. adjust 'Tablename' to your table.
WITH CTE AS (SELECT TN.*, RowNum RN from 'TableName' TN order by ResultList), SELECT * FROM CTE A INNER JOIN CTE B using (ResultList) WHERE A.RN <> B.RN The above uses natrual joins to join all the tables columns to the same table columns and since duplicate rows will have different row numbers, the result set will list both offending records.
Comments
I got this snippet somewhere along the line for deleting dups:
DELETE FROM TABLE_NAME WHERE ROWID IN (SELECT ROWID FROM TABLE_NAME MINUS SELECT MIN(ROWID) FROM TABLE_NAME GROUP BY <column list> ); Note the column_list lists the columns that are used to determine uniqueness.
2 Comments
Select * FROM table_name A WHERE a.rowid > ( SELECT min (B.rowid) FROM table_name B WHERE A.row_id = B.row_id ); 1 Comment
Suppose you are having a test table(table in which you moved the record using flat file) dummd which is having multiple columns (like 150 and you are not sure which column is unique or primary )and duplicate rows so to find all the unique records you can use union and then create a view or new table like i did as test1 :-
create table test1 as select * from dummd union select * from dummd