Remove duplicate from a table

Question

The database type is PostGres 8.3.

If I wrote:

SELECT field1, field2, field3, count(*) FROM table1 GROUP BY field1, field2, field3 having count(*) > 1;

I have some rows that have a count over 1. How can I take out the duplicate (I do still want 1 row for each of them instead of +1 row... I do not want to delete them all.)

Example:

1-2-3 1-2-3 1-2-3 2-3-4 4-5-6

Should become :

1-2-3 2-3-4 4-5-6

The only answer I found is there but I am wondering if I could do it without hash column.

Warning I do not have a PK with an unique number so I can't use the technique of min(...). The PK is the 3 fields.

I am not sure whether I understand it right. You say "The PK is the 3 fields" - then how come you have duplicates in the table like multiple 1-2-3 records. Correct me if I am wrong. — Vijay Dev
– Vijay Dev, Commented Oct 28, 2008 at 15:02
The PK was in the 3 fields, we had to remove them for a merge (long story) now we need to put it back. We have some duplicate that we would like to take off. — Patrick Desjardins
– Patrick Desjardins, Commented Oct 28, 2008 at 15:10

Tom H · Accepted Answer · 2008-10-28 19:45:53Z

This is one of many reasons that all tables should have a primary key (not necessarily an ID number or IDENTITY, but a combination of one or more columns that uniquely identifies a row and which has its uniqueness enforced in the database).

Your best bet is something like this:

SELECT field1, field2, field3, count(*) INTO temp_table1 FROM table1 GROUP BY field1, field2, field3 having count(*) > 1 DELETE T1 FROM table1 T1 INNER JOIN (SELECT field1, field2, field3 FROM table1 GROUP BY field1, field2, field3 having count(*) > 1) SQ ON SQ.field1 = T1.field1 AND SQ.field2 = T1.field2 AND SQ.field3 = T1.field3 INSERT INTO table1 (field1, field2, field3) SELECT field1, field2, field3 FROM temp_table1 DROP TABLE temp_table1

It works thx, I have modified few thing like DELETE T1 doesnt work, I had to put the T1 after FROM table1 as T1. Few things like that. But the solution was fast and successful here.
I'm glad that it worked for you. Thanks for pointing out my overlooking the T1 alias. I've corrected it in the script.

Jonathan Leffler · Accepted Answer · 2008-10-28 14:58:14Z

One possible answer is:

CREATE <temporary table> (<correct structure for table being cleaned>); BEGIN WORK; -- if needed INSERT INTO <temporary table> SELECT DISTINCT * FROM <source table>; DELETE FROM <source table> INSERT INTO <source table> SELECT * FROM <temporary table>; COMMIT WORK; -- needed DROP <temporary table>;

I'm not sure whether the 'work' is needed on transaction statements, nor whether the explicit BEGIN is necessary in PostgreSQL. But the concept applies to any DBMS.

The only thing to beware of is referential constraints and in particular triggered delete operations. If those exist, this may prove less satisfactory.

Brian Schmitt · Accepted Answer · 2008-10-28 15:00:57Z

This will use the OID Object ID (if the table was created with it):

DELETE FROM table1 WHERE OID NOT IN (SELECT MIN (OID) FROM table1 GROUP BY field1, field2, field3)

Guillaume · Accepted Answer · 2008-10-28 15:04:28Z

0

Well I should misunderstand something but I'll say :

SELECT DISTINCT field1, field2, field3 FROM table1

Too easy to be good? ^^

answered Oct 28, 2008 at 15:04

Guillaume

1 Comment

Vijay Dev Over a year ago

I think Daok wants to delete all but one of the duplicate entries.

Aaron Palmer · Accepted Answer · 2008-10-28 15:30:26Z

This is the simplest method I've found:

Postgre SQL syntax:

CREATE TABLE tmp AS SELECT distinct * FROM table1 truncate table table1 insert into table1 select * from tmp drop table tmp

T-SQL syntax:

select distinct * into #tmp from table1 truncate table table1 insert into table1 select * from #tmp drop table #tmp

Vijay Dev · Accepted Answer · 2014-03-04 20:36:24Z

A good Answer for this problem, but for SQL Server. It uses the ROWCOUNT that SQL Server offers, to good effect. I have never used PostgreSQL and hence don't know the equivalent of ROWCOUNT in PostgreSQL.

Ta01 · Accepted Answer · 2014-03-04 20:36:42Z

Using TSQL, no idea if Postgres supports temp tables but you could select into a temp table, and then loop through and delete and insert your results back into the original

-- **Disclaimer** using TSQL -- You could select your records into a temp table with a pk Create Table #dupes ([id] int not null identity(1,1), f1 int, f2 int, f3 int) Insert Into #dupes (f1,f2,f3) values (1,2,3) Insert Into #dupes (f1,f2,f3) values (1,2,3) Insert Into #dupes (f1,f2,f3) values (1,2,3) Insert Into #dupes (f1,f2,f3) values (2,3,4) Insert Into #dupes (f1,f2,f3) values (4,5,6) Insert Into #dupes (f1,f2,f3) values (4,5,6) Insert Into #dupes (f1,f2,f3) values (4,5,6) Insert Into #dupes (f1,f2,f3) values (7,8,9) Select f1,f2,f3 From #dupes Declare @rowCount int Declare @counter int Set @counter = 1 Set @rowCount = (Select Count([id]) from #dupes) while (@counter < @rowCount + 1) Begin Delete From #dupes Where [Id] <> (Select [id] From #dupes where [id]=@counter) and ( [f1] = (Select [f1] from #dupes where [id]=@counter) and [f2] = (Select [f2] from #dupes where [id]=@counter) and [f3] = (Select [f3] from #dupes where [id]=@counter) ) Set @counter = @counter + 1 End Select f1,f2,f3 From #dupes -- You could take these results and pump them back into --your original table Drop Table #dupes

Tested this on MS SQL Server 2000. Not familiar with Postgres' options but maybe this will lead you in a right direction.

Collectives™ on Stack Overflow

Remove duplicate from a table

7 Answers 7

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Linked

Related