If most of the rows are deleted (mostly dupes) and the table fits into RAM, consider this route:
SELECT surviving rows into a temporary table. - Reroute FK references to survivors
DELETE all rows from the base table. - Re-
INSERT survivors.
1a. Distill surviving rows
CREATE TEMP TABLE tmp AS SELECT DISTINCT ON (login_name, password) * FROM ( SELECT DISTINCT ON (email) * FROM taccounts ORDER BY email, last_login DESC ) sub ORDER BY login_name, password, last_login DESC;
About DISTINCT ON:
To identify duplicates for two different criteria, use a subquery to apply the two rules one after the other. The first step preserves the account with the latest last_login, so this is "serializable".
Inspect results and test for plausibility.
SELECT * FROM tmp;
Temporary tables are dropped automatically at the end of a session. In pgAdmin (which you seem to be using) the session lives as long as the editor window is open.
1b. Alternative query for updated definition of "duplicates"
SELECT * FROM taccounts t WHERE NOT EXISTS ( SELECT FROM taccounts t1 WHERE ( NULLIF(t1.email, '') = t.email OR (NULLIF(t1.login_name, ''), NULLIF(t1.password, '')) = (t.login_name, t.password)) AND (t1.last_login, t1.account_id) > (t.last_login, t.account_id) );
This doesn't treat NULL or empty string ('') as identical in any of the "duplicate" columns.
The row expression (t1.last_login, t1.account_id) takes care of the possibility that two dupes could share the same last_login. The one with the bigger account_id is chosen in this case - which is unique, since it is the PK.
2a. How to identify all incoming FKs
SELECT c.confrelid::regclass::text AS referenced_table , c.conname AS fk_name , pg_get_constraintdef(c.oid) AS fk_definition FROM pg_attribute a JOIN pg_constraint c ON (c.conrelid, c.conkey[1]) = (a.attrelid, a.attnum) WHERE c.confrelid = 'taccounts'::regclass -- (schema-qualified) table name AND c.contype = 'f' ORDER BY 1, contype DESC;
Only building on the first column of the foreign key. More about that:
Or inspect the Dependents rider in the right hand window of the object browser of pgAdmin after selecting the table taccounts.
2b. Reroute to new primary
If you have tables referencing taccounts (incoming foreign keys to taccounts) you will want to update all those fields, before you delete the dupes.
Reroute all of them to the new primary row:
UPDATE referencing_tbl r SET referencing_column = tmp.reference_column FROM tmp JOIN taccounts t1 USING (email) WHERE r.referencing_column = t1.referencing_column AND referencing_column IS DISTINCT FROM tmp.reference_column; UPDATE referencing_tbl r SET referencing_column = tmp.reference_column FROM tmp JOIN taccounts t2 USING (login_name, password) WHERE r.referencing_column = t1.referencing_column AND referencing_column IS DISTINCT FROM tmp.reference_column;
3. & 4. Go in for the kill
Now, dupes are not referenced any more. Go in for the kill.
ALTER TABLE taccounts DISABLE TRIGGER ALL; DELETE FROM taccounts; VACUUM taccounts; INSERT INTO taccounts SELECT * FROM tmp; ALTER TABLE taccounts ENABLE TRIGGER ALL;
Disable all triggers for the duration of the operation. This avoids checking for referential integrity during the operation. Everything should be fine once you re-activate triggers. We took care of all incoming FKs above. Outgoing FKs are guaranteed to be sound, since you have no concurrent write access and all values have been there before.
account_idthe primary key? What version of PostgreSQL? Can you afford to lock the table for some time (no concurrent access)?account_idis the primary key. Postgre ver 1.16.0. Locking is actually not required as I am working on some migration thingSELECT version()in your database. 1.16 is probably the version of pgAdmin which you seem to be using. And edit your question with the additional (essential) information.last_loginhappen to be the same for a pair of dupes? And canlogin_nameandpasswordalso beNULL. And how to deal with that?