Add a unique constraint but ignore existing table data

Question

I wish to add a unique constraint to a table using the following query -

ALTER TABLE events ADD CONSTRAINT events_timestamp_request_session_key UNIQUE (event_timestamp, request_id, session_id);

The query is valid, however on one database the existing data does not meet the constraint, and so the query fails -

ERROR: could not create unique index "events_timestamp_request_session_key"
DETAIL: Key (event_timestamp, request_id, session_id)=(2017-07-05 14:53:25.475246+00, a55df6-8533e322cd-aa9d57-87e2, 132443) is duplicated.

It is expected that there would be a few duplicates, but unfortunately I cannot simply delete or alter them.

Is there any way to add the constraint as required, while ignoring the existing data in the table?

Gary - Stand with Ukraine · Accepted Answer · 2017-07-25 15:55:26Z

You can use a partial index for this, its not a particularly nice solution but it will work until you can correct your old data.

Something like:

CREATE UNIQUE INDEX events_timestamp_request_session_key ON events (event_timestamp, request_id, session_id) WHERE event_timestamp >= '2017-07-01'::timestamp;

where the time is the start of your clean data.

The where clause limits the index to only looking at records with a newer event timestamp. The old records are excluded from the index entirely so don't get considered for uniqueness checks.

Doc: https://www.postgresql.org/docs/9.6/static/indexes-partial.html

It works and forces the combination of those columns to be unique, so thanks for that. Going forward though, do you know what the impact on queries will be? I.e. with roughly 3 million rows being added a week, will queries be slower than they would have with the unique constraint? Thanks.
@David Gard. Inserts will be slower because they're having to maintain an extra index. You'll need to benchmark to determine if that causes you a problem. One thing to consider is that because this is a partial index, it won't help with any queries you want to run that might look at old data, so you could find yourself needing 2 indexes, a unique one for partial table, and a non-unique one for the whole table.

Gordon Linoff · Accepted Answer · 2017-07-20 11:10:14Z

I don't think there is a built-in method for doing this. You can, however, create a unique index using an expression. Let me assume that you have a serial unique id in each table:

create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id, (case when event_id < 99999 then event_id else -1 end) );

The expression essentially says: "if this key is already in the table, then ignore it for the unique constraint".

You could enforce the unique constraint for all but the current duplicates by eliminating them:

create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id, (case when event_id in (1, 2, 3, 5, 8) then event_id else -1 end) );

This would require research into the current duplicates. Actually, you can also do either with a filter clause:

create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id) where event_id > 999999 ; create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id) where event_id not in (1, 2, 3, 5, 8) ;

Collectives™ on Stack Overflow

Add a unique constraint but ignore existing table data

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related