8

I wish to add a unique constraint to a table using the following query -

ALTER TABLE events ADD CONSTRAINT events_timestamp_request_session_key UNIQUE (event_timestamp, request_id, session_id); 

The query is valid, however on one database the existing data does not meet the constraint, and so the query fails -

ERROR: could not create unique index "events_timestamp_request_session_key"
DETAIL: Key (event_timestamp, request_id, session_id)=(2017-07-05 14:53:25.475246+00, a55df6-8533e322cd-aa9d57-87e2, 132443) is duplicated.

It is expected that there would be a few duplicates, but unfortunately I cannot simply delete or alter them.

Is there any way to add the constraint as required, while ignoring the existing data in the table?

2 Answers 2

14

You can use a partial index for this, its not a particularly nice solution but it will work until you can correct your old data.

Something like:

CREATE UNIQUE INDEX events_timestamp_request_session_key ON events (event_timestamp, request_id, session_id) WHERE event_timestamp >= '2017-07-01'::timestamp; 

where the time is the start of your clean data.

The where clause limits the index to only looking at records with a newer event timestamp. The old records are excluded from the index entirely so don't get considered for uniqueness checks.

Doc: https://www.postgresql.org/docs/9.6/static/indexes-partial.html

Sign up to request clarification or add additional context in comments.

2 Comments

It works and forces the combination of those columns to be unique, so thanks for that. Going forward though, do you know what the impact on queries will be? I.e. with roughly 3 million rows being added a week, will queries be slower than they would have with the unique constraint? Thanks.
@David Gard. Inserts will be slower because they're having to maintain an extra index. You'll need to benchmark to determine if that causes you a problem. One thing to consider is that because this is a partial index, it won't help with any queries you want to run that might look at old data, so you could find yourself needing 2 indexes, a unique one for partial table, and a non-unique one for the whole table.
3

I don't think there is a built-in method for doing this. You can, however, create a unique index using an expression. Let me assume that you have a serial unique id in each table:

create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id, (case when event_id < 99999 then event_id else -1 end) ); 

The expression essentially says: "if this key is already in the table, then ignore it for the unique constraint".

You could enforce the unique constraint for all but the current duplicates by eliminating them:

create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id, (case when event_id in (1, 2, 3, 5, 8) then event_id else -1 end) ); 

This would require research into the current duplicates. Actually, you can also do either with a filter clause:

create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id) where event_id > 999999 ; create unique index unq_events_timestamp_request_session_key on (event_timestamp, request_id, session_id) where event_id not in (1, 2, 3, 5, 8) ; 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.