2

I'm trying to make a table to store the edges of a directed graph, but want to avoid duplicate edges in both directions. I can of course make it one direction :

CREATE TABLE edge (parent integer, child integer, UNIQUE(parent, child)); 

But how can I make sure that I also don't allow a new (parent, child) that already exists as (child, parent) ?

Any ideas would be greatly appreciated !

2 Answers 2

2

Using GREATEST and LEAST

Essentially, if we can sort of the columns in the index, we can ensure that the duplicate entry causes a collision and a constraint violation this is because sort(5,2) = sort(2,5). Because there is no sort, we use greatest and least.

CREATE TABLE edge ( parent int, child int ); CREATE UNIQUE INDEX ON edge ( greatest(parent,child), least(parent,child) ); INSERT INTO edge(parent,child) VALUES (42,7), (42,9), (5,7); INSERT INTO edge(parent,child) VALUES (7,42); ERROR: duplicate key value violates unique constraint "edge_greatest_least_idx" DETAIL: Key ((GREATEST(parent, child)), (LEAST(parent, child)))=(42, 7) already exists. 
2
  • 1
    Thanks ! the greatest / least was the thing I was apparently too stupid for. Commented Oct 4, 2017 at 17:53
  • @EvanCarroll - check out my answer! What's your assessment of performance issues? Commented Oct 6, 2017 at 2:59
0

You can also do this: (for positive INTEGERs db-fiddle)

CREATE TABLE edge ( parent int NOT NULL, child int NOT NULL ); CREATE UNIQUE INDEX ed_uq_ix ON edge (ABS(parent - child), (parent * child)); 

Non conflicting records:

INSERT INTO edge(parent,child) VALUES (42,7), (42,9), (5,7); -- succeeds 

BUT:

INSERT INTO edge VALUES (9, 42); -- fails! 
6
  • 1
    I'm not sure it's faster -- and I'm not sure the speed matters on this kind of operation, but it won't work on more than 100k parent/child rows without growing to bigint. For instance, run your create DDL and then run INSERT INTO edge(parent,child) VALUES (1e5, 1e5); My example works up until 2,147,483,647 (>1e9) which is the full range of ints. Commented Oct 6, 2017 at 3:23
  • Won't PostgreSQL just CAST anyway? I'm not sure any system would cope with 2^32 edges? Commented Oct 6, 2017 at 3:28
  • 1
    I think you're missing what I'm saying. In a normal serial column, you can handle 2^31-1 rows and the primary key can handle 2^32 rows. Both of those are usually far more rows than needed. In your index, you can only handle 46340 rows. Why? Because your algorithm is p*c assuming p=c, you can exhaust that range in 46,341. I find it possible to have more than 46,341 edges. So the difference between your range and my range is (2^31-1) - sqrt(2^31-1) you have to value the performance difference, if anything, that much; if you can only support 46,341 rows - does it matter? Commented Oct 6, 2017 at 4:34
  • Besides the other reasons - already mentioned by Evan and a_horse_with_no_name - this has an issue with negative ids as well. It doesn't allow both (-5,1) and (-1,5) for example. Commented Oct 6, 2017 at 7:34
  • @ypercubeᵀᴹ edited to deal with your concerns! Hadn't thought of that! Commented Oct 6, 2017 at 10:14

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.