1

This is a strategy question - but one I've been grappling with all week and could really use some wisdom-as-a-service.

Context

I am trying to piece together a clean, actionable PostgreSQL database with sanitized records from a ton of messy, noisy raw source tables.

Say we have contacts, companies, stores. Many contacts can work at a company. Multiple companies can be involved in a store (owner, maintenance, etc.) There are many-to-many (M:N) junctions between contacts_companies and companies_stores.

Table relationships:

contacts >-< contacts_companies >-< companies >-< companies_stores >-< stores unique on: contacts:email, companies:name, stores:formatted_address 

The multiple data sources have all sorts of overlaps between the above tables. Some have just store info, some have some contact and store info, etc. As I go through each data source, data will be cleansed and then upserted to the appropriate table.

Problem

The upsert operation is the problem. Inserting with an ON CONFLICT is fine for a specific table, but what about the junctions between?

To illustrate the problem, say we have a data source that has the following records:

{owner: "Alice A.", email: "A@Alice_stores.net", store_address:"1 main st"} 

This record has

  • a contact {name: Alice A, email: A@Alice_stores.net},
  • a company {name: Alice Stores}
  • a store {address: 1 main st}

The database might already have a store for 1 main st, but not a contact/company for Alice.

If we individually upsert the contact, company and store, we won't know what proper IDs to place in the junction tables:

  • If INSERT, use new id for FK in junction.
  • If Update, use existing id for fk in junction.

Question

What is the best strategy for upserting multiple tables from various sources when there are M:N relationships across the board?

One strategy that occurred to me is to query all tables/junctions, hold them in memory and perform all conditionals/updates in memory and then have a more forceful upsert. But this can quickly get out of hand...

1
  • 1
    Why can't you use returning clause in INSERT...ON CONFLICT to get value of id and then use it in link tables? Commented May 9, 2019 at 21:29

1 Answer 1

1

One way could be using returning clause with cte to make all upserts in one statement. For illustration purposes I use 1 many-to-many relationship.

--- test tables with m2m create table test1(test1_id bigserial not null primary key, uq_value1 text ,constraint unique_uq_val1 unique (uq_value1)); create table test2(test2_id bigserial not null primary key, uq_value2 text ,constraint unique_uq_val2 unique (uq_value2)); create table test1_test2 (test1_id bigint not null, test2_id bigint not null, primary key (test1_id, test2_id), constraint fk_test1 foreign key (test1_id) references test1(test1_id ), constraint fk_test2 foreign key (test2_id) references test2(test2_id ) ); ------------------------------------ -- insert value into the first table , or do a dummy update if it's already there with ins1 as (insert into test1(uq_value1) values('foo') on conflict on constraint unique_uq_val1 do update set uq_value1=test1.uq_value1 returning test1_id ) , -- insert value into the second table , or do a dummy update if it's already there ins2 as (insert into test2(uq_value2) values('bar') on conflict on constraint unique_uq_val2 do update set uq_value2=test2.uq_value2 returning test2_id ) , -- select PK of inserted records (since there is exactly one record in each insert, -- cross join is used sel_1 as ( select test1_id,test2_id from ins1 cross join ins2 ) -- finally insert into link tables if such a record doesn't exist : insert into test1_test2(test1_id, test2_id) select * from sel_1 a where not exists( select null from test1_test2 b where (b.test1_id, b.test2_id) = (a.test1_id, a.test2_id)) 
1
  • This is perfect! The CTE essentially creates a transaction with a set order of ops, and the returning gives us the latest, regardless of insert/update. Thank you @a1ex07 ! Commented May 10, 2019 at 12:40

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.