How to Compare Multiple Columns and Rows of a Table

Question

I have a big data table that looks something like this

ID Marker Value1 Value2 ================================ 1 A 10 11 1 B 12 13 1 C 14 15 2 A 10 11 2 B 13 12 2 C 3 A 10 11 3 C 12 13

I want to search this data by the following data, which is user input and not stored in a table:

Marker Value1 Value2 ========================== A 10 11 B 12 13 C 14 14

The result should be something like this:

ID Marker Value1 Value2 Match? ========================================== 1 A 10 11 true 1 B 12 13 true 1 C 14 15 false 2 A 10 11 true 2 B 13 12 true 2 C false 3 A 10 11 true 3 C 12 13 false

And ultimately this (the above table is not necessary, it should demonstrate how these values came to be):

ID Matches Percent ======================== 1 2 66% 2 2 66% 3 1 33%

I'm searching for the most promising approach to get this to work in SQL (PostgreSQL to be exact).

My ideas:

Create a temporary table, join it with the above one and group the result
Use CASE WHEN or a temporary PROCEDURE to only use a single (probably bloated) query

I'm not satisified with either approach, hence the question. How can I compare two tables like these efficiently?

@jarlh You're right, maybe that's the easiest way. Creating temporary tables just feels wrong somehow. — Stefan S.
– Stefan S., Commented Jun 25, 2019 at 9:48
You can use GROUP BY + Count for the first step... When record HAVING COUNT > 1 then match is true — Nifriz
– Nifriz, Commented Jun 25, 2019 at 9:49
Why is ('B', 13,12) for ID = 2 a match if the user input is ('B', 12, 13)? — user330315
– user330315, Commented Jun 25, 2019 at 12:14
@a_horse_with_no_name Because the columns "Value1" and "Value2" are semantically the same. I even thought about using arrays to represent these two columns as one, but I'm not sure it's worth the hassle. — Stefan S.
– Stefan S., Commented Jun 25, 2019 at 12:32

score 1 · Accepted Answer · 2019-06-25 12:36:50Z

The user input can be supplied using a VALUES clause in a common table expression and that can then be used in a left join with the actual table.

with user_input (marker, value1, value2) as ( values ('A', 10, 11), ('B', 12, 13), ('C', 14, 14) ) select d.id, count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)), 100 * count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)) / cast(count(*) as numeric) as pct from data d left join user_input u on (d.marker, d.value1, d.value2) = (u.marker, u.value1, u.value2) group by d.id order by d.id;

Returns:

id | count | pct ---+-------+------ 1 | 2 | 66.67 2 | 2 | 66.67 3 | 1 | 50.00

Online example: https://rextester.com/OBOOD9042

Edit

If the order of the values isn't relevant (so (12,13) is considered the same as (13,12) then the comparison gets a bit more complicated.

with user_input (marker, value1, value2) as ( values ('A', 10, 11), ('B', 12, 13), ('C', 14, 14) ) select d.id, count(*) filter (where (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) is not distinct from (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2))) from data d left join user_input u on (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) = (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2)) group by d.id order by d.id;

The Impaler · Accepted Answer · 2019-06-25 09:58:04Z

You can use a CTE to pre-compute the matches. Then a simple aggregation will do the trick. Assuming your parameters are:

Marker Value1 Value2 ========================== m1 x1 y1 m2 x2 y2 m3 x3 y3

You can do:

with x as ( select id, case when marker = :m1 and (value1 = :x1 and value2 = :y1 or value1 = :y1 and value2 = :x1) or marker = :m2 and (value1 = :x2 and value2 = :y2 or value1 = :y2 and value2 = :x2) or marker = :m3 and (value1 = :x3 and value2 = :y3 or value1 = :y3 and value2 = :x3) then 1 else 0 end as matches from t ) select id, sum(matches) as matches, 100.0 * sum(matches) / count(*) as percent from x group by id

Saeid Amini · Accepted Answer · 2019-06-25 10:37:15Z

Try this:

CREATE TABLE #Temp ( Marker nvarchar(50), Value1 nvarchar(50), Value2 nvarchar(50) ) INSERT INTO #Temp Values ('A', '10', '11') INSERT INTO #Temp Values ('B', '12', '13') INSERT INTO #Temp Values ('C', '14', '14') SELECT m.Id, m.Marker, m.Value1, m.Value2, (Select CASE WHEN COUNT(*) = 0 THEN 'False' WHEN COUNT(*) <> 0 THEN 'True' END FROM #Temp t WHERE t.Marker = m.Marker and t.Value1 = m.Value1 and t.Value2 = m.Value2) as Matches FROM [Test].[dbo].[Markers] m ORDER BY Matches DESC Drop TABLE #Temp

If it's exactly what you want, I try to solve the second part of it.

Collectives™ on Stack Overflow

How to Compare Multiple Columns and Rows of a Table

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related