0

I have a big data table that looks something like this

ID Marker Value1 Value2 ================================ 1 A 10 11 1 B 12 13 1 C 14 15 2 A 10 11 2 B 13 12 2 C 3 A 10 11 3 C 12 13 

I want to search this data by the following data, which is user input and not stored in a table:

Marker Value1 Value2 ========================== A 10 11 B 12 13 C 14 14 

The result should be something like this:

ID Marker Value1 Value2 Match? ========================================== 1 A 10 11 true 1 B 12 13 true 1 C 14 15 false 2 A 10 11 true 2 B 13 12 true 2 C false 3 A 10 11 true 3 C 12 13 false 

And ultimately this (the above table is not necessary, it should demonstrate how these values came to be):

ID Matches Percent ======================== 1 2 66% 2 2 66% 3 1 33% 

I'm searching for the most promising approach to get this to work in SQL (PostgreSQL to be exact).

My ideas:

  1. Create a temporary table, join it with the above one and group the result
  2. Use CASE WHEN or a temporary PROCEDURE to only use a single (probably bloated) query

I'm not satisified with either approach, hence the question. How can I compare two tables like these efficiently?

6
  • Looks like a LEFT JOIN. Commented Jun 25, 2019 at 9:39
  • @jarlh You're right, maybe that's the easiest way. Creating temporary tables just feels wrong somehow. Commented Jun 25, 2019 at 9:48
  • You can use GROUP BY + Count for the first step... When record HAVING COUNT > 1 then match is true Commented Jun 25, 2019 at 9:49
  • Why is ('B', 13,12) for ID = 2 a match if the user input is ('B', 12, 13)? Commented Jun 25, 2019 at 12:14
  • @a_horse_with_no_name Because the columns "Value1" and "Value2" are semantically the same. I even thought about using arrays to represent these two columns as one, but I'm not sure it's worth the hassle. Commented Jun 25, 2019 at 12:32

3 Answers 3

1

The user input can be supplied using a VALUES clause in a common table expression and that can then be used in a left join with the actual table.

with user_input (marker, value1, value2) as ( values ('A', 10, 11), ('B', 12, 13), ('C', 14, 14) ) select d.id, count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)), 100 * count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)) / cast(count(*) as numeric) as pct from data d left join user_input u on (d.marker, d.value1, d.value2) = (u.marker, u.value1, u.value2) group by d.id order by d.id; 

Returns:

id | count | pct ---+-------+------ 1 | 2 | 66.67 2 | 2 | 66.67 3 | 1 | 50.00 

Online example: https://rextester.com/OBOOD9042

Edit

If the order of the values isn't relevant (so (12,13) is considered the same as (13,12) then the comparison gets a bit more complicated.

with user_input (marker, value1, value2) as ( values ('A', 10, 11), ('B', 12, 13), ('C', 14, 14) ) select d.id, count(*) filter (where (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) is not distinct from (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2))) from data d left join user_input u on (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) = (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2)) group by d.id order by d.id; 
Sign up to request clarification or add additional context in comments.

Comments

1

You can use a CTE to pre-compute the matches. Then a simple aggregation will do the trick. Assuming your parameters are:

Marker Value1 Value2 ========================== m1 x1 y1 m2 x2 y2 m3 x3 y3 

You can do:

with x as ( select id, case when marker = :m1 and (value1 = :x1 and value2 = :y1 or value1 = :y1 and value2 = :x1) or marker = :m2 and (value1 = :x2 and value2 = :y2 or value1 = :y2 and value2 = :x2) or marker = :m3 and (value1 = :x3 and value2 = :y3 or value1 = :y3 and value2 = :x3) then 1 else 0 end as matches from t ) select id, sum(matches) as matches, 100.0 * sum(matches) / count(*) as percent from x group by id 

Comments

1

Try this:

CREATE TABLE #Temp ( Marker nvarchar(50), Value1 nvarchar(50), Value2 nvarchar(50) ) INSERT INTO #Temp Values ('A', '10', '11') INSERT INTO #Temp Values ('B', '12', '13') INSERT INTO #Temp Values ('C', '14', '14') SELECT m.Id, m.Marker, m.Value1, m.Value2, (Select CASE WHEN COUNT(*) = 0 THEN 'False' WHEN COUNT(*) <> 0 THEN 'True' END FROM #Temp t WHERE t.Marker = m.Marker and t.Value1 = m.Value1 and t.Value2 = m.Value2) as Matches FROM [Test].[dbo].[Markers] m ORDER BY Matches DESC Drop TABLE #Temp 

enter image description here

If it's exactly what you want, I try to solve the second part of it.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.