29

I'm in a situation where I want to get the minimum value from of 6 columns.

I've found three ways so far to accomplish this, but I have concerns with the performance of these methods and would like to know which would be better for performance.

The first method is to use a big case statement. Here's an example with 3 columns, based on the example in the link above. My case statement would be much longer since I will be looking at 6 columns.

Select Id, Case When Col1 <= Col2 And Col1 <= Col3 Then Col1 When Col2 <= Col3 Then Col2 Else Col3 End As TheMin From MyTable 

The second option is to use the UNION operator with multiple select statements. I would put this in an UDF that accepts an Id parameter.

select Id, dbo.GetMinimumFromMyTable(Id) from MyTable 

and

select min(col) from ( select col1 [col] from MyTable where Id = @id union all select col2 from MyTable where Id = @id union all select col3 from MyTable where Id = @id ) as t 

And the 3rd option I found was to use the UNPIVOT operator, which I didn't even know existed until just now

with cte (ID, Col1, Col2, Col3) as ( select ID, Col1, Col2, Col3 from TestTable ) select cte.ID, Col1, Col2, Col3, TheMin from cte join ( select ID, min(Amount) as TheMin from cte UNPIVOT (Amount for AmountCol in (Col1, Col2, Col3)) as unpvt group by ID ) as minValues on cte.ID = minValues.ID 

Because of the table size, and frequency in which this table is queried and updated, I am concerned about the performance impact these queries would have on the database.

This query will actually be used in a join to a table with a few million records, however the records returned will be reduced to around a hundred records at a time. It will get run many times throughout the day, and the 6 columns I am querying are frequently updated (they contain daily stats). I do not think there are any indexes on the 6 columns I am querying.

Which of these methods is better for performance when trying to get the minimum of multiple columns? Or is there another better method that I don't know of?

I am using SQL Server 2005

Sample Data & Results

If my data contained records like this:

 Id Col1 Col2 Col3 Col4 Col5 Col6 1 3 4 0 2 1 5 2 2 6 10 5 7 9 3 1 1 2 3 4 5 4 9 5 4 6 8 9 

The end result should be

 Id Value 1 0 2 2 3 1 4 4 
0

7 Answers 7

22

I tested the performance of all 3 methods, and here's what I found:

  • 1 record: No noticeable difference
  • 10 records: No noticeable difference
  • 1,000 records: No noticeable difference
  • 10,000 records: UNION subquery was a little slower. The CASE WHEN query is a little faster than the UNPIVOT one.
  • 100,000 records: UNION subquery is significantly slower, but UNPIVOT query becomes a little faster than the CASE WHEN query
  • 500,000 records: UNION subquery still significantly slower, but UNPIVOT becomes much faster than the CASE WHEN query

So the end results seems to be

  • With smaller record sets there doesn't seem to be enough of a difference to matter. Use whatever is easiest to read and maintain.

  • Once you start getting into larger record sets, the UNION ALL subquery begins to perform poorly compared to the other two methods.

  • The CASE statement performs the best up until a certain point (in my case, around 100k rows), and which point the UNPIVOT query becomes the best-performing query

The actual number at which one query becomes better than another will probably change as a result of your hardware, database schema, data, and current server load, so be sure to test with your own system if you're concerned about performance.

I also ran some tests using Mikael's answer; however, it was slower than all 3 of the other methods tried here for most recordset sizes. The only exception was it did better than a the UNION ALL query for very large recordset sizes. I like the fact it shows the column name in addition to the smallest value though.

I'm not a dba, so I may not have optimized my tests and missed something. I was testing with the actual live data, so that may have affected the results. I tried to account for that by running each query a few different times, but you never know. I would definitely be interested if someone wrote up a clean test of this and shared their results.

0
6

Don't know about what is fastest but you could try something like this.

declare @T table ( Col1 int, Col2 int, Col3 int, Col4 int, Col5 int, Col6 int ) insert into @T values(1, 2, 3, 4, 5, 6) insert into @T values(2, 3, 1, 4, 5, 6) select T4.ColName, T4.ColValue from @T as T1 cross apply ( select T3.ColValue, T3.ColName from ( select row_number() over(order by T2.ColValue) as rn, T2.ColValue, T2.ColName from ( select T1.Col1, 'Col1' union all select T1.Col2, 'Col2' union all select T1.Col3, 'Col3' union all select T1.Col4, 'Col4' union all select T1.Col5, 'Col5' union all select T1.Col6, 'Col6' ) as T2(ColValue, ColName) ) as T3 where T3.rn = 1 ) as T4 

Result:

ColName ColValue ------- ----------- Col1 1 Col3 1 

If you are not interested in what column has the min value you can use this instead.

declare @T table ( Id int, Col1 int, Col2 int, Col3 int, Col4 int, Col5 int, Col6 int ) insert into @T select 1, 3, 4, 0, 2, 1, 5 union all select 2, 2, 6, 10, 5, 7, 9 union all select 3, 1, 1, 2, 3, 4, 5 union all select 4, 9, 5, 4, 6, 8, 9 select T.Id, (select min(T1.ColValue) from ( select T.Col1 union all select T.Col2 union all select T.Col3 union all select T.Col4 union all select T.Col5 union all select T.Col6 ) as T1(ColValue) ) as ColValue from @T as T 

A simplified unpivot query.

select Id, min(ColValue) as ColValue from @T unpivot (ColValue for Col in (Col1, Col2, Col3, Col4, Col5, Col6)) as U group by Id 
0
6

Add a persisted computed column that uses a CASE statement to do the logic you need.

The minimum value will then always be efficiently available when you need to do a join (or whatever else) based on that value.

The value will be recomputed every time any of the source values change (INSERT/UPDATE/MERGE). I'm not saying this is necessarily the best solution for the workload, I merely offer it as a solution, just like the other answers. Only the OP can determine which is best for the workload.

0
1

Case statement for 6 dates. To do less, copy the true branch from the first case statement. Worst case is when Date1 is the lowest value, best case is when Date6 is the lowest value, so put the most likely date in Date6. I wrote this because of the limitations of computed columns.

CASE WHEN Date1 IS NULL OR Date1 > Date2 THEN CASE WHEN Date2 IS NULL OR Date2 > Date3 THEN CASE WHEN Date3 IS NULL OR Date3 > Date4 THEN CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN Date6 ELSE Date4 END END ELSE CASE WHEN Date3 IS NULL OR Date3 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date3 IS NULL OR Date3 > Date6 THEN Date6 ELSE Date3 END END END ELSE CASE WHEN Date2 IS NULL OR Date2 > Date4 THEN CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN Date6 ELSE Date4 END END END ELSE CASE WHEN Date2 IS NULL OR Date2 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date2 IS NULL OR Date2 > Date6 THEN Date6 ELSE Date2 END END END END ELSE CASE WHEN Date1 IS NULL OR Date1 > Date3 THEN CASE WHEN Date3 IS NULL OR Date3 > Date4 THEN CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN Date6 ELSE Date4 END END ELSE CASE WHEN Date3 IS NULL OR Date3 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date3 IS NULL OR Date3 > Date6 THEN Date6 ELSE Date3 END END END ELSE CASE WHEN Date1 IS NULL OR Date1 > Date4 THEN CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN Date6 ELSE Date4 END END ELSE CASE WHEN Date1 IS NULL OR Date1 > Date5 THEN CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN Date6 ELSE Date5 END ELSE CASE WHEN Date1 IS NULL OR Date1 > Date6 THEN Date6 ELSE Date1 END END END END END 

If you came across this page simply looking to compare dates and aren't as concerned about performance or compatibility, you can use a Table Value Constructor, which can be used wherever subselects are allowed (SQL Server 2008 and up):

Lowest = ( SELECT MIN(TVC.d) FROM ( VALUES (Date1), (Date2), (Date3), (Date4), (Date5), (Date6) ) AS TVC(d) ) 
1

Your case statement is not efficient. You are doing 5 comparisons in the worst case, and 2 in best case; whereas finding the minimum of n should do at most n-1 comparisons.

For each row, on average you are doing 3.5 comparisons instead of 2. Thus it is taking more cpu time and is slow. Try your tests again using the below case statement. It is just using 2 comparisons per row and should be more efficient than unpivot and union all.

Select Id, Case When Col1 <= Col2 then case when Col1 <= Col3 Then Col1 else col3 end When Col2 <= Col3 Then Col2 Else Col3 End As TheMin From YourTableNameHere 

The union all method is wrong in your case as you are getting the minimum value not per row but for whole table. Also, it wont be efficient as you are going to scan the same table 3 times. When the table is small, the I/O won't make much difference, but for large tables it will. Do not use that method.

Unpivot is good and give a try to manual unpivot as well by using cross join your table with (select 1 union all select 2 union all select 3). It should be as efficient as the unpivot.

Best solution would be having a computed persisted column, if you do not have space issues. It will add to the size of the row by 4 bytes (I suppose you will have int type), which in turn will increase the size of the table.

However, space and memory is issue in your system and CPU is not then do not make it persisted but use simple computed column using the case statement. It will make the code simpler.

-1

I guess that the first option is fastest (although it does not look very slick from programming perspective!). This is because It deals with exactly N rows (where N is the table size) and has to do no search or sort like method 2 or 3.

A test with large sample should prove the point.

As yet another option to consider (as if you need more!), is to create a materialized view over your table. if your table size is in 100s of thousands or more. This way, the min value is calculated while the row is changed and the entire table would not have to be processed with every query. In SQL Server, materialized views are called Indexed Views

0
-1
Create table #temp ( id int identity(1,1), Name varchar(30), Year1 int, Year2 int, Year3 int, Year4 int ) Insert into #temp values ('A' ,2015,2016,2014,2010) Insert into #temp values ('B' ,2016,2013,2017,2018) Insert into #temp values ('C' ,2010,2016,2014,2017) Insert into #temp values ('D' ,2017,2016,2014,2015) Insert into #temp values ('E' ,2016,2016,2016,2016) Insert into #temp values ('F' ,2016,2017,2018,2019) Insert into #temp values ('G' ,2016,2017,2020,2019) Select *, Case when Year1 >= Year2 and Year1 >= Year3 and Year1 >= Year4 then Year1 when Year2 >= Year3 and Year2 >= Year4 and Year2 >= Year1 then Year2 when Year3 >= Year4 and Year3 >= Year1 and Year3 >= Year2 then Year3 when Year4 >= Year1 and Year4 >= Year2 and Year4 >= Year3 then Year4 else Year1 end as maxscore from #temp 
1
  • You are not accounting for NULLs – that makes your CASE expression relatively simple. However, if at least one of the columns is indeed NULL, your solution will return Year1 as the result, which may not necessarily be correct. Commented Nov 24, 2015 at 10:20

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.