2

I have a SQL Server temporal table with the following data:

ID ValidFrom ValidTo MyValue 23 7/7/19 13:51 7/7/19 13:51 0 23 7/7/19 13:51 9/9/19 11:22 0 23 9/9/19 11:22 9/9/19 11:23 0 23 9/9/19 11:23 5/14/20 23:02 0 23 5/14/20 23:02 5/16/20 20:02 0 23 5/16/20 20:02 5/16/20 23:53 0 23 5/16/20 23:53 5/16/20 23:58 0 23 5/16/20 23:58 5/16/20 23:58 0 23 5/16/20 23:58 5/16/20 23:59 0 23 5/16/20 23:59 5/17/20 0:16 0 23 5/17/20 0:16 5/17/20 1:47 0 23 5/17/20 1:47 5/17/20 1:48 0 23 5/17/20 1:48 5/20/20 16:52 0 23 5/20/20 16:52 5/20/20 16:52 0 23 5/20/20 16:52 8/22/20 0:22 0 23 8/22/20 0:22 9/3/20 20:22 0 23 9/3/20 20:22 9/3/20 20:23 0 23 9/3/20 20:23 12/31/99 0:00 6 

I want to perform a query so I only get the point of change of the 'MyValue', like so:

23 7/7/19 13:51 7/7/19 13:51 0 23 9/3/20 20:23 12/31/99 0:00 6 SELECT ID, ValidFrom, ValidTo, MyValue FROM MyTable FOR SYSTEM_TIME ALL WHERE ID = 23 

Gets me the values, but how do I arrive at my desired two-column result?

The following proposed solution does NOT work:

WITH data AS ( SELECT ID, ValidFrom, MyValue, LAG(MyValue, 1) OVER (PARTITION BY ID ORDER BY ValidFrom) prevVal, LEAD(MyValue, 1) OVER (PARTITION BY ID ORDER BY ValidFrom) nextVal FROM MyTable FOR SYSTEM_TIME ALL WHERE ID = 23 ) SELECT ID, ValidFrom, MyValue FROM data WHERE (prevVal IS NOT NULL AND prevVal <> MyValue) OR (nextVal IS NOT NULL AND nextVal <> MyValue) ORDER BY ValidFrom DESC; 

Resulting in the following results:

ID ValidFrom MyValue 23 2020-09-03 20:23:32.23 6 23 2020-09-03 20:22:00.41 0 
2
  • I see that you have changed your question and now returning a logically incorrect result based on your question. Then Aaron Bertrand's would be right. Commented Sep 4, 2020 at 11:22
  • I added another answer, matching your edited question. You were in fact asking for first occurences of MyValue, not where the change occurs as you said in your original post. Then it is a very simple group by. Commented Sep 4, 2020 at 11:35

3 Answers 3

2

Given this sample data:

CREATE TABLE dbo.x(ID int, ValidFrom datetime, ValidTo datetime, MyValue tinyint); INSERT dbo.x VALUES -- notice I inserted the first two rows in a different order (23 ,'7/7/19 13:51','9/9/19 11:22', 0), (23 ,'7/7/19 13:51','7/7/19 13:51', 0), (23 ,'9/9/19 11:22','9/9/19 11:23', 0), (23 ,'9/9/19 11:23','5/14/20 23:02', 0), (23 ,'9/3/20 20:22','9/3/20 20:23', 0), (23 ,'9/3/20 20:23','12/31/99 0:00', 6); 

With the WHERE clause limiting the results to ID = 23, you don't need PARTITION BY, but you will need to add it back if you ever pull back more than one ID. Ordering by ID when you are only pulling back one single value for ID makes zero sense. Here are three approaches, the first two being overly illustrative of the sequence (showing how "is_anchor" is determined), and the last being the most concise:

1.With just lag:

;WITH cte1 AS ( SELECT ID, ValidFrom, ValidTo, MyValue, prev = LAG(MyValue, 1) OVER (/* PARTITION BY ID */ ORDER BY ValidFrom, ValidTo) FROM dbo.x WHERE ID = 23 --MyTable FOR SYSTEM_TIME ALL WHERE ID = 23 ), cte2 AS ( SELECT ID, ValidFrom, ValidTo, MyValue, is_anchor = CASE WHEN prev <> MyValue or prev IS NULL THEN 1 ELSE 0 END FROM cte1 ) SELECT ID, ValidFrom, ValidTo, MyValue FROM cte2 WHERE is_anchor = 1 ORDER BY ID, ValidFrom, ValidTo; 

2.With row_number:

;WITH cte1 AS ( SELECT ID, ValidFrom, ValidTo, MyValue, prev = LAG(MyValue, 1) OVER (/* PARTITION BY ID */ ORDER BY ValidFrom, ValidTo), rn = ROW_NUMBER() OVER (/* PARTITION BY ID */ ORDER BY ValidFrom, ValidTo) FROM dbo.x WHERE ID = 23 --MyTable FOR SYSTEM_TIME ALL WHERE ID = 23 ), cte2 AS ( SELECT ID, ValidFrom, ValidTo, MyValue, is_anchor = CASE WHEN prev <> MyValue OR rn = 1 THEN 1 ELSE 0 END FROM cte1 ) SELECT ID, ValidFrom, ValidTo, MyValue FROM cte2 WHERE is_anchor = 1 ORDER BY ID, ValidFrom, ValidTo; 

3.With one less CTE (woop!):

;WITH cte1 AS ( SELECT ID, ValidFrom, ValidTo, MyValue, prev = LAG(MyValue, 1) OVER (/* PARTITION BY ID */ ORDER BY ValidFrom, ValidTo) FROM dbo.x WHERE ID = 23 --MyTable FOR SYSTEM_TIME ALL WHERE ID = 23 ) SELECT ID, ValidFrom, ValidTo, MyValue FROM cte1 WHERE prev <> MyValue or prev IS NULL ORDER BY ID, ValidFrom, ValidTo; 

In all three cases, the results are:

ID ValidFrom ValidTo MyValue == =================== =================== ======= 23 2019-07-07 13:51:00 2019-07-07 13:51:00 0 23 2020-09-03 20:23:00 1999-12-31 00:00:00 6 

dbfiddle

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, Aaron, this is really neat! Amazing how complex this is to pull off. I did have an overflow issue with your query since 'MyValue' is a tinyint, although converting it to int did the trick.. I guess LAG and LEAD always return int? What if MyValue was a varchar() type? Not asking you to write more queries, but just comment on the data type issue.
@user1054922 I had no idea your source table had a tinyint, this is why we like to see source tables and sample data as I have shown here, create table, insert, even a fiddle, this makes it much easier to help you and not have to make assumptions about your data. The overflow is because I assumed int and used a token value of something that definitely isn't going to fit in a tinyint. You could use 255 in its place, if no value could ever be 255, or you could stick with your cast, or you could pay the price of an additional row_number like in the second answer.
@user1054922 If MyValue wasn't numeric, then you would just use some kind of string comparison to determine a difference. Instead of prev-MyValue <> 0, for example, you would say prev <> MyValue, which actually would also work for numeric values, I was just overthinking it. Redid the fiddle to show this (but still with numerics).
Got it. This has been a good learning experience. I'll also be sure to structure my question in the future as you described. Thank you again!
2

You can use lag() function. ie:

with data as ( SELECT ID, ValidFrom, ValidTo, MyValue, lag(MyValue,1) over (/*Partition by id*/ order by validFrom), prevVal lead(MyValue,1) over (/*Partition by id*/ order by validFrom) nextVal FROM MyTable WHERE ID = 23 ) select ID, ValidFrom, ValidTo, MyValue from data where (prevVal is not null and prevVal <> myValue) OR (nextVal is not null and nextVal <> myValue); 

PS: Probably you would want to order by validFrom.

UPDATE: Edited to order by validFrom as in my suggestion and correcting that 'FOR SYSTEM_TIME', I have no idea where it came from.

Here is DBFiddle Demo

UPDATE: I understood the question something different (where the change occurs - as it was in original question). But in reality what you were asking was very simply the first occurrences of MyValue:

select ID, min(validFrom) ValidFrom, min(validTo) validTo, MyValue from x where ID = 23 group by x.ID,MyValue; 

and here is DBFiddle Demo

12 Comments

To be safe and deterministic, you probably want the order by inside the window functions to be on ValidFrom, ValidTo, not on ID.
Thanks! But if I change 'order by id' inside of each partition by, I get: ID ValidFrom MyValue 23 2020-09-03 20:23:32.23 6 23 2020-09-03 20:22:00.41 0 whereas if I keep it as your original, the start date of July 7th, 2019 is in the correct place.
@AaronBertrand What I mean is that if I modify the query as you mentioned in the comment here, it gives me bad data. Cetin Basoz, Could you please modify your original answer to illustrate what you're talking about?
@AaronBertrand I'm not saying your addition is incorrect... what I'm saying is that if I change PARTITION BY ID ORDER BY ID to PARTITION BY ID ORDER BY VALIDFROM, It does not give me the results that I expect from my original question.
|
2

I got it to work with this query. To adapt to a system versioned table you would need to add FOR SYSTEM_TIME ALL to the query.

Data

CREATE TABLE dbo.x(ID int, ValidFrom datetime, ValidTo datetime, MyValue int); INSERT dbo.x VALUES -- notice I inserted the first two rows in a different order (23 ,'7/7/19 13:51','7/7/19 13:51', 0), (23 ,'7/7/19 13:51','9/9/19 11:22', 0), (23 ,'9/9/19 11:22','9/9/19 11:23', 0), (23 ,'9/9/19 11:23','5/14/20 23:02', 0), (23 ,'5/14/20 23:02','5/16/20 20:02', 0), (23 ,'5/16/20 20:02','5/16/20 23:53', 0), (23 ,'5/16/20 23:53','5/16/20 23:58', 0), (23 ,'5/16/20 23:58','5/16/20 23:58', 0), (23 ,'5/16/20 23:58','5/16/20 23:59', 0), (23 ,'5/16/20 23:59','5/17/20 0:16', 0), (23 ,'5/17/20 0:16','5/17/20 1:47', 0), (23 ,'5/17/20 1:47','5/17/20 1:48', 0), (23 ,'5/17/20 1:48','5/20/20 16:52', 0), (23 ,'5/20/20 16:52','5/20/20 16:52', 0), (23 ,'5/20/20 16:52','8/22/20 0:22', 0), (23 ,'8/22/20 0:22','9/3/20 20:22', 0), (23 ,'9/3/20 20:22','9/3/20 20:23', 0), (23 ,'9/3/20 20:23','12/31/99 0:00', 6); 

Query

;with data_cte as ( SELECT ID, ValidFrom, ValidTo, MyValue, lag(MyValue, 1, -1) over (Partition by id order by ValidFrom, ValidTo) prevVal FROM x WHERE ID = 23 ) select ID, ValidFrom, ValidTo, MyValue from data_cte where prevVal=-1 OR prevVal <> myValue; 

Results

ID ValidFrom ValidTo MyValue 23 2019-07-07 13:51:00.000 2019-07-07 13:51:00.000 0 23 2020-09-03 20:23:00.000 1999-12-31 00:00:00.000 6 

6 Comments

On mobile so can’t check but in your results the second row is the wrong row and not the row the OP expects. Actually first row is the wrong row also.
Yea I see. The first row in the data was not correct. I edited in on your answer too to reflect the OP's data in the question. There's a time part here we're not seeing too. But to get the OP's result is just the ValidTo column of the query no?
I edited the order of the first two rows to demonstrate that order by id is naive and can't be relied upon. It makes the outcome nondeterministic because if all rows have the same id then there is no order. So lag/lead/prevval/nextval are nondeterministic. The answer should get the right rows regardless of what order the data was inserted in - it's about the data in the row, not when it was inserted. Did you try the dbfiddle?
This still doesn't produce the right rows because ordering by id relies on the assumption that ordering by the OP's desired ordering will magically happen based on the insert order (and the lack of new indexes or updated stats or 20 other things that can influence nondeterministic order). This is easy to reproduce with far fewer rows.
You're right. I updated it to make the ORDER BY ValidFrom, ValidTo. Now it's the same as yours but I use the optional default value argument of the LAG function and I don't use ROW_NUMBER or 2 CTE's. Anyway, nicely done Aaron :)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.