0

Given the following table structure:

CAT NR DATE VERSION SOME_CODE
ABC 123 2009-02-19T00:00:00Z 1 OPP
ABC 456 2009-03-18T00:00:00Z 1 ZUM
ABC 444 2009-03-18T00:00:00Z 1 ZUM
ABC 456 2009-04-18T00:00:00Z 2 XXX
ABC 456 2009-04-18T00:00:00Z 3 XXX
ABC 456 2009-04-18T00:00:00Z 4 UIO
ABC 456 2009-05-18T00:00:00Z 5 RQA
ABC 444 2009-03-18T00:00:00Z 2 MUZ
DEF 789 2018-02-17T00:00:00Z 1 WER
DEF 637 2018-02-16T00:00:00Z 1 FAW
GHI 248 2018-02-17T00:00:00Z 1 QWE
GHI 248 2019-02-17T00:00:00Z 2 PPP
GHI 357 2020-02-16T00:00:00Z 1 FFF
GHI 357 2020-02-16T00:00:00Z 2 GGG
GHI 357 2020-02-16T00:00:00Z 3 LLL
GHI 357 2020-02-16T00:00:00Z 4 LLL
GHI 357 2020-08-16T00:00:00Z 4 FFF
GHI 357 2020-10-16T00:00:00Z 5 ZZZ
GHI 420 2020-02-16T00:00:00Z 1 QDS

I need to find out how many effective changes took place in column SOME_CODE (sometimes values are the same and therefore they should not count) per month and category (CAT).

So expected SQL query result should like this:

CAT MONTH SOME_CODE_CHANGES
ABC APR 2009 1
ABC MAR 2009 1
GHI FEB 2020 2

Does anybody have a hint how to start the query?

Within a category (e.g. ABC) I want to count number of changes in column SOME_CODE. When a changed happened (e.g. for NR=456) say from VERSION 1 to 2 ("ZUM" -> "XXX") I wanna count it but with two conditions:

  • must be a change within a month
  • must be an effective change. Means the change from VERSION 3 to VERSION 4 is a valid change ("XXX" -> "UIO") while the change from VERSION 2 to VERSION 3 is nonesense ("XXX" -> "XXX") and should not be counted

My approach:

select trunc(DATE, 'MON'), CAT, sum(case when _MY_CONDITION_ then 1 else 0 end) counter from table where date between TO_TIMESTAMP('01/06/2018 00:00:00.000000000', 'DD/MM/YYYY HH24:MI:SS.FF') and TO_TIMESTAMP('31/07/2018 00:00:00.000000000', 'DD/MM/YYYY HH24:MI:SS.FF') group by trunc(DATE, 'MON'), CAT order by trunc(DATE, 'MON') 
5
  • Please explain how these numbers are calculated. Why are you choosing only two months for the results? Commented Mar 10, 2021 at 12:29
  • What should be the output if a value changes from ABC to DEF and then back to ABC? Is that one or two changes? Commented Mar 10, 2021 at 12:33
  • @MT0 then it should count as twice changes (if within a month) Commented Mar 10, 2021 at 12:47
  • @GordonLinoff I added some more hints Commented Mar 10, 2021 at 12:47
  • 1. In your data you have three rows with the same CAT, same DATE and same SOME_CODE. How do you order such rows? "By date" doesn't suffice, since they have exactly the same timestamp. And, of course, different order may result in different counts of "changes". 2. If the value changes from 'ZUM' on February 26 to 'ZAM' on March 2, is that a "change" you count for March? In almost any sane business use case the answer should be "yes" - but your explanation is confusing enough that MT0 assumed "no" for his answer. Please clarify. Commented Mar 11, 2021 at 5:18

2 Answers 2

2

From Oracle 12, you can use MATCH_RECOGNIZE:

SELECT cat, month, COUNT(*) FROM ( SELECT t.*, TRUNC( "DATE", 'MM' ) AS month FROM table_name t ) MATCH_RECOGNIZE( PARTITION BY cat, month ORDER BY "DATE", version ONE ROW PER MATCH AFTER MATCH SKIP TO LAST change_code PATTERN ( strt change_code ) DEFINE change_code AS change_code.some_code <> strt.some_code ) GROUP BY cat, month 

Which, for the sample data:

CREATE TABLE table_name ( CAT, NR, "DATE", VERSION, SOME_CODE ) AS SELECT 'ABC', 123, TIMESTAMP '2009-02-19 00:00:00 UTC', 1, 'OPP' FROM DUAL UNION ALL SELECT 'ABC', 456, TIMESTAMP '2009-03-18 00:00:00 UTC', 1, 'ZUM' FROM DUAL UNION ALL SELECT 'ABC', 444, TIMESTAMP '2009-03-18 00:00:00 UTC', 1, 'ZUM' FROM DUAL UNION ALL SELECT 'ABC', 444, TIMESTAMP '2009-03-18 00:00:00 UTC', 2, 'MUZ' FROM DUAL UNION ALL SELECT 'ABC', 456, TIMESTAMP '2009-04-18 00:00:00 UTC', 2, 'XXX' FROM DUAL UNION ALL SELECT 'ABC', 456, TIMESTAMP '2009-04-18 00:00:00 UTC', 3, 'XXX' FROM DUAL UNION ALL SELECT 'ABC', 456, TIMESTAMP '2009-04-18 00:00:00 UTC', 4, 'UIO' FROM DUAL UNION ALL SELECT 'ABC', 456, TIMESTAMP '2009-05-18 00:00:00 UTC', 5, 'RQA' FROM DUAL UNION ALL SELECT 'DEF', 637, TIMESTAMP '2018-02-16 00:00:00 UTC', 1, 'FAW' FROM DUAL UNION ALL SELECT 'DEF', 789, TIMESTAMP '2018-02-17 00:00:00 UTC', 1, 'WER' FROM DUAL UNION ALL SELECT 'GHI', 248, TIMESTAMP '2018-02-17 00:00:00 UTC', 1, 'QWE' FROM DUAL UNION ALL SELECT 'GHI', 248, TIMESTAMP '2019-02-17 00:00:00 UTC', 2, 'PPP' FROM DUAL UNION ALL SELECT 'GHI', 357, TIMESTAMP '2020-02-16 00:00:00 UTC', 1, 'FFF' FROM DUAL UNION ALL SELECT 'GHI', 420, TIMESTAMP '2020-02-16 00:00:00 UTC', 1, 'QDS' FROM DUAL UNION ALL SELECT 'GHI', 357, TIMESTAMP '2020-02-16 00:00:00 UTC', 2, 'GGG' FROM DUAL UNION ALL SELECT 'GHI', 357, TIMESTAMP '2020-02-16 00:00:00 UTC', 3, 'LLL' FROM DUAL UNION ALL SELECT 'GHI', 357, TIMESTAMP '2020-02-16 00:00:00 UTC', 4, 'LLL' FROM DUAL UNION ALL SELECT 'GHI', 357, TIMESTAMP '2020-08-16 00:00:00 UTC', 4, 'FFF' FROM DUAL UNION ALL SELECT 'GHI', 357, TIMESTAMP '2020-10-16 00:00:00 UTC', 5, 'ZZZ' FROM DUAL 

Outputs:

 CAT | MONTH | COUNT(*) :-- | :-------- | -------: ABC | 01-MAR-09 | 1 ABC | 01-APR-09 | 1 DEF | 01-FEB-18 | 1 GHI | 01-FEB-20 | 3 

If you want to see the changes then you can use:

SELECT * FROM ( SELECT t.*, TRUNC( "DATE", 'MM' ) AS month FROM table_name t ) MATCH_RECOGNIZE( PARTITION BY cat, month ORDER BY "DATE", version MEASURES MATCH_NUMBER() AS mn, FIRST( some_code ) AS change_from, LAST( some_code ) AS change_to ONE ROW PER MATCH AFTER MATCH SKIP TO LAST change_code PATTERN ( strt change_code ) DEFINE change_code AS change_code.some_code <> strt.some_code ) 

Which outputs:

 CAT | MONTH | MN | CHANGE_FROM | CHANGE_TO :-- | :-------- | -: | :---------- | :-------- ABC | 01-MAR-09 | 1 | ZUM | MUZ ABC | 01-APR-09 | 1 | XXX | UIO DEF | 01-FEB-18 | 1 | FAW | WER GHI | 01-FEB-20 | 1 | FFF | QDS GHI | 01-FEB-20 | 2 | QDS | GGG GHI | 01-FEB-20 | 3 | GGG | LLL 

db<>fiddle here


If your requirement for "within a month" is that you want changes where there is at most one month's difference between the previous row to the changed row, even if the rows are in two different calendar months, (rather than just the changes that happen in the same calendar month) then you can use:

SELECT cat, TRUNC( change_date, 'MM' ) AS month, COUNT(*) FROM table_name MATCH_RECOGNIZE( PARTITION BY cat ORDER BY "DATE", version MEASURES LAST( "DATE" ) AS change_date ONE ROW PER MATCH AFTER MATCH SKIP TO LAST change_code PATTERN ( strt change_code ) DEFINE change_code AS ( change_code.some_code <> strt.some_code AND MONTHS_BETWEEN( change_code."DATE", strt."DATE" ) <= 1 ) ) GROUP BY cat, TRUNC( change_date, 'MM' ) 

Which outputs:

 CAT | MONTH | COUNT(*) :-- | :-------- | -------: ABC | 01-MAR-09 | 2 ABC | 01-MAY-09 | 1 ABC | 01-APR-09 | 2 DEF | 01-FEB-18 | 1 GHI | 01-FEB-20 | 3 

db<>fiddle here

Sign up to request clarification or add additional context in comments.

4 Comments

This can't be right, for a couple of reasons. First, you don't ever check change across month boundary (change in March, compared to last value from February). Second, the data in the OP's sample has duplicates by (CAT, "DATE"), so the problem is not even well-defined. The OP needs to clarify that point. In any case, if you change the very first row to have ZUM as CHANGE_CODE, your query still reports 1 change in March 2009, which can't be right - there are no changes that month.
@mathguy There is a change in March 2009 (from the 3rd to the 4th row of my data, its the 3rd and 8th row in the OP's data) and the OP only asks for "changes within a month" and this looks for changes that are within each calendar month.
Oh - I was wondering why I was still getting one change for that month (I didn't inspect the data closely). In any case - I believe if there are different values on Feb. 27 and then on March 5, then the change happened "within" the month. The OP needs to clarify that too (in addition to what to do when there are duplicate timestamps for a given CAT).
@MT0 fantastic! solved my problems very well - now I can start to play around - thank you and all other phenomenal inputs!
0

I think you want one less than the number of distinct codes for each cat in each month:

select cat, trunc(date, 'MON') as yyyymm, count(distinct code) - 1 from t group by cat, trunc(date, 'MON'); 

From what I can tell, this matches the results you specify.

2 Comments

But this is not correct if the change was like @MT0 told, true? If the change was from ABC to DEF and back to ABC your query would count 1 change - whereby there were effectively 2.
Second, distinct is not sufficient because a change relates only to a NR within a CAT.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.