How to get the count of distinct IDs each month which are different from the previous month?

Question

I am trying to count the number of unique IDs in a table for each month. But the catch is that the count of IDs for each month should only include IDs which were not present in the previous month

I am trying to write an SQL query which will work in google BigQuery but so far I have only figured out how to get the count of distinct IDs for each month. I am not able to figure out how to get the condition for the IDs not being present in the previous month.

For e.g. I have a table like below tbl1:

time_stamp | ID | col3 | col4 ------------------------------- 2019-06-10 | 1 | 10 | 20 2019-06-10 | 2 | 11 | 21 2019-06-10 | 3 | 12 | 22 2019-07-10 | 2 | 11 | 21 2019-07-10 | 4 | 13 | 23 2019-08-10 | 4 | 13 | 23 2019-08-10 | 5 | 14 | 24 2019-09-10 | 5 | 14 | 24 2019-09-10 | 6 | 15 | 25

Expected Output

time_stamp | count -------------------- 2019-06-10 | 3 2019-07-10 | 1 2019-08-10 | 1 2019-09-10 | 1

stackoverflow.com/editing-help

jarlh
– jarlh

2019-10-24 20:10:40 +00:00
Commented Oct 24, 2019 at 20:10 — jarlh
– jarlh, Commented Oct 24, 2019 at 20:10

Mikhail Berlyant · Accepted Answer · 2019-10-24 20:54:08Z

Update

I realized - you asked for count of IDs for each month should only include IDs which were not present in theprevious month - not in previous months, but month

Below is solution for it

#standardSQL SELECT month, COUNT(1) users FROM ( SELECT *, IFNULL(DATE_DIFF(month, LAG(month) OVER(PARTITION BY ID ORDER BY month), MONTH), 0) != 1 qualified FROM ( SELECT DISTINCT DATE_TRUNC(time_stamp, MONTH) month, ID FROM `project.dataset.table` ) ) WHERE qualified GROUP BY month

you can test, play with it using below sample data

#standardSQL WITH `project.dataset.table` AS ( SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL SELECT '2019-06-10', 2, 11, 21 UNION ALL SELECT '2019-06-10', 3, 12, 22 UNION ALL SELECT '2019-06-11', 3, 12, 22 UNION ALL SELECT '2019-07-10', 2, 11, 21 UNION ALL SELECT '2019-07-10', 4, 13, 23 UNION ALL SELECT '2019-08-10', 1, 13, 23 UNION ALL SELECT '2019-08-10', 4, 13, 23 UNION ALL SELECT '2019-08-10', 5, 14, 24 UNION ALL SELECT '2019-09-10', 5, 14, 24 UNION ALL SELECT '2019-09-10', 6, 15, 25 ) SELECT month, COUNT(1) users FROM ( SELECT *, IFNULL(DATE_DIFF(month, LAG(month) OVER(PARTITION BY ID ORDER BY month), MONTH), 0) != 1 qualified FROM ( SELECT DISTINCT DATE_TRUNC(time_stamp, MONTH) month, ID FROM `project.dataset.table` ) ) WHERE qualified GROUP BY month -- ORDER BY month

with result

Row month users 1 2019-06-01 3 2 2019-07-01 1 3 2019-08-01 2 4 2019-09-01 1

Hope, this time it is what you asked!

Initial answer Below is for BigQuery Standard SQL and returns count of users which are not presented in prev months

#standardSQL SELECT time_stamp, COUNT(1) `count` FROM ( SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry FROM `project.dataset.table` ) WHERE first_entry GROUP BY time_stamp

if to apply to sample data from your question - output is

Row time_stamp count 1 2019-06-10 3 2 2019-07-10 1 3 2019-08-10 1 4 2019-09-10 1

You can test, play with it using below example

#standardSQL WITH `project.dataset.table` AS ( SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL SELECT '2019-06-10', 2, 11, 21 UNION ALL SELECT '2019-06-10', 3, 12, 22 UNION ALL SELECT '2019-07-10', 2, 11, 21 UNION ALL SELECT '2019-07-10', 4, 13, 23 UNION ALL SELECT '2019-08-10', 4, 13, 23 UNION ALL SELECT '2019-08-10', 5, 14, 24 UNION ALL SELECT '2019-09-10', 5, 14, 24 UNION ALL SELECT '2019-09-10', 6, 15, 25 ) SELECT time_stamp, COUNT(1) `count` FROM ( SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry FROM `project.dataset.table` ) WHERE first_entry GROUP BY time_stamp -- ORDER BY time_stamp

In case if you need to group by month vs. by date (it is not clear from your question)

#standardSQL WITH `project.dataset.table` AS ( SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL SELECT '2019-06-11', 2, 11, 21 UNION ALL SELECT '2019-06-12', 3, 12, 22 UNION ALL SELECT '2019-07-10', 2, 11, 21 UNION ALL SELECT '2019-07-11', 4, 13, 23 UNION ALL SELECT '2019-08-10', 4, 13, 23 UNION ALL SELECT '2019-08-12', 5, 14, 24 UNION ALL SELECT '2019-09-10', 5, 14, 24 UNION ALL SELECT '2019-09-13', 6, 15, 25 ) SELECT DATE_TRUNC(time_stamp, MONTH) month, COUNT(1) `count` FROM ( SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry FROM `project.dataset.table` ) WHERE first_entry GROUP BY month -- ORDER BY month

above returns monthly users excluding those which were present in previous months

Row month count 1 2019-06-01 3 2 2019-07-01 1 3 2019-08-01 1 4 2019-09-01 1

Gordon Linoff · Accepted Answer · 2019-10-24 20:22:36Z

1

You can use two levels of aggregation:

select yyyymm, count(*) from (select id, date_trunc(min(time_stamp), month) as yyyymm from tbl1 group by id ) t group by yyyymm order by yyyymm;

answered Oct 24, 2019 at 20:22

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

1 Comment

J-ho Over a year ago

works only on never version of MSSQL. Starting with SQL Server 2022 (16.x). learn.microsoft.com/en-us/sql/t-sql/functions/…

Collectives™ on Stack Overflow

How to get the count of distinct IDs each month which are different from the previous month?

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related