2

I have a postgresql table which has events recorded by date/time. The table has the columns id, event and timestamp.

My output has to be something like this:

'Day', '1st Timers', '2nd Timers', '3rd Timers', '3+ Timers' 

1st timers are all ids that have done the event for the first time. 2nd timers are all ids that have done the event for the second time. etc. etc.

Is this possible using a single SQL query?

edit: Sample data and output as per request

user_id date event 1 09/03/15 14:08 opened 2 10/03/15 14:08 opened 1 11/03/15 14:08 opened 4 14/03/15 14:08 opened 1 15/03/15 14:08 opened 5 16/03/15 14:08 opened 1 17/03/15 14:08 opened 4 17/03/15 14:08 opened 6 18/03/15 14:08 opened 1 18/03/15 14:08 opened 6 18/03/15 14:08 other Output (for event=opened) date 1time 2times 3times 4times 5times 09/03/15 1 0 0 0 0 10/03/15 1 0 0 0 0 11/03/15 0 1 0 0 0 14/03/15 1 0 0 0 0 15/03/15 0 0 1 0 0 16/03/15 1 0 0 0 0 17/03/15 0 1 0 1 0 18/03/15 1 0 0 0 1 
3
  • 3
    Can you provide sample table data and expected output? Commented Apr 9, 2015 at 11:15
  • 1
    As always, your version of Postgres, please. It's relevant for the best solution. Commented Apr 9, 2015 at 12:34
  • If a user does an event two times on his/her first day, does (s)he count as "1st-timer" and "2nd-timer"? Commented Apr 9, 2015 at 12:42

2 Answers 2

4

For each date, you seem to want to count the number of users that hit 1 time, 2 times, and so on. I see this as a row_number() followed by conditional aggregation:

select thedate, sum(case when seqnum = 1 then 1 else 0 end) as time_1, sum(case when seqnum = 2 then 1 else 0 end) as time_2, sum(case when seqnum = 3 then 1 else 0 end) as time_3, sum(case when seqnum = 4 then 1 else 0 end) as time_4, sum(case when seqnum = 5 then 1 else 0 end) as time_5 from (select t.*, date_trunc('day', date) as thedate row_number() over (partition by user_id order by date_trunc('day', date)) as seqnum from table t where event = 'opened' ) t group by thedate order by thedate; 
Sign up to request clarification or add additional context in comments.

4 Comments

Clever use of case an sum with window functions
Awesome! Seems to be almost working.. The output I get is this pastebin.com/UyDYQ2pr. I believe the correct output for column2 is (column2 - column1). Some slight tweak is required, but not able to pin point.
@Anoop . . . I think you just want where event = 'opened'.
Correct. I figured it out. Accepting the answer.
2

Aggregate FILTER

Starting with Postgres 9.4 use the new aggregate FILTER clause:

SELECT event_time::date , count(*) FILTER (WHERE rn = 1) AS times_1 , count(*) FILTER (WHERE rn = 2) AS times_2 , count(*) FILTER (WHERE rn = 3) AS times_3 -- etc. from ( SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY event_time) AS rn FROM tbl ) t GROUP BY 1 ORDER BY 1; 

Related:

About the cast event_time::date:

Crosstab

Or use an actual crosstab query (faster). Available for any modern Postgres version. Read this first:

SELECT * FROM crosstab( 'SELECT event_time::date, rn, count(*)::int AS ct FROM ( SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY event_time) AS rn FROM tbl ) t GROUP BY 1, 2 ORDER BY 1' ,$$SELECT * FROM unnest ('{1,2,3}'::int[])$$ ) AS ct (day date, times_1 int, times_2 int, times_3 int); 

4 Comments

Thanks will try. That was just sample data I quickly cooked up, the actual field name is 'event_time'.
I'm using Amazon RedShift. I believe the crosstab and filter are not supported. :(
@Anoop: I believe that is something you should have told us up front. Redshift is not Postgres. I did ask for the version, too ...
Sorry about that. Very new to RedShift. I believed that the underlying DB was postgres without any differences.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.