0

I want to create a feature similar to sequential segmentation provided by Adobe Analytics (Ref) I am using an external BQ table created on top of clickstream data stored in GCS location in the format of parquet, partitioned on date_key and event_name.

Data schema: date_key, user_id, session_id, event_name

My use case could have been easily solved with the use of MATCH_RECOGNIZE; however, since I am limited to using BigQuery for certain reasons and MATCH_RECOGNIZE is not supported in BigQuery, it has become challenging to find a solution.

I was trying UNNEST, but it is not scaling for larger data Here is the sample query I am using for segment: A then B then C then D, with some operator based conditions in between.

SELECT session_id, ROW_NUMBER() OVER (PARTITION BY session_id) as hit_counter, FROM `table` WHERE date_key >= 20241201 and date_key <= 20241231 AND event_name IN ('A', 'B', 'C', 'D') ), event_mapping AS ( SELECT session_id, ARRAY_AGG(STRUCT(event_name, hit_counter)) AS event_sequence FROM filtered_ds GROUP BY session_id ), combinations AS ( SELECT session_id FROM event_mapping, UNNEST(event_sequence) a, UNNEST(event_sequence) b, UNNEST(event_sequence) c, UNNEST(event_sequence) d, UNNEST(event_sequence) e WHERE a.event_name = 'A' AND b.event_name = 'B' AND c.event_name = 'C' AND d.event_name = 'D' AND e.event_name = 'E' AND b.hit_counter > a.hit_counter c.hit_counter - b.hit_counter BETWEEN 0 AND 10 AND d.hit_counter - c.hit_counter > 15 AND e.hit_counter > d.hit_counter ) SELECT COUNT(DISTINCT session_id) FROM combinations;``` 

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.