Return to Revisions

2 of 2

edited tags

edited Aug 9 at 17:51

How to handle irrelevant categorical variables in aggregated data?

I’m working with ad server data where I can’t get user-level data — only aggregated reports. The data is aggregated on multiple categorical dimensions (e.g., day × product × medium × source × campaign × format), and metrics like conversion, impression, and cost are sums over those dimensions. My goal is to predict conversion.

I have two main questions:

a) Should I always pull the data at the most granular aggregation level possible? (including all categories like format, source, campaign)

b) If a categorical variable (e.g., format) is irrelevant for the model, is it better to remove that variable by re-aggregating the data at a higher level (dropping that dimension), or keep the original aggregation level but simply exclude the variable from the model?

Thanks for any insights!

asked Aug 9 at 13:49

David