Skip to main content
2 of 2
edited tags
David
  • 63
  • 2

How to handle irrelevant categorical variables in aggregated data?

I’m working with ad server data where I can’t get user-level data — only aggregated reports. The data is aggregated on multiple categorical dimensions (e.g., day × product × medium × source × campaign × format), and metrics like conversion, impression, and cost are sums over those dimensions. My goal is to predict conversion.

I have two main questions:

a) Should I always pull the data at the most granular aggregation level possible? (including all categories like format, source, campaign)

b) If a categorical variable (e.g., format) is irrelevant for the model, is it better to remove that variable by re-aggregating the data at a higher level (dropping that dimension), or keep the original aggregation level but simply exclude the variable from the model?

Thanks for any insights!

David
  • 63
  • 2