I’m working with ad server data where I can’t get user-level data — only aggregated reports. The data is aggregated on multiple categorical dimensions (e.g., day × product × medium × source × campaign × format), and metrics like conversion, impression, and cost are sums over those dimensions. My goal is to predict conversion.
I have two main questions:
a) Should I always pull the data at the most granular aggregation level possible? (including all categories like format, source, campaign)
b) If a categorical variable (e.g., format) is irrelevant for the model, is it better to remove that variable by re-aggregating the data at a higher level (dropping that dimension), or keep the original aggregation level but simply exclude the variable from the model?
Thanks for any insights!