Assuming a randomized experiment, with the randomization stratified on a discretized version of a continuous baseline covariate (e.g. age groups, cutoff of a clinical score).
We know that stratification variables should always be included in the analysis (e.g. Adjusting for stratification variables)
This leaves me with three options:
- Include only the discretized covariate (e.g. age group) in the analysis. This follows the guidelines to the letter but throws away information due to discretization.
- Include only the original continuous covariate (e.g. age in years) in the analysis. This keeps all the information, but now there is a mismatch between the stratification and randomization
- Include both the discretized AND the continuous covariate. This seems like a safe option, but might result in loss of precision when the sample is small and the relationship between the outcome and the covariate is roughly linear (e.g. the discretized covariate will have on average 0 coefficient).
I tried looking into the literature on this, but I couldn't find a keyword to filter out all the papers that just say you should include stratification
A somewhat similar question for the Cox model has no answer: Cox regression: age covariate within age group strata