I am using Cox regression to calculate the hazard ratio between two treatment groups, in which (as usual) age is one of the confounding variables. The effect of age is strong and markedly nonlinear. Using a single coefficient for age is therefore not appropriate. One solution is to use age group as a stratification variable. Note: I do not need to estimate the effect of age per-se, or the other stratification variables, and I have a lot of data (typically > 1000 treatment, >100k ref group) so the reduction in estimation efficiency is negligible.
The drawback of using age group strata is that this does not correct fully for the effect of age, since the average age within each stratum will differ between treatment groups with a different age distribution. In practice this is a fairly small effect, but it is a real one that I would rather eliminate if possible.
At this point I probably betray a lack of understanding of how stratification works, but it seems logical to me to use age group as a stratification variable, AND age as a continuous covariate to correct for the age effect within each stratum, on the basis that the effect of age is pretty linear within appropriately sized age bands. Example R code, in case this makes it more obvious:
coxph(SurvObj ~ Treatment + Age + strata(AgeGroup, Gender, otherVar), data=foo.df) This must be a common issue, and I am a bit concerned that I have not seen anyone else approach it like this. Can anyone comment on whether this approach is sound? Results look plausible.