4
$\begingroup$

I am using Cox regression to calculate the hazard ratio between two treatment groups, in which (as usual) age is one of the confounding variables. The effect of age is strong and markedly nonlinear. Using a single coefficient for age is therefore not appropriate. One solution is to use age group as a stratification variable. Note: I do not need to estimate the effect of age per-se, or the other stratification variables, and I have a lot of data (typically > 1000 treatment, >100k ref group) so the reduction in estimation efficiency is negligible.

The drawback of using age group strata is that this does not correct fully for the effect of age, since the average age within each stratum will differ between treatment groups with a different age distribution. In practice this is a fairly small effect, but it is a real one that I would rather eliminate if possible.

At this point I probably betray a lack of understanding of how stratification works, but it seems logical to me to use age group as a stratification variable, AND age as a continuous covariate to correct for the age effect within each stratum, on the basis that the effect of age is pretty linear within appropriately sized age bands. Example R code, in case this makes it more obvious:

coxph(SurvObj ~ Treatment + Age + strata(AgeGroup, Gender, otherVar), data=foo.df) 

This must be a common issue, and I am a bit concerned that I have not seen anyone else approach it like this. Can anyone comment on whether this approach is sound? Results look plausible.

$\endgroup$
4
  • $\begingroup$ When you say that the ''effect of age is strong and markedly nonlinear", do you mean that the effect of age changes with age, or that it changes with time? $\endgroup$ Commented May 16, 2017 at 8:25
  • $\begingroup$ The former. Outcome is strongly dependent on age at treatment for young patients, but much less so for older patients, hence a single coefficient for age is not appropriate. I should also have mentioned that there is some difference in the form of the hazard function with age, particularly for patients at the bottom end of the age range. This violation of the PH assumption is one of the main reasons for using age group as a stratification variable, irrespective of the linearity question. $\endgroup$ Commented May 16, 2017 at 13:10
  • $\begingroup$ Have you tried an interaction between age and the treatment of interest? Consider using age and age*age or ln(age) to potentially address the nonlinear effect. $\endgroup$ Commented May 17, 2017 at 3:12
  • 1
    $\begingroup$ Interaction between age and treatment is small in this case, and can be neglected. I could transform age to make the effect more linear, though a ln() transform only partly addresses this, and I want to keep age group as a stratification variable anyway because of non-PH. I will try to get my head around age*age, though again, I am not trying to remove age group as a stratification variable, because of non-PH. The original question still stands. Is it appropriate to use age as a continuous covariate in conjunction with age group strata to correct for residual age effect within strata? $\endgroup$ Commented May 18, 2017 at 10:18

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.