2
$\begingroup$

Assuming a randomized experiment, with the randomization stratified on a discretized version of a continuous baseline covariate (e.g. age groups, cutoff of a clinical score).

We know that stratification variables should always be included in the analysis (e.g. Adjusting for stratification variables)

This leaves me with three options:

  1. Include only the discretized covariate (e.g. age group) in the analysis. This follows the guidelines to the letter but throws away information due to discretization.
  2. Include only the original continuous covariate (e.g. age in years) in the analysis. This keeps all the information, but now there is a mismatch between the stratification and randomization
  3. Include both the discretized AND the continuous covariate. This seems like a safe option, but might result in loss of precision when the sample is small and the relationship between the outcome and the covariate is roughly linear (e.g. the discretized covariate will have on average 0 coefficient).

I tried looking into the literature on this, but I couldn't find a keyword to filter out all the papers that just say you should include stratification

A somewhat similar question for the Cox model has no answer: Cox regression: age covariate within age group strata

$\endgroup$

1 Answer 1

3
$\begingroup$

This seems to be well covered by two papers:

My TLDR is:

  1. The continuous predictor should always be included.
  2. Adding the discretized/categorized version of the predictor can be thought of as a crude way to handle possible non-linearity and might be somewhat beneficial, but is not required. However, if non-linearity is a concern more flexible models (e.g. splines) for the covariate might be a better solution (and will also correspond to stratification)
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.