Shifting start and end time of a row, when doing survival/recurrent event analysis: Why does it affect the model estimate?

Question

I have the following dataset:

data_1 <- data.frame( id = c(1, 1, 2, 2, 3, 3), start = c(0, 5, 0, 3, 0, 4), stop = c(5, 10, 3, 7, 4, 8), event = c(1, 0, 1, 1, 0, 1), covariate = factor(c("A", "A", "B", "B", "A", "B")) )

using this code:

cox_model <- coxph(Surv(start, stop, event) ~ covariate + cluster(id), data = data_1)

my covaraiteB estimate is 2.732.

then in the second row, I add two days to both start and end:

data_2 <- data.frame( id = c(1, 1, 2, 2, 3, 3), start = c(0, 7, 0, 3, 0, 4), stop = c(5, 12, 3, 7, 4, 8), event = c(1, 0, 1, 1, 0, 1), covariate = factor(c("A", "A", "B", "B", "A", "B")) )

again, fit the model:

cox_model_2 <- coxph(Surv(start, stop, event) ~ covariate + cluster(id), data = data_2)

and my covaraiteB estimate is now 2.109.

Why does this happen? I just added two days to both start and end of one of the rows, it shouldn't change anything, so why does it affect model estimates?

And if I want the model to treat data_2 similar to data_1, is there a way to do it without transforming data_2 to data_1 (i.e., by adding something in the formula statement)?

Almost sure my question is answered here: stats.stackexchange.com/a/208852/349607 (In my case, I should use the gap time scale) — Farzin Shamloo
– Farzin Shamloo, Commented Mar 31 at 17:55

Alex J · Accepted Answer · 2025-04-01 02:35:26Z

It's because the events you've defined are different, in the two coxph calls

You can see this by comparing the Surv objects from the two data sets:

> with(data_1, Surv(start, stop, event)) [1] (0, 5] (5,10+] (0, 3] (3, 7] (0, 4+] (4, 8] > with(data_2, Surv(start, stop, event)) [1] (0, 5] (7,12+] (0, 3] (3, 7] (0, 4+] (4, 8]

In the first, the second row's event occurs sometime after 10 days; in the second, it occurs sometime after 12 days (I am not 100% sure that's the right interpretation, but the gist is that the two Surv objects are different, so you get different models).

I don't know what start and stop mean in your study, so I don't know what "adding two" means. I suggest you read what the type argument of ?Surv does, and see how that relates to your study. You may need to use a different type, or redefine start and stop (e.g. subtract start from each), etc. - depending on what the variables mean.

David L Thiessen · Accepted Answer · 2025-04-01 04:37:36Z

The reason for the difference is that in Data1, ID1 is at risk between 5 and 10. In particular, in Data1, ID1 is at risk at Time 7, when ID2 experiences an event. This means that ID1 contributes to the risk set at Time 7, which affects the denominator of the Cox estimating equations. In Data2, ID1 is not at risk when ID2 experiences the event at Time 7.

Remember that the parameter estimates in the Cox model only depend on who was at risk at every event time, not on the actual times themselves. EG, multiplying all times by 100 or adding 100 to every start and stop time gives the same parameter estimates. But changing the relative ordering of the events or which observations were under observation at event times will change the estimate. You can see by experimentation that changing the second row's start time to anything in [5,7) results in the same estimate. Similarly, changing the second row's start time to anything in [7,8) results in the same estimate.

I don't understand why you think that changing the data shouldn't change the parameter estimates, or why you think that the two datasets should be the same. You'll probably need to ask another question with much more detail about what you're trying to do for this to make sense.

Thanks for the response! Regarding your question: "I don't understand why you think that changing the data shouldn't change the parameter estimates": Because in both datasets, we have 10 total units of time for id=1, and in both of them, they experienced an event after 5 units of time, and for the next 5 units of time, they had no events. I got my answer in the link I commented, I should simply code my formula as coxph(Surv(gaps, event) ~ ...) instead of coxph(Surv(start, stop, event) ~ ...). They are modelling different things, and what I wanted to get is captured by the former formulation — Farzin Shamloo
– Farzin Shamloo, Commented Apr 2 at 0:02

Stack Exchange Network

Shifting start and end time of a row, when doing survival/recurrent event analysis: Why does it affect the model estimate?

2 Answers 2

Linked

Hot Network Questions

Shifting start and end time of a row, when doing survival/recurrent event analysis: Why does it affect the model estimate?

2 Answers 2

Linked

Related

Hot Network Questions