Recurrent event analysis

Question

I want to model patient visits. My assumptions are:

Patients visit the hospital until they stop visiting at all. I don't know if their last visit was the last one.
Patients visit at certain intervals. These intervals vary across patients, but are roughly the same for a patient.

I've been searching and reading all day long and my head is about to explode, so I decided to ask what are the ways to approach this.

My best guess is some sort of survival analysis and it looks like survival regression supports recurring events. The problem is that there are multiple ways to do this and I don't know which one to use.

What I'm trying to get out of the model:

Probability the patient return at all, given time elapsed from his last visit.
The number of visits a patient will make over a period T.

Can I model this with one regression? Which one?

If you have multiple events per patient you could consider Poisson regression — Giuseppe Biondi-Zoccai
– Giuseppe Biondi-Zoccai, Commented Apr 21, 2016 at 20:34
Interesting questions. What is the distribution of counts for the patient visits? Are you working with panel data? — Marquis de Carabas
– Marquis de Carabas, Commented Apr 22, 2016 at 4:03
The distribution looks Poisson, but sometimes (I want to create models more than 1 hospital) it is overdispersed - negbin. re: Panel data - I have more than 1 observation per individual, so the answer is yes? — datahurts
– datahurts, Commented Apr 22, 2016 at 6:25

Theodor · Accepted Answer · 2016-04-22 19:27:20Z

The main decision to be made is about the time scale that you plan on using. Is it time since the origin of the recurrent event process (like some diagnosis or some intervention, or birth) or is it time between two visits? These two approaches are called calendar time and gap time. As a general rule, what answers one of them does not necessarily answer the other one.

For calendar time, the canonical framework is like this: each individual has a counting process $N(t)$ which denotes the number of visits up to time $t$. The intensity of this process is denoted as $\lambda(t)$ and it has to satisfy certain conditions, but for the most part you can treat it as the hazard function from classical survival analysis. What is commonly done is that you take a form such as $\lambda(t | x_i) = \lambda_0(t) e^{\beta' x_i}$. In this case $t$ is time since origin. For an individual with events at $t_1 < t_2 < ... t_n$ and a followup until $\tau > t_n$, you can estimate a model like that in $R$ by taking

coxph(Surv(tstart, tstop, status) ~ x) with tstart = c(0, t_1 ... t_n), tstop = c(t_1, ... t_n, tau) and status = c(1, 1, 1 ... 0) and status is 1 for when tstop corresponds to an event and tau corresponds to the end of follow-up. You should also use a +cluster(id) specification in the formula to get the correct standard errors.

The model implies that the number of events in a given period $(t_a, t_b)$ is Poisson distributed with expectation $\int_{t_a}^{t_b} \lambda(t)dt$, analogous to the cumulative hazard. So in some sense it is really easy to answer this kind of questions.

The cool thing is that you can easily incorporate random effects called frailty with the specification +frailty(id). This is used quite often as a variance reduction technique. You can also use it for prediction although that might require a bit more work and thinking.

The second option is to use gap time scale. The gaps would be defined as tstop - tstart from the calendar time case. In this case the model would be something like $\phi(w | x_i) = \phi_0(w) e^{\beta'x_i}$ where $w$ is time since the previous visit. If the gaps for an individual are gaps = c(w_1 ... w_n) then you would fit this as coxph(Surv(gaps, status) ~ x) again with a +cluster(id) and the status variable as before. Essentially this is the exact same problem as clustered survival data.

The model implies that the individual gets "restarted" after every visit. Hence, it's really easy to predict "survival" probabilities (which would be the probability not to get another visit in some time). You can do some math to see that given that an individual "survived" up to some time, what is the survival conditional on that.

Here you can use the +frailty option to account for individual heterogeneity as well.

It is common to use gap times and include some covariate like previous number of events (or log(previous number of events)), or stratify on the previous number of events so that you get different $\phi_0$'s for the time between the first and the second visit, second and third, etc. Here however you should be able to really defend you choices, as you are de facto altering the time scale of the model.

In general, with the calendar time approach you have quick access to the distribution of the events in a certain time window, and with the gap time scale you have easy access to the distribution of the gaps themselves, but you can not easily get the best of both worlds, at least not in nice closed forms (you can achieve many things by simulation though). The only time when the two coincide is when $\lambda_0$ (or $\phi_0$) is constant. In that case, the number of events in any time interval is Poisson (with the same expectation) and the gaps follow an exponential distribution. This is usually seen as a very strong assumption.

A book that is arguably the best introduction to the analysis of the recurrent events is this one. The authors there emphasise that the decision between which time scale to use may depend a lot on the problem at hand. Their general point seems to be that for incident events, that do not alter the process itself, calendar time is the most useful (like warranty repairs on cars, myocardial infarctions). On the other hand, gap time scales are most useful when you want to predict the time to the next event, and is most natural when at every event the unit of interest has some intervention (like a car repair, or some transplant).

Thank you for the thorough reply! I did some more reading since asking the question and discovered that I should be looking at the two models you propose and your response cleared any doubts I had. One question that still lingers is the distribution for the counting process. From my research (and your reply confirms it), the model assumes Poisson, but my data shows overdispersion which means a negative binomial distribution which looks to me like a mixture of Poisson distributions. I did not see any way to specify this distribution in the survival package. How can I tackle this? — datahurts
– datahurts, Commented Apr 23, 2016 at 9:20
Since you also need to deal with death as an absorbing state, a multi-state transition model may be appropriate. It easily handles multiple hospitalizations and is intended to estimate transition probabilities (or transition odds ratios) and mean time in hospital/expected number of hospitalizations. End of follow-up (censoring) is handled by terminating the longitudinal patient status records. More here. — Frank Harrell
– Frank Harrell, Commented Apr 24 at 14:27

Stack Exchange Network

Recurrent event analysis

1 Answer 1

Linked

Hot Network Questions

Recurrent event analysis

1 Answer 1

Linked

Related

Hot Network Questions