5
$\begingroup$

I'm building a churn prediction model to estimate how long users will remain active in an application. I plan to use survival analysis because it handles censoring and provides time-to-event probabilities. What is the correct way to define the event indicator (churned vs. censored) given a training data cutoff date?

Approach 1 (Retrospective):

  • Features (regressors): collected using data from registration up to the training cutoff date
  • Duration: time from registration to last activity before cutoff
  • Event indicator: 1 if user was inactive for 30+ days before cutoff, 0 otherwise
  • Observation window: from past up to training cutoff

Approach 2 (Prospective):

  • Features: collected using data from registration up to the training cutoff date
  • Duration: time from registration to last activity before cutoff
  • Event indicator: based on user status after the cutoff (e.g., 1 if churned within 1 week post-cutoff, 0 if still active). This assumes the model will predict survival time 1 week into the future.
  • Observation window: extends beyond training cutoff to label events

Which approach is correct for survival analysis, or are both valid depending on the use case?

Thanks!

$\endgroup$
1
  • 5
    $\begingroup$ Approach 1 is standard in survival analysis. You censor at the cutoff. Anyone who has not yet churned by then is “alive but censored.” (In survival parlance) $\endgroup$ Commented Oct 1 at 4:29

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.