How should composite endpoints be handled in predictive survival models without a control group?

Question

I am working on a project using observational time-to-event data that includes no treatment or control arms. The primary outcome is a composite endpoint, defined as the occurrence of either of two event types — death or a clinically relevant non-fatal event — where death is clearly more severe.

The objective is prediction, not causal inference. Each participant has repeated measurements of several clinical variables recorded over time.

Please note that I am joining this project in its latter stages. I was not involved in any decisions regarding sample size/power, and many of the final modelling choices. I don't have access to the data and I am working in an advisory/consulting capacity.

The dataset consists of 450 patients and 100 events over a follow-up period of up to 4 years (median: 340 days). Given this sample size and event count, more complex frameworks such as multi-state or ordinal survival models were thought to be underpowered or unstable and were ruled out (also because these analyses are non-standard in this particular domain and a non-negotiable looming deadline for the write-up).

One challenge lies in representing this composite structure. While some researchers treat both events as exchangeable failures in a single survival model (eg., standard Cox), this is problematic here due to the difference in clinical severity. Moreover, the events form a semi-competing risks structure: the non-fatal event may occur before death, but not after.

To address this, we are exploring the use of the Win Ratio methodology (Pocock et al., 2012), which prioritises more severe outcomes. However, this framework was developed for randomised trials and requires a binary "group" variable (eg., treatment vs. control) — a structure absent from our dataset.

A potential workaround is to define high- vs. low-risk strata by dichotomising a continuous baseline predictor. This idea is supported by Wang et al. (2024), who propose a regularised Win Ratio regression for risk prediction. In their formulation, the grouping variable can be derived from baseline data rather than trial arms, and the threshold can be selected empirically — for example, via nested cross-validation.

While alternative approaches such as multi-state models, illness–death frameworks, or ordinal survival models (eg., Markov proportional odds models) could offer more nuanced representations of event severity and temporal ordering, they are not currently feasible in this project due to constraints mentioned above. The goal is not to model the full transition process or cause-specific hazards, but to construct a clinically useful predictive tool. We welcome suggestions for methods that preserve interpretability while remaining tractable in a real-world predictive setting.

Questions

I am well-aware of problems due to dichotomising continuous variables, but here the situation is different. Is it justifiable to use a dichotomised baseline predictor as the grouping variable in a Win Ratio analysis in the absence of a treatment/control design, as per Wang et al (2024) ? If not, what alternative methods would reflect differential event severity while maintaining a predictive focus?
If so, how should the dichotomisation threshold be chosen in a predictive modelling context? Is selection via cross-validation acceptable, or should the threshold be anchored to clinical criteria or empirical quantiles?

References

Pocock, S. J., Ariti, C. A., Collier, T. J., & Wang, D. (2012). The win ratio: A new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal, 33(2), 176–182. https://doi.org/10.1093/eurheartj/ehr352

Wang, D., Dong, G., Huang, B., Verbeeck, J., Cui, Y., Song, J., Gamalo-Siebers, M., Hoaglin, D. C., & Seifu, Y. (2024). Regularized win ratio regression for variable selection and risk prediction. BMC Medical Research Methodology, 24(1), 54. https://doi.org/10.1186/s12874-025-02554-w

"win ratios" are inferential and depend on some apriori knowledge about which type of endpoint is worse (ie death > CABG > other cardiac hospitalization > ...). You are predicting time to a composite endpoint. But are you saying you also want to predict the kind of endpoint? — AdamO
– AdamO, Commented May 1 at 16:57
Thanks @AdamO to clarify we're not trying to predict which event will occur (that would be nice but I didn’t think that was possible with win ratios?) I thought a multi-state or cause-specific Cox would be needed for that?), but rather to model time-to-event in a way that reflects severity. I thought win ratio framework incorporates severity by comparing more severe events first when ranking patient pairs. I've been exploring the Wang extension, which frames the Win Ratio in a predictive context using regularised regression - though I'm still figuring out how best (if at all) to apply it here. — Robert Long
– Robert Long, Commented May 1 at 17:44
Severity fits into what I'm broadly calling the "kind" of endpoint here. Nevertheless, Frank's answer very much hits on the method you might use. — AdamO
– AdamO, Commented May 1 at 20:57

Frank Harrell · Accepted Answer · 2025-05-01 18:57:38Z

I’m surprised you see WIN ratio as a good solution here. It requires a good deal of work and has a lot of disadvantages which I cataloged here, especially getting a clinical interpretation from it. Dichotomizing the exposure variable in this context is extremely problematic and defeats the whole purpose of the analysis IMHO. The situation seems to be calling out for an ordinal state transition model as detailed here. Multistate models give you all sorts of clinical estimands such as covariate-specific mean time alive and well, mean time alive, and all the state occupancy probabilities. In many cases you fit these with off-the-shelf software although in other cases you need to allow non-proportional-odds with respect to follow-up time to not assume that fatal and nonfatal events have similar timings.

The link I provided deals with discrete time state transition models, which are the easiest. The R survival package comes with a great vignette for continuous time multistate models.

Hi Frank ! Thank you so much - that's music to my ears....had I been involved in this earlier I would have advocated for the approaches you suggest, but due to the time constraints and your feedback, we are going to abandon composite end point analysis for this part of the project. For the next stage(s) I will definitely look towards multistate models. FYI, I didn't think it was a "good" solution - just something that looked interesting at first sight, and the Wang (2024) paper is quite a good read - if you get chance to take a look it would be interesting to know your thoughts, best wishes! — Robert Long
– Robert Long, Commented May 1 at 19:55
Looks like Mao is the sole author. Very interesting paper. Having to construct all possible pairs of patients due to not having a per-patient contribution to the likelihood is a drawback. Markov models admit standard penalization as well as differential penalization, e.g., you can penalize time by covariate interactions more than main effects. — Frank Harrell
– Frank Harrell, Commented May 2 at 11:15

Stack Exchange Network

How should composite endpoints be handled in predictive survival models without a control group?

Questions

References

1 Answer 1

Hot Network Questions

How should composite endpoints be handled in predictive survival models without a control group?

Questions

References

1 Answer 1

Related

Hot Network Questions