I am working on a project using observational time-to-event data that includes no treatment or control arms. The primary outcome is a composite endpoint, defined as the occurrence of either of two event types — death or a clinically relevant non-fatal event — where death is clearly more severe.
The objective is prediction, not causal inference. Each participant has repeated measurements of several clinical variables recorded over time.
Please note that I am joining this project in its latter stages. I was not involved in any decisions regarding sample size/power, and many of the final modelling choices. I don't have access to the data and I am working in an advisory/consulting capacity.
The dataset consists of 450 patients and 100 events over a follow-up period of up to 4 years (median: 340 days). Given this sample size and event count, more complex frameworks such as multi-state or ordinal survival models were thought to be underpowered or unstable and were ruled out (also because these analyses are non-standard in this particular domain and a non-negotiable looming deadline for the write-up).
One challenge lies in representing this composite structure. While some researchers treat both events as exchangeable failures in a single survival model (eg., standard Cox), this is problematic here due to the difference in clinical severity. Moreover, the events form a semi-competing risks structure: the non-fatal event may occur before death, but not after.
To address this, we are exploring the use of the Win Ratio methodology (Pocock et al., 2012), which prioritises more severe outcomes. However, this framework was developed for randomised trials and requires a binary "group" variable (eg., treatment vs. control) — a structure absent from our dataset.
A potential workaround is to define high- vs. low-risk strata by dichotomising a continuous baseline predictor. This idea is supported by Wang et al. (2024), who propose a regularised Win Ratio regression for risk prediction. In their formulation, the grouping variable can be derived from baseline data rather than trial arms, and the threshold can be selected empirically — for example, via nested cross-validation.
While alternative approaches such as multi-state models, illness–death frameworks, or ordinal survival models (eg., Markov proportional odds models) could offer more nuanced representations of event severity and temporal ordering, they are not currently feasible in this project due to constraints mentioned above. The goal is not to model the full transition process or cause-specific hazards, but to construct a clinically useful predictive tool. We welcome suggestions for methods that preserve interpretability while remaining tractable in a real-world predictive setting.
Questions
I am well-aware of problems due to dichotomising continuous variables, but here the situation is different. Is it justifiable to use a dichotomised baseline predictor as the grouping variable in a Win Ratio analysis in the absence of a treatment/control design, as per Wang et al (2024) ? If not, what alternative methods would reflect differential event severity while maintaining a predictive focus?
If so, how should the dichotomisation threshold be chosen in a predictive modelling context? Is selection via cross-validation acceptable, or should the threshold be anchored to clinical criteria or empirical quantiles?
References
Pocock, S. J., Ariti, C. A., Collier, T. J., & Wang, D. (2012). The win ratio: A new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal, 33(2), 176–182. https://doi.org/10.1093/eurheartj/ehr352
Wang, D., Dong, G., Huang, B., Verbeeck, J., Cui, Y., Song, J., Gamalo-Siebers, M., Hoaglin, D. C., & Seifu, Y. (2024). Regularized win ratio regression for variable selection and risk prediction. BMC Medical Research Methodology, 24(1), 54. https://doi.org/10.1186/s12874-025-02554-w