Timeline for How to sample non-event data for survival analysis

Current License: CC BY-SA 4.0

9 events

when toggle format	what		by	license	comment
Jul 2, 2021 at 2:06	vote	accept	ddd
Jul 2, 2021 at 2:00	comment	added	user318288		Set the dummy variable to 1 for customers in the high churn program, and 0 otherwise. But I would break up the data, since a lot of times people throw everything into a model and expect it to work. Use "divide and conquer", i.e., break up a complex problem into multiple simpler problems to solve. That is, several Cox PH models (high churn group, low churn group).
Jul 2, 2021 at 1:59	comment	added	user318288		So have 2 Cox PH regression runs, one with churn that 20 times greater, and the other with churn that 20 times lower. Either that, or introduce a dummy (0,1) binary variable into the model that will adjust the model results by the two groups. That is, you can control for the two scenarios. Get rid of large heterogeneities in outcome results by breaking data up. A regression model looking at risk can only include records for objects (customers) for which everything is fair, or like-balanced.
Jul 2, 2021 at 1:32	comment	added	ddd		There two scenarios for non-churn events. The ones that got renewal at the end of turn and active services. For latter as you pointed out their time is today minus the first contract date. But the time can be as little as 1 month. If I consider every active services no matter how far they are in term, it will be a lot more non-churn events than churn events probably 20:1 which makes the dataset very imbalanced. Would that be a problem?
Jul 2, 2021 at 1:12	comment	added	user318288		Reviewers who make decisions about funding medical clinical trials like Pfizer and Moderna also expect one time window. You are only doing this since you are data dredging ("fishing expedition"),trying to find something among data for which there was no a priori research design with a fixed time period for each product. This happens all the time, and the data are called "administrative data," i.e. data are just found in a database which wasn't generated for research.
Jul 2, 2021 at 1:12	comment	added	user318288		Yes, could could simply use varying window lengths of time. So the answer would be for a 2-year study (window), we found this, and for a 5-year study (window) we found this. I was pretty rigid about the 10 years, for example, because in medical follow-up studies and clinical trials, you can't be all over the map on windows, as there is only one window. For example, both the Pfizer and Moderna mrna covid19 vaccine trials were 120 days long, and almost everyone was enrolled in the first month. But the point is the study investigators were not all over the map with windows of varying size.
Jul 2, 2021 at 1:04	comment	added	ddd		question is how to decide how long the follow-up period is, why 10 years? I discovered that different product indeed has different survival rate indicted by K-M curves. Should I use different period window then, e.g. max term length of that particular product?
Jul 2, 2021 at 0:28	history	edited	user318288	CC BY-SA 4.0	added 341 characters in body
Jul 2, 2021 at 0:21	history	answered	user318288	CC BY-SA 4.0