Can a Logsitic Regression model continue making predictions after removing predictions from the data set?

Question

I have a logistic regression model that predicts churn (0 vs. 1). I was asked to use the model to predict on a historical group of non-churners, remove anyone who was marked as a churner, and then increase one variable while keeping the rest constant to see how the prediction changes. Interestingly, it predicts zero churns after removing this first cohort of predicted churn. Increasing this one variable seems to have no impact after the initial cohort is removed, although this variable has the greatest feature importance in the model. Is this simply functioning as expected?

Here is the sample code being used:

r1_pred = logisticRegr_balanced.predict(dfa) dfa['Churn Prediction'] = r1_pred print("Non-Churners: "+str(len(dfa[dfa['Churn Prediction']==0]))) print("Churners: "+str(len(dfa[dfa['Churn Prediction']==1]))) print("Percent Churn: " +str((len(dfa[dfa['Churn Prediction']==1]))/len(dfa['Churn Prediction'])))

Results in:

Non-Churners: 70611 Churners: 19609 Percent Churn: 0.21734648636665926

Then I create a new dataframe with only the survivors, and increment the "Customer Life" variable by 30 days.

dfa_30 = dfa[dfa['Churn Prediction']==0] dfa_30 = dfa_60.drop('Churn Prediction', axis=1, inplace=False) dfa_30.CustomerLife = dfa_30.CustomerLife + 30 r30_pred = logisticRegr_balanced.predict(dfa_30) dfa_30['Churn Prediction'] = r30_pred print("Non-Churners: "+str(len(dfa_30[dfa_30['Churn Prediction']==0]))) print("Churners: "+str(len(dfa_30[dfa_30['Churn Prediction']==1]))) print("Percent Churn: " +str((len(dfa_30[dfa_30['Churn Prediction']==1]))/len(dfa_30['Churn Prediction'])))

Results in:

Non-Churners: 70611 Churners: 0 Percent Churn: 0.0

Is the model no longer able to predict churn because it has categorized everything on the binary scale in the first prediction, so all that's left are "permanent" survivors?

Peter · Accepted Answer · 2019-07-31 20:51:00Z

The model is able to predict churn on any sample. The issue is that increasing the „customer life“ by 30 days does not lead to any change in the prediction (based on what the model has learned).

I think you can check two things:

A) More of a fundamental thing, try to improve the model, e.g. by applying regularization to get a better fit on you features. GLMNET is the thing to go for in this case: https://web.stanford.edu/~hastie/glmnet_python/

B) Check how „sensitive“ your predictions are to changes in features. You could plot or look at predictions in case you increase „customer life“ by say 10, 20, 30, ..., 100 or so days. Remember that you basically predict the probability of churn. So it is possible that (some) probabilities at 30 days of customer life are just below 50%. If you gradually look at increases of customer life, you get a good idea of how churn changes in X. This is kind of a marginal effect (dy/dX).

thank you for the clarification and the note about GLMNET. I'll plot the predictions over a year of 30-day increments to see if it changes at all. I noticed Customer Life had a negative correlation with Churn, so increasing this value only seemed to decrease the probability. — flyeaglesfly
– flyeaglesfly, Commented Aug 1, 2019 at 17:35

Stack Exchange Network

Can a Logsitic Regression model continue making predictions after removing predictions from the data set?

1 Answer 1

Hot Network Questions

Can a Logsitic Regression model continue making predictions after removing predictions from the data set?

1 Answer 1

Related

Hot Network Questions