I have a logistic regression model that predicts churn (0 vs. 1). I was asked to use the model to predict on a historical group of non-churners, remove anyone who was marked as a churner, and then increase one variable while keeping the rest constant to see how the prediction changes. Interestingly, it predicts zero churns after removing this first cohort of predicted churn. Increasing this one variable seems to have no impact after the initial cohort is removed, although this variable has the greatest feature importance in the model. Is this simply functioning as expected?
Here is the sample code being used:
r1_pred = logisticRegr_balanced.predict(dfa) dfa['Churn Prediction'] = r1_pred print("Non-Churners: "+str(len(dfa[dfa['Churn Prediction']==0]))) print("Churners: "+str(len(dfa[dfa['Churn Prediction']==1]))) print("Percent Churn: " +str((len(dfa[dfa['Churn Prediction']==1]))/len(dfa['Churn Prediction']))) Results in:
Non-Churners: 70611 Churners: 19609 Percent Churn: 0.21734648636665926 Then I create a new dataframe with only the survivors, and increment the "Customer Life" variable by 30 days.
dfa_30 = dfa[dfa['Churn Prediction']==0] dfa_30 = dfa_60.drop('Churn Prediction', axis=1, inplace=False) dfa_30.CustomerLife = dfa_30.CustomerLife + 30 r30_pred = logisticRegr_balanced.predict(dfa_30) dfa_30['Churn Prediction'] = r30_pred print("Non-Churners: "+str(len(dfa_30[dfa_30['Churn Prediction']==0]))) print("Churners: "+str(len(dfa_30[dfa_30['Churn Prediction']==1]))) print("Percent Churn: " +str((len(dfa_30[dfa_30['Churn Prediction']==1]))/len(dfa_30['Churn Prediction']))) Results in:
Non-Churners: 70611 Churners: 0 Percent Churn: 0.0 Is the model no longer able to predict churn because it has categorized everything on the binary scale in the first prediction, so all that's left are "permanent" survivors?