I've got some data about airline flights (in a data frame called flights) and I would like to see if the flight time has any effect on the probability of a significantly delayed arrival (meaning 10 or more minutes). I figured I'd use logistic regression, with the flight time as the predictor and whether or not each flight was significantly delayed (a bunch of Bernoullis) as the response. I used the following code...
flights$BigDelay <- flights$ArrDelay >= 10 delay.model <- glm(BigDelay ~ ArrDelay, data=flights, family=binomial(link="logit")) summary(delay.model) ...but got the following output.
> flights$BigDelay <- flights$ArrDelay >= 10 > delay.model <- glm(BigDelay ~ ArrDelay, data=flights, family=binomial(link="logit")) Warning messages: 1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : algorithm did not converge 2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : fitted probabilities numerically 0 or 1 occurred > summary(delay.model) Call: glm(formula = BigDelay ~ ArrDelay, family = binomial(link = "logit"), data = flights) Deviance Residuals: Min 1Q Median 3Q Max -3.843e-04 -2.107e-08 -2.107e-08 2.107e-08 3.814e-04 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -312.14 170.26 -1.833 0.0668 . ArrDelay 32.86 17.92 1.833 0.0668 . --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2.8375e+06 on 2291292 degrees of freedom Residual deviance: 9.1675e-03 on 2291291 degrees of freedom AIC: 4.0092 Number of Fisher Scoring iterations: 25 What does it mean that the algorithm did not converge? I thought it be because the BigDelay values were TRUE and FALSE instead of 0 and 1, but I got the same error after I converted everything. Any ideas?