1

I am doing the 5 day regression challenge on Kaggle but I am having trouble getting a poisson regression line to plot when using ggplot. I want to look at the effects of Rain fall on the number of bikes that cross the bridges in NYC.

I have tried reversing the order of geom_point and geom_smooth but that hasn't worked. If I use gaussian instead, that also doesn't work.

Here's what I have written:

ggplot(data = bikes, mapping = aes(x = Precipitation, y = Total)) + geom_smooth(method = "glm", method.args = list(family = "poisson")) + geom_point() + labs(title = 'Precipitation vs. Total Bike Crossings') 

1 Answer 1

1

Inspect the data before plotting it:

unique(bikes$Precipitation) # [1] "0.01" "0.15" "0.09" "0.47 (S)" "0" "0.2" "T" # [8] "0.16" "0.24" "0.05" 

The regressor Precipitation is not numeric and some of its values when coerced to numeric will become NA values.

If you coerce Precipitation first the regression line shows up.

library(dplyr) library(ggplot2) bikes %>% mutate(Precipitation = as.numeric(Precipitation)) %>% na.omit() %>% ggplot(mapping = aes(x = Precipitation, y = Total)) + geom_smooth(method = "glm", formula = y ~ x, method.args = list(family = "poisson")) + geom_point() + labs(title = 'Precipitation vs. Total Bike Crossings') 

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

That was the problem! Thanks so much Rui!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.