I found this a very interesting question and I struggled to think of scenarios where binning a response variable would lead to better predictions.
The best I could come up with is a scenario like this one (all code is attached at the end), where the red class corresponds to $y \leq 1$ and the blue class to $y>1$ and we have one (or of course more) predictor that is within class uncorrelated with $y$, but separates the classes perfectly. 
Here, a Firth penalized logistic regression
Predicted Truth red blue red 5000 0 blue 2 4998
beats a simple linear model (followed by classifying based on whether predictions are >1):
Predicted Truth red blue red 4970 30 blue 0 5000
However, let's be honest, part of the problem is that a linear regression is not such a great model for this problem. Replacing the linear regression and the logistic regression with a regression and a classification random forest, respectively, deals with this perfectly. Both produce this result (see below):
Predicted Truth red blue red 5000 0 blue 0 5000
However, I guess that's at least an example where you seem to do a little better within the class of models with a linear regression equation (of course, this still totally ignores the possibility of using splines etc.).
library(tidyverse) library(ranger) library(ggrepel) library(logistf) # Set defaults for ggplot2 ---- theme_set( theme_bw(base_size=18) + theme(legend.position = "none")) scale_colour_discrete <- function(...) { # Alternative: ggsci::scale_color_nejm(...) scale_colour_brewer(..., palette="Set1") } scale_fill_discrete <- function(...) { # Alternative: ggsci::scale_fill_nejm(...) scale_fill_brewer(..., palette="Set1") } scale_colour_continuous <- function(...) { scale_colour_viridis_c(..., option="turbo") } update_geom_defaults("point", list(size=2)) update_geom_defaults("line", list(size=1.5)) # To allow adding label to points e.g. as geom_text_repel(data=. %>% filter(1:n()==n())) update_geom_defaults("text_repel", list(label.size = NA, fill = rgb(0,0,0,0), segment.color = "transparent", size=6)) # Start program ---- set.seed(1234) records = 5000 # Create the example data including a train-test split example = tibble(y = c(runif(n=records*2, min = 0, max=1), runif(n=records*2, min = 1, max=2)), class = rep(c(0L,1L), each=records*2), test = factor(rep(c(0,1,0,1), each=records), levels=0:1, labels=c("Train", "Test")), predictor = c(runif(n=records*2, min = 0, max=1), runif(n=records*2, min = 1, max=2))) # Plot the dataset example %>% ggplot(aes(x=predictor, y=y, col=factor(class))) + geom_point(alpha=0.3) + facet_wrap(~test) # Linear regression lm1 = lm(data=example %>% filter(test=="Train"), y ~ predictor) # Performance of linear regression prediction followed by classifying by prediction>1 table(example %>% filter(test=="Test") %>% pull(class), predict(lm1, example %>% filter(test=="Test")) > 1) # Firth penalized logistic regression glm1 = logistf(data=example %>% filter(test=="Train"), class ~ predictor, pl=F) # Performance of classifying by predicted log-odds from Firth LR being >0 table(example %>% filter(test=="Test") %>% pull(class), predict(glm1, example %>% filter(test=="Test"))>0) # Now, let's try this with RF instead: # First, binary classification RF rf1 = ranger(formula = class ~ predictor, data=example %>% filter(test=="Train"), classification = T) table(example %>% filter(test=="Test") %>% pull(class), predict(rf1, example %>% filter(test=="Test"))$predictions) # Now regression RF rf2 = ranger(formula = y ~ predictor, data=example %>% filter(test=="Train"), classification = F) table(example %>% filter(test=="Test") %>% pull(class), predict(rf2, example %>% filter(test=="Test"))$predictions>1)