ggplot2: fit geom_smooth() like categorical variable were continuous

Question

I have a continuous variable on y, and a categorical on x axis. At the categorical variable the order makes sense, and it would make sense to fit a regression by its index, I mean instead of c('a', 'b', 'c') use the indices (order(c('a', 'b', 'c')), which is c(1, 2, 3)), and fit the model against this. However, ggplot rejects to fit a geom_smooth(method = lm) if one variable is not numeric. Ok, then I can tell it that use the order:

geom_smooth(aes(x = order(hgcc), y = rtmean), method = lm)

But then it takes the indices of the whole column from the data frame, which is not good with faceting with scales = 'free', when only a subset of the levels of the x variable appears on one plot. The indexes in the whole dataframe are much higher in average, so the regression will be plotted far on the right:

Here is a minimal working example:

require(ggplot2) load(url('http://www.ebi.ac.uk/~denes/54b510889336eb2591d8beff/sample_data.RData')) ggplot(adata12cc, aes(x = hgcc, y = rtmean, color = cls, size = log10(intensity))) + geom_point(stat = 'sum', alpha = 0.33) + geom_smooth( aes(x = order(hgcc), y = rtmean), method = 'glm') + facet_wrap( ~ uhgroup, scales = 'free') + scale_radius(guide = guide_legend(title = 'Intensity (log)')) + scale_color_discrete(guide = guide_legend(title = 'Class')) + xlab('Carbon count unsaturation') + ylab('Mean RT [min]') + ggtitle('RT vs. carbon count & unsaturation by headgroup') + theme(axis.title = element_text(size = 24), axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1), axis.text.y = element_text(size = 11), plot.title = element_text(size = 21), strip.text = element_text(size = 18), panel.grid.minor.x = element_blank())

I know this is not the nice way of doing things, but ggplot could make life so much easier, if I could refer to those variables and do something with them which are subsetted anyways by faceting.

timat · Accepted Answer · 2016-11-22 13:57:15Z

I think I've got a solution, but I'm not sure what you want...

The Main problem is that your x value label, is already split by uhgroup If you look at the factor they are PC-O(38.7) PC(38.7 etc...

So the first thing is too create a new hgcc value for the x axis.

adata12cc$hgcc_value <-as.factor(substr(adata12cc$hgcc, (nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])-5), nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])))

Then another problem is that you have different x axis for geom_point and geom_smooth. One is hgcc, the other is order(hgcc_value).

The solution is to use the same value, here I use as.numeric(hgcc_value) (instead of order()) and to precise in scale_x_continuous the label of the breaks.

ggplot(adata12cc, aes(x = as.numeric(hgcc_value), y = rtmean, color = cls, size = log10(intensity))) + geom_point(stat = 'sum', alpha = 0.33) + geom_smooth( aes(x = as.numeric(hgcc_value), y = rtmean), method = 'glm') + facet_wrap( ~ uhgroup, scales = 'free') + scale_radius(guide = guide_legend(title = 'Intensity (log)')) + scale_color_discrete(guide = guide_legend(title = 'Class')) + scale_x_continuous(name = "Carbon count unsaturation", breaks=as.numeric(adata12cc$hgcc_value), labels = adata12cc$hgcc_value, minor_breaks = NULL)+ ylab('Mean RT [min]') + ggtitle('RT vs. carbon count & unsaturation by headgroup') + theme(axis.title = element_text(size = 24), axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1), axis.text.y = element_text(size = 11), plot.title = element_text(size = 21), strip.text = element_text(size = 18), panel.grid.minor.x = element_blank())

Is it what you were looking for?

Collectives™ on Stack Overflow

ggplot2: fit geom_smooth() like categorical variable were continuous

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related