2

I have a continuous variable on y, and a categorical on x axis. At the categorical variable the order makes sense, and it would make sense to fit a regression by its index, I mean instead of c('a', 'b', 'c') use the indices (order(c('a', 'b', 'c')), which is c(1, 2, 3)), and fit the model against this. However, ggplot rejects to fit a geom_smooth(method = lm) if one variable is not numeric. Ok, then I can tell it that use the order:

geom_smooth(aes(x = order(hgcc), y = rtmean), method = lm) 

But then it takes the indices of the whole column from the data frame, which is not good with faceting with scales = 'free', when only a subset of the levels of the x variable appears on one plot. The indexes in the whole dataframe are much higher in average, so the regression will be plotted far on the right:

regression pushed to right

Here is a minimal working example:

require(ggplot2) load(url('http://www.ebi.ac.uk/~denes/54b510889336eb2591d8beff/sample_data.RData')) ggplot(adata12cc, aes(x = hgcc, y = rtmean, color = cls, size = log10(intensity))) + geom_point(stat = 'sum', alpha = 0.33) + geom_smooth( aes(x = order(hgcc), y = rtmean), method = 'glm') + facet_wrap( ~ uhgroup, scales = 'free') + scale_radius(guide = guide_legend(title = 'Intensity (log)')) + scale_color_discrete(guide = guide_legend(title = 'Class')) + xlab('Carbon count unsaturation') + ylab('Mean RT [min]') + ggtitle('RT vs. carbon count & unsaturation by headgroup') + theme(axis.title = element_text(size = 24), axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1), axis.text.y = element_text(size = 11), plot.title = element_text(size = 21), strip.text = element_text(size = 18), panel.grid.minor.x = element_blank()) 

I know this is not the nice way of doing things, but ggplot could make life so much easier, if I could refer to those variables and do something with them which are subsetted anyways by faceting.

1 Answer 1

5

I think I've got a solution, but I'm not sure what you want...

The Main problem is that your x value label, is already split by uhgroup If you look at the factor they are PC-O(38.7) PC(38.7 etc...

So the first thing is too create a new hgcc value for the x axis.

adata12cc$hgcc_value <-as.factor(substr(adata12cc$hgcc, (nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])-5), nchar(levels(adata12cc$hgcc)[adata12cc$hgcc]))) 

Then another problem is that you have different x axis for geom_point and geom_smooth. One is hgcc, the other is order(hgcc_value).

The solution is to use the same value, here I use as.numeric(hgcc_value) (instead of order()) and to precise in scale_x_continuous the label of the breaks.

ggplot(adata12cc, aes(x = as.numeric(hgcc_value), y = rtmean, color = cls, size = log10(intensity))) + geom_point(stat = 'sum', alpha = 0.33) + geom_smooth( aes(x = as.numeric(hgcc_value), y = rtmean), method = 'glm') + facet_wrap( ~ uhgroup, scales = 'free') + scale_radius(guide = guide_legend(title = 'Intensity (log)')) + scale_color_discrete(guide = guide_legend(title = 'Class')) + scale_x_continuous(name = "Carbon count unsaturation", breaks=as.numeric(adata12cc$hgcc_value), labels = adata12cc$hgcc_value, minor_breaks = NULL)+ ylab('Mean RT [min]') + ggtitle('RT vs. carbon count & unsaturation by headgroup') + theme(axis.title = element_text(size = 24), axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1), axis.text.y = element_text(size = 11), plot.title = element_text(size = 21), strip.text = element_text(size = 18), panel.grid.minor.x = element_blank()) 

enter image description here

Is it what you were looking for?

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.