how to get ggplot2 axis to break at variable values?

Question

This @camille code generates a nice pareto plot with ggplot.

library(tidyverse) d <- tribble( ~ category, ~defect, "price", 80, "schedule", 27, "supplier", 66, "contact", 94, "item", 33 ) %>% arrange(desc(defect)) %>% mutate( cumsum = cumsum(defect), freq = round(defect / sum(defect), 3), cum_freq = cumsum(freq) ) %>% mutate(category = as.factor(category) %>% fct_reorder(defect)) brks <- unique(d$cumsum) ggplot(d, aes(x = fct_rev(category))) + geom_col(aes(y = defect)) + geom_point(aes(y = cumsum)) + geom_line(aes(y = cumsum, group = 1)) + scale_y_continuous(sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent), breaks = brks)

It's almost perfect except I'd like to see the second y-axis break at the cumulative y-values. This can be achieved in base-R with the following code. But how do I do it in ggplot?

## Creating the d tribble library(tidyverse) d <- tribble( ~ category, ~defect, "price", 80, "schedule", 27, "supplier", 66, "contact", 94, "item", 33 ) ## Creating new columns d <- arrange(d, desc(defect)) %>% mutate( cumsum = cumsum(defect), freq = round(defect / sum(defect), 3), cum_freq = cumsum(freq) ) ## Saving Parameters def_par <- par() ## New margins par(mar=c(5,5,4,5)) ## bar plot, pc will hold x values for bars pc = barplot(d$defect, width = 1, space = 0.2, border = NA, axes = F, ylim = c(0, 1.05 * max(d$cumsum, na.rm = T)), ylab = "Cummulative Counts" , cex.names = 0.7, names.arg = d$category, main = "Pareto Chart (version 1)") ## Cumulative counts line lines(pc, d$cumsum, type = "b", cex = 0.7, pch = 19, col="cyan4") ## Framing plot box(col = "grey62") ## adding axes axis(side = 2, at = c(0, d$cumsum), las = 1, col.axis = "grey62", col = "grey62", cex.axis = 0.8) axis(side = 4, at = c(0, d$cumsum), labels = paste(c(0, round(d$cum_freq * 100)) ,"%",sep=""), las = 1, col.axis = "cyan4", col = "cyan4", cex.axis = 0.8) ## restoring default paramenter par(def_par)

Camille had some ideas but they still linger, "The more recent versions of ggplot2 allow for a secondary axis, but it needs to be based on a transformation of the primary axis. In this case, that means it should take the primary axis's values and divide by the maximum value to get a percentage.".

Jack Brookes · Accepted Answer · 2018-05-25 16:24:06Z

brks <- unique(d$cumsum) brks2 <- unique(d$cumsum / max(d$cumsum)) ggplot(d, aes(x = fct_rev(category))) + geom_col(aes(y = defect)) + geom_point(aes(y = cumsum)) + geom_line(aes(y = cumsum, group = 1)) + scale_y_continuous(sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent, breaks = brks2), breaks = brks)

+1 for an excellent, simple (and should have been obvious the first time!) improvement on my old code. I posed a question last night curious about how to make use of ggplot's internal break calculations in these sorts of custom axes, and this is making me even more curious about that

camille · Accepted Answer · 2018-05-26 01:07:27Z

The only improvement this makes over my previous code from the last question, and @Jack Brookes answer is that I eliminated the need for calculating the two sets of breaks outside of the ggplot call. Instead, I just got the breaks for the cumulative raw numbers as unique(d$cumsum) and the breaks for the cumulative frequencies as unique(d$cumfreq). On both of these, I tacked a 0 to the beginning, because otherwise there's no break placed at 0.

library(tidyverse) library(scales) d <- tribble( ~ category, ~defect, "price", 80, "schedule", 27, "supplier", 66, "contact", 94, "item", 33 ) %>% arrange(desc(defect)) %>% mutate( cumsum = cumsum(defect), freq = round(defect / sum(defect), 3), cum_freq = cumsum(freq) ) %>% mutate(category = as.factor(category) %>% fct_reorder(defect)) ggplot(d, aes(x = fct_rev(category))) + geom_col(aes(y = defect)) + geom_point(aes(y = cumsum)) + geom_line(aes(y = cumsum, group = 1)) + scale_y_continuous(breaks = c(0, unique(d$cumsum)), sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent, breaks = c(0, unique(d$cum_freq))) ) + theme(panel.grid.minor = element_blank())

Double y axes have been shown to be hard to read. Maybe a better solution here is to put the percentages as a geom_text ?

Collectives™ on Stack Overflow

how to get ggplot2 axis to break at variable values?

2 Answers 2

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Linked

Related