5

This @camille code generates a nice pareto plot with ggplot.

library(tidyverse) d <- tribble( ~ category, ~defect, "price", 80, "schedule", 27, "supplier", 66, "contact", 94, "item", 33 ) %>% arrange(desc(defect)) %>% mutate( cumsum = cumsum(defect), freq = round(defect / sum(defect), 3), cum_freq = cumsum(freq) ) %>% mutate(category = as.factor(category) %>% fct_reorder(defect)) brks <- unique(d$cumsum) ggplot(d, aes(x = fct_rev(category))) + geom_col(aes(y = defect)) + geom_point(aes(y = cumsum)) + geom_line(aes(y = cumsum, group = 1)) + scale_y_continuous(sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent), breaks = brks) 

Capture3.png

It's almost perfect except I'd like to see the second y-axis break at the cumulative y-values. This can be achieved in base-R with the following code. But how do I do it in ggplot?

## Creating the d tribble library(tidyverse) d <- tribble( ~ category, ~defect, "price", 80, "schedule", 27, "supplier", 66, "contact", 94, "item", 33 ) ## Creating new columns d <- arrange(d, desc(defect)) %>% mutate( cumsum = cumsum(defect), freq = round(defect / sum(defect), 3), cum_freq = cumsum(freq) ) ## Saving Parameters def_par <- par() ## New margins par(mar=c(5,5,4,5)) ## bar plot, pc will hold x values for bars pc = barplot(d$defect, width = 1, space = 0.2, border = NA, axes = F, ylim = c(0, 1.05 * max(d$cumsum, na.rm = T)), ylab = "Cummulative Counts" , cex.names = 0.7, names.arg = d$category, main = "Pareto Chart (version 1)") ## Cumulative counts line lines(pc, d$cumsum, type = "b", cex = 0.7, pch = 19, col="cyan4") ## Framing plot box(col = "grey62") ## adding axes axis(side = 2, at = c(0, d$cumsum), las = 1, col.axis = "grey62", col = "grey62", cex.axis = 0.8) axis(side = 4, at = c(0, d$cumsum), labels = paste(c(0, round(d$cum_freq * 100)) ,"%",sep=""), las = 1, col.axis = "cyan4", col = "cyan4", cex.axis = 0.8) ## restoring default paramenter par(def_par) 

Capture4.png

Camille had some ideas but they still linger, "The more recent versions of ggplot2 allow for a secondary axis, but it needs to be based on a transformation of the primary axis. In this case, that means it should take the primary axis's values and divide by the maximum value to get a percentage.".

2 Answers 2

5
brks <- unique(d$cumsum) brks2 <- unique(d$cumsum / max(d$cumsum)) ggplot(d, aes(x = fct_rev(category))) + geom_col(aes(y = defect)) + geom_point(aes(y = cumsum)) + geom_line(aes(y = cumsum, group = 1)) + scale_y_continuous(sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent, breaks = brks2), breaks = brks) 

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

+1 for an excellent, simple (and should have been obvious the first time!) improvement on my old code. I posed a question last night curious about how to make use of ggplot's internal break calculations in these sorts of custom axes, and this is making me even more curious about that
3

The only improvement this makes over my previous code from the last question, and @Jack Brookes answer is that I eliminated the need for calculating the two sets of breaks outside of the ggplot call. Instead, I just got the breaks for the cumulative raw numbers as unique(d$cumsum) and the breaks for the cumulative frequencies as unique(d$cumfreq). On both of these, I tacked a 0 to the beginning, because otherwise there's no break placed at 0.


library(tidyverse) library(scales) d <- tribble( ~ category, ~defect, "price", 80, "schedule", 27, "supplier", 66, "contact", 94, "item", 33 ) %>% arrange(desc(defect)) %>% mutate( cumsum = cumsum(defect), freq = round(defect / sum(defect), 3), cum_freq = cumsum(freq) ) %>% mutate(category = as.factor(category) %>% fct_reorder(defect)) ggplot(d, aes(x = fct_rev(category))) + geom_col(aes(y = defect)) + geom_point(aes(y = cumsum)) + geom_line(aes(y = cumsum, group = 1)) + scale_y_continuous(breaks = c(0, unique(d$cumsum)), sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent, breaks = c(0, unique(d$cum_freq))) ) + theme(panel.grid.minor = element_blank()) 

1 Comment

Double y axes have been shown to be hard to read. Maybe a better solution here is to put the percentages as a geom_text ?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.