1

For a sample dataframe:

 df <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), letter_group = c("A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C"), value = c(2L, 3L, 4L, 5L, 6L, 6L, 7L, 8L, 5L, 6L, 7L, 3L, 4L, 5L, 6L, 4L, 5L, 6L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 8L, 5L, 3L, 2L, 4L, 5L, 6L, 4L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 4L, 5L, 6L, 4L)), .Names = c("year", "letter_group", "value"), row.names = c(NA, -44L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = structure(list( year = structure(list(), class = c("collector_integer", "collector" )), letter_group = structure(list(), class = c("collector_character", "collector")), value = structure(list(), class = c("collector_integer", "collector"))), .Names = c("year", "letter_group", "value" )), default = structure(list(), class = c("collector_guess", "collector"))), .Names = c("cols", "default"), class = "col_spec")) 

I am trying to create a box plot which comprises the years on the x axes - but also the 'letter-groups' grouped by year...

i.e. A, B, C for year 1, then a small space then A, B C for year 2 and so on....

I have the following:

library(ggplot2) p1 <- ggplot(df, aes(year, value)) p1 + geom_boxplot(aes(group=letter_group)) 

But this is only producing the 3 box plots.

Could someone please help me?

5
  • 2
    your grouping variables do not seem to be factors. is ggplot(df, aes(as.factor(year), value, fill=as.factor(letter_group))) + geom_boxplot() what you are looking for? Commented May 23, 2019 at 11:24
  • Thanks @nouse - simple and effective! Perfect! In my real example, I have ten 'letter-groups' - is there a way to specify the order of my variables? (I have 1-10, yet ggplot is ordering them 1, 10, 2-9) Commented May 23, 2019 at 12:52
  • @KT_1 Do you mean you have letters running A to J or that your "letter" groups are the numbers 1-10? Try factor(letter_group, levels = LETTERS[1:10]) or factor(letter_group, levels = 1:10). Commented May 23, 2019 at 13:02
  • Thanks @Lyngbakr - my 'letter_group' is infact called 'deciles' which run 1-10. The second of your helpful suggestions gives the error: Error in as.factor(deciles, levels = 1:10) : unused argument (levels = 1:10) What am I doing wrong? Commented May 23, 2019 at 13:24
  • 1
    @KT_1 Note that it's factor not as.factor. (See here for an explanation of the differences.) Commented May 23, 2019 at 13:26

3 Answers 3

3

An alternative to @nouse's solution (which is the best solution) is to use faceting. One benefit of faceting, however, is that you also get letter group labels on the x-axis.

Define data structure

# Load library library(ggplot2) # Define data frame df <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), letter_group = c("A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C"), value = c(2L, 3L, 4L, 5L, 6L, 6L, 7L, 8L, 5L, 6L, 7L, 3L, 4L, 5L, 6L, 4L, 5L, 6L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 8L, 5L, 3L, 2L, 4L, 5L, 6L, 4L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 4L, 5L, 6L, 4L)), .Names = c("year", "letter_group", "value"), row.names = c(NA, -44L), class = c("tbl_df","tbl", "data.frame"), spec = structure(list(cols = structure(list( ear = structure(list(), class = c("collector_integer", "collector")), letter_group = structure(list(), class = c("collector_character", "collector")), value = structure(list(), class = c("collector_integer", "collector"))), .Names = c("year", "letter_group", "value")), default = structure(list(), class = c("collector_guess", "collector"))), .Names = c("cols", "default"), class = "col_spec")) 

Plot results

# Plot results g <- ggplot(df) g <- g + geom_boxplot(aes(letter_group, value)) g <- g + facet_grid(. ~ year, switch = "x") g <- g + theme(strip.placement = "outside", strip.background = element_blank(), panel.background = element_rect(fill = "white"), panel.grid.major = element_line(colour = alpha("gray50", 0.25), linetype = "dashed")) g <- g + ylab("Value") + xlab("Year & Letter Group") print(g) 

Created on 2019-05-23 by the reprex package (v0.2.1)

Sign up to request clarification or add additional context in comments.

Comments

1

Your question has been largely answered here.

Your dataframe does not include factors, so you would first need to turn your grouping variables into factors. Then, there are two options, as per link given above. Either construct a new factor by combining your two original factors (as shown in z-cool's answer) - but this does not create the desired space between factor levels on the x-axis - or you would need to assign one of your factors to fill, or col. In your case, the quickest way to solve your problem is

ggplot(df, aes(as.factor(year), value, fill=as.factor(letter_group))) + geom_boxplot() 

If you do not want to colorize your plot, you can change this with scale_fill_manual or scale_color_manual, depending on your choice in aes before:

ggplot(df, aes(as.factor(year), value, fill=as.factor(letter_group))) + geom_boxplot() + scale_fill_manual(values=c("white", "white", "white")) + theme(legend.position = "none") 

Comments

-1

This should work

library(tidyverse) df %>% mutate(year_group = paste(year, letter_group)) %>% ggplot(aes(year_group, value)) + geom_boxplot() 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.