20

I want to create a heat map using ggplot however I want to order the y-axis by the number of observations. I order the dataframe by the column N and add the number of observations to the group name so that it appears in the axis label. When I plot the data it re-orders based on the group name. Is there a way to set factor levels based on the order they appear in the data frame?

Some data:

library(dplyr) library(tidyr) library(ggplot2) school <- c("School A", "SChool B", "School C", "School D", "School E", "School F") N <- c(25,28,12,22,30,25) var1 <- c(1,0,1,1,0,1) var2 <- c(0,0,0,1,0,1) var3 <- c(0,1,0,1,1,1) df <- tbl_df (data.frame (school, N, var1, var2, var3)) df <- arrange (df, N) %>% gather (variable, value, var1:var3) df$school <- paste0 (df$school, " (", df$N, ")") df <- select (df, school, variable, value) ggplot(df, aes(variable, school)) + geom_tile(aes(fill = value), colour = "white") + scale_fill_gradient(low = "white",high = "steelblue") 

Ultimately I want the order of schools to be:

School C (12)

School D (22)

School A (25)

School F (25)

School B (28)

School E (30)

As I want to do this for multiple plots I want to find a way to do this automatically and not have to re-set factor levels each time.

0

3 Answers 3

17

One way around this is to change your ggplot call to

ggplot(df, aes(variable, factor(school, levels = unique(school)))) + ... 

To avoid typing this every time, you can create a function

f <- function(x) factor(x, levels = unique(x)) 

and then call it by ggplot(df, aes(variable, f(school))) + ...

Note that this will place the first level of the factor at the bottom of the plot. If you want it at the top, you need to change f to function(x) factor(x, levels = rev(unique(x)))

Sign up to request clarification or add additional context in comments.

2 Comments

How does factor(school, levels = unique(school)) change the result, compared to just factor(school)? I can see it does, but how does it link the order to the value N? And factor(df$school) == factor(df$school2, levels = unique(df$school2)) returns a vector TRUE of length 18 - how can that be so and yet have different results on the ggplot?
@James factor() does by default first sort the levels (alphabetically) -> see the documentation ?factor. If we use factor(school, levels = unique(school)), we force the levels to be in the order they appear in the vector, i.e. without first being sorted. You can have a look at the following example, where we start with an (alphabetically) unsorted vector: school_rev <- c( "School C", "School B", "School A") You'll notice the different order of the levels of: f1 <- factor(school_rev) and f2 <- factor(school_rev, unique(school_rev)).
15

Add the following forcats pipe to the code just before the call to ggplot().

library(forcats) df$school <- fct_inorder(df$school) %>% fct_rev() 

fct_inorder() creates factor levels in data frame order and fct_rev() reverses them so the plot goes in the right direction.

1 Comment

This worked well for my use case of ordering a character based on another field with the "duplicate" levels that couldn't be applied to a factor. I ended up sorting my data using arrange followed by this code. Now I can get back to this complicated plot!
2

One way would be to make the school column and ordered factor:

df$school <- reorder(df$school, rep(6:1, length.out=length(k)), order=TRUE) 

enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.