what's the correct way to reference dataset variables inside ggplot scale functions?

Question

My question is similar to this one (I want to use arbitrary text for the labels on an axis), but instead of hardcoding values into the ggplot functions, I want to supply them by referencing a variable that exists in the source dataset.

The solution I've been using is to wrap all the ggplot code inside curly brackets, pipe the source dataset into it, and reference the variables with .$:

library(tidyverse) tribble( ~description, ~y, ~x, "apples", 3.4, 1.1, "oranges", 5.6, 2.4, "mangos", 2.3, 4.8 ) %>% {ggplot(data = ., aes(y = y, x = x)) + scale_x_continuous( breaks = .$x, labels = .$description ) + geom_point() + geom_line()}

This works but feels like a workaround. Is there a canonical/correct/cleaner/better way to do this? I've been trying to find an answer in the documentation but I'm having trouble finding the right keywords to describe this situation.

(The plot is nonsense, I know.)

Your approach is the best I know of and what I use in my day to day work. I'm interested to see if there are better approaches. Something else you might find useful is the magrittr t pipe (%T>%) if you want to do something with that data after you plot. — Ian Campbell
– Ian Campbell, Commented Jun 16, 2021 at 18:15
Yeah. Usually what I do is nest() the data and then build the plot inside mutate() + pmap() (I often need to repeat similar plots across multiple subpopulations, variables, etc., when faceting isn't desirable). This has the advantage of saving the plot for later ggsave(), etc. — lost
– lost, Commented Jun 25, 2021 at 2:37

Limey · Accepted Answer · 2021-06-15 07:05:07Z

An interesting question. +1

I don't have a perfect answer yet, but I can offer something of a simplification if you're prepared to give up the pipe between tibble creation and plotting.

d <- tribble( ~description, ~y, ~x, "apples", 3.4, 1.1, "oranges", 5.6, 2.4, "mangos", 2.3, 4.8 ) d %>% ggplot(aes(y = y, x = x)) + scale_x_continuous(breaks = d$x, labels = d$description) + geom_point() + geom_line()

If the piping from tibble creation is important, you could wrap the plot creation in a function:

myPlot <- function(data, labels, breaks) { bVar <- enquo(breaks) lVar <- enquo(labels) data %>% ggplot(aes(y = y, x = x)) + scale_x_continuous(breaks = data %>% pull(!! bVar), labels = data %>% pull(!! lVar)) + geom_point() + geom_line() } tribble( ~description, ~y, ~x, "apples", 3.4, 1.1, "oranges", 5.6, 2.4, "mangos", 2.3, 4.8 ) %>% myPlot(description, x)

This approach does at least honour the tidyverse's use of NSE and so fits naturally into magrittr's piping framework, but it would be good to avoid having to use a custom function. I haven't figured out how to refer back to the origin of the "%>% pipe" from within the ggplot "+ pipe".

You could extend the myPlot function to handle arbitrary x and y variables in the obvious manner.

Thanks for the response. Yeah, I definitely want to avoid making a bespoke function. Ends up cluttering the code and the workspace.

Collectives™ on Stack Overflow

what's the correct way to reference dataset variables inside ggplot scale functions?

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related