76

I have the following problem: I would like to visualize a discrete and a continuous variable on a boxplot in which the latter has a few extreme high values. This makes the boxplot meaningless (the points and even the "body" of the chart is too small), that is why I would like to show this on a log10 scale. I am aware that I could leave out the extreme values from the visualization, but I am not intended to.

Let's see a simple example with diamonds data:

m <- ggplot(diamonds, aes(y = price, x = color)) 

alt text

The problem is not serious here, but I hope you could imagine why I would like to see the values at a log10 scale. Let's try it:

m + geom_boxplot() + coord_trans(y = "log10") 

alt text

As you can see the y axis is log10 scaled and looks fine but there is a problem with the x axis, which makes the plot very strange.

The problem do not occur with scale_log, but this is not an option for me, as I cannot use a custom formatter this way. E.g.:

m + geom_boxplot() + scale_y_log10() 

alt text

My question: does anyone know a solution to plot the boxplot with log10 scale on y axis which labels could be freely formatted with a formatter function like in this thread?


Editing the question to help answerers based on answers and comments:

What I am really after: one log10 transformed axis (y) with not scientific labels. I would like to label it like dollar (formatter=dollar) or any custom format.

If I try @hadley's suggestion I get the following warnings:

> m + geom_boxplot() + scale_y_log10(formatter=dollar) Warning messages: 1: In max(x) : no non-missing arguments to max; returning -Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In max(x) : no non-missing arguments to max; returning -Inf 

With an unchanged y axis labels:

alt text

6
  • 3
    That's a bug in coord_trans - but you can specify custom labels to scale_y_log10... Commented Jan 15, 2011 at 14:34
  • Thank you @hadley, I should miss something but eg. + scale_y_continous(formatter=dollar) just do not work. I cannot see the result of any formatter given and I also get three In max(x) : no non-missing arguments to max; returning -Inf warnings messages. Commented Jan 15, 2011 at 16:33
  • @daroxzig: The examples I have seen for the formatter argument have all involved quoted names, so perhaps formatter="dollar"? Commented Jan 15, 2011 at 16:56
  • @DWin: I tried with quotes also, but the result is exactly the same. Commented Jan 15, 2011 at 17:00
  • 2
    Formatter doesn't work (yet) but you can still set the labels manually... Commented Jan 15, 2011 at 17:42

4 Answers 4

60

The simplest is to just give the 'trans' (formerly 'formatter') argument of either the scale_x_continuous or the scale_y_continuous the name of the desired log function:

library(ggplot2) # which formerly required pkg:plyr m + geom_boxplot() + scale_y_continuous(trans='log10') 

EDIT: Or if you don't like that, then either of these appears to give different but useful results:

m <- ggplot(diamonds, aes(y = price, x = color), log="y") m + geom_boxplot() m <- ggplot(diamonds, aes(y = price, x = color), log10="y") m + geom_boxplot() 

EDIT2 & 3: Further experiments (after discarding the one that attempted successfully to put "$" signs in front of logged values):

# Need a function that accepts an x argument # wrap desired formatting around numeric result fmtExpLg10 <- function(x) paste(plyr::round_any(10^x/1000, 0.01) , "K $", sep="") ggplot(diamonds, aes(color, log10(price))) + geom_boxplot() + scale_y_continuous("Price, log10-scaling", trans = fmtExpLg10) 

alt text

Note added mid 2017 in comment about package syntax change:

scale_y_continuous(formatter = 'log10') is now scale_y_continuous(trans = 'log10') (ggplot2 v2.2.1)

Sign up to request clarification or add additional context in comments.

13 Comments

Thank you @DWin, but this is not the one I was looking for. This way the y axis' labels will be converted to log10, but the axis will not be transformed. What I would like to get: one transformed axis (y) with not scientific labels.
@daroczig: The "successful experiment" with "dollarizing" used fmtLg10dlr <- function(x) dollar(log10(x)); m + geom_boxplot() + scale_y_continuous(formatter='fmtLg10dlr') , but it just looks "wrong" to me.
I suspect you're trying to do something like ggplot(diamonds, aes(color, log10(price))) + geom_boxplot() + scale_y_continuous(formatter = function(x) format(10 ^ x)) - you need to transform the data and back-transform the labels.
Another similar solution, using sprintf: fmtdol<- function(x)sprintf('$%sK',x/1000)
scale_y_continuous(formatter = 'log10') is now scale_y_continuous(trans = 'log10') (ggplot2 v2.2.1)
|
21

I had a similar problem and this scale worked for me like a charm:

breaks = 10**(1:10) scale_y_log10(breaks = breaks, labels = comma(breaks)) 

as you want the intermediate levels, too (10^3.5), you need to tweak the formatting:

breaks = 10**(1:10 * 0.5) m <- ggplot(diamonds, aes(y = price, x = color)) + geom_boxplot() m + scale_y_log10(breaks = breaks, labels = comma(breaks, digits = 1)) 

After executing::

enter image description here

2 Comments

I just noticed this very similar problem has the same solution.
thank you for pointing my attention to this alternate solution which would be complete with specifying the simple dollar formatter or by writing a custom one: + scale_y_log10(breaks = breaks, labels = dollar(breaks))
11

Another solution using scale_y_log10 with trans_breaks, trans_format and annotation_logticks()

library(ggplot2) m <- ggplot(diamonds, aes(y = price, x = color)) m + geom_boxplot() + scale_y_log10( breaks = scales::trans_breaks("log10", function(x) 10^x), labels = scales::trans_format("log10", scales::math_format(10^.x)) ) + theme_bw() + annotation_logticks(sides = 'lr') + theme(panel.grid.minor = element_blank()) 

2 Comments

Very elegant output
In 2020, this is the first answer that copies, pastes n' works. (Yes, I tried them all.) Thanks!
2

I think I got it at last by doing some manual transformations with the data before visualization:

d <- diamonds # computing logarithm of prices d$price <- log10(d$price) 

And work out a formatter to later compute 'back' the logarithmic data:

formatBack <- function(x) 10^x # or with special formatter (here: "dollar") formatBack <- function(x) paste(round(10^x, 2), "$", sep=' ') 

And draw the plot with given formatter:

m <- ggplot(d, aes(y = price, x = color)) m + geom_boxplot() + scale_y_continuous(formatter='formatBack') 

alt text

Sorry to the community to bother you with a question I could have solved before! The funny part is: I was working hard to make this plot work a month ago but did not succeed. After asking here, I got it.

Anyway, thanks to @DWin for motivation!

1 Comment

I think formatter now changed to labels => stackoverflow.com/questions/10146109/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.