4

I have a dataframe with four columns that looks like this:

Beef-Low Beef-High Cereal-Low Cereal-High 90 73 107 98 76 102 95 74 90 118 97 56 64 104 80 111 86 81 98 95 51 107 74 88 72 100 74 82 90 87 67 77 95 117 89 86 78 111 58 92 

I want to make a histogram showing all four columns as different-colored bars so I tried:

> hist(wt$Beef.Low, main="Weight Gain Across Four Diets", xlab="Weight Gain", col="coral", xlim=c(0,120), ylim=c(0,4)) > hist(wt$Beef.High, col="coral3", add=T) > hist(wt$Cereal.Low, col="yellow", add=T) > hist(wt$Cereal.High, col="yellow3", add=T) 

Which produced:

Histogram

I don't like the opaque nature of the bars because they mask the shapes of the overlapping histograms. I know that I can use the code found here to manually curate the colors of my histograms; but that seems like a tedious process and I feel sure that there must be a better way.

Instead, I tried to copy what was done in this question

> bl = wt$Beef.Low > bh = wt$Beef.High > cl = wt$Cereal.Low > ch = wt$Cereal.High > wts = rbind(bl,bh,cl,ch) > wtss = as.data.frame(wts) > ggplot(wtss, aes("Weight", fill="Diet")) + geom_histogram(alpha=0.5, aes(y = "Frequency"), position="identity") 

But it doesn't work and I don't understand the ggplot commands well enough to even have a clue as to why. Please help.

2
  • 1
    for ggplot, it would be easier to melt the data first like in this answer Commented Nov 17, 2014 at 21:51
  • @rawr: I don't understand what melt does. It said it creates a "molten dataframe" but I don't know what that means Commented Nov 17, 2014 at 22:03

1 Answer 1

11

I'd be inclined to do this with facets. Otherwise, with your dataset, the result are incomprehensible.

library(reshape2) library(ggplot2) gg <- melt(wt) ggplot(gg, aes(x=value, fill=variable)) + geom_histogram(binwidth=10)+ facet_grid(variable~.) 

EDIT: Response to OP's comment.

melt(...) converts a data frame from "wide" format - data in different columns - to "long" format - all the data in one column, with a second column distinguishing between the different types of data (e.g., identifying which column the data in the row came from).

If you use melt(...) with the defaults, as above, it creates a data frame with two columns: $value contains the actual data, and $variable contains the names of the column (in the starting data frame) that this data came from. Compare wt and gg and I think you'll see what I mean.

So here we use value for the x-axis, and group the data based on variable.

Sign up to request clarification or add additional context in comments.

2 Comments

This worked beautifully and is an elegant solution to my problem. I still don't understand what melt does or what a molten dataframe is though.
@Slavatron it looks like it makes the df into a single vector to easily hist it

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.