0

I am using R to create size frequency histograms for diseased and healthy individuals with fitted normal distribution lines. I have 2 issues that I'm seeking advice on.

  1. How do I create a histogram from aggregated data? The example table below has the summarized number of diseased and healthy individuals within each size.

dput(data)

'structure(list(Size = c(25L, 28L, 31L, 45L, 60L), diseased = c(0L, 22L, 10L, 5L, 2L), healthy = c(55L, 40L, 15L, 7L, 2L)), .Names = c("Size", "diseased", "healthy"), class = "data.frame", row.names = c(NA, -5L))' 

2.How do I overlay both histograms into 1 figure with fitted normal distribution lines.

I have tried the following code for aggregated data ggplot(data,aes(x=Size,y=diseased))+geom_bar(stat='identity'), which works well, but I can't figure out how to add the histogram for the healthy individuals.

I have also tried using the following text to revert the summarized data (called "data") to the original raw format: raw <- data[rep(1:data, times=data$diseased), "Size", drop=FALSE]

I get the following error message: Error in rep(1:data, times=data$diseased) : invalid 'times' argument. From previous comments, it appears that the rep function can't handle "0"

6
  • 1
    Does this post help? stackoverflow.com/questions/3541713/… or maybe better this: stackoverflow.com/questions/6957549/… Commented Feb 25, 2013 at 17:15
  • Can you make your data reproducible by showing the result of dput(data) or dput(head(data))? Also, how can your columns have different numbers of rows? Commented Feb 25, 2013 at 17:16
  • @DavidRobinson, I believe it is an exact duplicate of the post linked by buildakicker. Commented Feb 25, 2013 at 17:21
  • @Arun: It's possible, though I'm a bit unsure without more clarification from the OP. Commented Feb 25, 2013 at 17:23
  • yes, possibly. number of values in Size don't match that of disease and healthy. And the OP has given a data.frame as input in ggplot but taken the trouble to provide separate data... Commented Feb 25, 2013 at 17:25

1 Answer 1

0

So, I'm in a hurry and I kind of hacked together the normal curve, but you can use this to plot two "histogram-style" plots on top of each other.

It would be easier to get the curves if we had the full data set and not just summaries, of course. I kind of fudged them together, but I think it's enough to get the general idea here.

I'm not totally clear on why you would want to do this, but you can...

library(SDMTools) # Use this to get weighted means testdata <- structure(list(Size=c(25L, 28L, 31L, 45L, 60L), diseased=c(0L, 22L, 10L, 5L, 2L), healthy=c(55L, 40L, 15L, 7L, 2L)), .Names = c("Size", "diseased", "healthy"), class = "data.frame", row.names = c(NA, -5L)) barplot(testdata$healthy, names.arg=paste(" ",testdata$Size), col="light blue", border="blue", xlim=c(0,6), ylim=c(0,70), width=0.5, space=1) par(new=TRUE) barplot(testdata$diseased, col="pink", border="red", xlim=c(0,6), ylim=c(0,70), width=0.5, space=c(2,1,1,1,1)) healthy_mean <- wt.mean(x=testdata$healthy,wt=testdata$Size) healthy_sd <- wt.sd(x=testdata$healthy,wt=testdata$Size) diseased_mean <- wt.mean(x=testdata$diseased,wt=testdata$Size) diseased_sd <- wt.sd(x=testdata$diseased,wt=testdata$Size) yfit_healthy <- as.data.frame(dnorm(0:max(testdata$healthy), mean=healthy_mean,sd=healthy_sd)) names(yfit_healthy) <- "y" yfit_diseased <- as.data.frame(dnorm(0:max(testdata$diseased), mean=diseased_mean,sd=diseased_sd)) names(yfit_diseased) <- "y" yfit_healthy$x <- seq(0,6,length.out=length(yfit_healthy$y)) yfit_diseased$x <- seq(0,6,length.out=length(yfit_diseased$y)) lines(yfit_healthy$x, (max(testdata$healthy)*yfit_healthy$y)/max(yfit_healthy$y), col="blue",lwd=2) lines(yfit_diseased$x, (max(testdata$diseased)*yfit_diseased$y)/max(yfit_diseased$y), col="red",lwd=2) 

This code gets me:

Not Exactly My Finest Graph

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.