I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/
has 16 reviews at the moment:
review=review = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000 Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.
Isn't the median a 'fairer' judge?
An experiment shows that median's error is always bigger than mean. Is median worse?
library(foreach) #the overall population of bookjudgments n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,n, p) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res) library(foreach) # the overall population of bookjudgments n <- 5 p <- 0.5 expected.value <- n*p peoplesbelieve <- rbinom(10^6,n, p) # 16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) # which mean square error is bigger? Mean's or Median's? meansqrterror.mean <- mean((stat[,"mean"]-expected.value)^2) meansqrterror.median <- mean((stat[,"median"]-expected.value)^2) res <- paste("mean MSE",meansqrterror.mean) res <- paste(res, "| median MSE", meansqrterror.median) print(res)