Skip to main content
changed tag; some formatting
Source Link
gung - Reinstate Monica
  • 150.3k
  • 90
  • 418
  • 748

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review=review = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000 

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?


An experiment shows that median's error is always bigger than mean. Is median worse?

library(foreach) #the overall population of bookjudgments n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,n, p) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res) 

library(foreach) # the overall population of bookjudgments n <- 5 p <- 0.5 expected.value <- n*p peoplesbelieve <- rbinom(10^6,n, p) # 16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) # which mean square error is bigger? Mean's or Median's? meansqrterror.mean <- mean((stat[,"mean"]-expected.value)^2) meansqrterror.median <- mean((stat[,"median"]-expected.value)^2) res <- paste("mean MSE",meansqrterror.mean) res <- paste(res, "| median MSE", meansqrterror.median) print(res) 

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000 

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?


An experiment shows that median's error is always bigger than mean. Is median worse?

library(foreach) #the overall population of bookjudgments n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,n, p) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res) 

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000 

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?


An experiment shows that median's error is always bigger than mean. Is median worse?

library(foreach) # the overall population of bookjudgments n <- 5 p <- 0.5 expected.value <- n*p peoplesbelieve <- rbinom(10^6,n, p) # 16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) # which mean square error is bigger? Mean's or Median's? meansqrterror.mean <- mean((stat[,"mean"]-expected.value)^2) meansqrterror.median <- mean((stat[,"median"]-expected.value)^2) res <- paste("mean MSE",meansqrterror.mean) res <- paste(res, "| median MSE", meansqrterror.median) print(res) 
edited tags
Link
Jeromy Anglim
  • 46.5k
  • 25
  • 161
  • 270
had an error that changes the outcome
Source Link
Roland Kofler
  • 681
  • 1
  • 6
  • 16

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000 

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?


An experiment shows that theremedian's error is no 'significant' difference between median andalways bigger than mean, as the audience argued. Is median worse?

library(foreach) #the overall population of bookjudgments expected.value<n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,5n, expected.valuep) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res) 

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000 

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?


An experiment shows that there is no 'significant' difference between median and mean, as the audience argued.

library(foreach) #the overall population of bookjudgments expected.value<-0.5 peoplesbelieve <-rbinom(10^6,5, expected.value) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res) 

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000 

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?


An experiment shows that median's error is always bigger than mean. Is median worse?

library(foreach) #the overall population of bookjudgments n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,n, p) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res) 
corrected code
Source Link
Roland Kofler
  • 681
  • 1
  • 6
  • 16
Loading
added example code to show that both are ok
Source Link
Roland Kofler
  • 681
  • 1
  • 6
  • 16
Loading
Tweeted twitter.com/#!/StackStats/status/34208782598803456
edited title
Link
user88
user88
Loading
Source Link
Roland Kofler
  • 681
  • 1
  • 6
  • 16
Loading