Revisions to Is median fairer than mean?

changed tag; some formatting

edited Jul 25, 2020 at 11:57

150.3k
90
418
748

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review=review = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?

An experiment shows that median's error is always bigger than mean. Is median worse?

 library(foreach) #the overall population of bookjudgments n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,n, p) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res)

library(foreach) # the overall population of bookjudgments n <- 5 p <- 0.5 expected.value <- n*p peoplesbelieve <- rbinom(10^6,n, p) # 16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) # which mean square error is bigger? Mean's or Median's? meansqrterror.mean <- mean((stat[,"mean"]-expected.value)^2) meansqrterror.median <- mean((stat[,"median"]-expected.value)^2) res <- paste("mean MSE",meansqrterror.mean) res <- paste(res, "| median MSE", meansqrterror.median) print(res)

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?

An experiment shows that median's error is always bigger than mean. Is median worse?

 library(foreach) #the overall population of bookjudgments n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,n, p) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res)

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?

An experiment shows that median's error is always bigger than mean. Is median worse?

library(foreach) # the overall population of bookjudgments n <- 5 p <- 0.5 expected.value <- n*p peoplesbelieve <- rbinom(10^6,n, p) # 16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) # which mean square error is bigger? Mean's or Median's? meansqrterror.mean <- mean((stat[,"mean"]-expected.value)^2) meansqrterror.median <- mean((stat[,"median"]-expected.value)^2) res <- paste("mean MSE",meansqrterror.mean) res <- paste(res, "| median MSE", meansqrterror.median) print(res)

edited tags

Link

edited Feb 12, 2011 at 5:48

Jeromy Anglim

46.5k
25
161
270

mean median average

had an error that changes the outcome

Source Link

edited Feb 9, 2011 at 21:37

Roland Kofler

681
1
6
16

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?

An experiment shows that theremedian's error is no 'significant' difference between median andalways bigger than mean, as the audience argued. Is median worse?

 library(foreach) #the overall population of bookjudgments expected.value<n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,5n, expected.valuep) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res)

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?

An experiment shows that there is no 'significant' difference between median and mean, as the audience argued.

 library(foreach) #the overall population of bookjudgments expected.value<-0.5 peoplesbelieve <-rbinom(10^6,5, expected.value) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res)

I recently read the advice that you should generally use median not mean to eliminate outliers. Example: The following article http://www.amazon.com/Forensic-Science-Introduction-Scientific-Investigative/product-reviews/1420064932/

has 16 reviews at the moment:

review= c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 3, 2, 1, 1) summary(review) ## "ordinary" summary Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 3.750 5.000 4.062 5.000 5.000

Because they use Mean the article gets 4 stars but if they would use Median it would get 5 stars.

Isn't the median a 'fairer' judge?

An experiment shows that median's error is always bigger than mean. Is median worse?

 library(foreach) #the overall population of bookjudgments n<-5 p<-0.5 expected.value<-n*p peoplesbelieve <-rbinom(10^6,n, p) #16 ratings made for 100 books ratings <- foreach(i=1:100, .combine=rbind) %do% sample(peoplesbelieve,16) stat <- foreach(i=1:100, .combine=rbind) %do% c(mean=mean(ratings[i,]), median=median(ratings[i,])) #which mean square error is bigger? Mean's or Median's? meansqrterror.mean<-mean((stat[,"mean"]-expected.value)^2) meansqrterror.median<-mean((stat[,"median"]-expected.value)^2) res<-paste("mean MSE",meansqrterror.mean) res<-paste(res, "| median MSE", meansqrterror.median) print(res)