2

Background: Point-biserial correlation is used to measure the relationship between a binary variable, x, and a continuous variable, y.

Methods: I use the cor.test() function to calculate R and p-value:

# the two vectors x <- mtcars$am y <- mtcars$mpg #calculate point-biserial correlation cor_result <- cor.test(x, y) cor_result$p.value cor_result$estimate 

The I use ggplot2 to plot it this way, the numbers within the points denote for cylinder:

library(see) # theme_modern() library(dplyr) library(ggplot2) # plot mtcars %>% mutate(am = factor(am)) %>% mutate(id = row_number()) %>% ggplot(aes(x=id, y=mpg, color=am, label = cyl )) + geom_point(size = 8, alpha=0.5)+ geom_text(color = "black", hjust=0.5, vjust=0.5)+ scale_color_manual(values = c("steelblue", "purple"), labels = c("No", "Yes"))+ scale_x_continuous(breaks = 1:32, labels = 1:32)+ scale_y_continuous(breaks= scales::pretty_breaks())+ geom_text(aes(x = 10, y = 30, label = ifelse(am == 0, "R = 0.5998324, p = 0.0002850207", "")), color = "black", size = 4) + facet_wrap(. ~ am, nrow = 1, strip.position = "bottom") + labs(y = "mpg", color="Automatic vs Manual transmission")+ theme_modern()+ theme( aspect.ratio = 2, strip.background = element_blank(), strip.placement = "outside", legend.position = "bottom", axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), text=element_text(size=16) ) 

enter image description here

My question Would you consider this as an appropriate figure to show the correlation of am and mpg. Could you give me a hint to improve this plot.

1
  • 1
    Why do you use different colors for am values if you already make facets based on am? (or vice-versa) Commented Apr 25, 2023 at 7:48

1 Answer 1

1

I don't like your plot because the jitter in the x-direction brings no useful information (the points are ordered by row id).

You've added a third dimension to the plot (number of cylinders cyl in addition to transmission am and miles per gallon mpg). I'll ignore this third dimension because the question asks how to show the association between am and mpg.

Since am takes only two values (0 = automatic, 1 = manual), this boils down to visually comparing two groups. With larger sample sizes my default choice for this kind of comparison are overlapped histograms (example).

Here here histograms don't work well because there are only a few observations per transmission group. In this case I prefer a stacked strip chart.

attach(mtcars) stripchart(mpg ~ am, method = "stack", main = "Automatic (0) vs Manual (1) transmission" ) 

If you'd like more color in your strip charts, another option is a beeswarm plot.

This type of graph arranges the data so that each point is visible but it doesn't jitter them randomly; the exact positions are calculated so that the points don't overlap yet are packed close. (There are various algorithms to this purpose.)

In this case the difference between the strip chart and beeswarm plot is hard to notice as there as so few points to plot. For fun I've colored the points according to cyl (cylinders).

library("beeswarm") beeswarm(mpg ~ am, pch = 15, pwcol = cyl, main = "Automatic (0) vs Manual (1) transmission", horizontal = TRUE ) legend("topright", title = "cylinders", legend = c(8, 6, 4), col = c("#9E9E9E", "#CD0BBC", "#2297E6"), pch = 15 ) 

Created on 2023-04-25 with reprex v2.0.2

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your reply. I will go through as far as i im on my deKtop

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.