Likert scale and sample size

Question

I want to study the satisfaction of a mother receiving paper or website information. I will have at least 5 questions, with Likert scale (5 points) each time.

For the sample size:

Should I calculate using the sample size formula for a survey (sampling $x$ women out of $y$)?
Should I calculate the sample size according to the difference I am expecting between the Likert scale for the most important question?
Should i adjust for the number of questions?

May I understand correctly that you are simply testing the difference in a Likert scale response DV based off the single grouping variable, the type of information they receive? Or is the design different? And do you mean that the DV is actually a composite composed of multiple Likert-scale items? — Shawn Hemelstrand
– Shawn Hemelstrand, Commented Jun 7, 2024 at 15:31
you are correct, the type of info they get is the independent variable (paper or web), and the DV are the 4 to 5 questions likert scale evaluating their satisfaction, comprehension, etc... this is not a composite outcome. Given the different likert scales, does it matter of some have different scale (1, 2, 3, 4 OR 1, 2, 3, 4, 5) Thank you !! — F cachat
– F cachat, Commented Jun 7, 2024 at 15:45

Shawn Hemelstrand · Accepted Answer · 2024-06-07 16:20:49Z

For sample size, you usually have four components, where knowing three will allow you to derive the fourth:

Effect size (such as Cohen's $d$ aka the mean difference divided by the standard deviation).
Sample size per group.
Alpha (usually $a = .05$ by convention, but it can technically be anything).
Power (also by convention $.80$ but doesn't have to be).

So if you are just wanting to estimate the sample size per group you want for a simple group comparison, then you need to adjust the other three here in order to determine how many people you need. Here we would normally leave the power and alpha cutoffs where they are, and the sample size is what we want, so our only real toggle here is the effect size, which you can vary based off your assumptions of mean group differences and their standard deviation. Usually this estimate should either be based off what is expected from the past literature, or it should be based upon some arbitrary cutoffs (a good recent review of what "small", "medium", and "strong" effects are approximated as can be found here).

For a two-sample test, this should be easy to calculate with a basic power calculator, (see Chapter 2 of Cohen's canonical test on power referenced below for the section on how t-test power is calculated). While Likert scale items are notorious for being non-normally distributed, their composites can be more normal, and t-tests tend to be robust to non-normality with large enough samples, particularly the Welch $t$-test, which is robust to both heterogeneity of variance and normality issues (Delacre et al., 2017). Some like Brysbaert recommend $n > 100$ per group minimum, but that is based off his assumption of weak effects in psychology (Brysbaert, 2019, p.8). So here the effect size you calculate will have a big determination on how many people you need.

Lastly, you do not need to adjust for the number of questions since it is a single score from a composite, nor do you have to be concerned about adjusting for specific questions. However, having more items would be helpful, particularly if we want to ensure reliability of the items. However, I would keep them all on the same scale (only 1-5, 1-7, etc.) rather than different ranges for each.

References

Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2(1), 1–38. https://doi.org/10.5334/joc.72
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology, 30(1), 92. https://doi.org/10.5334/irsp.82

$\begingroup$ Thank you so much $\endgroup$

F cachat
– F cachat

2024-06-07 17:44:25 +00:00
Commented Jun 7, 2024 at 17:44 — F cachat
– F cachat, Commented Jun 7, 2024 at 17:44

jginestet · Accepted Answer · 2024-06-08 01:38:03Z

This answer may start, or rather re-open an old debate, but so be it; it may be needed.
I would strongly advise you against the use of a Likert scale. A Likert scale is an ordinal scale: 3 is more than 2, which is more than 1, etc. And that is about all one can say; the difference between 3 and 2 is not the same as the difference between 2 and 1; the sum of 1 and 4 is not the same as the sum of 2 and 3. One fundamentally can not perform any arithmetic on ordinal scales, just comparisons. Identifying the different levels on a Likert scale as 1-5 fools us into thinking we can do math with them: but we could just as well have identified as A, B, C, etc., or 2, 4, 8,,, (powers of 2), or happy face to sad face etc... The only mathematical sound tests one can perform on Likert scales (i.e. on ordinal scales), are non-parametric tests such as the Sign test, the Mann-Whitnet U test, etc. which only require comparisons. See e.g. here on CV.
Some authors claim than we you aggregate several Likert answers (i.e. sum/average them), this becomes an interval scale, but I have not found any substantive proof/demonstration of this. This is even worse in your case, when apparently your questions do not all use the same scale.
Now, if instead you used a VAS (Visual Analogue Scale), and had subjects answer all questions on a single scale (so that all answers were comparable relative to each others), you could make a case that this is now an interval scale (and in fact a ratio scale).
So my advice is to either switch to a VAS, and then you can use a whole range of statistical tests, but the 2 sample Welch test would be fine, and there are plenty of tools to give you a sample size. Or stick with Likert, but then use Mann-Whitney U, which will definitively tell you which group was more satisfied, but do so question by question (and do not aggregate the answers). The tools for sample size are more limited, but e.g. you can try this one.

Stack Exchange Network

Likert scale and sample size

2 Answers 2

References

Linked

Hot Network Questions

Likert scale and sample size

2 Answers 2

References

Linked

Related

Hot Network Questions