For sample size, you usually have four components, where knowing three will allow you to derive the fourth:
- Effect size (such as Cohen's $d$ aka the mean difference divided by the standard deviation).
- Sample size per group.
- Alpha (usually $a = .05$ by convention, but it can technically be anything).
- Power (also by convention $.80$ but doesn't have to be).
So if you are just wanting to estimate the sample size per group you want for a simple group comparison, then you need to adjust the other three here in order to determine how many people you need. Here we would normally leave the power and alpha cutoffs where they are, and the sample size is what we want, so our only real toggle here is the effect size, which you can vary based off your assumptions of mean group differences and their standard deviation. Usually this estimate should either be based off what is expected from the past literature, or it should be based upon some arbitrary cutoffs (a good recent review of what "small", "medium", and "strong" effects are approximated as can be found here).
For a two-sample test, this should be easy to calculate with a basic power calculator, (see Chapter 2 of Cohen's canonical test on power referenced below for the section on how t-test power is calculated). While Likert scale items are notorious for being non-normally distributed, their composites can be more normal, and t-tests tend to be robust to non-normality with large enough samples, particularly the Welch $t$-test, which is robust to both heterogeneity of variance and normality issues (Delacre et al., 2017). Some like Brysbaert recommend $n > 100$ per group minimum, but that is based off his assumption of weak effects in psychology (Brysbaert, 2019, p.8). So here the effect size you calculate will have a big determination on how many people you need.
Lastly, you do not need to adjust for the number of questions since it is a single score from a composite, nor do you have to be concerned about adjusting for specific questions. However, having more items would be helpful, particularly if we want to ensure reliability of the items. However, I would keep them all on the same scale (only 1-5, 1-7, etc.) rather than different ranges for each.
References
- Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2(1), 1–38. https://doi.org/10.5334/joc.72
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
- Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology, 30(1), 92. https://doi.org/10.5334/irsp.82