1
$\begingroup$

A "yes/no" question was asked to two independent groups of people.

Group A: N=20, Yes=6, No=14. Group B: "58% responded yes."

I thought one could not perform a chi-squared without knowing (here) Group B total N-- but I was told that one could do this. I was lost, so I went and found the original source of the information, and found that

Group B: N=12, Yes=7 (58%), No=5.

Using this information, I generated expected frequencies using the method (Row x Column)/Total Ns. (which in this case was 20+12=32.)

I calculated chi-squared to be 2.5.

However, the answer they gave is "(c^2 = 7.51; p=0.0058)".

I have wracked my brain trying to understand this. No, I do not think the answer is a typo. It may be I don't know what kind of chi-squared is being done. That I don't know what "c^2" is. It may be that the "Group B 58% yes" requires a completely different N than 12 (even though it is indeed the previously undisclosed N), but when I reverse engineer this N the algebra is complicated, and the N seems to be very large.

Am I doing something wrong? This is maddening.

$\endgroup$
8
  • 3
    $\begingroup$ Is this published? Can you direct us to the source? $\endgroup$ Commented Sep 4, 2024 at 15:30
  • 2
    $\begingroup$ A typo seems like the most parsimonious explanation. A c in symbol font is a $\chi$ so it's likely that they changed the font. If they wrote a chi-square and p value without giving df, that increases the probability (to me) that they made another mistake. $\endgroup$ Commented Sep 4, 2024 at 16:21
  • $\begingroup$ The chi-squared statistic for the data you give is $1.46.$ You are correct that the test is impossible to perform without information equivalent to the total of group B. One demonstration is to change the total. For instance, when the 58% value is attained with $700$ yeses and $500$ nos, the chi-squared statistic increases to $5.37.$ There are no circumstances under which the chi-squared statistic in this case could rise to the reported value of $7.51,$ because it cannot exceed $5.5$. $\endgroup$ Commented Sep 4, 2024 at 19:33
  • 1
    $\begingroup$ I thought they might have (incorrectly) treated the 58% as a fixed probability and done a $\chi^2$ goodness of fit test, but that gives a statistic of 6.4 with 58% and 6.6 with 7/12 $\endgroup$ Commented Sep 4, 2024 at 21:49
  • 1
    $\begingroup$ It is sometimes possible to perform a test when you just have a sample proportion, and even to get a legitimate rejection, albeit the p-value may be much much higher than if you had full information. Some sample proportions are simply not consistent with very low denominators, especially if they're quoted to more than a couple of figures. e.g. if the 0.58 had been 0.5837 (rounded to 4dp) rather than 0.5833, then you could rule out 7/12 (or 14/24, 21/36, etc) and the fraction with the lowest denominator consistent with that would be 27/41, and 27 Yes,14 No would be rejected at the 5% level.. $\endgroup$ Commented Sep 5, 2024 at 7:53

2 Answers 2

4
$\begingroup$
  1. It is indeed not possible to perform a $\chi^2$ test (or a Fisher exact, or even a z-test of 2 proportions -which would be the square root of the $\chi^2$ statistic) w/o knowing counts (at a minimum total counts of the groups, or counts for each outcome -yes/no).
  2. Using your numbers, I do get exactly the same $\chi^2$ statistic, or $2.4961$ to be exact). @whuber used Yates correction; when I do I get his answer; but it seems you did not use Yates correction.
  3. As whuber pointed out, no matter what sample size I use (e.g. 700,000 vs. 500,000) I can not get $\chi^2=7.51$. There is an asymptotic limit. But...
  4. The previous statistics were obtained using the pooled variance, which is perfectly fine when comparing (6,14) to (7,5), but no longer fine when comparing (6,14) to, say, (7000,5000) or other such large counts. The variances become very different, and one should use the unpooled variance formulas. In this case, with sample B at (Y=777, N=555), and still $\hat p=.5833$, I get $\chi^2=7.5149$, but p=0.00612. The statistic is close to the answer, but not the p-value?? Double checked with a $\chi^2$ table (d.f.=1) and 7.51 does not give 0.0058...

So the best that can be said is that the question is ill-posed, and the answer is no better. Sorry...

$\endgroup$
3
  • $\begingroup$ It is $\chi^2_1 = 7.\mathbf{6}1$ which has a probability of $0.0058$ of being exceeded. $\endgroup$ Commented Sep 5, 2024 at 0:45
  • $\begingroup$ I appreciate tremendously everyone's input. What if...they used "58%" back on Group A's N=20 to generate an expected (not observed) freq for Group B of N=20, yes=11.6, No=8.4. This would generate c^2=6.44. How illegitimate would that be? I am trying to understand how they could have obtained c^2=7.5. $\endgroup$ Commented Sep 5, 2024 at 1:26
  • 1
    $\begingroup$ 11 out of 20 is .55, and 12 out of 20 is .6. And 11.6 is not achievable in this kind of test. Sorry, this is a bad question, and a worse "answer". It just does not add up... $\endgroup$ Commented Sep 5, 2024 at 3:21
3
$\begingroup$

This is more an extended comment than an answer, and anyway this is a stretch as there are certainly missing or confusing information in the instructions and answer key. However, one possibility is that they might be using the Neyman's modified $\chi^2$ statistic. But again, this is a stretch.

I'm going to assume a goodness-of-fit test here, but I think it would hold for a test of independence or homogeneity given a sufficiently large number of observations in group B. The Neyman-modified $\chi^2$ statistic can be computed as*:

$$\chi_{Neyman}^2 = \sum_{i=1}^k\frac{(O_i -E_i)^2}{O_i}$$

What makes this statistic different from the Pearson's $\chi^2$ statistic is that the observed values go into the denominator, instead of the expected values. This could make it possible to reach a statistic of 7.61 and a p-value of 0.0058, as shown a few paragraphs below. However, this is making several assumptions, possibly incorrect, about the instructions and answer key you were given.

This is certainly a stretch, because:

  • There is nothing in the instructions saying explicitly they use the Neyman-modified $\chi^2$ statistic. Some notations I've seen include $\chi_{Neyman}^2$ , $NM^2$ , or $\chi_{mod}^2$ , but not c^2. In my field (sociology & political science), my experience is that it's much more common to use the Pearson's $\chi^2$ statistic (or a binomial test), and in fact I have yet to see one of my colleagues use the Neyman's alternative. So I would expect some sort of notice or warning when an alternative to Pearson's $\chi^2$ statistic is used. That being said, perhaps the Neyman $\chi^2$ is more common in your field or research community.
  • A $\chi^2$ statistic equals to $7.51$ doesn't yield a p-value of $0.0058$ for 1 degree of freedom, as others noted in comments. However a value of $7.61$ does. It hints to some mistake or typo in the instructions. If it's indeed the case, I think we can only make assumptions about all the values they meant, and these assumptions could be incorrect.
  • It is assuming that the proportion $0.58$ is rounded, but nothing in the instructions say that explicitly. If it is not rounded, then the Neyman $\chi^2$ statistic cannot reach $7.51$ or $7.61$, but only about $7.46667$. Consequently, this would not explain the values you have, making this very answer completely irrelevant.

Now, assuming the proportion $0.58$ indeed comes from a rounding operation, the Neyman $\chi^2$ can reach a value of $7.61$ if the proportion of expected "yes" equals $0.5826747$:

$$\chi_{Neyman}^2 = \frac{(6 - (20 \times 0.5826747))^2}{6} + \frac{(14 - (20 \times (1-0.5826747)))^2}{14} = 7.61$$

Conversely, the $\chi^2$ statistic and the p-value could have been rounded, so the proportion of "yes" in group B might not be exactly 0.5826747. At first glance, it seems that values close to it, at the fourth or fifth decimal place, would be plausible candidates.


*As a side note, the Neyman's version of $\chi^2$ is a special case of the Cressie-Read power divergence statistics, with $\lambda =-2$. You can find software implementations in the R package philentropy, or the scipy library in Python. Other statistical software probably offer similar features.

$\endgroup$
3
  • 1
    $\begingroup$ The fact that this produces the "correct" $\chi^2$ estimate probably reveals that the answer key is incorrect. $\endgroup$ Commented Sep 5, 2024 at 17:25
  • $\begingroup$ Look, I am going to level with you. I am sorry, but I had to alter the p and c^2 because I didn’t want people googling this. This is from a pub. study (name withheld, trust me on this). Group A was as OP. The Group B study was cited as “58% yes” but no N given; in fact, Group B was “7=Yes, 5=No.” Author: “We obtained responses from 20 [subjects]. Only 6 of them (30%) [said Y]. Given the reported incidence [of Y] of 58% (ref), the chi square test showed this difference as statistically significant.(c^2 =7.5;p=0.0062).“ I had not known Neyman’s, but maybe? Thank you everyone, immensely. $\endgroup$ Commented Sep 5, 2024 at 19:02
  • $\begingroup$ @LindaBarrett I don't think they used the Neyman $\chi^2$ then, in particular if they don't mention anything particular relative to this. Had they used a chi-squared test of independence, the Neyman $\chi^2$ would be about 2.6. Had they used a chi-squared goodness-of-fit test, the Neyman $\chi^2$ would be about 7.65. Some incorrect value (either in the reported counts or in the reported test statistic) is much more likely in my opinion than some "fancy" version of the $\chi^2$ statistic. If might be worth checking if an erratum has been published, or if other papers identified the problem. $\endgroup$ Commented Sep 5, 2024 at 19:58

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.