0
$\begingroup$

Is there a way to test whether our data from sample is similiar to population data?

Let's say that we conducted a poll about political preferences with 2% marigin of error and 95% confidence level. Can we check reliably whether we had a proper sample?

I know about chi square tests. Let's say we have a party, which got 36% of 10000000 votes (3600000) and poll had said that they ought to get 35,5% (3550000). Chi-square result is about 704, which seems far too big.

I've heard that chi square shouldn't be used for large samples, so are there any other tests I can use?

$\endgroup$
3
  • $\begingroup$ "I've heard that chi square shouldn't be used for large samples, so are there any other tests I can use?" See Are large data sets inappropriate for hypothesis testing? $\endgroup$ Commented Aug 17 at 8:16
  • $\begingroup$ Also, about "Can we check reliably whether we had a proper sample?": In your example, you're observing that the point estimate from the sample is far off 0.5% from the actual value. You should elaborate on why you consider this a problem in the first place, and why you think you should run a test at all, because it doesn't seem really clear. $\endgroup$ Commented Aug 17 at 13:12
  • $\begingroup$ Moreover, it seems there might be an error in the way you computed the chisquare statistic (704). I've been unable to replicate it with the few information you gave. How did you calculate it, exactly? // That being said, the question is quite old now, and I suspect we won't get the clarifications necessary to answer it, so I'd suggest to simply close it. $\endgroup$ Commented Aug 18 at 9:48

1 Answer 1

0
$\begingroup$

Traditional statistical tests like chi-square-tests and binomial test are meant to investigate point hypotheses. If you test for x = 35.5%, you test for x = 35.50000000000...% and with a large sample (such as your $10^7$) things get quite precise and the slightest difference will yield a $p$ very close to zero.

First, you should not take the point estimation of 35.5% from your poll, but the 95% or 99% confidence interval from your poll and compare that to the 36%. This is still comparing to the "same" population, not "similar" populations. You will have to define what "similar" should actually mean.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.