13
$\begingroup$

Perhaps a trivial question but I have been receiving some conflicting information which is lending to some confusion for me.

Firstly, there seems to be some conflicting information about the distinction between parametric and non-parametric tests. Namely,

  1. Some sources suggest parametric tests make assumptions about the parameters of the population distribution from which the sample is drawn (a, b)
  2. Other sources suggest that parametric tests are only suitable for normally distributed data (a, b, c, d, e)

I personally interpret the first claim to be true, and not the latter but I was hoping for clarity on this.

If it is the first point that is true, this leads to my second question. I have seen multiple sources say that:

  • A Chi Squared Test is a non parametric test (a, b, c)

But I was wondering why this is the case that a Chi Squared Test is non-parametric, if like parametric test such as Z-test, t-test, ANOVA, it assumes the population distribution follows a particular distribution, doesn’t it? I thought a Chi Squared test makes assumptions that the test statistic is drawn from a Chi Squared distribution, and therefore because it’s making a distributional assumption, it’s parametric. I must be misunderstanding something though. Can someone help clarify?

$\endgroup$
5
  • 3
    $\begingroup$ Can you elaborate on the reasons why you want to know if it's parametric or not? Are you wondering under what conditions you can use a chi-square test? $\endgroup$ Commented Feb 1 at 16:57
  • $\begingroup$ @J-J-J Honestly, I was just interested for conceptual understanding because I wanted to have alignment and clarity in my understanding of parametric and non-parametric distinctions $\endgroup$ Commented Feb 3 at 13:04
  • $\begingroup$ You need to be more careful about (i) the information sources you consult and (ii) how you read them. For instance, (a) and (e) are awful--and that's predictable, given their (blatant marketing) nature and failure to follow basic academic protocols like citing authorities. Site (c) is okay, but you misread it: it provides suitable caveats and does not assert parametric tests are "only" suitable for normal data. Ditto for (d), although it's less careful than (c). (b) is characteristic of software help files, often written by people who don't really understand what they're doing. $\endgroup$ Commented Feb 3 at 13:29
  • $\begingroup$ @whuber I apologize if the sources were inappropriate. However, referring to parameter tests as requiring an assumption of normality is something I have heard across academics, professionals and others, and in other sources. I was just trying to pull together some representative examples. However, I take your point and acknowledge there is a lot of bad information out there. I just wanted to highlight there is a conflict in how parametric tests are communicated. If you Google parametric test assumptions, many examples will incorrectly refer to it this way which is all I was trying to highlight $\endgroup$ Commented Feb 3 at 13:31
  • $\begingroup$ Right--but we are very, very familiar with the awful information about statistics out there. If you want more examples, search our site for "SPSS" for instance ;-). My point is "are communicated" is too vague and needs to be refined. Although incorrect, a helpful high-level bit of wisdom would be "don't ever believe anything said on the internet about 'parametric' anything: consult a well-known textbook, preferably one not oriented towards a software platform." $\endgroup$ Commented Feb 3 at 13:46

6 Answers 6

16
$\begingroup$

The second claim is nonsensical. Even OLS regression does not require normally distributed data. It makes assumptions about the errors and even those are only needed for some of the inferences we can draw from the results. And there are parametric statistics that assume some other distribution (e.g. some forms of survival analysis assume a Weibull or some other distribution and avoiding this assumption was one impetus for Cox survival analysis, which is usually called "semi-parametric"!).

There is a lot of debate about the meaning of "parametric" that makes more sense than this, and debating definitions can be silly. But we should note that all inferential statistics (and some descriptive stats) makes assumptions. Maybe the most common is that the errors are independent (although this is common, it's not universal). Another common one is that a variable is continuous (again, obviously, this is not universal, but it's often overlooked).

I think the first paragraph of the Wikipedia entry is actually pretty good:

Parametric statistics is a branch of statistics which leverages models based on a fixed (finite) set of parameters.[1] Conversely nonparametric statistics does not assume explicit (finite-parametric) mathematical forms for distributions when modeling data. However, it may make some assumptions about that distribution, such as continuity or symmetry, or even an explicit mathematical shape but have a model for a distributional parameter that is not itself finite-parametric.

However, as is often the case, it may not be all that intuitive what it means. What exactly is a "fixed (finite) set of parameters"? The article cites a book that I do not have: Modes of Parametric Statistical Inference by Seymour Geisser.

Does the chi-square test meet this criterion?

Pearson, at least, argued that, if n is reasonably large and the null is true, then the distribution will be distributed in a chi-square. That is, you don't need to assume it, it's a consequence of the null.

For practical purposes, though, I think a more useful quality of a test is not "parametric" vs. "not" but what assumptions it makes, and what the consequences of violating those assumptions are.

$\endgroup$
4
  • 5
    $\begingroup$ I was about to answer with a quote of the same paragraph from Wikipedia. I took the liberty of adding a link for the source. (+1) $\endgroup$ Commented Feb 1 at 17:00
  • 1
    $\begingroup$ Thanks for the response! One element of what you described I was interested in clarity on is the normality of residuals assumption of OLS reg. I have always thought this is the case and that it is a misunderstanding when people suggest you need normality of data. However, I've found that an assumption of t-test pertains to normality of test statistic or sample mean. However, OLS and t-test can be expressed as equivalent. So how is it the case that OLS normality assumption is for residuals whereas t-test assumption is for test statistic/sample mean? $\endgroup$ Commented Feb 3 at 13:26
  • 1
    $\begingroup$ @JElder I think that's because when there is only one IV and it has only two levels (as a t-test) then normality of the residuals implies normality of the data. OLS and t-test are not equivalent -- OLS is much more general. $\endgroup$ Commented Feb 3 at 15:25
  • $\begingroup$ Oh yeah, I know OLS and t-test are not equivalent, t-test is just a special case or expression of OLS. That was just a contrast between the normality assumptions I was trying to square. But your description makes sense-- It's one IV with two level factor, so normality of residuals may be equivalent to normality of test statistic/sample mean in t-test context (e.g., OLS when one IV with two level factor). Thanks! $\endgroup$ Commented Feb 3 at 15:29
9
$\begingroup$

The fundamental problem is that there is no good, unambiguous, agreed upon definition of parametric vs. non-parametric tests.

I will quote wiki's entry for non-parametric tests;

The term "nonparametric statistics" has been defined imprecisely in the following two ways, among others: The first meaning of nonparametric involves techniques that do not rely on data belonging to any particular parametric family of probability distributions.
...
The second meaning of non-parametric involves techniques that do not assume that the structure of a model is fixed.

The above first meaning relates to distribution-free methods, i.e. where the underlying data is not assumed to come from a specific parametric distribution (normal, Poisson, binomial, etc.).
And I have to admit that the second meaning completely escapes me.

I will also quote quote from Bradley's classic Distribution-Free Statistical Tests (1968, p. 15–16):

The terms nonparametric and distribution-free are not synonymous, and neither term provides an entirely satisfactory description of the class of statistics to which they are intended to refer.…Roughly speaking, a nonparametric test is one which makes no hypothesis about the value of a parameter in a statistical density function, whereas a distribution-free test is one which makes no assumptions about the precise form of the sampled population. The definitions are not mutually exclusive, and a test can be both distribution-free and parametric.…

The next quote is from The Handbook of Nonparametric Statistics from 1962 (p. 2):

A precise and universally acceptable definition of the term ‘nonparametric’ is not presently available. The viewpoint adopted in this handbook is that a statistical procedure is of a nonparametric type if it has properties which are satisfied to a reasonable approximation when some assumptions that are at least of a moderately general nature hold.

As you can see, this definition is so vague (reasonable, moderatley) as to be completely unhelpful.

You can even find more "unusual" definitions, such as this one from the Handbook of Parametric and Nonparametric Statistical Procedures (Sheskin, 2000):

The distinction employed in this book for categorizing a procedure as a parametric versus a nonparametric test is primarily based on the level of measurement represented by the data that are being analyzed. As a general rule, inferential statistical tests that evaluate categorical/ nominal data and ordinal/rank-order data are categorized as nonparametric tests, while those tests that evaluate interval data or ratio data are categorized as parametric tests. Although the appropriateness of employing level of measurement as a criterion in this context has been debated, its usage provides a reasonably simple and straightforward schema for categorization that facilitates the decision-making process for selecting an appropriate statistical test.

From the above, it should become clear that it is difficult to specifically define the term nonparametric (and therefore, the term parametric).

Let me add one final definition, which is incorrect, but which is nevertheless often encountered, often in respected sources, namelly the definition that parametric tests are those which rely on assumptions of normality of the data. It can be found here, or here, e.g. There are plenty of parametric methods which assume the data belongs to a Poisson, Binomial, Weibull, etc. distribution. So if you encounter this definition ("does not require/assume normality of the data"), you can just stop there and dismiss anything from this source about the topic.

As alluded by the wiki entry, a very common definition is the distribution free definition, i.e. methods which do not make assumptions about the frequency distribution of the variables to be evaluated (underlying data is not assumed -or known- to come from any parametric distribution). This is the one I use, knowing full well that it is not universally accepted. Note that this "assumption" can be a certitude (e.g. for binomial assumption; the data is binomially distributed).
Note that this definition may lead to "odd" classifications; e.g. a $\chi^2$ test for a contingency table is non-parametric (no assumption about the data, only about the distribution of the statistic of interest), while a Fisher-exact test for the exact same contingency table would be parametric (assumes -in fact we know- that the data is binomial). OLS regression would also be non-parametric (in fact, computing the OLS regression is not even statistical; it is purely mathematical (solving a set of linear (in the coefficients) equations, derived from solving partial differential equations). It becomes statistical when making e.g. inferences about the coefficients - but relies on the normality of the residuals, not of the data...).

Now, your definition (basically, any distributional assumption, on the data, or the statistic (and maybe the residuals?)) would be too broad; for any hypothesis test, we need to assume/know, asymptotically or not, the distribution of the statistic (or else we can not compute a p-value), so that would make all hypothesis tests parametric.

TLDR; there is indeed conflicting definitions of paramnetric vs. non-parametric. Hence yes, many (but not all) would describe the $\chi^2$ test as non-parametric (because it makes distributional assumptions about the statistic, not the data).

$\endgroup$
8
$\begingroup$

In some contexts, where the $\chi^2$ is intended as the exact distribution of the null hypothesis, then $\chi^2$ tests (plural) can be interpreted as parametric tests.

However, in other contexts where the $\chi^2$ is intended as the asymptotic (i.e. "limiting") distribution of the null hypothesis, then $\chi^2$ tests (again, plural) can be interpreted as "non-parametric" tests. (In the sense that a "non-parametric family" of null hypotheses will asymptotically correspond to a $\chi^2$ distribution.)

So unfortunately I'd have to say that the answer depends on the context, i.e. which "$\chi^2$ test" we are talking about. Questions to ask to clarify which "$\chi^2$ test" we are talking about include but are not limited to:

  • What is the test statistic? (I.e. function of the random observations?)
  • Is the distribution given for the test statistic "under the null hypothesis" (i.e. when the null hypothesis is assumed to be true) the exact (i.e. actual) distribution of the test statistic, or an asymptotic (i.e. limiting) distribution of the test statistic?
  • What is the exact null hypothesis?

The confusion of what the answer is for any given, individual "$\chi^2$ test" is compounded by there being many distinct statistical tests related to the $\chi^2$ distribution, e.g. Stats.SE and Wikipedia.

So basically the question appears to be founded on at least two false premises:

  1. There is "the" $\chi^2$ test. (There are actually many "$\chi^2$ tests" and which one we are talking about matters for the purposes of discussion.)
  2. The conditions for being widely considered "non-parametric" are the same for both asymptotic and exact tests. (Often it's possible to converge asymptotically to a parametric family of distributions even starting from a non-parametric family of exact distributions.)

Even the famous "the" Pearson's $\chi^2$ test in practice corresponds to multiple distinct hypothesis tests (same test statistic perhaps, but different null hypotheses). So without being much more specific about exactly which "$\chi^2$ test" we are talking about, it unfortunately seems hopeless to say whether it is parametric or non-parametric.

I am willing to bet however that the "non-parametric" instances all refer to asymptotic tests, and not exact ones, although don't quote me on that at the risk of over-generalizing.

Random links from Bing search results for "asymptotic test chi squared":

$\endgroup$
6
$\begingroup$

it assumes the population distribution follows a particular distribution, doesn’t it?

The chi-squared statistic follows a chi-squared distribution, but that is not the same as making distributional assumptions about the data.

For example we can compare observations of samples from two populations in a data table and get something like below.

 | 1 2 3 4 5 6 7 8 9 ---------+---------------------------- sample A | 5 16 11 20 8 7 5 19 9 sample B | 8 5 7 3 7 11 10 3 6 

and a chi-squared test can compare those two samples, and answer the question whether they are different, without assuming anything about the distributions of the populations.

What would be like using a parametric test is for instance if we would assume a normal distribution for the populations, and then the samples with both the same mean 5 and variance 6.3 would lead to the conclusion that the populations from which the samples are taken have the same distribution. However, already with our naked eye we can see that the distributions are very different as they peak in different places.

So yes, the chi-squared test itself follows (approximately) a chi-squared distribution, but it is not (necessarily*) imposing some parametric distribution to the underlying data.


* The chi-squared test is a very broad concept. It can be used in conjunction with assumptions about the distribution of the population. For example some fitting or filtering could be performed before computing a chi-squared statistic. Or a goodness of fit test explicitly compares the population with a specific distribution.

$\endgroup$
0
5
$\begingroup$

Existing answers correctly note that (a) the term "nonparametric" is ambiguous and controversial and (b) there is more than one test that can be referred to as $\chi^2$-test, and the classification may depend on which one you mean.

Regarding the $\chi^2$-test for contingency tables and also the $\chi^2$ goodness-of-fit test for a given multinomial distribution, which may actually have been derived from an underlying continuous distribution, a relevant aspect is that the probabilities of all table entries define the set of possible distributions exhaustively, assuming that the events that are counted in the table are i.i.d. (which is a standard assumption acceptable for nonparametric methods). In the following I will mean only these two tests by "$\chi^2$-tests".

For this reason I would call these tests "nonparametric", because there is no parametric model that constrains the situation.

However there are some confusing aspects about this (confusion should be expected as highlighted already in other answers).

  1. The $\chi^2$-distribution applies (asymptotically) to the test statistic assuming the null hypothesis, which is in fact restrictive (and may even be a restrictive parametric model in case of the goodness-of-fit test). This however wouldn't make the test "count" as parametric, as in testing it is generally understood that the underlying data generating process does not have to belong to the null hypothesis; also the alternative is possible, and the alternative is fully general here (assuming i.i.d.), i.e., the tests test the null against anything else that could happen (goodness-of-fit tests are generally classified as nonparametric despite them testing fit of data to a parametric distribution).

  2. The multinomial distribution can in fact be called a parametric distribution as it is a distribution and is defined by its parameters. Regarding the $\chi^2$-tests, both null hypothesis and alternative are characterised by multinomial distributions, therefore the model behind these tests could indeed be called "parametric". The distinct thing here is that the multinomial distribution is fully general, i.e., it captures all possibilities (assuming i.i.d.). So although it is a parametric distribution, it is not restrictive as pretty much all other parametric distributions are (unless you allow infinite dimensional "parameters" that are so complicated that they can fully characterise all possible distributions also in other situations, which a weird mathematician could do, but a statistician wouldn't like that). This adds a specific ambiguity regarding the classification of the $\chi^2$-tests that is not present for most other tests.

$\endgroup$
4
$\begingroup$

Just because you can generate a statistic from the data that has a parametric distribution, doesn't mean it's a parametric test. For instance, suppose you have two population, Population A and Population B. If you randomly select an element from A and an element from B, there is someone probability $p$ that the element from A will be "larger" (for whatever ordering you're using) than the element from B.

Thus, if you have a paired test where you just look at which element of each pair is larger, and discard all other information, you are guaranteed to have a binomial distribution with parameter $p$, regardless of what the original distributions are.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.