Skip to main content
replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/
Source Link

Fisher thought that the p-value could be interpreted as a continuous measure of evidence against the null hypothesis. There is no particular fixed value at which the results become 'significant'. The way I usually try to get this across to people is to point out that, for all intents and purposes, p=.049 and p=.051 constitute an identical amount of evidence against the null hypothesis (cf. @Henrik's answer herehere).

On the other hand, Neyman & Pearson thought you could use the p-value as part of a formalized decision making process. At the end of your investigation, you have to either reject the null hypothesis, or fail to reject the null hypothesis. In addition, the null hypothesis could be either true or not true. Thus, there are four theoretical possibilities (although in any given situation, there are just two): you could make a correct decision (fail to reject a true--or reject a false--null hypothesis), or you could make a type I or type II error (by rejecting a true null, or failing to reject a false null hypothesis, respectively). (Note that the p-value is not the same thing as the type I error rate, which I discuss herehere.) The p-value allows the process of deciding whether or not to reject the null hypothesis to be formalized. Within the Neyman-Pearson framework, the process would work like this: there is a null hypothesis that people will believe by default in the absence of sufficient evidence to the contrary, and an alternative hypothesis that you believe may be true instead. There are some long-run error rates that you will be willing to live with (note that there is no reason these have to be 5% and 20%). Given these things, you design your study to differentiate between those two hypotheses while maintaining, at most, those error rates, by conducting a power analysis and conducting your study accordingly. (Typically, this means having sufficient data.) After your study is completed, you compare your p-value to $\alpha$ and reject the null hypothesis if $p<\alpha$; if it's not, you fail to reject the null hypothesis. Either way, your study is complete and you have made your decision.

Fisher thought that the p-value could be interpreted as a continuous measure of evidence against the null hypothesis. There is no particular fixed value at which the results become 'significant'. The way I usually try to get this across to people is to point out that, for all intents and purposes, p=.049 and p=.051 constitute an identical amount of evidence against the null hypothesis (cf. @Henrik's answer here).

On the other hand, Neyman & Pearson thought you could use the p-value as part of a formalized decision making process. At the end of your investigation, you have to either reject the null hypothesis, or fail to reject the null hypothesis. In addition, the null hypothesis could be either true or not true. Thus, there are four theoretical possibilities (although in any given situation, there are just two): you could make a correct decision (fail to reject a true--or reject a false--null hypothesis), or you could make a type I or type II error (by rejecting a true null, or failing to reject a false null hypothesis, respectively). (Note that the p-value is not the same thing as the type I error rate, which I discuss here.) The p-value allows the process of deciding whether or not to reject the null hypothesis to be formalized. Within the Neyman-Pearson framework, the process would work like this: there is a null hypothesis that people will believe by default in the absence of sufficient evidence to the contrary, and an alternative hypothesis that you believe may be true instead. There are some long-run error rates that you will be willing to live with (note that there is no reason these have to be 5% and 20%). Given these things, you design your study to differentiate between those two hypotheses while maintaining, at most, those error rates, by conducting a power analysis and conducting your study accordingly. (Typically, this means having sufficient data.) After your study is completed, you compare your p-value to $\alpha$ and reject the null hypothesis if $p<\alpha$; if it's not, you fail to reject the null hypothesis. Either way, your study is complete and you have made your decision.

Fisher thought that the p-value could be interpreted as a continuous measure of evidence against the null hypothesis. There is no particular fixed value at which the results become 'significant'. The way I usually try to get this across to people is to point out that, for all intents and purposes, p=.049 and p=.051 constitute an identical amount of evidence against the null hypothesis (cf. @Henrik's answer here).

On the other hand, Neyman & Pearson thought you could use the p-value as part of a formalized decision making process. At the end of your investigation, you have to either reject the null hypothesis, or fail to reject the null hypothesis. In addition, the null hypothesis could be either true or not true. Thus, there are four theoretical possibilities (although in any given situation, there are just two): you could make a correct decision (fail to reject a true--or reject a false--null hypothesis), or you could make a type I or type II error (by rejecting a true null, or failing to reject a false null hypothesis, respectively). (Note that the p-value is not the same thing as the type I error rate, which I discuss here.) The p-value allows the process of deciding whether or not to reject the null hypothesis to be formalized. Within the Neyman-Pearson framework, the process would work like this: there is a null hypothesis that people will believe by default in the absence of sufficient evidence to the contrary, and an alternative hypothesis that you believe may be true instead. There are some long-run error rates that you will be willing to live with (note that there is no reason these have to be 5% and 20%). Given these things, you design your study to differentiate between those two hypotheses while maintaining, at most, those error rates, by conducting a power analysis and conducting your study accordingly. (Typically, this means having sufficient data.) After your study is completed, you compare your p-value to $\alpha$ and reject the null hypothesis if $p<\alpha$; if it's not, you fail to reject the null hypothesis. Either way, your study is complete and you have made your decision.

edited body
Source Link
gung - Reinstate Monica
  • 150.3k
  • 90
  • 418
  • 748

On the other hand, Neyman & Pearson thought you could use the p-value as part of a formalized decision making process. At the end of your investigation, you have to either reject the null hypothesis, or fail to reject the null hypothesis. In addition, the null hypothesis could be either true or not true. Thus, there are four theoretical possibilities (although in any given situation, there are just two): you could make a correct decision (fail to reject a true--or reject a false--null hypothesis), or you could make a type I or type II error (by rejecting a true null, or failing to reject a false null hypothesis, respectively). (Note that the p-value is not the same thing atas the type I error rate, which I discuss here.) The p-value allows the process of deciding whether or not to reject the null hypothesis to be formalized. Within the Neyman-Pearson framework, the process would work like this: there is a null hypothesis that people will believe by default in the absence of sufficient evidence to the contrary, and an alternative hypothesis that you believe may be true instead. There are some long-run error rates that you will be willing to live with (note that there is no reason these have to be 5% and 20%). Given these things, you design your study to differentiate between those two hypotheses while maintaining, at most, those error rates, by conducting a power analysis and conducting your study accordingly. (Typically, this means having sufficient data.) After your study is completed, you compare your p-value to $\alpha$ and reject the null hypothesis if $p<\alpha$; if it's not, you fail to reject the null hypothesis. Either way, your study is complete and you have made your decision.

On the other hand, Neyman & Pearson thought you could use the p-value as part of a formalized decision making process. At the end of your investigation, you have to either reject the null hypothesis, or fail to reject the null hypothesis. In addition, the null hypothesis could be either true or not true. Thus, there are four theoretical possibilities (although in any given situation, there are just two): you could make a correct decision (fail to reject a true--or reject a false--null hypothesis), or you could make a type I or type II error (by rejecting a true null, or failing to reject a false null hypothesis, respectively). (Note that the p-value is not the same thing at the type I error rate, which I discuss here.) The p-value allows the process of deciding whether or not to reject the null hypothesis to be formalized. Within the Neyman-Pearson framework, the process would work like this: there is a null hypothesis that people will believe by default in the absence of sufficient evidence to the contrary, and an alternative hypothesis that you believe may be true instead. There are some long-run error rates that you will be willing to live with (note that there is no reason these have to be 5% and 20%). Given these things, you design your study to differentiate between those two hypotheses while maintaining, at most, those error rates, by conducting a power analysis and conducting your study accordingly. (Typically, this means having sufficient data.) After your study is completed, you compare your p-value to $\alpha$ and reject the null hypothesis if $p<\alpha$; if it's not, you fail to reject the null hypothesis. Either way, your study is complete and you have made your decision.

On the other hand, Neyman & Pearson thought you could use the p-value as part of a formalized decision making process. At the end of your investigation, you have to either reject the null hypothesis, or fail to reject the null hypothesis. In addition, the null hypothesis could be either true or not true. Thus, there are four theoretical possibilities (although in any given situation, there are just two): you could make a correct decision (fail to reject a true--or reject a false--null hypothesis), or you could make a type I or type II error (by rejecting a true null, or failing to reject a false null hypothesis, respectively). (Note that the p-value is not the same thing as the type I error rate, which I discuss here.) The p-value allows the process of deciding whether or not to reject the null hypothesis to be formalized. Within the Neyman-Pearson framework, the process would work like this: there is a null hypothesis that people will believe by default in the absence of sufficient evidence to the contrary, and an alternative hypothesis that you believe may be true instead. There are some long-run error rates that you will be willing to live with (note that there is no reason these have to be 5% and 20%). Given these things, you design your study to differentiate between those two hypotheses while maintaining, at most, those error rates, by conducting a power analysis and conducting your study accordingly. (Typically, this means having sufficient data.) After your study is completed, you compare your p-value to $\alpha$ and reject the null hypothesis if $p<\alpha$; if it's not, you fail to reject the null hypothesis. Either way, your study is complete and you have made your decision.

Bounty Awarded with 50 reputation awarded by chl
added 125 characters in body
Source Link
gung - Reinstate Monica
  • 150.3k
  • 90
  • 418
  • 748

(I think this is a great question. Evidently I upvoted it at some point in the past, but didn't answer. I thought I might put down a few ideas now.)

Let me start by defining the terms of the discussion as I see them. A p-value is the probability of getting a sample statistic (say, a sample mean) as far as, or further from some reference value than your sample statistic, if the reference value were the true population parameter. For example, a p-value answers the question: what is the probability of getting a sample mean IQ more than $|\bar x-100|$ points away from 100, if 100 is really the mean of the population from which your sample was drawn. Now the issue is, how should that number be employed in making a statistical inference?

Fisher thought that the p-value could be interpreted as a continuous measure of evidence against the null hypothesis. There is no particular fixed value at which the results become 'significant'. The way I usually try to get this across to people is to point out that, for all intents and purposes, p=.049 and p=.051 constitute an identical amount of evidence against the null hypothesis (cf. @Henrik's answer here).

  • the omnibus ANOVA for a multiple regression model (it is possible to figure out how all the hypothesized non-zero slope parameters come together to create a non-centrality parameter for the F distribution, but it isn't remotely intuitive, and I doubt anyone does it)
  • the value of a Shapiro-Wilk test of the normality of your residuals in a regression analysis (what magnitude of W$W$ do you care about and why? how much power to you have to reject the null when that magnitude is correct?)
  • the value of a test of homogeneity of variance (e.g., Levene's test) of homogeneity of variance (same; same comments as above)
  • any other tests to check assumptions, etc.
  • t-tests of covariates other than the explanatory variable of primary interest in the study
  • initial / exploratory research (e.g., pilot studies)

(I think this is a great question. Evidently I upvoted it at some point in the past, but didn't answer. I thought I might put down a few ideas now.)

Let me start by defining the terms of the discussion as I see them. A p-value is the probability of getting a sample statistic (say, a sample mean) as far, or further from some reference value than your sample statistic, if the reference value were the true population parameter. For example, a p-value answers the question: what is the probability of getting a sample mean IQ more than $|\bar x-100|$ points away from 100, if 100 is really the mean of the population from which your sample was drawn. Now the issue is, how should that number be employed in making a statistical inference?

Fisher thought that the p-value could be interpreted as a continuous measure of evidence against the null hypothesis. There is no particular fixed value at which the results become 'significant'. The way I usually try to get this across to people is to point out that, for all intents and purposes, p=.049 and p=.051 constitute an identical amount of evidence against the null hypothesis.

  • the omnibus ANOVA for a multiple regression model (it is possible to figure out how all the hypothesized non-zero slope parameters come together to create a non-centrality parameter for the F distribution, but it isn't remotely intuitive, and I doubt anyone does it)
  • the value of a Shapiro-Wilk test of the normality of your residuals in a regression analysis (what magnitude of W do you care about and why? how much power to you have to reject the null when that magnitude is correct?)
  • the value of a test (e.g., Levene's test) of homogeneity of variance (same comments as above)
  • any other tests to check assumptions, etc.
  • t-tests of covariates other than the explanatory variable of primary interest in the study
  • initial / exploratory research (e.g., pilot studies)

Let me start by defining the terms of the discussion as I see them. A p-value is the probability of getting a sample statistic (say, a sample mean) as far as, or further from some reference value than your sample statistic, if the reference value were the true population parameter. For example, a p-value answers the question: what is the probability of getting a sample mean IQ more than $|\bar x-100|$ points away from 100, if 100 is really the mean of the population from which your sample was drawn. Now the issue is, how should that number be employed in making a statistical inference?

Fisher thought that the p-value could be interpreted as a continuous measure of evidence against the null hypothesis. There is no particular fixed value at which the results become 'significant'. The way I usually try to get this across to people is to point out that, for all intents and purposes, p=.049 and p=.051 constitute an identical amount of evidence against the null hypothesis (cf. @Henrik's answer here).

  • the omnibus ANOVA for a multiple regression model (it is possible to figure out how all the hypothesized non-zero slope parameters come together to create a non-centrality parameter for the F distribution, but it isn't remotely intuitive, and I doubt anyone does it)
  • the value of a Shapiro-Wilk test of the normality of your residuals in a regression analysis (what magnitude of $W$ do you care about and why? how much power to you have to reject the null when that magnitude is correct?)
  • the value of a test of homogeneity of variance (e.g., Levene's test; same comments as above)
  • any other tests to check assumptions, etc.
  • t-tests of covariates other than the explanatory variable of primary interest in the study
  • initial / exploratory research (e.g., pilot studies)
Source Link
gung - Reinstate Monica
  • 150.3k
  • 90
  • 418
  • 748
Loading