What is the post-hoc power in my experiment? How to calculate this?

Question

The following articles are reprinte of #3375492 of math.stackexchange.com. It was recommended to ask this community at math.stackexchange.com.

My motivations
I often see the claims that post-hoc power is nonsense. This kind of editorials are mass-produced and is published on many established journals. I can easily to access to the definitions that are not chunked-down to formulas or codes.

However, it is unclear what the post-hoc power they criticize is. Certainly they writes definition is written in words. However, it is not chunked down into formulas or calculation codes.　Therefore, what is they want to criticize are not identified / at least not shared with me. (Both code 1 and code 2 below seem to meet their common definitions. The results are different, but different ways.)

The strange thing is that even though many people have been criticized so much but "what is post-hoc power?" is not seems to clear. Wouldn't it be strange to be able to understand these opinions like “it doesn't make sense because it is unique if other variables are set” or “circular theory” for objects whose calculation method is not shown? This looks like a barren on-air battle under the unclear premise.

Give calculation procedure before criticizing them!! （This is likely to apply to all statutory ethics editorials that have been mass-produced recently.）

The verbal explanation is written on the mass-produced editorial. They are not what I want. - Please show me formulas or codes instead of words. - Please chunk down words into the formula.

Require explanations in formulas and codes instead of words.

I know that there is no "correct" post-hoc analysis, as it is often screamed in mass-produced editorial. “Correct post-hoc analysis” I said is synonymous with “post-hoc analysis that many people criticize.”

My Question

What is the post-hoc power in the following experiment?

Experiment:
We randomly divide 20 animals into two groups, Group A and Group B. After that, for Group A, Foods A are fed, and for Group B, Foods B are fed. After a certain period, bodyweight was measured, and the data were as follows.

Group_A :40.2, 40.4, 40.6, 40.8, 41.0, 41.2, 41.4, 41.6, 41.8
Group_B :30.1, 30.3, 30.5, 30.7, 30.9, 31.1, 31.3, 31.5, 31.7, 31.9, 32.1

I would like to conduct a two-sided test with a significance level of 0.05 to see if there is a significant difference between the two groups.

I think it is one of the following ones. Both codes are written in "R". R source codes can be downloaded from the following link.

The difference between Method 1 and Method 2 is using the predetermined value (in the code of method1, we use α=0.05) or using the calculated p-value when calculating power.

Method 1
Code01

#Load data Group_A = c(40.2, 40.4, 40.6, 40.8, 41.0, 41.2, 41.4, 41.6, 41.8) Group_B = c(30.1, 30.3, 30.5, 30.7, 30.9, 31.1, 31.3, 31.5, 31.7, 31.9, 32.1) # Welch Two Sample t-test t.test(Group_A,Group_B) library(effsize) library(pwr) cd = cohen.d(Group_A, Group_B) cd pwr.t2n.test(n1 = 9, n2= 11, d = cd$estimate, sig.level = 0.05, power = NULL, alternative = c("two.sided"))

Method 2
Code02

# Load data Group_A = c(40.2, 40.4, 40.6, 40.8, 41.0, 41.2, 41.4, 41.6, 41.8) Group_B = c(30.1, 30.3, 30.5, 30.7, 30.9, 31.1, 31.3, 31.5, 31.7, 31.9, 32.1) # Welch Two Sample t-test twel=t.test(Group_A,Group_B) twel pwel=twel$p.value library(effsize) library(pwr) cd = cohen.d(Group_A, Group_B) cd pwr.t2n.test(n1 = 9, n2= 11, d = cd$estimate, sig.level = pwel, power = NULL, alternative = c("two.sided"))

Which is the “correct” post-hoc power calculation code?

Notes:
If your "R" environment does not have packages named "effsize" and "pwr", you need to install them previously.　If the following command is executed on R while connected to the Internet, installation should start automatically.

install.packages("effsize") install.packages("pwr")

【Post-Hoc Notes】 (Added after 2019/10/06 00:56(JST))

(1)Relationship between effect size and power　(Based on Method 01)
Fig. PHN01 shows the relationship between effect size and power when using code01 above, p = 0.05, 0.025, 0.01. Where n1 = 9, n2 = 11.

Fig. PHN01 :Relationship between effect size and power

These are calculated using the R same manner of followiing code.

Code PHN 01

library(pwr) pv=0.025 pwr.t2n.test(n1 = 9, n2= 11, d = 4, sig.level = pv, power = NULL, alternative = c("two.sided"))

(2)Relationship between effect size and power　(Based on Method 02)
Fig. PHN02 shows the relationship between effect size and power when using code02, where n1 = 9, n2 = 11.

Fig. PHN02 :Relationship between effect size and power

Code PHN 02 library(effsize) library(pwr)

offc=1.6 offc=0.1+offc Group_A = c(30.2+offc, 30.4+offc, 30.6+offc, 30.8+offc, 31.0+offc, 31.2+offc, 31.4+offc, 31.6+offc, 31.8+offc) Group_B = c(30.1, 30.3, 30.5, 30.7, 30.9, 31.1, 31.3, 31.5, 31.7, 31.9, 32.1) print(mean(Group_A)-mean(Group_B)) twel=t.test(Group_A,Group_B) pwel=twel$p.value cd = cohen.d(Group_A, Group_B) pwr.t2n.test(n1 = 9, n2= 11, d = cd$estimate, sig.level = pwel, power = NULL, alternative = c("two.sided"))

(3)Comment on Welch’s correction
There was a comment that “it is better to remove the Welch correction”. Certainly in the R is not comprising the functionality to calculate the power it self under the Welch correction for n1≠n2 cases.

Please forget the following code.

Code PHN 03

library(effsize) offc=1.6 offc=0.1+offc Group_A = c(30.2+offc, 30.4+offc, 30.6+offc, 30.8+offc, 31.0+offc, 31.2+offc, 31.4+offc, 31.6+offc, 31.8+offc) Group_B = c(30.1, 30.3, 30.5, 30.7, 30.9, 31.1, 31.3, 31.5, 31.7, 31.9, 32.1) print(mean(Group_A)-mean(Group_B)) #Option1 Var.equal twel=t.test(Group_A,Group_B, var.equal=True) pwel=twel$p.value #Option2 Hedges.correction, Optoon3 var.equal=FALSE cohen.d(Group_A, Group_B, hedges.correction=FALSE, var.equal=FALSE) sqrt((9+11)/(9*11)) cd$estimate/twel$statistic

(4)The "correct" post-hoc power calculation method　for when welch's correction is not required

This part has been split into the following thread:
The calculation method of post-hoc power in t-test without welch's correction

https://gpsych.bmj.com/content/32/4/e100069

Only the case where the Welch correction was not necessary was written, but I found a paper in which the "correct" post-hoc power calculation method was written in mathematical formulas. Here, “correct” means “criticized by mass-produced editorials”.

Post-hoc power seems to be calculated by the following formula.

Here, the α is given in advance, it can be considered that it is essentially the same as the method of Code 1. However, my setting is different from the Welch test.

　(PHN04-01)

Here,
　(PHN04-02)
(PHN04-03)
And, use the following d for ,
(PHN04-04)

However, I could not read out the distribution of the following statistics. (Maybe non-central t distribution, but how is the non-central parameter value?)

(PHN04-05)

What is this ${Z}_{\alpha /2}$? . Zα is the upper α point of which distribution? Is the upper α/2 point t-distribution?

And

How can it be extended to Welch's case?

【P.S.】 I'm not very good at English, so I'm sorry if I have some impolite or unclear expressions. I welcome any corrections and English review. (You can edit my question and description to improve them)

Is the only difference between the two scripts: sig.level = pwel and sig.level = 0.05? — Jeremy Miles
– Jeremy Miles, Commented Oct 4, 2019 at 16:23
I can't install the packages easily, but I think the first should give you the post hoc power, which is a transformation of the p-value, and the second should give you 0.5. — Jeremy Miles
– Jeremy Miles, Commented Oct 4, 2019 at 18:29
[I know how to install a package, thanks.I can't easily install packages for complex reasons to do with security policies where I work. Not because of the edition of R that I use.] — Jeremy Miles
– Jeremy Miles, Commented Oct 4, 2019 at 19:36

EdM · Accepted Answer · 2019-10-07 16:29:25Z

Let's examine the well accepted statistical definitions of "power," "power analysis," and "post-hoc," using this site's tag information as a guide.

Power

is a property of a hypothesis testing method: the probability of rejecting the null hypothesis given that it is false, i.e. the probability of not making a type II error. The power of a test depends on sample size, effect size, and the significance (𝛼) level of the test.

Let's ignore for now the post-hoc issue. From that definition you can see that either of your approaches to power could be considered "correct": Method 1 is based on a significance (𝛼) level of 0.05, while Method 2 is based on the significance (𝛼) level that you happened to find, about 0.17.

For what is useful, however, consider power analysis:

An inquiry into the quality of a statistical test by calculating the power - the probability of rejecting the null hypothesis given that it is false - under certain circumstances. Power analysis is often used when planning a study to determine the sample size required to achieve a nominal level of power (e.g. 80%) for a given effect size.

In the design phase of a study, where the importance of power analysis is unquestioned, you attempt to estimate the number of cases needed to detect a "statistically significant" effect. This typically means basing the calculations on a significance (𝛼) level of 0.05. It would be hard to come up with any rationale for choosing instead a level of 0.17. So for power analysis in the a priori design-phase of a study your Method 1 would be the only one to make sense.

Now consider post-hoc:

"Post-hoc" refers to analyses that are decided upon after the data has been collected, as opposed to "a priori".

We need to distinguish 2 types of post-hoc analysis related to power calculations. One is to treat the just-completed study as a pilot study to inform the design of a more detailed study. You use the observed difference between the groups and the observed variance of the difference as estimates of the true population values. Based on those estimates, you determine the sample size needed in a subsequent study to provide adequate power (say, 80%) to detect a statistically significant difference (say, 𝛼 < 0.05). That's quite appropriate. That is "post-hoc" in the sense of being based on already obtained data, but it is used to inform the design of the next study.

In most cases, however, that is not how the phrase "post-hoc power analysis" is used or the way you are using the phrase. You (and many others) seek to plug into a formula to determine some type of "power" of the study and analysis you have already done.

This type of "post-hoc power analysis" is fundamentally flawed, as noted for example by Hoenig and Heisey in The Abuse of Power. They describe two variants of such analysis. One is the "observed power," "that is, assuming the observed treatment effects and variability are equal to the true parameter values, the probability of rejecting the null hypothesis." (Note that this null hypothesis is typically tested at 𝛼 < 0.05, your Method 1, and is based on the sample size at hand. This seems to be what you have in mind.) Yet this "observed power" calculation adds nothing:

Observed power can never fulfill the goals of its advocates because the observed significance level of a test ("p value") also determines the observed power; for any test the observed power is a 1:1 function of the p value.

That's the point that Jeremy Miles makes with his example calculations based on your two Methods. In this type of post-hoc analysis, neither Method adds any useful information. That's why you find both of us effectively saying that is no "correct" post-hoc power calculation code. Yes, you can plug numbers correctly into a formula, but to call the analysis "correct" from a statistical perspective would be an abuse of terminology.

There is a second (ab)use of power calculations post-hoc, which does not seem to be what you have in mind but which should be addressed for completeness: "finding the hypothetical true difference that would have resulted in a particular power, say .9." Hoenig and Heisey show that this approach can lead to nonsensical conclusions, based on what they call:

the “power approach paradox” (PAP): higher observed power does not imply stronger evidence for a null hypothesis that is not rejected.

So the statistical advice (which is what one should expect from this site) is to refrain from post-hoc power tests in the sense that you wish to use them.

Thankyou for your commment. I'm sorry, but as I said in the mainbody, verbal explanations are not welcomed. It can be found by googled editorial. Please show me formulas or codes instead of words. Please chunk down words into the formula. — Blue Various
– Blue Various, Commented Oct 11, 2019 at 3:30
I know that there is no "correct" post-hoc analysis, as it is often screamed in mass-produced editorial. “Correct post-hoc analysis” I said is synonymous with “post-hoc analysis that many people criticize.” I added it to the main body with emphasis because it was bad. I want to hear about why there is one-to-one correspondence between p-value and power even though there is no formula or code. Although it seems to be one-to-one correspondence, it seems to depend on the effect size and sample size ... — Blue Various
– Blue Various, Commented Oct 11, 2019 at 3:31
In post hoc power, you know the effect size and sample size. They don't change. Hence there is a 1:1 correspondence. — Jeremy Miles
– Jeremy Miles, Commented Oct 11, 2019 at 15:12
@BlueVarious p-values map to effect sizes at fixed sample size and test type. If sample size doesn't matter in the original test (e.g., Z-test) neither does the 1:1 relationship between p-value and "post-hoc power"; Hoenig and Heisey show a graph for 1-sided Z-tests. For tests where sample size matters (t-tests and F-tests) Russ Lenth has corresponding tables and formulas here. How to handle unequal variances or sample sizes is inherent in the tests themselves and has nothing extra to do with power calculations. — EdM
– EdM, Commented Oct 18, 2019 at 14:50

Jeremy Miles · Accepted Answer · 2019-10-05 05:16:33Z

Here's the thing. Post hoc power tells you the probability that you would have detected a significant result, based on the result that you have. That is, if the estimate that you just found is the population parameter, what is the probability that another study, which is exactly the same as the study you did, will obtain a statistically significant result.

If your p-value is 0.05, your post hoc power is 0.5.

In your first analysis, you ask "What is the power to detect an effect, if I use an alpha that is equal to the p-value that I found, and the effect size that I found?" The answer is:

 power = 0.4985284

i.e. within precision limits of 0.50.

The second analysis says "What's the probability I would get a significant effect, given the effect I found". You had a very low p-value, so you have lots and lots of power. Hence power is 1.00.

Let's try it again with different data:

#Load data Group_A = c(40.2, 40.4, 40.6, 40.8, 41.0, 41.2, 41.4, 41.6, 41.8) Group_B = c(40.2, 40.4, 40.6, 40.8, 41.0, 41.2, 41.4, 41.6, 41.8, 31.9, 32.1)

The t-test is not statistically significant:

 p-value = 0.1741

Hence, the first power estimate tells me that my power is less than 50%.

> pwr.t2n.test(n1 = 9, n2= 11, d = cd$estimate, sig.level = 0.05, power = NULL, + alternative = c("two.sided")) t test power calculation n1 = 9 n2 = 11 d = 0.5923485 sig.level = 0.05 power = 0.2389704

The second analysis tells me that my power, if I use the same alpha as I found, is (approximately) 50%.

> pwr.t2n.test(n1 = 9, n2= 11, d = cd$estimate, sig.level = pwel, power = NULL, + alternative = c("two.sided")) t test power calculation n1 = 9 n2 = 11 d = 0.5923485 sig.level = 0.1740843 power = 0.4740473 alternative = two.sided

You get a little closer if you don't use the Welch correction (use equal.variances = TRUE in the t-test).

Post hoc power is nonsense because it doesn't tell you anything you didn't already know.

The first analysis you did is a transformation of p - the lower p, the higher power. This is what is conventionally referred to as post hoc power. The second analysis you did gives a result of 50%, whatever your data look like.

Thank you for your answer. I have some questions for your answer. First one is: What "your first analysis, " you means and How to caluculate "power = 0.4985284" of First box of your answer? Is what you say the same as what is written in my “post-hoc note (2)”? — Blue Various
– Blue Various, Commented Oct 6, 2019 at 14:18
Second one: The value of the third Box in your answer seems to be calculated with a code that replaces the measured data in my code 02 with the one written in the second Box in your answer. Is that correct? When calculated on my PC, it was “sig.level = 0.1740843, power = 0.4740473”. — Blue Various
– Blue Various, Commented Oct 6, 2019 at 14:33
Third one：>”Hence, the first power estimate tells me that my power is less than 50%.” ■ Is this a comment on Method 2? In the Method 1, 0.5 not seem appeared. (See Fig. PHN01) Even if it is a comment for Method 2, isn't it “over than 50%” instead of “less than 50%”? (See Fig. PHN02) — Blue Various
– Blue Various, Commented Oct 6, 2019 at 14:42
@BlueVarious please see this page about post-hoc power calculations: "You’ve got the data, did the analysis, and did not achieve 'significance.' So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis." Just because you can calculate a value doesn't mean that it is meaningful. Use instead what you found to help design an adequately powered study in the future. — EdM
– EdM, Commented Oct 6, 2019 at 18:11
To answer your earlier question, post hoc power analysis is the probability of obtaining a significant result, with the effect you have in your data and the sample size you have in your data. If you have a significant effect, your post hoc power will be over 50%. If you don't, it will be less than 50%. If p = 0.05 power is 50%. — Jeremy Miles
– Jeremy Miles, Commented Oct 7, 2019 at 4:44

Stack Exchange Network

What is the post-hoc power in my experiment? How to calculate this?

2 Answers 2

Linked

Hot Network Questions

What is the post-hoc power in my experiment? How to calculate this?

2 Answers 2

Linked

Related

Hot Network Questions