Revisions to Can this be modeled by a two-sample t-test?

address comment

edited Sep 23, 2013 at 17:55

585
7
17

Looks likeThis is a reasonable approach. If this is homework, could you please tag it as such?Your calculations are correct numerically.

There's one major caveat in the 'draw an appropriate conclusion' here. This is something of a perennial on here: "how to interpret a p-value".

The short answer is "it's $P(D|H)$" meaning the probability of the data given the hypothesis (which, naturally, was stated before performing the experiment). In your case, it's the probability of drawing this sample of 22 responses given your prior null hypothesis that the means are the same. Of course, here it's much $<5\%$. This doesn't translate as "sole earners earn more", which is $P(H|D)$. There's a more detailed explanation here..

The null distribution is just the distribution of the statistic of interest. In your case it's a $t$ distribution. You have calculated a statistic based on certain inputs (means and sds). Then check the probability that the value of this no. falls outside a certain range in the distribution of interest. In calculating the statistic you have made the additional assumption that the means and sds are figures which describe a normal distribution.

If using R you might like to verify things in an empirical case with

set.seed(1) ms <- rnorm(11, mean=124, sd=18) m2 <- rnorm(11, mean=95, sd=15)

then using debugonce(t.test) before

t.test(ms, m2, alternative="greater", var.equal=TRUE)

Better still look at the source with stats:::t.test.default; you can follow the logic, substitute exact rather than empiric values, and consider some additional ways to approach the problem.

Update Now this may seem like splitting hairs, but I think it's an important distinction which often gets overlooked in introductory stats courses. This $t$-test assumes the frequentist worldview. You have made certain assumptions (samples really are normally distributed, equal variance, means are the same) in generating a statistic. The probability of drawing such a statistic from the $t$ distribution is indeed very low. Hence the probability of observing such data as lead to the statistic is indeed very low. You can indeed use such a finding to "support a claim" and this is widespread in practice. Strictly speaking however, justifying a statement such as "sole earners earn more" requires that you approach the given data from a Bayesian perspective...

Looks like a reasonable approach. If this is homework, could you please tag it as such?

There's one major caveat in the 'draw an appropriate conclusion' here. This is something of a perennial on here: "how to interpret a p-value".

The short answer is "it's $P(D|H)$" meaning the probability of the data given the hypothesis (which, naturally, was stated before performing the experiment). In your case, it's the probability of drawing this sample of 22 responses given your prior null hypothesis that the means are the same. Of course, here it's much $<5\%$. This doesn't translate as "sole earners earn more", which is $P(H|D)$. There's a more detailed explanation here..

The null distribution is just the distribution of the statistic of interest. In your case it's a $t$ distribution. You have calculated a statistic based on certain inputs (means and sds). Then check the probability that the value of this no. falls outside a certain range in the distribution of interest. In calculating the statistic you have made the additional assumption that the means and sds are figures which describe a normal distribution.

If using R you might like to verify things in an empirical case with

set.seed(1) ms <- rnorm(11, mean=124, sd=18) m2 <- rnorm(11, mean=95, sd=15)

then using debugonce(t.test) before

t.test(ms, m2, alternative="greater", var.equal=TRUE)

Better still look at the source with stats:::t.test.default; you can follow the logic, substitute exact rather than empiric values, and consider some additional ways to approach the problem.

This is a reasonable approach. Your calculations are correct numerically.

There's one caveat in the 'draw an appropriate conclusion' here. This is something of a perennial on here: "how to interpret a p-value".

The short answer is "it's $P(D|H)$" meaning the probability of the data given the hypothesis (which, naturally, was stated before performing the experiment). In your case, it's the probability of drawing this sample of 22 responses given your prior null hypothesis that the means are the same. Of course, here it's much $<5\%$. This doesn't translate as "sole earners earn more", which is $P(H|D)$. There's a more detailed explanation here..

The null distribution is just the distribution of the statistic of interest. In your case it's a $t$ distribution. You have calculated a statistic based on certain inputs (means and sds). Then check the probability that the value of this no. falls outside a certain range in the distribution of interest. In calculating the statistic you have made the additional assumption that the means and sds are figures which describe a normal distribution.

If using R you might like to verify things in an empirical case with

set.seed(1) ms <- rnorm(11, mean=124, sd=18) m2 <- rnorm(11, mean=95, sd=15)

then using debugonce(t.test) before

t.test(ms, m2, alternative="greater", var.equal=TRUE)

Better still look at the source with stats:::t.test.default; you can follow the logic, substitute exact rather than empiric values, and consider some additional ways to approach the problem.

Update Now this may seem like splitting hairs, but I think it's an important distinction which often gets overlooked in introductory stats courses. This $t$-test assumes the frequentist worldview. You have made certain assumptions (samples really are normally distributed, equal variance, means are the same) in generating a statistic. The probability of drawing such a statistic from the $t$ distribution is indeed very low. Hence the probability of observing such data as lead to the statistic is indeed very low. You can indeed use such a finding to "support a claim" and this is widespread in practice. Strictly speaking however, justifying a statement such as "sole earners earn more" requires that you approach the given data from a Bayesian perspective...

improve code

Source Link

edited Sep 22, 2013 at 17:05

dardisco

585
7
17

Looks like a reasonable approach. If this is homework, could you please tag it as such?

There's one major caveat in the 'draw an appropriate conclusion' here. This is something of a perennial on here: "how to interpret a p-value".

The short answer is "it's $P(D|H)$" meaning the probability of the data given the hypothesis (which, naturally, was stated before performing the experiment). In your case, it's the probability of drawing this sample of 22 responses given your prior null hypothesis that the means are the same. Of course, here it's much $<5\%$. This doesn't translate as "sole earners earn more", which is $P(H|D)$. There's a more detailed explanation here..

The null distribution is just the distribution of the statistic of interest. In your case it's a $t$ distribution. You have calculated a statistic based on certain inputs (means and sds). Then check the probability that the value of this no. falls outside a certain range in the distribution of interest. In calculating the statistic you have made the additional assumption that the means and sds are figures which describe a normal distribution.

If using R you might like to verify things in an empirical case with

set.seed(1) ms <- rnorm(11, mean=124, sd=18) m2 <- rnorm(11, mean=95, sd=15)

then using debugonce(t.test) before

t.test(ms, m2, alternative="greater", var.equal=TRUE)

Better still look at the source with stats:::t.test.default; you can follow the logic, substitute exact rather than empiric values, and consider some additional ways to approach the problem.

Looks like a reasonable approach. If this is homework, could you please tag it as such?

There's one major caveat in the 'draw an appropriate conclusion' here. This is something of a perennial on here: "how to interpret a p-value".

The short answer is "it's $P(D|H)$" meaning the probability of the data given the hypothesis (which, naturally, was stated before performing the experiment). In your case, it's the probability of drawing this sample of 22 responses given your prior null hypothesis that the means are the same. Of course, here it's much $<5\%$. This doesn't translate as "sole earners earn more", which is $P(H|D)$. There's a more detailed explanation here..

The null distribution is just the distribution of the statistic of interest. In your case it's a $t$ distribution. You have calculated a statistic based on certain inputs (means and sds). Then check the probability that the value of this no. falls outside a certain range in the distribution of interest. In calculating the statistic you have made the additional assumption that the means and sds are figures which describe a normal distribution.

If using R you might like to verify things in an empirical case with

set.seed(1) ms <- rnorm(11, mean=124, sd=18) m2 <- rnorm(11, mean=95, sd=15)

then using debugonce(t.test) before

t.test(ms, m2, alternative="greater")

Better still look at the source with stats:::t.test.default; you can follow the logic, substitute exact rather than empiric values, and consider some additional ways to approach the problem.

Looks like a reasonable approach. If this is homework, could you please tag it as such?

There's one major caveat in the 'draw an appropriate conclusion' here. This is something of a perennial on here: "how to interpret a p-value".

The short answer is "it's $P(D|H)$" meaning the probability of the data given the hypothesis (which, naturally, was stated before performing the experiment). In your case, it's the probability of drawing this sample of 22 responses given your prior null hypothesis that the means are the same. Of course, here it's much $<5\%$. This doesn't translate as "sole earners earn more", which is $P(H|D)$. There's a more detailed explanation here..

The null distribution is just the distribution of the statistic of interest. In your case it's a $t$ distribution. You have calculated a statistic based on certain inputs (means and sds). Then check the probability that the value of this no. falls outside a certain range in the distribution of interest. In calculating the statistic you have made the additional assumption that the means and sds are figures which describe a normal distribution.

If using R you might like to verify things in an empirical case with

set.seed(1) ms <- rnorm(11, mean=124, sd=18) m2 <- rnorm(11, mean=95, sd=15)

then using debugonce(t.test) before

t.test(ms, m2, alternative="greater", var.equal=TRUE)

Better still look at the source with stats:::t.test.default; you can follow the logic, substitute exact rather than empiric values, and consider some additional ways to approach the problem.

give example in code

Source Link

edited Sep 22, 2013 at 5:21

dardisco

585
7
17

Looks like a reasonable approach. If this is homework, could you please tag it as such?

There's one major caveat in the 'draw an appropriate conclusion' here. This is something of a perennial on here: "how to interpret a p-value".

The short answer is "it's $P(D|H)$" meaning the probability of the data given the hypothesis (which, naturally, was stated before performing the experiment). In your case, it's the probability of drawing this sample of 22 responses given your prior null hypothesis that the means are the same. Of course, here it's much $<5\%$. This doesn't translate as "sole earners earn more", which is $P(H|D)$. There's a more detailed explanation here..

The null distribution is just the distribution of the statistic of interest. In your case it's a $t$ distribution. You have calculated a statistic based on certain inputs (means and sds). Then check the probability that the value of this no. falls outside a certain range in the distribution of interest. In calculating the statistic you have made the additional assumption that the means and sds are figures which describe a normal distribution.

If using R you might like to verify things in an empirical case with

set.seed(1) ms <- rnorm(11, mean=124, sd=18) m2 <- rnorm(11, mean=95, sd=15)

then using debugonce(t.test) before

t.test(ms, m2, alternative="greater")