3
$\begingroup$

I am a secondary school student and I'm trying to wrap around how people actually create and apply normal distribution in the real world .

From my limited knowledge common sense tells me that, they collect the raw data and they then go on to create a histogram out of it if this histogram is close enough to be a perfect bell curve (I'm assuming they determine this with other statistical measures like skewness etc), they then use the standard deviation and the mean values and plug that into the normal distribution function to get their actual bell curve.

Because even though things like height are normally distributed, I'm assuming practically in the real world when someone collects data they might find that the heights may not be normally distributed somewhere (like for example a town with loads of children etc). So in my head, I'm assuming scientists make a histogram to check first to see if it's roughly close enough to being a bell curve.

$\endgroup$
4

1 Answer 1

1
$\begingroup$

Not all data are normally distributed, but there are a few properties of normal distributions that make them convinient. I'll pull a quote from McElreath's Statistical Rethinking

The take-home lesson from all of this is that, if all we are willing to assume about a collection of measurements is that they have a finite variance, then the Gaussian distribution represents the most conservative probability distribution to assign to those measurements. But very often we are comfortable assuming something more. And in those cases, provided our assumptions are good ones, the principle of maximum entropy leads to distributions other than the Gaussian. (p. 306)

That is to say, normal (aka Gaussian) distributions are likely to arise from data generated by complex interacting processes. Other distributions are possible but often require making more assumptions.

Speaking as a scientist, the main way we use normal distributions is as the underlying structure of error in statistical models. To use your example, say we want to compare heights between two groups of people, men vs. women. If we are comfortable assuming that height is a normally-distributed variable within each group, we can proceed with a parametric statistical test like a t-test.

We can examine normality of each group with a histogram as you described, or use something like a Shapiro-Wilk test to do a normality hypothesis test. Additionally, we often justify assuming normality of a distribution with the Central Limit Theorem if our sample size is sufficient.

$\endgroup$
1
  • $\begingroup$ +1. But I must take issue with McElreath's "most conservative," because it belies assumptions that frequently are (strongly) violated in practice. As an example, for regulating (upper) concentrations of environmental contaminants a Normality assumption is usually the least conservative possible in the sense of being protective of the environment. We cannot (and IMHO should not) divorce our theoretical assumptions from the needs of those who will be using our statistical procedures and recommendations. $\endgroup$ Commented May 30 at 13:13

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.