How is the Normal distribution made from real world data

Question

I am a secondary school student and I'm trying to wrap around how people actually create and apply normal distribution in the real world .

From my limited knowledge common sense tells me that, they collect the raw data and they then go on to create a histogram out of it if this histogram is close enough to be a perfect bell curve (I'm assuming they determine this with other statistical measures like skewness etc), they then use the standard deviation and the mean values and plug that into the normal distribution function to get their actual bell curve.

Because even though things like height are normally distributed, I'm assuming practically in the real world when someone collects data they might find that the heights may not be normally distributed somewhere (like for example a town with loads of children etc). So in my head, I'm assuming scientists make a histogram to check first to see if it's roughly close enough to being a bell curve.

Welcome to Cross Validated. I am a bit perplexed as to why you think scientists are working hard to impose on anything gaussian distribution. Model fitting and normality checking are altogether different things. — User1865345
– User1865345, Commented Aug 31, 2024 at 12:57
Real-life examples of common distributions Are there any examples of a variable being normally distributed that is not due to the Central Limit Theorem? Importance of normal distribution — kjetil b halvorsen
– kjetil b halvorsen ♦, Commented Aug 31, 2024 at 22:43
Are any processes in nature distributed exactly normal? Can we see shape of normal curve somewhere in nature? Is there an explanation for why there are so many natural phenomena that follow normal distribution? and search this site! — kjetil b halvorsen
– kjetil b halvorsen ♦, Commented Aug 31, 2024 at 22:48
Normality is a model (/abstraction/approximation). Almost all variables I've ever encountered - e.g. height - clearly cannot be normal, though it might be a pretty serviceable approximation for some purposes; I more often see pretty skewed ones. Keeping in mind Box's point that all our models are wrong, his productive question is "how wrong does it have to be to not be useful?". That's a question of what we're trying to use the normal model to do. In many cases we don't require the original variable to be particularly close to normal, and yet in others we might. — Glen_b
– Glen_b, Commented Sep 1, 2024 at 23:53

Nick Cox · Accepted Answer · 2025-04-26 07:10:24Z

Not all data are normally distributed, but there are a few properties of normal distributions that make them convinient. I'll pull a quote from McElreath's Statistical Rethinking

The take-home lesson from all of this is that, if all we are willing to assume about a collection of measurements is that they have a finite variance, then the Gaussian distribution represents the most conservative probability distribution to assign to those measurements. But very often we are comfortable assuming something more. And in those cases, provided our assumptions are good ones, the principle of maximum entropy leads to distributions other than the Gaussian. (p. 306)

That is to say, normal (aka Gaussian) distributions are likely to arise from data generated by complex interacting processes. Other distributions are possible but often require making more assumptions.

Speaking as a scientist, the main way we use normal distributions is as the underlying structure of error in statistical models. To use your example, say we want to compare heights between two groups of people, men vs. women. If we are comfortable assuming that height is a normally-distributed variable within each group, we can proceed with a parametric statistical test like a t-test.

We can examine normality of each group with a histogram as you described, or use something like a Shapiro-Wilk test to do a normality hypothesis test. Additionally, we often justify assuming normality of a distribution with the Central Limit Theorem if our sample size is sufficient.

+1. But I must take issue with McElreath's "most conservative," because it belies assumptions that frequently are (strongly) violated in practice. As an example, for regulating (upper) concentrations of environmental contaminants a Normality assumption is usually the least conservative possible in the sense of being protective of the environment. We cannot (and IMHO should not) divorce our theoretical assumptions from the needs of those who will be using our statistical procedures and recommendations. — whuber
– whuber ♦, Commented May 30 at 13:13

Stack Exchange Network

How is the Normal distribution made from real world data

1 Answer 1

Linked

Hot Network Questions

How is the Normal distribution made from real world data

1 Answer 1

Linked

Related

Hot Network Questions