Introductory book on statistical inference that connects theory to data analysis

Question

I am a PhD student in Phonetics and I have been doing statistical analyses of speech data for a long time now. I am quite familiar with the hands-on side of data analysis with R and Python, such as organizing the dataset, plotting distributions, checking for tests' assumptions, run linear regressions, and so forth. However, I am not completely happy with my knowledge because, even though I have an intuitive understanding of inferential statistics and I am very careful to make sure that I am not doing anything stupid with my data, I don't understand the mathematical theory behind statistical inference. Since I have a workable knowledge of basic math (for example, I know the basics of linear algebra, single-variable and multivariable calculus), I think it's time to try to learn once for all the foundations of statistics.

So I looked for introductory books on mathematical statistics that had undergrads as the main audience, to ensure that I would be able to follow the math.

In particular, I started reading All of Statistics: A Concise Course in Statistical Inference by Larry Wasserman. Even though I am enjoying it and being able to follow the math, I am still struggling to make the connections between what I am learning (e.g., random variables, CDF, PDF, etc.) and my experience with data analysis. I mean, I am still not able to visualize the big picture and understand how to transfer this formal knowledge to my data analysis practices.

I would like recommendations of books or resources that are similar to All of Statistics but that present the mathematical theory of statistical inference in the context of actual scientific data analyses.

I think the problem you will have (or are having) is that real data analysis (e.g. various types of regression, multilevel models, factor analysis etc.) relies on more advanced mathematical stats than you will be able to get in an introductory book. — Peter Flom
– Peter Flom, Commented Aug 31 at 16:17
Another issue is that to get to the mathematical theory of statistics, you have to first get through the mathematical theory of probability. (For example, a lot of schools have the first upper-level undergrad courses Mathematical Probability & Statistics I & II...but really, it is Math Prob 1st term and Math Stats 2nd term). — Gregg H
– Gregg H, Commented Aug 31 at 17:21
I like John Rice’s text Mathematical Statistics and Data Analysis for a blend of theory and practice. — Nick Cox
– Nick Cox, Commented Aug 31 at 20:07
The problem with most textbooks is that they have too much information in them. Before you can get to what you want, you need to read many other things you might not be interested in. My recommendation is to focus on learning the multivariable normal distribution, really really well. All of its mathematical properties. From there you can derive least-squares, confidence intervals for regression, classification, ect. — Nicolas Bourbaki
– Nicolas Bourbaki, Commented Sep 1 at 2:00
I have a sense that "How to Think About Statistics" (ISBN 9780805072556) is perhaps far more basic/introductory than what you are seeking, but it is a tiny gem, in my opinion. It has a (small) "Suggested Readings" page in the back, and perhaps from that list of suggestions you will find a great resource. — pestophagous
– pestophagous, Commented Sep 1 at 17:04

Gregg H · Accepted Answer · 2025-08-31 17:24:58Z

I am going to recommend the textbook I use for my upper-level undergraduate courses in Mathematical Probability & Statistics:
Hogg, R.V., Tanis, E.A., & Zimmerman, D.L. (2020). Probability and Statistical Inference (10th ed.). Hoboken, NJ: Pearson. (ISBN 0-13-518939-X)

As a textbook, I like this because it is written in a format that many undergraduates can read and learn from using just the text itself.

If you feel comfortable enough with the probability theory introduced, you can start in chapters 5 or 6 to focus on the foundations of mathematical statistics.

This is (by far) not the only text available...but I hope this helps.

I don't have this book, but, looking at the table of contents (available from Amazon) I think it illustrates the problem rather than solving it. The first mention of a hypothesis test is on p. 361. All of ANOVA and regression are covered in 12 pages at the end. It's not that this is a bad book (I don't know if it is, but it did get a lot of editions and the authors are reputable) it's that the OP's request really can't be met. — Peter Flom
– Peter Flom, Commented Aug 31 at 19:17

kjetil b halvorsen · Accepted Answer · 2025-09-01 21:36:39Z

I would propose Introduction to Mathematical Statistics and Its Applications (5th Edition) by Larsen & Marx. At 768 pages it contains a lot, including a course in probability. But also the first few probability chapters use a lot of real-data examples. For instance, section 4.2 on the Poisson distribution uses real data (and real controversies) from epidemiology. While it proves the Poisson limit from the binomial, it also investigates it adequacy with finite (small) $n$ and $p$ numerically.

This style, with theory and proofs mixed with real-data examples is used throughout the book.

The first edition of this book was the text used in the first stats course I ever took, and I found it then quite readable. Have to say that some of my co-studenst didn't agree, and the reason for that was their math background. But if your math is up to the task, you should like this book.

The reviews on amazon are mostly positive, not all. I guess the reason for negative reviews are people not having good enough math. I copy one review by an actual instructor having used it:

I have just finished teaching a year of probability and statistics out of the fourth edition of this text. As I was teaching the course, it became clear just how difficult it is to write a mathematically rigorous undergraduate text in mathematical statistics. I selected this book because it seemed to be the best of the department recommended texts. For example, it is a bit more rigorous than Wackerly, Mendenhall, and Schaeffer Mathematical Statistics with Applications ; significantly more rigorous than Devore Probability and Statistics for Engineering and the Sciences ; and just as rigorous but apparently more popular than Hogg and Tannis Probability and Statistical Inference (8th Edition) . The book is readable and well-written and I'll probably use it again if and when I teach the sequence. The authors, as Jay I. Simon pointed out in an earlier review, have a sense of humor. For example, a random walk problem begins with the following sentence: "A somewhat inebriated conventioneer finds himself in the embarrassing predicament of being unable to predetermine whether his next step will be forward or backward." There are several other examples of humor: for example, the authors discuss an airline known as Doomsday Airlines. The reason that I give the book only four stars is that the rigor is on occasion illusory, as Glitzer pointed out in another review. Here is a chapter-by-chapter review.

Preface: The authors claim that the first 7 chapters can easily be covered in one semester. I don't agree with this statement. We covered the first four chapters and part of the fifth, and very few of my students suggested that I was going slowly.

Chapter 1: This is an historical introduction. I don't know about the accuracy of the history (although I believe that the history is accurate), but the authors tell a good story. The treatment of the golden ratio is problematic, since their definition inverts one of the ratios and so their definition is the reciprocal of the usual golden ratio. This is not that problematic in itself, but the continued fraction representation converges to the usual golden ratio.

Chapter 2: This introduces elementary probability and combinatorics. It is one of the best chapters in the text with excellent examples and a good introduction to the Kolmogorov axiomatic framework which does not get bogged down in measure theoretic details.

Chapter 3: Random variables are introduced in this chapter, the longest in the book. Much of the material is well-done in this chapter, but the introduction of continuous random variables is a mess. They initially define continuous sample spaces to be those that are uncountable blatantly disregarding the possibility of mixed distributions. They then define a continuous real-valued random variable to be a function between two subsets of the real numbers and assert without justification that a probability density function. The `definition' is in any case simultaneously too restrictive (the input space need not be real) and too general (the observation space of a binomial random variable is a subset of the real numbers). In the discussion of the relationship between a cdf and a pdf, the authors blatantly misapply the fundamental theorem of calculus since there is no reason to assume that a pdf is continuous. This disregard of basic regularity issues permeates the chapter, usually without comment from the authors. Although there were other factors (many due to me), the confused treatment of continuous random variables was a contributor to the fact that most of my class never had a clear idea of what a random variable was. However, despite these issues, the chapter is still fairly good. The examples and exercises are well done, and not all of them are routine.

Chapter 4: This chapter is devoted to a discussion of some of the more important distributions. The material is generally of high quality. The central limit theorem is stated and the proof is deferred to an appendix. The appendix starts off by stating that the full proof is beyond the level of the text. While I agree with this, I do not understand why one would devote an appendix to `a proof of the central limit theorem' without giving a proof. This is an example of the illusory nature of the apparent rigor of the text.

Chapter 5: This is a very hard chapter on estimation. The key sections are on maximum likelihood estimators, confidence intervals, unbiasedness, and (perhaps) efficiency. Given the difficulty of the notion of sufficiency, I thought that the authors did an excellent job with it. The optional section on Bayesian estimation is also well done.

Chapter 6: Hypothesis testing is introduced here. The authors routinely state hypotheses tests as theorems starting in this chapter. This seems to be an abuse of the term, and when they `prove' the theorems, they typically show that the hypothesis test is at least approximately a generalized likelihood ratio test (GLRT); which is not the same thing at all. Saying that, the basic idea of what an hypothesis test actually is and how to perform one is explained well.

Chapter 7: The basic t and chi-square tests are introduced here. Note that the hypotheses tests, by the time they are actually stated, are pretty obvious which makes it strange that appendices are devoted to their proofs. As noted above, the tests are shown in the appendices to be (at least approximately) GLRTs. I did like the derivation of the various sampling distributions.

Chapter 8: This chapter discusses how to classify data. Although it comes at an appropriate place in the discussion, it might be better to have it earlier so that more students have a chance to consider it in a classroom setting.

Chapter 9: This chapter discusses two-sample data. It's pretty vanilla.

Chapter 10: Here we look at goodness-of-fit tests. The discussion is nice, although I think more attention should have been paid to the categorical distribution rather than simply leaping to the general multinomial distribution.

Chapter 11: At this point, the examples and exercises become much more computationally intensive. For this chapter discusses regression, covariance, and the bivariate normal distribution. I think this is one of the better chapters in the text, although a linear algebraic point of view for the multivariate normal distribution would have made an elegant addition.

Chapter 12: ANOVA is now introduced. Given the complexity of the setup, the authors give a very nice exposition.

We did not have time to discuss chapters 13 (randomized block design) or 14 (non-parametric statistics). My impression is that they are less rigorous but give a good overall view of the basic ideas.

All in all, I would recommend this book to other instructors, and will recommend (actually require) it for future prob/stat students. The book appears to be at about the right level and is superior to the competition. That is despite the confused treatment of continuous random variables and the insistence on stating hypotheses tests as theorems.

civilstat · Accepted Answer · 2025-09-17 18:43:24Z

Take a look at "Stat Labs: Mathematical Statistics Through Applications" by Nolan and Speed. It doesn't cover all of the topics that a traditional mathematical-statistics book would, but it does provide an extended real-world data vignette in each chapter and then discusses (in text or exercises) the theory behind each vignette. For example, Ch 7 introduces a real dataset of crab shell growth, explains the context for why it matters, and then walks you through deriving the least squares estimates of slope & intercept; their relationship to the correlation coefficient; MLEs for a bivariate normal distribution; and so on.

As an aside: It's hard to find a textbook that unifies both perspectives -- I think it's because many statistics courses are focused either primarily on theory or primarily on applications, so the textbooks have a similar divide. For example, I teach widely-accessible undergrad courses on Intro Stats (e.g. best practices for EDA and for the most common hypothesis tests & confidence intervals) and on Statistical Modeling (e.g. best practices in using & interpreting linear and logistic regression). But our theory course has a much more limited audience -- you need calculus, linear algebra, and probability in order to understand the theory of why our Intro Stats or Stat Modeling methods work the way they do -- so we split it off into a separate course, with a separate textbook.

(I'd love to teach an integrated applied+theory sequence, but enrollment pressures make this difficult, since few of the students ready for Intro Stats already know that they'll eventually want to learn the theory too.)

Furthermore, some books on "the mathematical theory of statistical inference" (like Wasserman's) are really aimed at a graduate level audience: What mathematical tools does a MS or PhD student need in order to read theory papers and prove theorems about statistical methods? So besides the theory topics that are directly linked to data analysis (e.g. CDFs can be used to visualize and compare the distributions of actual samples), that's why Wasserman also covers topics that are primarily useful as proof techniques and only tenuously connected to data analysis.

thecity2 · Accepted Answer · 2025-09-02 19:07:39Z

Casella and Berger “Statistical Inference” is really the gold standard along these lines. I would recommend either starting there or even first getting a good grip on probability. I recommend Introduction to Probability by Blitztein and Huang. These two books will get you to 95% of your journey.

Stack Exchange Network

Introductory book on statistical inference that connects theory to data analysis

4 Answers 4

Hot Network Questions

Introductory book on statistical inference that connects theory to data analysis

4 Answers 4

Related

Hot Network Questions