I would propose Introduction to Mathematical Statistics and Its Applications (5th Edition) by Larsen & Marx. At 768 pages it contains a lot, including a course in probability. But also the first few probability chapters use a lot of real-data examples. For instance, section 4.2 on the Poisson distribution uses real data (and real controversies) from epidemiology. While it proves the Poisson limit from the binomial, it also investigates it adequacy with finite (small) $n$ and $p$ numerically.
This style, with theory and proofs mixed with real-data examples is used throughout the book.
The first edition of this book was the text used in the first stats course I ever took, and I found it then quite readable. Have to say that some of my co-studenst didn't agree, and the reason for that was their math background. But if your math is up to the task, you should like this book.
The reviews on amazon are mostly positive, not all. I guess the reason for negative reviews are people not having good enough math. I copy one review by an actual instructor having used it:
I have just finished teaching a year of probability and statistics out of the fourth edition of this text. As I was teaching the course, it became clear just how difficult it is to write a mathematically rigorous undergraduate text in mathematical statistics. I selected this book because it seemed to be the best of the department recommended texts. For example, it is a bit more rigorous than Wackerly, Mendenhall, and Schaeffer Mathematical Statistics with Applications ; significantly more rigorous than Devore Probability and Statistics for Engineering and the Sciences ; and just as rigorous but apparently more popular than Hogg and Tannis Probability and Statistical Inference (8th Edition) . The book is readable and well-written and I'll probably use it again if and when I teach the sequence. The authors, as Jay I. Simon pointed out in an earlier review, have a sense of humor. For example, a random walk problem begins with the following sentence: "A somewhat inebriated conventioneer finds himself in the embarrassing predicament of being unable to predetermine whether his next step will be forward or backward." There are several other examples of humor: for example, the authors discuss an airline known as Doomsday Airlines. The reason that I give the book only four stars is that the rigor is on occasion illusory, as Glitzer pointed out in another review. Here is a chapter-by-chapter review.
Preface: The authors claim that the first 7 chapters can easily be covered in one semester. I don't agree with this statement. We covered the first four chapters and part of the fifth, and very few of my students suggested that I was going slowly.
Chapter 1: This is an historical introduction. I don't know about the accuracy of the history (although I believe that the history is accurate), but the authors tell a good story. The treatment of the golden ratio is problematic, since their definition inverts one of the ratios and so their definition is the reciprocal of the usual golden ratio. This is not that problematic in itself, but the continued fraction representation converges to the usual golden ratio.
Chapter 2: This introduces elementary probability and combinatorics. It is one of the best chapters in the text with excellent examples and a good introduction to the Kolmogorov axiomatic framework which does not get bogged down in measure theoretic details.
Chapter 3: Random variables are introduced in this chapter, the longest in the book. Much of the material is well-done in this chapter, but the introduction of continuous random variables is a mess. They initially define continuous sample spaces to be those that are uncountable blatantly disregarding the possibility of mixed distributions. They then define a continuous real-valued random variable to be a function between two subsets of the real numbers and assert without justification that a probability density function. The `definition' is in any case simultaneously too restrictive (the input space need not be real) and too general (the observation space of a binomial random variable is a subset of the real numbers). In the discussion of the relationship between a cdf and a pdf, the authors blatantly misapply the fundamental theorem of calculus since there is no reason to assume that a pdf is continuous. This disregard of basic regularity issues permeates the chapter, usually without comment from the authors. Although there were other factors (many due to me), the confused treatment of continuous random variables was a contributor to the fact that most of my class never had a clear idea of what a random variable was. However, despite these issues, the chapter is still fairly good. The examples and exercises are well done, and not all of them are routine.
Chapter 4: This chapter is devoted to a discussion of some of the more important distributions. The material is generally of high quality. The central limit theorem is stated and the proof is deferred to an appendix. The appendix starts off by stating that the full proof is beyond the level of the text. While I agree with this, I do not understand why one would devote an appendix to `a proof of the central limit theorem' without giving a proof. This is an example of the illusory nature of the apparent rigor of the text.
Chapter 5: This is a very hard chapter on estimation. The key sections are on maximum likelihood estimators, confidence intervals, unbiasedness, and (perhaps) efficiency. Given the difficulty of the notion of sufficiency, I thought that the authors did an excellent job with it. The optional section on Bayesian estimation is also well done.
Chapter 6: Hypothesis testing is introduced here. The authors routinely state hypotheses tests as theorems starting in this chapter. This seems to be an abuse of the term, and when they `prove' the theorems, they typically show that the hypothesis test is at least approximately a generalized likelihood ratio test (GLRT); which is not the same thing at all. Saying that, the basic idea of what an hypothesis test actually is and how to perform one is explained well.
Chapter 7: The basic t and chi-square tests are introduced here. Note that the hypotheses tests, by the time they are actually stated, are pretty obvious which makes it strange that appendices are devoted to their proofs. As noted above, the tests are shown in the appendices to be (at least approximately) GLRTs. I did like the derivation of the various sampling distributions.
Chapter 8: This chapter discusses how to classify data. Although it comes at an appropriate place in the discussion, it might be better to have it earlier so that more students have a chance to consider it in a classroom setting.
Chapter 9: This chapter discusses two-sample data. It's pretty vanilla.
Chapter 10: Here we look at goodness-of-fit tests. The discussion is nice, although I think more attention should have been paid to the categorical distribution rather than simply leaping to the general multinomial distribution.
Chapter 11: At this point, the examples and exercises become much more computationally intensive. For this chapter discusses regression, covariance, and the bivariate normal distribution. I think this is one of the better chapters in the text, although a linear algebraic point of view for the multivariate normal distribution would have made an elegant addition.
Chapter 12: ANOVA is now introduced. Given the complexity of the setup, the authors give a very nice exposition.
We did not have time to discuss chapters 13 (randomized block design) or 14 (non-parametric statistics). My impression is that they are less rigorous but give a good overall view of the basic ideas.
All in all, I would recommend this book to other instructors, and will recommend (actually require) it for future prob/stat students. The book appears to be at about the right level and is superior to the competition. That is despite the confused treatment of continuous random variables and the insistence on stating hypotheses tests as theorems.