2
$\begingroup$

I've been trying to learn data science for a while now. In fact, I actually finished the "Data Scientist Associate" career path in DataCamp. However, as you might expect, the courses don't cover everything (had a lot of gaps in my knowledge when I worked with real datasets). So I'm reading a couple of books to cover these gaps.

The problem is that I like to read textbooks that cover data science in general and not go into too much theoretical detail for the topics/subtopics (when I need to I find the required information from more specific sources) because I'm not explicitly a data scientist. But none of the books I'm reading cover missing data properly. Experimental Design and Data Analysis for Biologists by Quinn and Keough has a missing data section but it's more on what missing data is. And the books I found on missing data specifically are too detailed.

I can deal with details if there is no solution but I'd love to hear suggestions from you for books with proper amount of explanation (not too detailed not too simple).


Here are the books I've looked at so far:

  • Experimental Design and Data Analysis for Biologists (Quinn and Keough) - too simple
  • Practical Statistics for Data Scientists (Bruce, Bruce and Gedeck) - no missing data part
  • Missing Data: Analysis and Design (Graham) - much too detailed
  • Applied Missing Data Analysis (Enders) - my favorite so far but still a bit complex
  • Multiple Imputation of Missing Data (He, Zhang and Hsu) - similar to Enders'
  • Fundamentals of Biostatistics (Rosner) - no missing data part
  • Introduction to the Practice of Statistics (Moore, McCabe and Craig) - no missing data part
$\endgroup$
6
  • 2
    $\begingroup$ These might be of interest: scikit-learn examples on missing data imputation, and the scikit-learn user guide on tools for handling missing data. Missing data handling in pandas. $\endgroup$ Commented Jan 9 at 12:36
  • 1
    $\begingroup$ Chapter 2 (End-to-End Machine Learning Project) of Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems 2nd edition (by Aurélien Géron) covers data cleaning and dealing with missing data. $\endgroup$ Commented Jan 9 at 16:56
  • $\begingroup$ Thank you both. These are not exactly what I was looking for but they are really great resources. Thanks to your help though I was able to find what I was looking for in couple papers: The prevention and handling of the missing data by Kang and Missing data: Issues, concepts, methods by Pham, Pandis and White $\endgroup$ Commented Jan 10 at 11:22
  • 1
    $\begingroup$ Probably drop the idea of books initially, and consider a common rule of thumb to fill missing values with the average or median value of a feature. After this, look at hot deck imputation ( statisticseasily.com/glossario/what-is-hot-deck-imputation) and then use of chained equations in Markov Chain Monte Carlo (MCMC) simulation. Top tier biomedical journals (impact>30) almost always require MCMC if there's missing data. $\endgroup$ Commented Jan 12 at 20:24
  • $\begingroup$ This is very useful; thank you! I'm using books because I have trouble learning abstract concepts, and performing these imputation methods on example datasets helps me get a feel for them in general (like what happens to the data when I do what and why, etc.). So I'm just making sure I learn them right and have the proper resources for when/if I actually have to use them. But I will definitely prioritize these methods first. $\endgroup$ Commented Jan 14 at 11:18

2 Answers 2

0
$\begingroup$

Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems 3nd edition ~ Aurélien Géron

It's a good book and covers everything about Missing values. It's part of EDA and there is no generic strategy for the same

$\endgroup$
0
$\begingroup$

here are some lists of the books that cover topics on the missing data in data science and analysis.

  1. Applied Missing Data Analysis by Craig K. Enders
  2. Missing Data: Analysis and Design by John W. Graham
  3. Statistical Analysis with Missing Data by Roderick J. A. Little and Donald B. Rubin
  4. Flexible Imputation of Missing Data by Stef van Buuren

Hope this will help.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.