Suppose I have a dataset with 100 rows, but for one of my columns, titled 'Age', there are NaN values for 14 of the rows. A common approach to dealing with this is filling up those NaN values with the median or mean of the data, but what is the justification for this? I can agree that the median or mean age in this case is the most 'likely' age for a random datapoint if the age histogram looks vaguely Gaussian, but why shouldn't I populate those NaNs with a random number taken from a normal distribution centered at that 'most likely' age? Wouldn't that be more realistic? It seems unrealistic to me that if my dataset is missing 14 people, that they're *all* going to be the same age, even if it is the most common age. Seems more likely to me there'd be a variance around that most likely age, just like a normal distribution.