An important point that has not been addressed in the previous (excellent) answers is the actual estimation step. Multinomial logit models can be estimated numerically because their CDF has an analytical integralhave a PDF that is easy to integrate, leading to a closed-form expression of the choice probability. The density function of the normal distribution is not so easily integrated, so probit models typically require simulation. So while both models are abstractions of real world situations, logit is usually simplerfaster to use on larger problems (multiple alternatives or large datasets).
That saidTo see this more clearly, there has been some interesting work by Chandra Bhatthe probability of a particular outcome being selected is a function of the $x$ predictor variables and the $\varepsilon$ error terms (following Train)
$$ P = \int I[\varepsilon > -\beta'x] f(\varepsilon)d\varepsilon $$ Where $I$ is an indicator function, 1 if selected and zero otherwise. Evaluating this integral depends heavily on the assumption of $f(x)$. In a logit model, this is a logistic function, and a normal distribution in finding fast estimatorsthe probit model. For a logit model, this becomes
$$ P=\int_{\varepsilon=-\beta'x}^{\infty} f(\varepsilon)d\varepsilon\\ = 1- F(-\beta'x) = 1-\dfrac{1}{\exp(\beta'x)} $$
No such convenient form exists for general probit models, if you're interested.