If we take $n=1$ and $k=K$ and use the probabilities given by the softmax function, then the two definitions match.
As an example, consider $n=1$ and $K=2$. Then the outcome is $y=0$ or $y=1$. Let $p_0$ denote $\mathbb{P}(y=0)$ and $p_1$ denote $\mathbb{P}(y=1)$. By the Kolmogorov axioms, we know $p_0 + p_1 = 1$ and $p_0 \ge 0$ and $p_1 \ge 0$. When the $p_i$ are fixed and the trials are independent, this is a Bernoulli distribution.
In terms of your notation, we know that the $p_i$ are not fixed, but instead vary with the features $x$ and the parameters $\theta$. So we can write $$ p_i = \frac{\exp(f_i(x,\theta))}{\exp(f_0(x,\theta))+\exp(f_1(x,\theta))} $$ where the $f_i$ are the outputs of the neural network.
It's somewhat cumbersome, but we can write this in the form of a multinomial distribution. Most of the values that appear in the multinomial form can be simplified.
$$\begin{align} \mathbb P(y=0) &=\frac{1!}{(1-0)!0!} \left(\frac{\exp(f_0(x,\theta))}{\exp(f_0(x,\theta))+\exp(f_1(x,\theta))}\right)^1 \left(\frac{\exp(f_1(x,\theta))}{\exp(f_0(x,\theta))+\exp(f_1(x,\theta))}\right)^0 \\ &= \frac{\exp(f_0(x,\theta))}{\exp(f_0(x,\theta))+\exp(f_1(x,\theta))} \end{align}$$
and similarly for the case $\mathbb P(y=1)$. This example can be generalized to the multinomial case.
In all of these circumstances, the $p_i$ are the only thing that you are estimating (as functions of features and parameters). This model assumes that you know the number of draws $n$ and the number of classes $K$.