In GDA we can assume that posterior probability for each of $K$ possible classes is Gaussian with same variance $\Sigma$, and different means $\mu_k$, ie. $$p(\mathbf x|C_k):\mathcal N(\mathbf \mu_k,\mathbf \Sigma), \ k=1,\dots,K.$$ According to Bayes formula we can write $$p(C_k|\mathbf x) = \frac{p(\mathbf x|C_k) \ p(C_k)}{\sum_{j=1}^K p(\mathbf x|C_j)\ p(C_j)} = \frac{\exp(a_k)}{\sum_{j=1}^K \exp(a_j)} $$ where $a_k$ is defined as $$a_k = \ln p(\mathbf x|C_k) p(C_k). $$ What confuse me is that according to Bishop C. book we can represent $a_k$ as a linear model in the following way $$a_k=\mathbf w_k^T \,\mathbf x + w_{k0} $$ with \begin{equation} \mathbf w_k = \mathbf \Sigma^{-1} \mathbf \mu_k, \quad w_{k0}=-\frac{1}{2} \mathbf \mu_k^T \Sigma^{-1} \mathbf \mu_k + \ln p(C_k) \end{equation} because (according to C. Bishop)
"We see that the $a_k$ are again linear functions of $\mathbf x$ as a consequence of the cancellation of the quadratic terms due to the shared covariances."
I don't see how the cancellation of the quadratic terms due to the shared covariances happend?
In my derivation:
\begin{align} a_k &= \ln p(\mathbf x|C_k) p(C_k)\\ &= \ln p(\mathbf x|C_k) + \ln p(C_k)\\ &=\ln(2\pi)^{-\frac{M}{2}}|\mathbf \Sigma|^{- \frac{1}{2}} -\frac{1}{2} (\mathbf x - \mathbf \mu_k)^T \mathbf \Sigma^{-1} (\mathbf x - \mathbf \mu_k)+ \ln p(C_k)\\ &= -\frac{1}{2} \color{red}{\mathbf x^T \mathbf \Sigma^{-1} \mathbf x }+\mathbf x^T\mathbf \Sigma^{-1} \mathbf \mu_k \! -\!\frac{1}{2}\mathbf \mu_k^T\mathbf \Sigma^{-1} \mathbf \mu_k +\ln(2\pi)^{-\frac{M}{2}}|\mathbf \Sigma|^{- \frac{1}{2}} \end{align} I still have quadratic term $\color{red}{\mathbf x^T \mathbf \Sigma^{-1} \mathbf x} $!
Also I got the same linear term but in free term $w_{k0}$ I has additional constant $\ln(2\pi)^{-\frac{M}{2}}|\mathbf \Sigma|^{- \frac{1}{2}} $.
How I can cancel previous quadratic term and get a linear model in multiclass case?
Additional note: For binary case ($K=2$) I already know the answer: we will represent posterior probability with sigmoid function and in that case we will loose quadratic term.