The ellipse is given by
$ (r - r_0)^T Q (r - r_0) = 1 $
where
$ Q = R D R^T $
where
$ D = \begin{bmatrix} \dfrac{1}{a^2} && 0 \\ 0 && \dfrac{1}{b^2} \end{bmatrix} $
and
$ R = \begin{bmatrix} \cos \theta && - \sin \theta \\ \sin \theta && \cos \theta \end{bmatrix}$
Hence,
$Q = \begin{bmatrix} \dfrac{1}{a^2} \cos^2 \theta + \dfrac{1}{b^2} \sin^2 \theta && \cos \theta \sin \theta (\dfrac{1}{b^2} - \dfrac{1}{a^2}) \\ \cos \theta \sin \theta (\dfrac{1}{b^2} - \dfrac{1}{a^2}) && \dfrac{1}{a^2} \sin^2 \theta + \dfrac{1}{b^2} \cos^2 \theta \end{bmatrix} $
$r_0$ is the center of the best fit ellipse. To find this ellipse, define the equation error function
$ E = \displaystyle \sum_{i} \bigg( (r_i - r_0)^T Q (r_i - r_0) - 1 \bigg)^2 $
Next, find the partial derivatives of $E$ with respect to $r_0$ and $\theta$
Define $v_i = r_0 - r_i $
Then
$ E = \displaystyle \sum_{i} ( v_i^T Q v_i - 1)^2 $
$\nabla_{r_0} E = 2 \displaystyle \sum_i \bigg( v_i^T Q v_i - 1 \bigg) (2) Q v_i $
and
$ \dfrac{\partial E}{\partial \theta} = 2 \displaystyle \sum_i \bigg( v_i^T Q v_i - 1 \bigg) \bigg( v_i^T \dfrac{d Q}{d \theta} v_i \bigg) $
In addition to the derivatives of $E$, we need to find the second partial derivatives which will constitute the Jacobian of the first derivatives.
We have
$ J_{r_0} = 4 \displaystyle \sum_i \bigg( 2 Q v_i v_i^T Q + \bigg( v_i^T Q v_i - 1 \bigg) Q \bigg) $
and
$ \dfrac{\partial \nabla_{r_0} E }{\partial \theta} = 4 \displaystyle \sum_i \bigg( v_i^T \dfrac{dQ}{d\theta} v_i ( Q v_i ) + (v_i^T Q v_i - 1) \dfrac{dQ}{d\theta} v_i \bigg) $
Finally,
$ \dfrac{\partial^2 E}{\partial \theta^2} = 2 \displaystyle \sum_i \bigg( (v_i^T \dfrac{dQ}{d\theta} v_i)^2 + (v_i^T Q v_i - 1) (v_i^T \dfrac{d^2 Q }{d \theta^2} v_i) \bigg) $
So that the overall Jacobian matrix is
$ J = \begin{bmatrix} J_{r_0} && \dfrac{\partial \nabla_{r_0} E }{\partial \theta} \\ \dfrac{\partial \nabla_{r_0} E }{\partial \theta}^T && \dfrac{\partial^2 E}{\partial \theta^2} \end{bmatrix}$
Define the unknowns vector as
$ \mathbf{x} = \begin{bmatrix} r_{0x} \\ r_{0y} \\ \theta \end{bmatrix} $
And the nonlinear vector function
$ \mathbf{y} = \begin{bmatrix} \dfrac{\partial E}{\partial r_{0x} } \\ \dfrac{\partial E}{\partial r_{0y} } \\\dfrac{\partial E}{\partial \theta } \end{bmatrix} $
Start with an initial guess $\mathbf{ x^{(0)} }$, and iterate using
$ \mathbf{x_{(k+1)} } = \mathbf{ x_{(k)}} - J_k^{-1} \mathbf{ y_{(k)} } $
And this should converge to the optimal solution within a few iterations.
Update:
I tested the above method with $40$ points that lie on an ellipse with semi-major axis $8$ and semi-minor axis $4$, and I set $a = 5, b = 3$. The method converged in $26$ iterations. I plotted the points and the generated ellipse. This is shown in the figure below.

I re-tested the above method, but this time, I took my $40$ points on an ellipse of semi-major axis $5$ and semi-minor axis $3$, and as before, I set $a = 5$, $b = 3$. The method converged in $10$ iterations. The resulting ellipse fits the data exactly as shown below.
