Edit for computational issues
Most (all?) of this stuff is in the Gaussian Process book by Rasmussen and Williams.
Computational issues are tricky for GPs. If we proceed niavely we will need $O(N^2)$ size memory just to hold the covariance matrix and (it turns out) $O(N^3)$ operations to invert it. There are a few things we can do to make things more feasible. One option is to note that guy that we really need is $v$, the solution to $(K + \sigma^2 I)v = Y$ where $K$ is the covariance matrix. The method of conjugate gradients solves this exactly in $O(N^3)$ computations, but if we satisfy ourselves with an approximate solution we could terminate the conjugate gradient algorithm after $k$ steps and do it in $O(kN^2)$ computations. We also don't necessarily need to store the whole matrix $K$ at once.
So we've moved from $O(N^3)$ to $O(kN^2)$, but this still scales quadratically in $N$, so we might not be happy. The next best thing is to work instead with a subset of the data, say of size $m$ where inverting and storing an $m \times m$ matrix isn't so bad. Of course, we don't want to just throw away the remaining data. The subset of regressors approach notes that we can derive the posterior mean of our GP as a regression of our data $Y$ on $N$ data-dependent basis functions determined by our covariance function; so we throw all but $m$ of these away and we are down to $O(m^2 N)$ computations.
A couple of other potential options exist. We could construct a low-rank approximation to $K$, and set $K = QQ^T$ where $Q$ is $n \times q$ and of rank $q$; it turns inverting $K + \sigma^2 I$ in this case can be done by instead inverting $Q^TQ + \sigma^2 I$. Another option is to choose the covariance function to be sparse and use conjugate gradient methods - if the covariance matrix is very sparse then this can speed up computations substantially.