R, 301 287 277 274 222 217 195 bytes
Nothing particularly creative, including the zero buffering of the peripheral elements of the entry matrix:
p=function(x){ w=which.max;d=cbind n=sum(x|1)^.5 s=which(!x^.5%%1) if (sum(s|1)){ y=d(0,t(d(0,x,0)),0) i=w(y*y%in%x[s]) y[!y]=max(x) n=-c(1,n+2) m=i+c(n,-n) j=m[w(-y[m])] z=y[i]=y[i]^.5;y[j]=y[j]*z x=p(y[n,][,n])} x} Nick Kennedy managed a better version of the algorithm as follows:
w=which.max;`~`=cbind;x=scan();while(any(s<-!x^.5%%1)){y=(k=Inf)~t(k~matrix(x,n<-sum(x|1)^.5)~k)~k;i=w(y*y%in%x[s]);m=i+c(-1,-(n=n+2),1,n);y[j]=y[j<-m[w(-y[m])]]*(y[i]=y[i]^.5);x=y[q<-3:n-1,q]};x avoiding the definition of a (recursive) function, plus other nice gains.