R, 301 287 277 274 222 217 195 186 178 173 bytes
Nothing particularly creative, including the zero buffering of the peripheral elements of the entry matrix, an earlier version later improved by Robin:
function(x){w=which.max if(any(s<-!x^.5%%1)){ y=cbind(NA,rbind(NA,x,NA),NA) z=y[i]=y[i<-w(y*y%in%x[s])]^.5 m=i+c(r<--c(1,nrow(y)),r) y[j]=y[j<-m[w(-y[m])]]*z x=p(y[r,r])} x} Using a sequence of numbers as its entry, and hence removing the call to a function, Nick Kennedy earlier managed a 186 bytes version of the algorithm as follows (with -10 bytes by Robin):
w=which.max;`~`=cbind;x=scan();while(any(s<-!x^.5%%1)){y=NA~t(NA~matrix(x,n<-length(x)^.5)~NA)~NA;i=w(y*y%in%x[s]);=i+c(r<--c(1,n+2),-r);y[j]=y[j<-m[w(-y[m])]]*(y[i]=y[i]^.5);x=y[r,r]};x avoiding the definition of a (recursive) function, plus other nice gains.