Revision 9405e717-13ff-4bbc-82d0-8fc509749b58

Okay, after I have seen that Newton's method does not work well here, here my reformulation of OP's algorithm. This implementation finds the same shifts, but it is far more efficient for large datasets.

 ClearAll[findShift];
 findShift[data_, reference_, {a_, b_, n_Integer}] := 
 Module[{shift, d, h, Y, nf},
 
 (* n+1 = the number of shifting operations per dimension.*)
 (* 
 a = leftmost shift in each direction*)
 (* 
 b rightmost shift in each direction*)
 
 (* The step size.*)
 h = (b - a)/n;
 
 (*Build only one nearest function for all shifting operations.*)
 
 nf = Nearest[data -> "Distance"];
 
 d = Dimensions[data][[-1]];
 (*a vector to store the shifts*)
 shift = ConstantArray[0., d];
 Y = reference;
 Do[
 Y[[All, direction]] -= (a - h);
 shift[[direction]] = MinimalBy[
 (*You can parallize this loop if you want.*)
 Table[
 (*The trick is to shift the reference, not the data! *)
 (*This allows us to reuse the Nearest function nf.*)
 Y[[All, direction]] -= h;
 {s, Total[nf[Y]^2, 2]}
 , {s, a, b, h}
 ],
 Last
 ][[1, 1]];
 (*Apply optimal shift in current direction.*)
 
 Y[[All, direction]] += b - shift[[direction]];
 , {direction, 1, d}];
 
 (*Finally, apply the shifts to the data once.*)
 
 data + ConstantArray[shift, Length[data]]
 ]

And here is a a usage example.

 reference = RandomReal[{-1, 1}, {100, 3}];
 data1 = RandomReal[{-1, 1}, {100000, 3}];
 data1shifted2 = findShift[data1, reference, {-1., 1., 20}]; // AbsoluteTiming // First

> 0.027269

OP's code requires 5.87231 seconds for the same task on my machine. So this is an improvement of factor 200.



**Newton's method is not very effective for this type of problem since there are many local minima. I keep the following only as reference.**


The following implementation is basically Newton's method. It assumes that towards the end of the optimization, the relation between data points and closest reference points does not change anymore. In that regime, the objective function is quadratic and Newton converges in a single step. The Newton search direction is also a good choice outside of this regime, and so we can employ Armijo line search to globalize Newton's method. (Newton's method is notorious for wild behavior outsize a basin of attraction. One of the reasons (when the Hessian is positive-definite) is that the _size_ of the step is typically too large.)

This improves two parts of OP's implementation:

 1. It updates all coordinate directions at once. 

 2. It builds only one `NearestFunction` for the reference data and applies is multiple times. This is an advantage because everytime we call `Nearest`, a certain space partition tree hast to be build. So while the actual data queries are very fast, there is a certain overhead for building the seach index. By calling `Nearest` only once, we have to pay the overhead only once, of course.

Here is the actual code:

 (*We have to generate only on NearestFunction for the reference data; then we can apply it repeatedly.*)
 nf = Nearest[reference];
 X = data1;
 
 (*Some hard-coded parameters for the algorithm.*)
 \[Sigma] = 0.5;
 \[Gamma] = 0.5;
 TOL = 1. 10^-8;
 maxbacktrackings = 100;
 maxiterations = 100;
 
 
 Y = nf[X, 1][[All, 1]];
 (*Objective: sum of squares*)
 F = 1/2 Total[(Y - X)^2, 2];
 (*This is the derivative of the objective.*)
 DF = Total[X - Y];
 (*This is the Newton search direction.*)
 u = -DF/Length[X];
 residualsquared = u.u;
 
 
 iter = 0;
 While[residualsquared > TOL^2 && iter < maxiterations,
 iter++;
 (*Armijo backtracking line search "with quadratic fit".*)
 
 (*Eventually, step size t = 1 should work because we use Newton's method.*)
 t = 1.;
 Xt = X + t ConstantArray[DF, Length[X]];
 Yt = nf[Xt, 1][[All, 1]];
 Ft = 1/2 Total[(Yt - Xt)^2, 2];
 
 slope = DF.u;
 backtrackings = 0;
 While[Ft > F - t \[Sigma] slope && 
 backtrackings < maxbacktrackings,
 backtrackings++;
 tnew = -\[Sigma] t^2 slope/(Ft - F - t slope);
 t = Max[\[Gamma] t, tnew];
 Xt = X + t ConstantArray[u, Length[X]];
 Yt = nf[Xt, 1][[All, 1]];
 Ft = 1/2 Total[(Yt - Xt)^2, 2]
 ];
 
 (*Now we have found a feasible step size t. Updating data*)
 X = Xt;
 Y = Yt;
 F = Ft;
 (*This is the derivative of the objective.*)
 DF = Total[X - Y];
 (*This is the Newton search direction.*)
 u = -DF/Length[X];
 residualsquared = u.u;
 ];

The result (the optimally translated data) is now stored in `X`.

On the example data, this converges with two iterations.



**Edit**

The algorithm minimizes the sum of squared distances from each reference point to its clostests point in the dataset. If you want to minimize the sum of squared distances of each data point from its closest point in the references, the following should do:

 Z = reference;
 X = data1;
 nf = Nearest[X];
 (*Some hard-coded parameters for the algorithm.*)
 \[Sigma] = 0.5;
 \[Gamma] = 0.5;
 TOL = 1. 10^-8;
 maxbacktrackings = 20;
 maxiterations = 100;
 
 Y = nf[Z, 1][[All, 1]];
 (*Objective:sum of squares*)
 F = 1/2 Total[(Z - Y)^2, 2];
 Print["initial fit" -> F];
 (*This is the derivative of the objective.*)
 DF = Total[Y - Z];
 (*This is the Newton search direction.*)
 u = -DF/Length[Z];
 residualsquared = u.u;
 
 iter = 0;
 While[
 residualsquared > TOL^2 && iter < maxiterations,
 iter++;
 (*Armijo backtracking line search "with quadratic \
 fit".*)(*Eventually,step size t=
 1 should work because we use Newton's method.*)t = 1.;
 t = 1.;
 Xt = X + t ConstantArray[u, Length[X]];
 nft = Nearest[Xt];
 Yt = nft[Z, 1][[All, 1]];
 Ft = 1/2 Total[(Z - Yt)^2, 2];
 slope = DF.u;
 backtrackings = 0;
 While[Ft > F + t \[Sigma] slope && backtrackings < maxbacktrackings,
 backtrackings++;
 tnew = -\[Sigma] t^2 slope/(Ft - F - t slope);
 t = Max[\[Gamma] t, tnew];
 Xt = X + t ConstantArray[u, Length[X]];
 nft = Nearest[Xt];
 Yt = nft[Z, 1][[All, 1]];
 Ft = 1/2 Total[(Z - Yt)^2, 2];
 ];
 (*Now we have found a feasible step size t.Updating data*)
 
 X = Xt;
 Y = Yt;
 F = Ft;
 nf = nft;
 (*This is the derivative of the objective.*)
 DF = Total[Y - Z];
 (*This is the Newton search direction.*)
 u = -DF/Length[Z];
 residualsquared = u.u;
 ];
 Print["final fit" -> F];
 Print["residual" -> Sqrt[residualsquared]];