I often have to deal with nonlinear shape optimization problems involving curves or surfaces surfaces. A classical example is Plateau's problem.
No matter if I use gradient-based techniques (such as mean curvature flow or $H^1$-gradient flow) or Newton's method, I find myself in the position of having to assemble a relatively large SparseArray in each iteration (in order to solve some system of linear equations afterwards). The point is, the sparsity pattern of these matrices are always the same; it is just the nonzero values that change. Actually, the same occurs if one has to solve, e.g., nonlinear elliptic PDE or parabolic PDE with time-dependent or state-dependent coefficients (see here for such an example).
Here is a typical example for assembling the discrete Laplace-Beltrami operator on a triangle mesh. We start with two functions for computing the values and the respective positions where they have to be added into the final matrix.
getLaplacian = Quiet@Block[{xx, x, PP, P, UU, U, f, Df, u, Du, v, Dv, g, integrant, quadraturepoints, quadratureweights}, xx = Table[Part[x, i], {i, 1, 2}]; PP = Table[Part[P, i, j], {i, 1, 3}, {j, 1, 3}]; UU = Table[Part[U, i], {i, 1, 3}]; f = x \[Function] PP[[1]] + x[[1]] (PP[[2]] - PP[[1]]) + x[[2]] (PP[[3]] - PP[[1]]); Df = x \[Function] Evaluate[D[f[xx], {xx}]]; g = x \[Function] Evaluate[Df[xx]\[Transpose].Df[xx]]; u = x \[Function] UU[[1]] + x[[1]] (UU[[2]] - UU[[1]]) + x[[2]] (UU[[3]] - UU[[1]]); Du = x \[Function] Evaluate[D[u[xx], {xx}]]; integrant = x \[Function] Evaluate[D[Du[xx].Inverse[g[xx]].Du[xx] Sqrt[Abs[Det[g[xx]]]], {UU, 2}]]; quadraturepoints = {{1/3, 1/3}}; quadratureweights = {1/2}; With[{code = N[quadratureweights.Map[integrant, quadraturepoints]] /. Part -> Compile`GetElement}, Compile[{{P, _Real, 2}}, code, CompilationTarget -> "C", RuntimeAttributes -> {Listable}, Parallelization -> True, RuntimeOptions -> "Speed" ]]]; getLaplacianCombinatorics = Block[{ff}, With[{code = Flatten[Table[{ Compile`GetElement[ff, i], Compile`GetElement[ff, j]}, {j, 1, 3}, {i, 1, 3}], 1]}, Compile[{{ff, _Integer, 1}}, code, CompilationTarget -> "C", RuntimeAttributes -> {Listable}, Parallelization -> True, RuntimeOptions -> "Speed" ]]]; Now, let's create a very fine triangle mesh on my beloved "Triceratops".
R = ExampleData[{"Geometry3D", "Triceratops"}, "MeshRegion"]; R = DiscretizeRegion[R, MaxCellMeasure -> {1 -> 0.01}]; MeshCellCount[R, 0] MeshCellCount[R, 2] 666191
1332378
And this is how the matrix gets assembled
tuples = MeshCells[R, 2, "Multicells" -> True][[1, 1]]; pat = Flatten[getLaplacianCombinatorics[tuples], 1]; // RepeatedTiming // First vals = Flatten[getLaplacian[Partition[MeshCoordinates[R][[Flatten[tuples]]], 3]]]; // RepeatedTiming // First A = With[{spopt = SystemOptions["SparseArrayOptions"]}, Internal`WithLocalSettings[ SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> Total}], SparseArray[pat -> vals, {MeshCellCount[R, 0], MeshCellCount[R, 0]}, 0.], SetSystemOptions[spopt] ] ]; // RepeatedTiming // First 0.21
0.20
0.72
What catches the eye is that the mere assembly of the matrix takes several times longer than the actual number crunching in getLaplacian. Now think of the surface has moved slightly during the optimization process so that the MeshCoordinates have changed while all the combinatorics (the MeshCells) stay the same. That means, we can reuse pat, but we have to 1.) recompute vals and 2.) reassemble the matrix A.
As time is often crucial and because the assembly of the matrix needs about 80% of the time, I wonder whether this can be improved.
ord = Ordering[pat]; po = pat[[ord]]; vo = vals[[ord]];$\endgroup$LinearSolvethen take compared to the assembly? $\endgroup$SparseArray[po -> _, {MeshCellCount[R, 0], MeshCellCount[R, 0]}, 0.]; // RepeatedTiming // First$\endgroup$LinearSolve[A, Method -> "Pardiso"](for a matrixAof the same size but that has no nullspace) takes about2.2seconds. However, when I use MKL Pardiso over LibraryLink and reuse the symbolic factorization, the numerical factorization after refreshing the nonzero values needs only0.4seconds, so roughly half as long as the assembly. $\endgroup$