I have a list of integers dims and a list of SparseArrays bdrs (representing a chain complex $\mathbb{Z}^{d_0}\overset{\partial_1}{\leftarrow}\mathbb{Z}^{d_1}\overset{\partial_2}{\leftarrow}\mathbb{Z}^{d_2}\leftarrow\ldots$).
I wish to import/export such data from/to a file.txt (each line should be a matrix entry). For instance, the data $$\mathbb{Z}^{2}\xleftarrow{\left[\begin{smallmatrix}5&0&0\\0&6&7\end{smallmatrix}\right]} \mathbb{Z}^{3}\xleftarrow{\left[\begin{smallmatrix}0&8&0&0\\9&0&0&0\\0&0&-1&-2\end{smallmatrix}\right]}\mathbb{Z}^{4}$$ corresponds to a file
2 3 4 1 1 5 2 2 6 2 3 7 1 2 8 2 1 9 3 3 -1 3 4 -2 and $$ \mathbb{Z}^{7}\xleftarrow{0} \mathbb{Z}^{0}\xleftarrow{0} \mathbb{Z}^{5} \xleftarrow{\left[\begin{smallmatrix}0&0\\0&0\\0&0\\0&0\\0&0\\\end{smallmatrix}\right]} \mathbb{Z}^{2}\xleftarrow{\left[\begin{smallmatrix}0&0&0&15\\21&0&0&0\\\end{smallmatrix}\right]} \mathbb{Z}^{4}$$ corresponds to a file
7 0 5 2 4 1 4 14 2 1 21 My solution is:
chcxIn[file_]:= Module[{s,dims,bdrs={},k=1,i=1}, s=Import["/home/"<>file,"List"]; s=Map[If[#=="",{},ImportString[#,"Table"][[1]]]&,s]; dims=s[[1]]; s=ParallelMap[If[#=={},{},#[[;;2]]->#[[3]]]&,s[[3;;]],{1}]; Do[ If[s[[j]]=={}, AppendTo[bdrs,SparseArray[s[[i;;j-1]], dims[[k;;k+1]]]]; k+=1; i=j+1;],{j,Length@s}]; Return@{bdrs,dims}]; chcxOut[bdrs_,dims_,file_]:= Export["/home/"<>file, {StringReplace[ ToString@dims, {"{"->"","}"->"",","->""}],""}~Join~ Flatten[Table[ArrayRules[b][[;; -2]]~Join~{""} /.({u_,v_}->w_):>(ToString[u]<>" "<>ToString[v]<>" "<>ToString[w]), {b,bdrs}],1]~Join~{""}, "List"]; However, this is hopelessly inefficient (time and memory wise). For 50MB of data, chcxOut needs 65 seconds and 700MB of RAM. This seems excessive. I wish to deal with files of size 10GB. Is there an efficient way of doing this?
Edit: With the help of @HenrikSchumacher, here is an improvement.
chcxIn[fileName_] := Module[{s=OpenRead[fileName],r(*read*), l(*line*), dims,bdrs={},k=0,e={}}, dims=ImportString[Read[s,String],"Table"][[1]]; r:=Read[s,Record,NullRecords->True]; Monitor[If[s=!=$Failed, While[l=!=EndOfFile, l=r; Which[l=="0", , l=="", k+=1; AppendTo[bdrs,SparseArray[e,dims[[k;;k+1]]]]; e={}, True, l=ImportString[l,"Table"][[1]]; AppendTo[e,l[[1;;2]]->l[[3]]]]; ]], k]; Close[s]; {bdrs,dims}]; chcxOut[bdrs_,dims_,fileName_] := Module[{f=OpenWrite[fileName], w(*write*)}, w=WriteString[f,ExportString[#,"Table"]]&; w@{dims}; WriteString[f,"\n\n"]; Monitor[ Do[ If[Times@@dims[[k;;k+1]]==0 || bdrs[[k]]["Density"]==0, w@{0}, w@Join[bdrs[[k]]["NonzeroPositions"],Partition[bdrs[[k]]["NonzeroValues"], 1], 2]]; WriteString[f,"\n\n"],{k,Length@bdrs}],k]; Close[f];]; For a 2MB file, the time and memory performance is: Export 0.1sec 4MB, Import 0.2sec 8MB, chcxOut 1.1sec 12MB, chcxIn 265sec 9MB. As we can see, importing from my custom format is still much slower. Hopefully, there is a better way to do this.
MAT,MTX, or"HarwellBoeing".MATis a binary format and should produce the smallest files. $\endgroup$MATcan handle sequences of matrices. $\endgroup$