0
$\begingroup$

I have a list of integers dims and a list of SparseArrays bdrs (representing a chain complex $\mathbb{Z}^{d_0}\overset{\partial_1}{\leftarrow}\mathbb{Z}^{d_1}\overset{\partial_2}{\leftarrow}\mathbb{Z}^{d_2}\leftarrow\ldots$).

I wish to import/export such data from/to a file.txt (each line should be a matrix entry). For instance, the data $$\mathbb{Z}^{2}\xleftarrow{\left[\begin{smallmatrix}5&0&0\\0&6&7\end{smallmatrix}\right]} \mathbb{Z}^{3}\xleftarrow{\left[\begin{smallmatrix}0&8&0&0\\9&0&0&0\\0&0&-1&-2\end{smallmatrix}\right]}\mathbb{Z}^{4}$$ corresponds to a file

2 3 4 1 1 5 2 2 6 2 3 7 1 2 8 2 1 9 3 3 -1 3 4 -2 

and $$ \mathbb{Z}^{7}\xleftarrow{0} \mathbb{Z}^{0}\xleftarrow{0} \mathbb{Z}^{5} \xleftarrow{\left[\begin{smallmatrix}0&0\\0&0\\0&0\\0&0\\0&0\\\end{smallmatrix}\right]} \mathbb{Z}^{2}\xleftarrow{\left[\begin{smallmatrix}0&0&0&15\\21&0&0&0\\\end{smallmatrix}\right]} \mathbb{Z}^{4}$$ corresponds to a file

7 0 5 2 4 1 4 14 2 1 21 

My solution is:

chcxIn[file_]:= Module[{s,dims,bdrs={},k=1,i=1}, s=Import["/home/"<>file,"List"]; s=Map[If[#=="",{},ImportString[#,"Table"][[1]]]&,s]; dims=s[[1]]; s=ParallelMap[If[#=={},{},#[[;;2]]->#[[3]]]&,s[[3;;]],{1}]; Do[ If[s[[j]]=={}, AppendTo[bdrs,SparseArray[s[[i;;j-1]], dims[[k;;k+1]]]]; k+=1; i=j+1;],{j,Length@s}]; Return@{bdrs,dims}]; chcxOut[bdrs_,dims_,file_]:= Export["/home/"<>file, {StringReplace[ ToString@dims, {"{"->"","}"->"",","->""}],""}~Join~ Flatten[Table[ArrayRules[b][[;; -2]]~Join~{""} /.({u_,v_}->w_):>(ToString[u]<>" "<>ToString[v]<>" "<>ToString[w]), {b,bdrs}],1]~Join~{""}, "List"]; 

However, this is hopelessly inefficient (time and memory wise). For 50MB of data, chcxOut needs 65 seconds and 700MB of RAM. This seems excessive. I wish to deal with files of size 10GB. Is there an efficient way of doing this?


Edit: With the help of @HenrikSchumacher, here is an improvement.

chcxIn[fileName_] := Module[{s=OpenRead[fileName],r(*read*), l(*line*), dims,bdrs={},k=0,e={}}, dims=ImportString[Read[s,String],"Table"][[1]]; r:=Read[s,Record,NullRecords->True]; Monitor[If[s=!=$Failed, While[l=!=EndOfFile, l=r; Which[l=="0", , l=="", k+=1; AppendTo[bdrs,SparseArray[e,dims[[k;;k+1]]]]; e={}, True, l=ImportString[l,"Table"][[1]]; AppendTo[e,l[[1;;2]]->l[[3]]]]; ]], k]; Close[s]; {bdrs,dims}]; chcxOut[bdrs_,dims_,fileName_] := Module[{f=OpenWrite[fileName], w(*write*)}, w=WriteString[f,ExportString[#,"Table"]]&; w@{dims}; WriteString[f,"\n\n"]; Monitor[ Do[ If[Times@@dims[[k;;k+1]]==0 || bdrs[[k]]["Density"]==0, w@{0}, w@Join[bdrs[[k]]["NonzeroPositions"],Partition[bdrs[[k]]["NonzeroValues"], 1], 2]]; WriteString[f,"\n\n"],{k,Length@bdrs}],k]; Close[f];]; 

For a 2MB file, the time and memory performance is: Export 0.1sec 4MB, Import 0.2sec 8MB, chcxOut 1.1sec 12MB, chcxIn 265sec 9MB. As we can see, importing from my custom format is still much slower. Hopefully, there is a better way to do this.

$\endgroup$
7
  • 1
    $\begingroup$ Use a sparse matrix format like MAT, MTX, or "HarwellBoeing". MAT is a binary format and should produce the smallest files. $\endgroup$ Commented Jul 1, 2020 at 20:34
  • $\begingroup$ @HenrikSchumacher These files will also be read in Python, C++, SageMath, GAP, so I'd like to stick to this format. Also, my data is a sequence of matrices, not just one. $\endgroup$ Commented Jul 1, 2020 at 21:35
  • 2
    $\begingroup$ Even more a reason to use a standard format. Otherwise you will have to reimplement import/export in all these languages. Btw.: MAT can handle sequences of matrices. $\endgroup$ Commented Jul 1, 2020 at 21:40
  • $\begingroup$ And all aforementioned programs support this format? Hmm, give me a bit of time to try import/export using .mat. $\endgroup$ Commented Jul 1, 2020 at 22:06
  • $\begingroup$ Yes, have a look. I am pretty sure that there must be APIs for Python (alone for migration reasons from MATLAB to SciPy) and C++. SageMath has MAT support through SciPy, too. So if you call GAP from SageMath, this should also be covered. $\endgroup$ Commented Jul 2, 2020 at 6:52

1 Answer 1

4
$\begingroup$

Something like this should work.

dims = Prepend[(Dimensions /@ bndrs)[[All, 2]], Dimensions[bndrs[[1]]][[1]]]; file = OpenWrite["a.txt"]; WriteString[file, ExportString[{dims}, "Table"]]; Do[ WriteString[file, "\n\n"]; WriteString[ file, ExportString[ Join[A["NonzeroPositions"], Partition[A["NonzeroValues"], 1], 2], "Table" ]], {A, bndrs}]; Close[file] 

The result is a human-readible file, so it is not really super compressed.

$\endgroup$
2
  • $\begingroup$ Thank you, but this does not work when some of the matrices have height or width 0. $\endgroup$ Commented Jul 29, 2020 at 18:33
  • 3
    $\begingroup$ Then you have to write an ecxeption for that case... $\endgroup$ Commented Jul 29, 2020 at 18:36

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.