Code to reproduce:
import pandas as pd from scipy import sparse as sc import numpy as np np.random.seed(42) vals = np.random.randint(0, 10, size=(1000, 1000)) keep = vals > 3 vals[keep] = 0 sparse_mtx = sc.coo_matrix(vals) sparse_pd = pd.DataFrame.sparse.from_spmatrix(sparse_mtx) num_tries = 30 t1 = timeit.timeit(lambda: sparse_pd.to_csv('sparse_pd.csv'), number=num_tries) t2 = timeit.timeit(lambda: sparse_pd.sparse.to_dense().to_csv('sparse_pd.csv'), number=num_tries) overhead = t1/t2 print(t1, t2, overhead) Output:
56.591012510471046 3.7841985523700714 14.954556883657089
Versions:
- python == 3.9.2
- pandas == 1.2.4