-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
MultiIndexPerformanceMemory or execution speed performanceMemory or execution speed performanceSparseSparse Data TypeSparse Data Type
Milestone
Description
Is your feature request related to a problem?
Converting a sparse Series to a scipy.sparse.coo_matrix could be much faster. I think the get_indexer function defined in _to_ijv adds unnecessary complexity.
Describe the solution you'd like
It can be much faster by accessing the codes attribute of the multiindex, as follows:
i_coord, j_coord = ss.index.codes i_labels, j_labels = ss.index.levels for a two-level multiindex. It should be straightforward to extend to more levels I think.
API breaking implications
None
Describe alternatives you've considered
None
Additional context
To give an example, I started digging into this problem because I had a 2-level-MultiIndexed Series with 61M rows, that is to be converted to a 1M x 1500 sparse matrix. Making the conversion using to_coo() took 10min, making it as described above took half a second.
Metadata
Metadata
Assignees
Labels
MultiIndexPerformanceMemory or execution speed performanceMemory or execution speed performanceSparseSparse Data TypeSparse Data Type