Skip to content

Conversation

@lukemanley
Copy link
Member

Perf improvement for pd.concat when objects contain EA-backed indexes. The bottleneck was EA.tolist. Still relatively slow vs non-EA, but an improvement.

$ asv continuous -f 1.1 upstream/main ea-tolist -b join_merge.ConcatIndexDtype before after ratio [90b4add7] [77a21f8d] <main> <ea-tolist> - 19.4±0.2ms 12.0±0.07ms 0.62 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, True, False) - 14.6±0.2ms 6.78±0.2ms 0.46 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, True, True) - 14.2±0.2ms 6.57±0.1ms 0.46 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, False, False) - 14.1±0.2ms 6.47±0.09ms 0.46 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, False, True) - 23.9±0.1ms 10.3±0.1ms 0.43 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, True, False) - 19.7±0.3ms 5.53±0.05ms 0.28 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, True, True) - 19.0±0.2ms 5.16±0.03ms 0.27 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, False, True) - 18.5±0.2ms 4.82±0.03ms 0.26 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, False, False) - 108±1ms 24.5±0.4ms 0.23 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, True, False) - 98.4±0.6ms 17.0±0.3ms 0.17 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, True, True) - 97.6±0.8ms 15.9±0.3ms 0.16 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, False, True) - 99.2±0.5ms 16.0±0.3ms 0.16 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, False, False) 
$ asv continuous -f 1.1 upstream/main ea-tolist -b array.ArrowStringArray before after ratio [90b4add7] [77a21f8d] <main> <ea-tolist> - 16.5±0.2ms 399±6μs 0.02 array.ArrowStringArray.time_tolist(True) - 17.1±0.2ms 404±6μs 0.02 array.ArrowStringArray.time_tolist(False) 
@lukemanley lukemanley added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 16, 2022
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phofl phofl added this to the 2.0 milestone Oct 17, 2022
@phofl phofl merged commit 4583a04 into pandas-dev:main Oct 17, 2022
@phofl
Copy link
Member

phofl commented Oct 17, 2022

@lukemanley lukemanley deleted the ea-tolist branch October 26, 2022 10:18
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ExtensionArray Extending pandas with custom dtypes or arrays. Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode

3 participants