-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Needs TriageIssue that has not been reviewed by a pandas team memberIssue that has not been reviewed by a pandas team memberPerformanceMemory or execution speed performanceMemory or execution speed performance
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this issue exists on the latest version of pandas.
-
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
import pandas as pd import numpy as np start, stop, step = 0, 100_000_000, 2 df1 = pd.DataFrame({"key": np.arange(start, stop, step), "val1": np.arange(start, stop, step), "val2": np.arange(start, stop, step)}, copy=False) df1.info() start, stop, step = 50_000_000, 70_000_000, 1 df2 = pd.DataFrame({"key": np.arange(start, stop, step), "val3": np.arange(start, stop, step), "val4": np.arange(start, stop, step)}, copy=False) df2.info() # This filtering should not be necessary for inner join as it only drops unused data df1 = df1.query("key >= @df2.key.min() and key <= @df2.key.max()") df2 = df2.query("key in @df1.key") df = pd.merge_ordered(df1, df2, on="key", how="inner") df.info()Installed Versions
INSTALLED VERSIONS ------------------ commit : 2a953cf80b77e4348bf50ed724f8abc0d814d9dd python : 3.10.9.final.0 python-bits : 64 OS : Linux OS-release : 6.2.0-36-generic Version : #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 9 15:34:04 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : pl_PL.UTF-8 LOCALE : pl_PL.UTF-8 pandas : 2.1.3 numpy : 1.26.2 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 65.5.0 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None bs4 : None bottleneck : None dataframe-api-compat: None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.2 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None Prior Performance

Plot generated by memory_profiler:
- black line: with 2 queries before
merge_ordered - blue line: without 2 queries before
merge_ordered
Metadata
Metadata
Assignees
Labels
Needs TriageIssue that has not been reviewed by a pandas team memberIssue that has not been reviewed by a pandas team memberPerformanceMemory or execution speed performanceMemory or execution speed performance