Skip to content

Conversation

@Komnomnomnom
Copy link
Contributor

This fixes the main JSON performance regression in v0.13 (closes #5765). The main bottleneck was the use of intermediate NumPy scalars (from PR #4498). I introduced code to avoid the use of NumPy scalars. Also:

  • checks current locale so locale fudging is only done if necessary
  • added numpy 1.6 preprocessor switches as it still requires the use of scalars due to its weird datetime handling
  • reorgranised some of the if/else checks in objToJSON.c based on likelihood.
  • added some JSON benchmarks to vbench to try and avoid future regressions.

0.12

In [1]: import pandas as pd, numpy as np In [2]: df = pd.DataFrame(np.random.rand(100000,10)) In [3]: %timeit df.to_json(orient='split') 10 loops, best of 3: 119 ms per loop In [4]: pd.__version__, np.__version__ Out[4]: ('0.12.0', '1.8.0')

This PR

In [1]: import pandas as pd, numpy as np In [2]: df = pd.DataFrame(np.random.rand(100000,10)) In [3]: %timeit df.to_json(orient='split') 10 loops, best of 3: 119 ms per loop In [4]: pd.__version__, np.__version__ Out[4]: ('0.13.0-406-ga18e5e6', '1.8.0')

While this solves the main performance issue, using vbench this PR still appears to be a bit slower than v0.12 (~ 2 - 5%) but it's unclear where the slowdown is coming from.

@jreback do you still have a windows box / vm that you could test on?

@jreback
Copy link
Contributor

jreback commented Jan 28, 2014

 ------------------------------------------------------------------------------- Test name | head[ms] | base[ms] | ratio | ------------------------------------------------------------------------------- packers_write_json | 23.3610 | 42.2543 | 0.5529 | packers_write_json_date_index | 31.3267 | 45.5254 | 0.6881 | packers_read_json | 39.0206 | 39.3570 | 0.9915 | packers_read_json_date_index | 39.2287 | 39.2920 | 0.9984 | ------------------------------------------------------------------------------- Test name | head[ms] | base[ms] | ratio | ------------------------------------------------------------------------------- Ratio < 1.0 means the target commit is faster then the baseline. Seed used: 1234 Target [0975572] : Merge branch 'json-0.13-slowdown' of https://github.com/Komnomnomnom/pandas into Komnomnomnom-json-0.13-slowdown Base [464c1f9] : Add Scatter-CI link to README.md 
jreback added a commit that referenced this pull request Jan 28, 2014
PERF: fix JSON performance regression from 0.12 (GH5765)
@jreback jreback merged commit 54945de into pandas-dev:master Jan 28, 2014
@jreback
Copy link
Contributor

jreback commented Jan 28, 2014

looks good....tested fine on windows , fyi you can watch this: http://scatterci.github.io/ScatterCI-Pandas/

for all builds (linux/windows/sparc builds too)

@Komnomnomnom
Copy link
Contributor Author

Awesome! That looks pretty sweet, thanks.

@Komnomnomnom Komnomnomnom deleted the json-0.13-slowdown branch January 28, 2014 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants