Skip to content

Conversation

@jbrockmendel
Copy link
Member

This is mostly cut/paste, avoiding moving any of the classes for the time being. There is a small change in _to_dt64 that will be described in a comment below.

Defined dtstrings for flake8 cleanup in setup.

@jreback jreback added Frequency DateOffsets Performance Memory or execution speed performance labels Oct 10, 2017
@jbrockmendel
Copy link
Member Author

CircleCI test errors look unrelated. Are they showing up elsewhere?

@codecov
Copy link

codecov bot commented Oct 17, 2017

Codecov Report

Merging #17830 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #17830 +/- ## ========================================== - Coverage 91.23% 91.21% -0.03%  ========================================== Files 163 163 Lines 50105 49909 -196 ========================================== - Hits 45715 45523 -192  + Misses 4390 4386 -4
Flag Coverage Δ
#multiple 89.01% <100%> (-0.02%) ⬇️
#single 40.2% <60%> (-0.18%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.15% <100%> (-0.01%) ⬇️
pandas/tseries/frequencies.py 96% <100%> (-0.11%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/common.py 91.42% <0%> (-1.39%) ⬇️
pandas/core/dtypes/concat.py 98.26% <0%> (-0.87%) ⬇️
pandas/core/dtypes/missing.py 90.44% <0%> (-0.18%) ⬇️
pandas/core/generic.py 92.03% <0%> (-0.17%) ⬇️
pandas/core/frame.py 97.74% <0%> (-0.11%) ⬇️
pandas/core/series.py 94.89% <0%> (-0.1%) ⬇️
pandas/core/internals.py 94.38% <0%> (-0.07%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bf7f9a...6118488. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 17, 2017

Codecov Report

Merging #17830 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #17830 +/- ## ========================================== - Coverage 91.24% 91.22% -0.03%  ========================================== Files 163 163 Lines 50168 50083 -85 ========================================== - Hits 45777 45686 -91  - Misses 4391 4397 +6
Flag Coverage Δ
#multiple 89.03% <100%> (-0.02%) ⬇️
#single 40.23% <60%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/frequencies.py 96% <100%> (-0.11%) ⬇️
pandas/tseries/offsets.py 97.15% <100%> (-0.01%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.5% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4489389...ced2c8d. Read the comment docs.

@jbrockmendel
Copy link
Member Author

This can be cut down if the diff is too large for easy review.

Looking forward to getting the dtstrings alias in setup.py --> fix a bunch of flake8 warnings.

# Offset names ("time rules") and related functions


from pandas._libs.tslibs.offsets import _offset_to_period_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can de-privatize later


from pandas._libs.tslib import _delta_to_nanoseconds
from pandas._libs.tslibs.offsets import ( # noqa
ApplyTypeError, CacheableOffset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need the noqa, that means you are importing extra things that are not needed. IOW some of these are just used internally (now in offsets.pyx)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two unused imports are WeekDay and CacheableOffset. They're both used in the tests (though CacheableOffset not really). I'll update the imports and get rid of the noqa.

# ---------------------------------------------------------------------
# DateOffset


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a note in other api changes, that CacheableOffset & Weekday are no longer public (prob not ever used publicly but who knowns).

cimport numpy as np
np.import_array()


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok with de-privatizing things internally here (can be later as well)

pass


# TODO: unused. remove?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This came up recently in #17914.

@jreback
Copy link
Contributor

jreback commented Oct 27, 2017

also pls run the perf for timeseries & offsets and report any issues. should slightly speed things up. be careful with the cacheable stuff though.

@pep8speaks
Copy link

pep8speaks commented Oct 28, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 28, 2017 at 06:04 Hours UTC
@jbrockmendel
Copy link
Member Author

asv continuous -E virtualenv -f 1.1 master HEAD -b timeseries [...] before after ratio [3ba2cfff] [92d7e290] + 544±30ms 665±50ms 1.22 timeseries.ToDatetime.time_format_no_exact + 16.6±0.02μs 19.6±0.09μs 1.18 timeseries.AsOf.time_asof_single_early - 3.75±0.07ms 3.38±0ms 0.90 timeseries.DatetimeIndex.time_normalize - 449±20ms 400±0.6ms 0.89 timeseries.SemiMonthOffset.time_end_decr_rng - 20.7±3μs 18.5±0.07μs 0.89 timeseries.Offsets.time_timeseries_day_apply - 17.9±3μs 15.7±0.1μs 0.88 timeseries.Offsets.time_custom_bday_apply - 3.70±0.2ms 3.05±0.1ms 0.82 timeseries.AsOf.time_asof_nan - 29.1±7μs 23.2±0.05μs 0.80 timeseries.Offsets.time_custom_bday_cal_incr - 141±2μs 112±0.02μs 0.80 timeseries.DatetimeIndex.time_unique - 604±10μs 473±0.6μs 0.78 timeseries.DatetimeIndex.time_reset_index_tz - 469±5μs 362±0.2μs 0.77 timeseries.DatetimeIndex.time_reset_index - 429±0.8μs 331±0.8μs 0.77 timeseries.DatetimeIndex.time_timeseries_is_month_start - 12.2±0.7ms 9.18±0.02ms 0.75 timeseries.ResampleSeries.time_period_downsample_mean - 199±0.9μs 149±2μs 0.75 timeseries.Offsets.time_custom_bmonthend_incr - 224±10ms 164±10ms 0.73 timeseries.ToDatetime.time_iso8601_tz_spaceformat - 3.12±0.2ms 2.20±0ms 0.71 timeseries.ResampleDataFrame.time_min_string - 2.86±0ms 2.01±0.05ms 0.70 timeseries.ResampleDataFrame.time_mean_numpy - 8.48±0.5μs 5.88±0.01μs 0.69 timeseries.DatetimeIndex.time_timestamp_tzinfo_cons - 44.1±3μs 30.3±0.2μs 0.69 timeseries.Offsets.time_custom_bday_cal_incr_neg_n - 41.0±1μs 28.2±0.1μs 0.69 timeseries.Offsets.time_custom_bday_cal_decr - 6.12±0.05ms 4.19±0.05ms 0.68 timeseries.ResampleSeries.time_timestamp_downsample_mean - 2.37s 1.57s 0.66 timeseries.Iteration.time_iter_periodindex - 237±0.4μs 154±0.2μs 0.65 timeseries.Offsets.time_custom_bmonthbegin_incr_n - 8.34±0.01ms 5.40±0.08ms 0.65 timeseries.DatetimeIndex.time_infer_freq_daily - 608±20ms 388±0.4ms 0.64 timeseries.SeriesArithmetic.time_add_offset_slow - 43.5±0.8μs 27.6±0.1μs 0.63 timeseries.Offsets.time_custom_bday_decr - 23.3±3ms 14.6±0.09ms 0.63 timeseries.Iteration.time_iter_periodindex_preexit - 717±50ms 446±0.7ms 0.62 timeseries.Iteration.time_iter_datetimeindex - 85.5±10μs 52.5±0.4μs 0.61 timeseries.SemiMonthOffset.time_begin_decr - 87.3±3μs 53.2±0.05μs 0.61 timeseries.SemiMonthOffset.time_begin_decr_n - 753±50ms 419±0.6ms 0.56 timeseries.SemiMonthOffset.time_begin_apply_index - 769±0.6ms 398±0.9ms 0.52 timeseries.SemiMonthOffset.time_begin_incr_rng - 16.9±0.1ms 8.40±0.01ms 0.50 timeseries.Iteration.time_iter_datetimeindex_preexit - 5.31±0.02ms 2.51±0ms 0.47 timeseries.AsOf.time_asof_nan_single - 248±60ms 8.65±0.2ms 0.03 timeseries.SeriesArithmetic.time_add_offset_fast - 307±40ms 3.81±0.01ms 0.01 timeseries.SeriesArithmetic.time_add_offset_delta 

About to re-run with CPU affinity to be on the safe side.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

what were these doing to be so before?

  •  248±60ms 8.65±0.2ms 0.03 timeseries.SeriesArithmetic.time_add_offset_fast 
  •  307±40ms 3.81±0.01ms 0.01 timeseries.SeriesArithmetic.time_add_offset_delta 
@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

@jbrockmendel pls always always rebase on master before you run a benchmark. this is against a commit from 10/9. Pretty much anytime you do anything you should rebase.

@jbrockmendel
Copy link
Member Author

Woops, good catch. Re-running now.

@jbrockmendel
Copy link
Member Author

taskset 8 asv continuous -E virtualenv -f 1.1 master HEAD -b timeseries [...] before after ratio [1977362f] [92d7e290] + 27.6±0.1μs 32.0±0.2μs 1.16 timeseries.Offsets.time_custom_bday_cal_decr - 8.95±0.03μs 8.07±0.05μs 0.90 timeseries.Offsets.time_timeseries_year_apply - 11.4±0.06μs 10.2±0.06μs 0.90 timeseries.Offsets.time_timeseries_year_incr - 175±1μs 153±0.7μs 0.87 timeseries.Offsets.time_custom_bmonthbegin_incr_n - 6.80±0.03μs 5.84±0μs 0.86 timeseries.DatetimeIndex.time_timestamp_tzinfo_cons - 109±20ms 40.9ms 0.37 timeseries.DatetimeIndex.time_dti_factorize - 105±0.04ms 36.0±3ms 0.34 timeseries.DatetimeIndex.time_dti_tz_factorize - 1.31s 299±3ms 0.23 timeseries.AsOfDataFrame.time_asof_nan - 1.99s 290±3ms 0.15 timeseries.AsOfDataFrame.time_asof 

(re-running)

The most commonly-called function that got moved to cython is _is_normalized, which gets called in onOffset, which in turn gets called in FooOffset.apply and offsets.generate_range. I'm not sure about that factorize and asofs, but that would explain (all but one of) the Offsets timings here.

@jbrockmendel
Copy link
Member Author

Same results again. I'll dig into the custom_bday_apply to see what the deal is.

@jreback jreback added this to the 0.22.0 milestone Oct 28, 2017
@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

ok this looks fine. ping after perf review (though that looks like a minor issue).

@jbrockmendel
Copy link
Member Author

Using cProfile I get a tiny speedup with the new version. With %timeit I get results similar to asv but a narrower difference. This is a mystery I'm OK with, given that follow-ons are going to optimize the tar out of these.

@jreback jreback merged commit b2d0d1b into pandas-dev:master Oct 28, 2017
@jbrockmendel jbrockmendel deleted the tslibs-offsets2 branch October 28, 2017 18:53
@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

thanks @jbrockmendel

Using cProfile I get a tiny speedup with the new version. With %timeit I get results similar to asv but a narrower difference. This is a mystery I'm OK with, given that follow-ons are going to optimize the tar out of these.

I wouldn't have expected much as these are mostly used in scalar type things which are 1-offs; and these are not typed much either (so that may help)

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
@jorisvandenbossche
Copy link
Member

@jbrockmendel Is there a reason that CacheableOffset and WeekDay are no longer public in tseries.offsets ?
They are not useful? For those using it, what is the alternative?

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

WeekDay is just an enum map and CacheableOffest is not usable on its own
both of these are private classes

@jorisvandenbossche
Copy link
Member

OK (note they are not private in the sense they are non-underscored classes in a module that is publicly exposed, but indeed seems like deprecation is not really needed in this case)

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Frequency DateOffsets Performance Memory or execution speed performance

4 participants