Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented May 27, 2020

Reverts #34389

Closes #36495

@jbrockmendel
Copy link
Member

do the asvs show any of the perf issues mentioned in #34389?

@jorisvandenbossche
Copy link
Member Author

The benchmark server isn't running at the moment, see #34389 (comment)

@jbrockmendel
Copy link
Member

i was hoping you'd be able to run them locally. (as mentioned in the other thread, im wrestling with hardware issues ATM)

@jorisvandenbossche
Copy link
Member Author

Sorry, I currently don't have time to run the full ASV suite.

@jreback jreback added Clean Internals Related to non-user accessible pandas implementation labels May 27, 2020
@jreback jreback added this to the 1.1 milestone May 27, 2020
@jbrockmendel
Copy link
Member

ive gotten asv working locally, will do a run on #34389 if were OK on holding off on merging this

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented May 27, 2020

As I also said in #34389, running asv is not necessarily sufficient, as it not sure that the functions that are tested are actually covered (it's not that our benchmark suite is 100% covering all cases, and even then, they don't necessarily include cases that start with non-consolidated data).
I think you need to specifically check the functions you changed and compare them with and without consolidation.

@jorisvandenbossche
Copy link
Member Author

Or, for some of them it might be sufficient to simply check in the code if it would matter or not.
For example, I was looking at the take example, and NDFrame.take calls BlockManager.take, and the first thing that does is also consolidate inplace.
So in such as case, of course the consolidate_inplace in NDFrame.take can be removed as a code clean-up, and it also won't matter anything performance wise.

@jbrockmendel
Copy link
Member

So in such as case, of course the consolidate_inplace in NDFrame.take can be removed as a code clean-up, and it also won't matter anything performance wise.

Let's un-revert that here and whittle down the places that merit double-checking

@jbrockmendel
Copy link
Member

I'm not seeing any consistent pattern in asv results

@jorisvandenbossche
Copy link
Member Author

Let's un-revert that here and whittle down the places that merit double-checking

Done

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel what's the status here? (I re-reverted the ones discussed above)

Do we continue with this PR reverting those, or do you have time to check those cases?

@jbrockmendel
Copy link
Member

Do we continue with this PR reverting those, or do you have time to check those cases?

Thanks for following up. I've gotten my hardware issues resolved, will run a round of asvs on this.

@jorisvandenbossche
Copy link
Member Author

As mentioned before (#34407 (comment)) and discussed elsewhere, I am not sure that asv will give any useful information for those changes.

@jbrockmendel
Copy link
Member

As mentioned before (#34407 (comment)) and discussed elsewhere, I am not sure that asv will give any useful information for those changes.

Neither am I. But if they do show something, that will help us whittle down the places that need more manual attention.

@jorisvandenbossche
Copy link
Member Author

But what I want to say is: even if they don't show anything, each case still needs manual attention, as the asv's (AFAIK) don't include non-consolidated data.

For example, a small timing for xs (on master, so with the automatic consolidation removed):

In [2]: df = pd.DataFrame(index=list(range(10000))) In [3]: for i in range(10): ...: df[i] = np.random.randn(10000) ...: In [6]: df2 = df._consolidate() In [11]: %timeit df.xs(0) 72 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [12]: %timeit df2.xs(0) 49.8 µs ± 547 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 

Do we care about this difference, I am not fully sure (that's to be discussed then). But it clearly has an impact.

@jbrockmendel
Copy link
Member

But what I want to say is: I haven't had my caffeine yet, and running asvs now doesn't preclude doing %timeits later

@jbrockmendel
Copy link
Member

Full asv run results below. A lot of these look like they may be the result of not being rebased on master, so I'm going to do that and re-run.

 before after ratio [bbb89cad] [1228510e] <master> <revert-34389-consolidate-less-1> + 773±2μs 2.66±0.02ms 3.44 timeseries.DatetimeIndex.time_normalize('repeated') + 77.2±0.9μs 256±0.7μs 3.31 timeseries.SortIndex.time_sort_index(True) + 983±5μs 3.15±0.01ms 3.21 timeseries.DatetimeAccessor.time_dt_accessor_normalize('UTC') + 984±4μs 3.15±0.01ms 3.20 timeseries.DatetimeAccessor.time_dt_accessor_normalize(tzutc()) + 988±10μs 3.12±0.02ms 3.16 timeseries.DatetimeAccessor.time_dt_accessor_normalize(None) + 776±10μs 2.28±0.01ms 2.94 indexing.DataFrameNumericIndexing.time_bool_indexer + 778±1μs 2.08±0ms 2.67 timeseries.DatetimeIndex.time_normalize('tz_naive') + 332±9ns 750±4ns 2.26 tslibs.period.PeriodProperties.time_property('min', 'quarter') + 348±5ns 761±2ns 2.19 tslibs.period.PeriodProperties.time_property('min', 'qyear') + 355±6ns 767±4ns 2.16 tslibs.period.PeriodProperties.time_property('M', 'quarter') + 235±0.3μs 498±0.4μs 2.12 indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int16Engine'>, <class 'numpy.int16'>), 'monotonic_incr') + 381±4ns 806±6ns 2.12 tslibs.period.PeriodProperties.time_property('min', 'month') + 369±10ns 778±7ns 2.11 tslibs.period.PeriodProperties.time_property('M', 'qyear') + 383±1ns 808±2ns 2.11 tslibs.period.PeriodProperties.time_property('min', 'day') + 383±5ns 807±2ns 2.11 tslibs.period.PeriodProperties.time_property('min', 'minute') + 383±7ns 805±4ns 2.10 tslibs.period.PeriodProperties.time_property('min', 'hour') + 384±5ns 806±2ns 2.10 tslibs.period.PeriodProperties.time_property('min', 'second') + 396±1ns 816±5ns 2.06 tslibs.period.PeriodProperties.time_property('min', 'dayofyear') + 398±5ns 816±4ns 2.05 tslibs.period.PeriodProperties.time_property('min', 'year') + 435±10ns 884±6ns 2.03 tslibs.period.PeriodProperties.time_property('min', 'daysinmonth') + 400±1ns 812±9ns 2.03 tslibs.period.PeriodProperties.time_property('M', 'month') + 401±5ns 811±3ns 2.02 tslibs.period.PeriodProperties.time_property('M', 'minute') + 404±8ns 817±3ns 2.02 tslibs.period.PeriodProperties.time_property('M', 'day') + 402±3ns 810±3ns 2.01 tslibs.period.PeriodProperties.time_property('min', 'dayofweek') + 405±9ns 811±4ns 2.00 tslibs.period.PeriodProperties.time_property('M', 'hour') + 419±4ns 836±4ns 1.99 tslibs.period.PeriodProperties.time_property('M', 'dayofyear') + 457±6ns 909±3ns 1.99 tslibs.period.PeriodProperties.time_property('M', 'daysinmonth') + 406±7ns 806±7ns 1.99 tslibs.period.PeriodProperties.time_property('M', 'second') + 416±4ns 825±7ns 1.99 tslibs.period.PeriodProperties.time_property('M', 'year') + 418±6ns 830±1ns 1.99 tslibs.period.PeriodProperties.time_property('M', 'dayofweek') + 440±10ns 866±4ns 1.97 tslibs.period.PeriodProperties.time_property('min', 'is_leap_year') + 449±20ns 877±5ns 1.95 tslibs.period.PeriodProperties.time_property('min', 'week') + 469±20ns 892±3ns 1.90 tslibs.period.PeriodProperties.time_property('M', 'week') + 465±6ns 873±10ns 1.88 tslibs.period.PeriodProperties.time_property('M', 'is_leap_year') + 1.19±0.01μs 1.84±0.03μs 1.55 tslibs.offsets.OnOffset.time_on_offset(<DateOffset: days=2, months=2>) + 325±1μs 493±2μs 1.52 indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class ' numpy.int8'>), 'monotonic_incr') + 25.3±2ms 37.6±1ms 1.49 gil.ParallelDatetimeFields.time_datetime_field_normalize + 11.0±1μs 16.3±0.2μs 1.49 tslibs.period.PeriodUnaryMethods.time_asfreq('min') + 27.6±0.2ms 40.8±0.5ms 1.48 timeseries.ToDatetimeFormat.time_different_offset + 7.03±0.01ms 10.3±0.1ms 1.46 groupby.CountMultiInt.time_multi_int_nunique + 129±0.4μs 188±0.6μs 1.46 timeseries.DatetimeIndex.time_normalize('dst') + 10.4±0.9μs 15.0±0.2μs 1.45 tslibs.period.PeriodUnaryMethods.time_now('M') + 5.91±0.02ms 8.42±0.04ms 1.42 arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<SemiMonthEnd: day_of_month=15>) + 5.94±0.04ms 8.45±0.01ms 1.42 arithmetic.OffsetArrayArithmetic.time_add_series_offset(<SemiMonthEnd: day_of_month=15>) + 5.76±0.02ms 8.16±0.05ms 1.42 arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<SemiMonthBegin: day_of_month=15>) + 924±3μs 1.29±0ms 1.40 series_methods.NanOps.time_func('sum', 1000000, 'int8') + 5.81±0.04ms 8.11±0.01ms 1.40 arithmetic.OffsetArrayArithmetic.time_add_series_offset(<SemiMonthBegin: day_of_month=15>) + 5.44±0.04ms 7.58±0.03ms 1.39 arithmetic.OffsetArrayArithmetic.time_add_series_offset(<BusinessDay>) + 5.42±0.02ms 7.53±0.01ms 1.39 arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<BusinessDay>) + 11.9±1μs 16.2±0.1μs 1.36 tslibs.period.PeriodUnaryMethods.time_asfreq('M') + 1.30±0.01s 1.75±0.02s 1.35 groupby.Apply.time_copy_function_multi_col + 26.2±0.1ms 34.4±0.07ms 1.31 timeseries.DatetimeIndex.time_normalize('tz_aware') + 3.02±0.03μs 3.90±0.06μs 1.29 period.Indexing.time_get_loc + 2.81±0.03ms 3.62±0.01ms 1.29 rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'min') + 2.83±0.02ms 3.64±0.03ms 1.29 rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'max') + 1.28±0.01ms 1.62±0.02ms 1.27 series_methods.NanOps.time_func('prod', 1000000, 'int8') + 14.9±0.4μs 18.3±0.2μs 1.23 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<DateOffset: days=2, months=2>) + 38.3±0.09μs 47.2±0.5μs 1.23 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<CustomBusinessDay>) + 2.05±0.2μs 2.52±0.5μs 1.23 index_cached_properties.IndexCache.time_values('IntervalIndex') + 15.1±0.4μs 18.4±0.1μs 1.22 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<DateOffset: days=2, months=2>) + 35.5±0.2μs 43.5±0.1μs 1.22 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<CustomBusinessDay>) + 15.4±0.5μs 18.8±0.3μs 1.22 tslibs.offsets.OffestDatetimeArithmetic.time_add(<DateOffset: days=2, months=2>) + 13.4±0.4μs 16.4±0.3μs 1.22 period.Indexing.time_series_loc + 19.9±0.1μs 24.0±0.4μs 1.21 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessYearBegin: month=1>) + 19.6±0.1μs 23.6±0.8μs 1.21 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<YearBegin: month=1>) + 21.1±0.5μs 25.4±0.1μs 1.20 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessYearEnd: month=12>) + 19.8±0.1μs 23.8±0.1μs 1.20 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessYearEnd: month=12>) + 144±0.2μs 173±2μs 1.20 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<CustomBusinessMonthBegin>) + 1.44±0.05ms 1.72±0.09ms 1.20 period.PeriodIndexConstructor.time_from_pydatetime('D', True) + 1.45±0.04ms 1.73±0.1ms 1.20 period.PeriodIndexConstructor.time_from_pydatetime('D', False) + 521±0.9ms 624±2ms 1.20 groupby.Apply.time_copy_overhead_single_col + 20.3±0.09μs 24.3±0.1μs 1.20 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<SemiMonthBegin: day_of_month=15>) + 175±0.7μs 209±4μs 1.20 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<CustomBusinessMonthBegin>) + 3.53±0.02ms 4.22±0.5ms 1.20 rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'max') + 56.3±0.7ms 67.3±0.8ms 1.19 series_methods.SeriesConstructor.time_constructor('dict') + 21.4±0.08μs 25.5±0.3μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<YearEnd: month=12>) + 128±1μs 152±2μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<CustomBusinessMonthEnd>) + 20.2±0.2μs 24.1±0.3μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessYearBegin: month=1>) + 19.7±0.1μs 23.5±0.07μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessYearEnd: month=12>) + 124±1μs 148±4μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<CustomBusinessMonthEnd>) + 160±1μs 190±2μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<CustomBusinessMonthBegin>) + 18.6±0.1μs 22.0±0.3μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<MonthBegin>) + 19.7±0.09μs 23.4±0.5μs 1.19 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<YearEnd: month=12>) + 19.1±0.1μs 22.6±0.1μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessMonthEnd>) + 21.2±0.09μs 25.1±0.3μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessYearBegin: month=1>) + 19.1±0.2μs 22.5±0.3μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessMonthEnd>) + 20.1±0.2μs 23.8±0.2μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<SemiMonthEnd: day_of_month=15>) + 167±0.7μs 198±1μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<CustomBusinessMonthEnd>) + 16.9±0.1μs 20.0±0.2μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<SemiMonthBegin: day_of_month=15>) + 19.6±0.3μs 23.1±0.07μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<QuarterBegin: startingMonth=3>) + 21.4±0.1μs 25.3±0.07μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<SemiMonthBegin: day_of_month=15>) + 142±0.1μs 167±2μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<CustomBusinessMonthBegin>) + 18.8±0.1μs 22.2±0.2μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<MonthEnd>) + 18.5±0.09μs 21.7±0.08μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<MonthBegin>) + 16.7±0.1μs 19.7±0.2μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<YearBegin: month=1>) + 37.1±1μs 43.6±0.1μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<DateOffset: days=2, months=2>) + 17.7±0.2μs 20.8±0.4μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessYearBegin: month=1>) + 20.3±0.08μs 23.9±0.1μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<SemiMonthBegin: day_of_month=15>) + 16.9±0.3μs 19.9±0.1μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessYearEnd: month=12>) + 16.7±0.1μs 19.7±0.2μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessMonthBegin>) + 19.0±0.09μs 22.3±0.09μs 1.18 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessMonthBegin>) + 21.1±0.03μs 24.8±0.6μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<YearBegin: month=1>) + 162±0.5μs 190±3μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<CustomBusinessMonthBegin>) + 16.7±0.1μs 19.6±0.2μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<YearEnd: month=12>) + 20.1±0.1μs 23.6±0.3μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<YearEnd: month=12>) + 17.2±0.1μs 20.1±0.05μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add(<YearEnd: month=12>) + 16.7±0.09μs 19.6±0.09μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<YearBegin: month=1>) + 17.4±0.08μs 20.3±0.1μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add(<YearBegin: month=1>) + 28.2±0.09μs 32.9±0.3μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<CustomBusinessDay>) + 20.4±0.2μs 23.8±0.2μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<SemiMonthEnd: day_of_month=15>) + 17.1±0.3μs 20.0±0.09μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessYearBegin: month=1>) + 109±1μs 127±2μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<CustomBusinessMonthEnd>) + 20.1±0.1μs 23.5±0.3μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessQuarterBegin: startingMonth=3>) + 17.5±0.1μs 20.5±0.3μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessYearEnd: month=12>) + 19.8±0.1μs 23.1±0.2μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<YearBegin: month=1>) + 162±0.6μs 189±2μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add(<CustomBusinessMonthBegin>) + 4.19±0.04μs 4.89±0.1μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<Day>) + 16.9±0.1μs 19.7±0.1μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessMonthEnd>) + 17.6±0.1μs 20.6±0.06μs 1.17 tslibs.offsets.OffestDatetimeArithmetic.time_add(<SemiMonthEnd: day_of_month=15>) + 19.8±0.2μs 23.1±0.2μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessMonthBegin>) + 5.28±0.02μs 6.14±0.03μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<Day>) + 5.22±0.04ms 6.08±0.08ms 1.16 tslibs.offsets.OnOffset.time_on_offset(<CustomBusinessMonthEnd>) + 16.7±0.05μs 19.4±0.1μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<MonthBegin>) + 4.21±0.3μs 4.89±0.5μs 1.16 index_cached_properties.IndexCache.time_shape('IntervalIndex') + 28.4±0.08μs 33.1±0.6μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<CustomBusinessDay>) + 5.56±0.03ms 6.45±0.1ms 1.16 tslibs.offsets.OnOffset.time_on_offset(<CustomBusinessMonthBegin>) + 16.6±0.06μs 19.3±0.4μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<YearEnd: month=12>) + 21.4±0.2μs 24.9±0.3μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessQuarterEnd: startingMonth=3>) + 5.39±0.02μs 6.25±0.03μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<Day>) + 21.4±0.07μs 24.8±0.1μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<SemiMonthEnd: day_of_month=15>) + 19.9±0.1μs 23.1±0.1μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessQuarterBegin: startingMonth=3>) + 19.7±0.1μs 22.9±0.06μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<MonthEnd>) + 17.2±0.2μs 19.9±0.3μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<SemiMonthEnd: day_of_month=15>) + 1.57±0.02s 1.81±0.03s 1.16 arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<CustomBusinessMonthBegin>) + 1.58±0.01s 1.83±0.02s 1.16 arithmetic.OffsetArrayArithmetic.time_add_series_offset(<CustomBusinessMonthBegin>) + 109±1μs 126±1μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_add(<CustomBusinessMonthEnd>) + 17.0±0.2μs 19.6±0.3μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<SemiMonthBegin: day_of_month=15>) + 19.4±0.08μs 22.4±0.2μs 1.16 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessDay>) + 18.2±0.2μs 21.0±0.1μs 1.15 indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc') + 20.1±0.2μs 23.3±0.2μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessQuarterEnd: startingMonth=3>) + 17.2±0.1μs 19.9±0.3μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessYearBegin: month=1>) + 17.2±0.1μs 19.8±0.06μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessQuarterEnd: startingMonth=3>) + 20.2±0.2μs 23.2±0.1μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessQuarterEnd: startingMonth=3>) + 1.07±0s 1.24±0.02s 1.15 arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<CustomBusinessMonthEnd>) + 16.6±0.1μs 19.2±0.05μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<MonthEnd>) + 17.0±0.05μs 19.6±0.5μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessDay>) + 20.4±0.08μs 23.5±0.1μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<QuarterEnd: startingMonth=3>) + 17.5±0.08μs 20.1±0.09μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_add(<QuarterEnd: startingMonth=3>) + 4.53±0.1μs 5.21±0.07μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_add(<Day>) + 17.6±0.1μs 20.3±0.4μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessQuarterBegin: startingMonth=3>) + 1.09±0s 1.26±0.03s 1.15 arithmetic.OffsetArrayArithmetic.time_add_series_offset(<CustomBusinessMonthEnd>) + 17.0±0.04μs 19.6±0.2μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessYearEnd: month=12>) + 16.9±0.1μs 19.4±0.3μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessMonthBegin>) + 17.1±0.1μs 19.6±0.1μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessQuarterBegin: startingMonth=3>) + 16.6±0.2μs 19.0±0.1μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessDay>) + 19.8±0.1μs 22.7±0.2μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessMonthEnd>) + 16.9±0.1μs 19.3±0.1μs 1.15 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<QuarterBegin: startingMonth=3>) + 19.2±0.07μs 21.9±0.2μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessMonthBegin>) + 20.7±0.1μs 23.7±0.1μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessDay>) + 17.6±0.09μs 20.1±0.2μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessMonthEnd>) + 17.1±0.05μs 19.6±0.1μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<QuarterEnd: startingMonth=3>) + 108±0.6μs 124±3μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<CustomBusinessMonthEnd>) + 17.1±0.07μs 19.5±0.09μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<QuarterEnd: startingMonth=3>) + 19.9±0.1μs 22.8±0.2μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<QuarterBegin: startingMonth=3>) + 4.33±0.05μs 4.94±0.07μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<Day>) + 7.05±0.05ms 8.03±0.05ms 1.14 io.hdf.HDFStoreDataFrame.time_query_store_table + 16.9±0.1μs 19.2±0.2μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<QuarterBegin: startingMonth=3>) + 16.5±0.3μs 18.7±0.4μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessDay>) + 19.5±0.08μs 22.2±0.2μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessDay>) + 17.3±0.09μs 19.7±0.4μs 1.14 tslibs.offsets.OffestDatetimeArithmetic.time_add(<MonthEnd>) + 10.7±0.03ms 12.2±0.08ms 1.14 groupby.MultiColumn.time_col_select_numpy_sum + 21.5±0.1μs 24.4±0.2μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<QuarterEnd: startingMonth=3>) + 17.1±0.08μs 19.4±0.1μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessMonthEnd>) + 17.2±0.1μs 19.5±0.5μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<SemiMonthEnd: day_of_month=15>) + 17.9±0.2μs 20.2±0.1μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_add(<SemiMonthBegin: day_of_month=15>) + 20.4±0.09μs 23.1±0.2μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<QuarterEnd: startingMonth=3>) + 16.6±0.1μs 18.7±0.2μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_apply(<MonthBegin>) + 19.8±0.3μs 22.4±0.09μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<MonthBegin>) + 17.2±0.2μs 19.4±0.2μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessQuarterEnd: startingMonth=3>) + 17.2±0.06μs 19.4±0.1μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessQuarterBegin: startingMonth=3>) + 301±6ms 341±2ms 1.13 arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<CustomBusinessDay>) + 21.4±0.1μs 24.2±0.1μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<QuarterBegin: startingMonth=3>) + 17.3±0.08μs 19.6±0.09μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_add(<QuarterBegin: startingMonth=3>) + 965±5μs 1.09±0ms 1.13 arithmetic.ApplyIndex.time_apply_index(<DateOffset: days=2, months=2>) + 21.5±0.06μs 24.3±0.1μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessQuarterBegin: startingMonth=3>) + 19.1±0.1μs 21.6±0.2μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<MonthEnd>) + 17.4±0.2μs 19.7±0.3μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessMonthBegin>) + 17.0±0.1μs 19.2±0.1μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<MonthEnd>) + 17.9±0.05μs 20.2±0.1μs 1.13 tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessQuarterEnd: startingMonth=3>) + 6.13±0.02μs 6.87±0.05μs 1.12 tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<Day>) + 5.90±0.2μs 6.59±0.02μs 1.12 indexing.NonNumericSeriesIndexing.time_getitem_scalar('period', 'non_monotonic') + 10.7±0.05ms 12.0±0.1ms 1.12 stat_ops.Rank.time_rank('Series', False) + 20.5±0.2μs 22.9±0.2μs 1.12 indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc') + 305±6ms 339±9ms 1.11 arithmetic.OffsetArrayArithmetic.time_add_series_offset(<CustomBusinessDay>) + 17.4±0.08μs 19.4±0.09μs 1.11 tslibs.offsets.OffestDatetimeArithmetic.time_add(<MonthBegin>) + 10.8±0.03ms 12.0±0.06ms 1.11 stat_ops.Rank.time_rank('Series', True) + 2.26±0ms 2.51±0.4ms 1.11 rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'sum') + 29.6±0.3μs 32.9±0.06μs 1.11 tslibs.offsets.OffestDatetimeArithmetic.time_add(<CustomBusinessDay>) + 1.09±0ms 1.21±0ms 1.11 arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<DateOffset: days=2, months=2>) + 2.26±0.01ms 2.49±0.4ms 1.10 rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'sum') + 3.31±0.02ms 3.65±0.03ms 1.10 rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'max') + 3.39±0.01ms 3.74±0.5ms 1.10 rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'std') - 278±2ms 253±2ms 0.91 io.json.ReadJSONLines.time_read_json_lines_concat('datetime') - 873±40ns 793±20ns 0.91 index_cached_properties.IndexCache.time_is_monotonic('RangeIndex') - 6.20±0.2μs 5.62±0.3μs 0.91 index_cached_properties.IndexCache.time_engine('TimedeltaIndex') - 3.47±0.2ms 3.13±0.04ms 0.90 rolling.Apply.time_rolling('Series', 300, 'int', <function Apply.<lambda> at 0x7f62bd249d08>, True) - 6.88±0.03ms 6.21±0.06ms 0.90 frame_methods.Apply.time_apply_pass_thru - 1.41±0.05μs 1.26±0.04μs 0.90 index_cached_properties.IndexCache.time_inferred_type('DatetimeIndex') - 1.07±0.08μs 950±30ns 0.89 index_cached_properties.IndexCache.time_is_monotonic_decreasing('Int64Index') - 515±10ns 459±20ns 0.89 index_cached_properties.IndexCache.time_is_all_dates('Int64Index') - 1.06±0.01ms 937±4μs 0.89 frame_methods.Quantile.time_frame_quantile(1) - 81.4±1μs 72.1±0.3μs 0.89 tslibs.timestamp.TimestampOps.time_ceil(None) - 83.1±3μs 73.5±0.6μs 0.88 tslibs.period.PeriodUnaryMethods.time_to_timestamp('M') - 80.6±2μs 71.3±0.6μs 0.88 tslibs.timestamp.TimestampOps.time_floor(None) - 1.13±0.08μs 1.00±0.06μs 0.88 index_cached_properties.IndexCache.time_is_all_dates('Float64Index') - 83.9±3μs 74.1±0.3μs 0.88 tslibs.period.PeriodProperties.time_property('M', 'start_time') - 83.8±4μs 73.6±0.2μs 0.88 tslibs.period.PeriodUnaryMethods.time_to_timestamp('min') - 85.5±3μs 75.0±1μs 0.88 tslibs.period.PeriodProperties.time_property('min', 'start_time') - 71.3±2μs 62.3±0.6μs 0.87 dtypes.Dtypes.time_pandas_dtype('period[D]') - 1.68±0.02s 1.47±0s 0.87 groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'direct') - 722±40ns 630±20ns 0.87 index_cached_properties.IndexCache.time_is_monotonic_increasing('Int64Index') - 1.69±0.01s 1.47±0s 0.87 groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'transformation') - 2.39±0.02s 2.08±0.01s 0.87 groupby.GroupByMethods.time_dtype_as_group('int', 'describe', 'direct') - 3.70±0.03s 3.21±0s 0.87 groupby.GroupByMethods.time_dtype_as_group('float', 'describe', 'transformation') - 2.38±0.02s 2.07±0s 0.87 groupby.GroupByMethods.time_dtype_as_group('int', 'describe', 'transformation') - 3.74±0.2μs 3.24±0.06μs 0.87 index_cached_properties.IndexCache.time_shape('DatetimeIndex') - 198M 171M 0.87 io.json.ReadJSONLines.peakmem_read_json_lines_concat('datetime') - 198M 171M 0.87 io.json.ReadJSONLines.peakmem_read_json_lines_concat('int') - 1.62±0.01s 1.41±0s 0.87 groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'transformation') - 4.03±0.2μs 3.49±0.1μs 0.87 index_cached_properties.IndexCache.time_engine('UInt64Index') - 3.71±0.03s 3.21±0.01s 0.87 groupby.GroupByMethods.time_dtype_as_group('float', 'describe', 'direct') - 517±20ns 447±10ns 0.86 index_cached_properties.IndexCache.time_inferred_type('Int64Index') - 1.62±0.01s 1.40±0s 0.86 groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'direct') - 1.58±0.06μs 1.36±0.03μs 0.86 index_cached_properties.IndexCache.time_is_all_dates('MultiIndex') - 2.13±0.08μs 1.80±0.1μs 0.85 index_cached_properties.IndexCache.time_shape('UInt64Index') - 1.22±0.05ms 974±20μs 0.80 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('le') - 540±4μs 414±2μs 0.77 frame_methods.Quantile.time_frame_quantile(0) - 1.57±0.2μs 1.18±0.05μs 0.76 index_cached_properties.IndexCache.time_values('UInt64Index') - 1.80±0.3μs 1.29±0.04μs 0.72 index_cached_properties.IndexCache.time_values('DatetimeIndex') - 146±4μs 92.5±0.3μs 0.63 tslibs.period.PeriodProperties.time_property('M', 'end_time') - 13.2±1μs 8.00±0.04μs 0.60 tslibs.timestamp.TimestampOps.time_normalize(<UTC>) - 13.1±0.6μs 7.88±0.1μs 0.60 tslibs.timestamp.TimestampOps.time_normalize(None) - 13.5±0.8μs 8.02±0.04μs 0.59 tslibs.timestamp.TimestampOps.time_normalize(tzutc()) - 161±4μs 81.4±0.7μs 0.51 tslibs.period.PeriodProperties.time_property('min', 'end_time') 
@jreback jreback removed this from the 1.1 milestone Jul 9, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Jul 31, 2020
@jbrockmendel
Copy link
Member

@jorisvandenbossche is this still active?

@simonjayhawkins
Copy link
Member

I milestoned this 1.1.1 since it reverted a PR that caused a regression #35488. #35578 has been merged (and backported) to fix #35488 so am removing milestone.

@simonjayhawkins simonjayhawkins removed this from the 1.1.1 milestone Aug 11, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.4 milestone Sep 28, 2020
@jreback
Copy link
Contributor

jreback commented Oct 14, 2020

i am not sure it makes sense to push this w/o a clear patch backed by tests. As i am not sure what excatly this is fixing.

@jreback jreback modified the milestones: 1.1.4, 1.2 Oct 26, 2020
@jreback
Copy link
Contributor

jreback commented Oct 26, 2020

moving this off 1.1.4 as not really clear metrics on how to evaluate this.

@jreback
Copy link
Contributor

jreback commented Nov 18, 2020

not sure what the point of the PR Is any longer.

@jreback jreback removed this from the 1.2 milestone Nov 18, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1.5 milestone Nov 25, 2020
@jorisvandenbossche
Copy link
Member Author

not sure what the point of the PR Is any longer.

There are still open regressions related to this (#36495).
(will take a look again in a few days)

@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

not sure what the point of the PR Is any longer.

There are still open regressions related to this (#36495).
(will take a look again in a few days)

maybe so but we still don't have any testing. so -1 on including this on 1.1.5 at this point.

@jorisvandenbossche jorisvandenbossche changed the title Revert "CLN: _consolidate_inplace less" REGR: revert "CLN: _consolidate_inplace less" / fix regression in fillna() Nov 26, 2020
@jorisvandenbossche
Copy link
Member Author

but we still don't have any testing

I added a test for the regression reported in #36495, which is fixed by this

)
df_nonconsol = df.pivot("i1", "i2")
result = df_nonconsol.fillna(0)
assert result.isna().sum().sum() == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a test in #36668 if you wanted a more explicit assertion

@jbrockmendel
Copy link
Member

I added a test for the regression reported in #36495, which is fixed by this

great. there are four consolidations this adds. is any one in particular responsible for fixing this bug?

@jorisvandenbossche
Copy link
Member Author

I suppose it is the consolidation in fillna that specifically fixes the reported regression.

I could do a PR with specifically only that change, but as mentioned above (#34407 (comment)), I think it can't hurt to change the other cases as well. Since the original PR #34389, we now already reverted parts of that change in 3 other PRs that were fixing regressions, and this is a 4th.

@jreback jreback merged commit 27989a6 into master Nov 26, 2020
@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

@jorisvandenbossche jorisvandenbossche deleted the revert-34389-consolidate-less-1 branch November 26, 2020 19:08
@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

@lumberbot-app

This comment has been minimized.

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Nov 27, 2020
…solidate_inplace less" / fix regression in fillna()
simonjayhawkins added a commit that referenced this pull request Nov 27, 2020
…nplace less" / fix regression in fillna() (#38115) Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jbrockmendel
Copy link
Member

AFAICT the consolidation in quantile is never reached. am i missing something?

@jorisvandenbossche
Copy link
Member Author

The default of the keyword is True, so it should be reached? (since we never specify this keyword anywhere at the moment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Clean Internals Related to non-user accessible pandas implementation

5 participants