-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: add fill_value to resample #14591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| If I'm understanding this correctly, this is equivalent to If that's right, is this worth a kwarg? Why not just leave it to the user to |
| I think that's a fair question. This is how i understood 3755. An argument for this feature is that you have a fill_value kwarg in reindex, whereas that's equivalent to just running reindex(...).fillna(), right? |
| That kwarg In [3]: series = pd.Series(range(4)) In [4]: series[2] = pd.np.nan In [5]: series Out[5]: 0 0.0 1 1.0 2 NaN 3 3.0 dtype: float64 In [6]: series.reindex(range(6), fill_value=9) Out[6]: 0 0.0 1 1.0 2 NaN 3 3.0 4 9.0 5 9.0 dtype: float64 |
| i see. got it, thanks. |
| @nchmura4 As Maximilian said, I also think that a possible |
| no problem, thanks for the feedback. Would this example illustrate the desired behavior? upsampling with fill_value doesn't fill any NaN's already in the dataframe, but does fill new rows during resampling: |
| @nchmura4 yes, I think that is a correct example. Question is here maybe: where does this keyword belong? In |
| I think this only makes sense on unsampling. I can't think of a use case for downsampling. So then I think it belongs in asfreq(). I'm working on that. |
58d72bc to 649a56b Compare Current coverage is 85.55% (diff: 100%)@@ master #14591 diff @@ ========================================== Files 145 145 Lines 51352 51352 Methods 0 0 Messages 0 0 Branches 0 0 ========================================== Hits 43932 43932 Misses 7420 7420 Partials 0 0
|
doc/source/whatsnew/v0.20.0.txt Outdated
| | ||
| - ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) | ||
| | ||
| - ``DataFrame.asfreq()`` now accepts a fill_value option to fill missing values during resampling (:issue:`3715`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-back ticks around fill_value
| value to use for missing values, applied during upsampling | ||
| .. version added:: 0.20.0 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some Examples here
| r.fillna(0) | ||
| | ||
| def test_fill_value(self): | ||
| # setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment, 1-liner expl
| # setup | ||
| rng = pd.date_range('1/1/2016', periods=10, freq='2S') | ||
| ts = pd.Series(np.arange(len(rng)), index=rng) | ||
| ts2 = pd.Series(np.random.rand(len(rng)), index=rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test also to where DataFarme.asfreq is tested (prob not very much :<)
{lamb} grin ".asfreq" pandas\tests\ pandas\tests\frame\test_timeseries.py: 299 : def test_asfreq(self): 300 : offset_monthly = self.tsframe.asfreq(offsets.BMonthEnd()) 301 : rule_monthly = self.tsframe.asfreq('BM') 305 : filled = rule_monthly.asfreq('B', method='pad') # noqa 309 : filled_dep = rule_monthly.asfreq('B', method='pad') # noqa 313 : result = zero_length.asfreq('BM') 316 : def test_asfreq_datetimeindex(self): 320 : df = df.asfreq('B') 323 : ts = df['A'].asfreq('B') pandas\tests\indexes\test_datetimelike.py: 830 : idx.get_loc(idx[1].asfreq('H', how='start'), method), 1) 888 : idx = pd.period_range('2000-01-01', periods=3).asfreq('H', how='start') pandas\tests\plotting\test_datetimelike.py: 319 : bts = tm.makeTimeSeries().asfreq('BM') 740 : ts2 = ts.asfreq('T').interpolate() pandas\tests\series\test_datetime_values.py: 37 : ok_for_period_methods = ['strftime', 'to_timestamp', 'asfreq'] pandas\tests\series\test_timeseries.py: 407 : def test_asfreq(self): 411 : daily_ts = ts.asfreq('B') 412 : monthly_ts = daily_ts.asfreq('BM') 415 : daily_ts = ts.asfreq('B', method='pad') 416 : monthly_ts = daily_ts.asfreq('BM') 419 : daily_ts = ts.asfreq(BDay()) 420 : monthly_ts = daily_ts.asfreq(BMonthEnd()) 423 : result = ts[:0].asfreq('M') and pls add a tests for series as well
| expected_df = df.resample('1S').asfreq().fillna(9.0) | ||
| expected_df.loc['2016-01-01 00:00:08', 'two'] = None | ||
| | ||
| assert_frame_equal(expected_df, actual_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add for series as well
| @nchmura4 looks pretty good. |
| thanks - I added documentation and tests you suggested |
| ------- | ||
| converted : type of caller | ||
| Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
| how : {'start', 'end'}, default end | ||
| For PeriodIndex only, see PeriodIndex.asfreq | ||
| normalize : bool, default False | ||
| Whether to reset output index to midnight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a sentence or 2 on the use / diffs of .asfreq() / resample.
| 2000-01-01 00:01:30 9.0 | ||
| 2000-01-01 00:02:00 2.0 | ||
| 2000-01-01 00:02:30 9.0 | ||
| 2000-01-01 00:03:00 3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice example, thanks!
Can you add the output using the existing method keyword (also for filling NaNs) with this example as well? (pick one of the possible values for the keyword)
pandas/core/generic.py Outdated
| normalize : bool, default False | ||
| Whether to reset output index to midnight | ||
| fill_value: scalar, optional | ||
| value to use for missing values, applied during upsampling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a similar sentence as for method: "this does not fill NaNs that were already present"?
(can you also capitalize the first word of the sentence?
| thanks for the feedback |
| sorry, i think i accidentally messed this pull request up :/ was trying to pull new changes locally and accidentally pushed them into this PR. Can someone help me? sorry! |
| @nchmura4 If you follow those steps, it should be solved: |
c8cd7f2 to aab4497 Compare aab4497 to 8f9097e Compare | @jorisvandenbossche thank you! i see what i did wrong. PR all up to date now :) |
| 2000-01-01 00:02:30 3.0 | ||
| 2000-01-01 00:03:00 3.0 | ||
| Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a See Also to .reindex (which has fill_value)
| this does not fill NaNs that already were present). | ||
| .. versionadded:: 0.20.0 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a See Also to Series.asfreq/DataFrame.asfreq
| with self.assertRaises(ValueError): | ||
| r.fillna(0) | ||
| | ||
| def test_fill_value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests should be in class Base. Which will tests with all index types (you need to tests both Series/DataFrame here)
use self.create_index() to create your index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I'm a bit stuck on PeriodIndex. With my current commits you can add a fill_value for PeriodIndexResampler._upsample with kind='timestamp' but it doesn't work for kind='period'. I think I have to build that out. Otherwise, this won't work for PeriodIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upsampling on a PeriodIndex with kind='period' doesn't work through reindex. I think it would involve adding fill_value to Index.get_indexer (line 901 of tseries/resample.py) Before I start going down that path, I just want to confirm that's its the right thing to do.
8f9097e to 133175d Compare | can you rebase |
133175d to 36095c6 Compare 36095c6 to 45a9e8e Compare 45a9e8e to 08b6b28 Compare | thanks @nchmura4 ! |
| closed by 2540d5a |
git diff upstream/master | flake8 --diff