Skip to content

Conversation

@carlosdanielcsantos
Copy link
Contributor

@carlosdanielcsantos carlosdanielcsantos commented Mar 23, 2017


cdef _roll_min_max(ndarray[numeric] input, int64_t win, int64_t minp,
object index, bint is_max):
object index, str closed, bint is_max):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the behavior of str here (cdef)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Mar 23, 2017
@carlosdanielcsantos
Copy link
Contributor Author

carlosdanielcsantos commented Mar 23, 2017

Here is a quick example of what this feature looks like

df = pd.DataFrame({'x': [1]*5}, ....: index = [pd.Timestamp('20130101 09:00:01'), ....: pd.Timestamp('20130101 09:00:02'), ....: pd.Timestamp('20130101 09:00:03'), ....: pd.Timestamp('20130101 09:00:04'), ....: pd.Timestamp('20130101 09:00:06')]) df["right"] = df.rolling('2s', closed='right').x.sum() # current behavior df["both"] = df.rolling('2s', closed='both').x.sum() df["left"] = df.rolling('2s', closed='left').x.sum() df["open"] = df.rolling('2s', closed='neither').x.sum() print(df) x right both left open 2013-01-01 09:00:01 1 1.0 1.0 NaN NaN 2013-01-01 09:00:02 1 2.0 2.0 1.0 1.0 2013-01-01 09:00:03 1 2.0 3.0 2.0 1.0 2013-01-01 09:00:04 1 2.0 3.0 2.0 1.0 2013-01-01 09:00:06 1 1.0 2.0 1.0 NaN 
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly minor points

  • add a whatsnew entry (Other Enhancements) in whatsnew (0.20.0), with a link to the computation.rst docs (where you have a new section showing the use of closed; you can use your test examples, e.g. showing all 4 cases).
if self.min_periods is not None and not \
is_integer(self.min_periods):
raise ValueError("min_periods must be an integer")
if self.closed not in ['right', 'both', 'left', 'neither']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing this?

def __init__(self, ndarray input, int64_t win, int64_t minp,
object index=None, object floor=None):
object index=None, object floor=None,
bint l_closed=False, bint r_closed=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a chance, can you add a mini-doc strings to things.

if we are a fixed indexer, return a mock indexer
instead of the FixedWindow Indexer. This is a type
compat Indexer that allows us to use a standard
code path with all of the indexers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you augment the doc-string here (with closed)

bint l_closed = False
bint r_closed = False

if closed not in ['right', 'left', 'both', 'neither']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do an assert here (this is an internal routine that has certain guarantees)


def roll_count(ndarray[double_t] input, int64_t win, int64_t minp,
object index):
object index, closed='right'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make this required

object closed


def roll_max(ndarray[numeric] input, int64_t win, int64_t minp,
object index):
object index, str closed):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the rest of these: str-> object


cdef _roll_min_max(ndarray[numeric] input, int64_t win, int64_t minp,
object index, bint is_max):
object index, str closed, bint is_max):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

tm.assert_frame_equal(result, expected)

def test_closed(self):
df = DataFrame({'A': [1]*5},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this issue number as a comment

df = DataFrame({'A': [1]*5},
index = [pd.Timestamp('20130101 09:00:01'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test passing invalid closed

expected["A"] = [np.nan, 1.0, 2, 2, 1]
result = df.rolling('2s', closed='left').sum()
tm.assert_frame_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisaycock how do these tests look? what corners do we need to test here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the DataFrame is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the DataFrame is empty, the behavior is exactly the same as in the current implementation. For time-based windows, it throws "ValueError: window must be an integer". Note that this feature only affects the window indexer's start and end vectors. In fixed windows, empty DataFrames always return empty (the result is constructed as np.empty(N) where N is the length of the input)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separately I think [4] should just return an empty frame for consistency.

In [4]: DataFrame().rolling('1s').sum() ValueError: window must be an integer In [5]: DataFrame().rolling(1).sum() Out[5]: Empty DataFrame Columns: [] Index: [] 

@chrisaycock ?

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

@carlosdanielcsantos #15795 (comment) are basically the docs (with a bit of prose) and nice formatting. Laying out the possibilities (in a sub-section of computation.rst). Note you also might want to have a 'real-world' uses case sentence for some of these.

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

you can see pep errors: http://pandas-docs.github.io/pandas-docs-travis/contributing.html#python-pep8

use git diff master | flake8 --diff

@codecov
Copy link

codecov bot commented Mar 27, 2017

Codecov Report

Merging #15795 into master will decrease coverage by 0.02%.
The diff coverage is 92.85%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #15795 +/- ## ========================================== - Coverage 91.01% 90.99% -0.03%  ========================================== Files 143 143 Lines 49400 49404 +4 ========================================== - Hits 44963 44956 -7  - Misses 4437 4448 +11
Impacted Files Coverage Δ
pandas/core/generic.py 96.24% <100%> (ø) ⬆️
pandas/core/window.py 96.08% <92.3%> (-0.11%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/common.py 90.96% <0%> (-0.34%) ⬇️
pandas/core/frame.py 97.86% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d3554c...8533421. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 27, 2017

Codecov Report

Merging #15795 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #15795 +/- ## ========================================== + Coverage 91% 91.03% +0.02%  ========================================== Files 145 145 Lines 49581 49592 +11 ========================================== + Hits 45122 45146 +24  + Misses 4459 4446 -13
Flag Coverage Δ
#multiple 88.8% <100%> (+0.02%) ⬆️
#single 40.56% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 96.26% <ø> (ø) ⬆️
pandas/core/window.py 96.24% <100%> (+0.02%) ⬆️
pandas/core/common.py 90.68% <0%> (-0.35%) ⬇️
pandas/util/testing.py 81.13% <0%> (+0.18%) ⬆️
pandas/tools/plotting.py 72.47% <0%> (+0.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 838e09c...aad97dc. Read the comment docs.

@carlosdanielcsantos
Copy link
Contributor Author

@jreback How should I add this to computations.rst? A new section inside Window Functions or just add the examples inside the existing Window Functions -> Rolling Windows?

Other enhancements
^^^^^^^^^^^^^^^^^^

- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window endpoint closedness. (:issue:`13965`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that a word?

Copy link
Contributor Author

@carlosdanielcsantos carlosdanielcsantos Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some discussion going on about that online, but its a useful word to refer the property of being closed. But I am open to suggestions, no pun intended :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahah yep that is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe closed-ness? (though I don't think that is actually a word)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a :ref: to the new sub-section that you are creating in computation.rst

on : string, optional
For a DataFrame, column on which to calculate
the rolling window, rather than the index
closed : 'right', 'left', 'both', 'neither'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the default setting (right)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this with a bulleted list

floor: optional
unit for flooring
l_closed: bint
left endpoint closedness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write these out l_closed -> left_closed easier to read

index: ndarray
index of the input
l_closed: bint
left endpoint closedness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can add a bit of explication here (1 line), just for future readers


start[0] = 0
end[0] = 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the comments above the code

if r_closed: # ..... end[0] = 1 else: # .... end[0] 0 
end_bound = index[i]
start_bound = index[i] - win

if l_closed: # left endpoint is closed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of can do (I use this if only an if statement)

# left endpoint .... if left_closed: .... 
else:
end[i] = end[i - 1]

# right endpoint is open
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep excatly like this one

minp: integer, minimum periods
index: 1d ndarray, optional
index to the input array
closed: 'right', 'left', 'both', 'neither'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closed: string, default 'right' {'right',......} 
@jreback
Copy link
Contributor

jreback commented Mar 27, 2017

@carlosdanielcsantos lgtm. some more doc-string nitpicks.

@jreback jreback added this to the 0.20.0 milestone Mar 27, 2017
@jreback
Copy link
Contributor

jreback commented Mar 27, 2017

ping when all green.

@carlosdanielcsantos
Copy link
Contributor Author

@jreback I was going to start writing tests for the FixedWindowIndexer but there are no explicit rolling computing test going on in TestRolling (they are done in TestMoments, I think). Where should I add these new tests?

@jreback
Copy link
Contributor

jreback commented Mar 27, 2017

@jreback How should I add this to computations.rst? A new section inside Window Functions or just add the examples inside the existing Window Functions -> Rolling Windows?

exactly, a new sub-section that pretty much shows the example with all 4 closed. maybe right before Centering Windows?

@jreback
Copy link
Contributor

jreback commented Mar 27, 2017

@jreback I was going to start writing tests for the FixedWindowIndexer but there are no explicit rolling computing test going on in TestRolling (they are done in TestMoments, I think). Where should I add these new tests?

We don't have explicit tests for the cython routines, these are covered by the integration tests (top-level routines). Though certainly could add them if you want / can.

You could add a new class in test_windows that explicity exercises these.

@carlosdanielcsantos
Copy link
Contributor Author

@jreback @chrisaycock
Thinking deeper about the fixed window case, it seems that, intuitively, the default behavior right now better resembles the closed='both' case. Notice that when you use rolling(3).apply(f), the result for index i will be the result of f applied to [i-2, i-1, i], not ]i-2, i-1, i] (ISO 31-11 math set notation). One could argue that this is equivalent to having closed='right' with f applied to ]i-3, i-2, i-1, i]. Indeed, but this probably adulterates the meaning of the window size.

@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

@jreback @chrisaycock
Thinking deeper about the fixed window case, it seems that, intuitively, the default behavior right now better resembles the closed='both' case. Notice that when you use rolling(3).apply(f), the result for index i will be the result of f applied to [i-2, i-1, i], not ]i-2, i-1, i] (ISO 31-11 math set notation). One could argue that this is equivalent to having closed='right' with f applied to ]i-3, i-2, i-1, i]. Indeed, but this probably adulterates the meaning of the window size.

@carlosdanielcsantos can you show a comparison of these 2, to show more what you mean?

@carlosdanielcsantos
Copy link
Contributor Author

carlosdanielcsantos commented Mar 28, 2017

@jreback

pd.DataFrame([1] * 5).rolling(3, min_periods=1).sum() 0 0 1.0 1 2.0 2 3.0 3 3.0 4 3.0 

Look how these results come from the application of sum() to every element inside the valid window. This resembles closed='both'. closed=right would look like [1.0, 1.0, 2.0, 2.0, 2.0]. Maybe the confusion arises from the fact that in fixed windows we are using the size of the window and not an interval as in the variable window, so maybe the closed parameter doesn't quite make sense here.

Nonetheless, I think that the possibility of ignoring the current index in computations is valuable (that was my initial problem, after all). Some solutions for this are:

  1. Use closed='both' as default in Fixed windows and allow closed = 'right', 'left', 'neither' as well.
  2. Drop the closed argument in Fixed windows but allow an argument to ignore the current index (same behavior as closed='left' in variable windows). The other cases equivalent to closed='right', 'neither' in variable windows can be obtained by varying window size: for example: rolling(3, closed='right') is the same as rolling(2) and rolling(3, closed='neither') is the same as rolling(2, ignore_own_index=True)
@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

Nonetheless, I think that the possibility of ignoring the current index in computations is valuable (that was my initial problem, after all). Some solutions for this are:

This a bit orthogonal though. We always want to actually use the index. Since using a fixed window size, it looks like we are ignoring the index :>.

So a possiblity is to actually check the index whether its equiv to a RangeIndex. e.g.

In [5]: df1 = pd.DataFrame([1] * 5) In [6]: df1.rolling(3, min_periods=1).sum() Out[6]: 0 0 1.0 1 2.0 2 3.0 3 3.0 4 3.0 In [8]: df2 = pd.DataFrame([1] * 5, index=[0, 1, 3, 4, 5]) In [9]: df2.rolling(3, min_periods=1).sum() Out[9]: 0 0 1.0 1 2.0 3 3.0 4 3.0 5 3.0 
In [11]: (np.diff(df1.index)==1).all() Out[11]: True In [12]: (np.diff(df2.index)==1).all() Out[12]: False 

Let's deal with this in a separate issue.

A way to disambiguate on this ATM is to allow closed=None to be the default parameter. Then it cannot be anything but that for a FixedWindow. For Variable, if we see closed=None -> closed='right' IOW the default is the same. And then allow the rest of the closed intervals.

I think we should fix this for Fixed intervals, but I agree this is a problem because we essentially have closed='both' as the default now (its implicit though) (as you stated above).

@jreback
Copy link
Contributor

jreback commented Mar 29, 2017

@carlosdanielcsantos if you can update would be great (docs and such), ignoring the issue about fixed windows for the moment.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

@carlosdanielcsantos can you update

@carlosdanielcsantos
Copy link
Contributor Author

carlosdanielcsantos commented Apr 3, 2017

@jreback I'll do that in the next hours. Should I just leave the fixed-window as it is?

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

I think make the closed=None the default, and document the possibilities (and default) for the variable sized window (and validate left,right,both,neither). For the fixed window document as well (you can say its closed='both', but ATM only allow None). and then open a new issue where we can deal with this.

We have to fix this, but this PR merely opens up the possibilities of change here. So I think ok to merge this.

@jreback
Copy link
Contributor

jreback commented Apr 9, 2017

@carlosdanielcsantos can you update

@carlosdanielcsantos
Copy link
Contributor Author

@jreback made some corrections but circleci is not passing in test_append_index - pandas.tests.test_multilevel.TestMultiLevel, but passing locally. Any thoughts?

@jreback
Copy link
Contributor

jreback commented Apr 10, 2017

try pushing again (make a modification)
looks like a fluke

@carlosdanielcsantos
Copy link
Contributor Author

@jreback Still not passing. Could I have broken this with the changes in window?

carlosdanielcsantos and others added 12 commits April 11, 2017 06:14
‘Closed’ feature is working as expected in time-based windows. Sum tests are passing.
Adding test of assert for closed parameter Adding assert for closed parameter in get_window_indexer
Also, adding whatsnew entry
Adding computation.rst section (still not written)
Not allowing closed parameter for fixed windows Passing tests again Updating docs
@jreback jreback force-pushed the rwindow-endpoints-inclusion branch from 6ca4949 to 568c12f Compare April 11, 2017 10:19
@jreback
Copy link
Contributor

jreback commented Apr 11, 2017

@carlosdanielcsantos I rebased you against current master. This should pass cleanly. If you'd make those other changes and ping.

@carlosdanielcsantos
Copy link
Contributor Author

@jreback what changes? I believe I updated everything that you requested. Can you take a look at AppVeyor build that failed?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments.

.. _stats.rolling_window.endpoints:

Rolling window endpoint inclusion
~~~~~~~~~~~~~~~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this underline needs to be the same length as the text

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this Rolling Window Endpoints

The inclusion of the interval endpoints in rolling window calculations can be specified with the ``closed``
parameter:

- ``right`` : close right endpoint (default for time-based windows)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a csv-table (3 columns: closed, description, default) or something like that

df
Currently, this feature is only implemented for time-based windows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand this a touch to say that only closed='both' is accepted for fixed windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only accepting closed=None for fixed windows, as previously requested by you

For the fixed window document as well (you can say its closed='both', but ATM only allow None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what i mean just make sure the docs are clear on that ; e.g. None -> 'both' for fixed (and assert that as well )

Other Enhancements
^^^^^^^^^^^^^^^^^^

- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window endpoint closedness. See :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See :ref:`the documentation ....

raise ValueError("min_periods must be an integer")
if self.closed is not None and self.closed not in \
['right', 'both', 'left', 'neither']:
raise ValueError("closed must be 'right', 'left', 'both' or "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there validation for a fixed window when closed is NOT both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, there is a validation for when closed is not None in Rolling.validate()

tm.assertRaisesRegexp(UnsupportedFunctionCall, msg,
getattr(r, func), dtype=np.float64)

def test_closed(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback here is the piece that tests that closed must be None for fixed-window. Well, actually it only tests that it should not be 'neither', but I do not think replication is necessary.

elif self.window < 0:
raise ValueError("window must be non-negative")

if not self.is_datetimelike and self.closed is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback here we enforce that closed must be None for windows that are not datetimelike

@jreback jreback closed this in 7322239 Apr 13, 2017
@jreback
Copy link
Contributor

jreback commented Apr 13, 2017

thanks @carlosdanielcsantos nice PR. as you can see, the code changes are the 'easy' part, its the docs & testing that are the details.!

keem em coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Design Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode

3 participants