Skip to content

Conversation

@bourbaki
Copy link
Contributor

@bourbaki bourbaki commented Dec 5, 2018

@pep8speaks
Copy link

pep8speaks commented Dec 5, 2018

Hello @Ma3aXaKa! Thanks for updating the PR.

Line 656:60: E261 at least two spaces before inline comment
Line 656:80: E501 line too long (88 > 79 characters)

Comment last updated on December 07, 2018 at 17:52 Hours UTC
@bourbaki
Copy link
Contributor Author

bourbaki commented Dec 5, 2018

The only question I have is how to properly document an alias like that?

@codecov
Copy link

codecov bot commented Dec 5, 2018

Codecov Report

Merging #24111 into master will decrease coverage by <.01%.
The diff coverage is 75%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #24111 +/- ## ========================================== - Coverage 92.21% 92.2% -0.01%  ========================================== Files 161 161 Lines 51684 51688 +4 ========================================== + Hits 47658 47660 +2  - Misses 4026 4028 +2
Flag Coverage Δ
#multiple 90.6% <75%> (-0.01%) ⬇️
#single 43% <75%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/window.py 96.29% <75%> (-0.11%) ⬇️
pandas/io/json/json.py 92.61% <0%> (-0.48%) ⬇️
pandas/util/testing.py 87.48% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b5f4d1...45f2bf8. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 5, 2018

Codecov Report

Merging #24111 into master will decrease coverage by 0.01%.
The diff coverage is 36.36%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #24111 +/- ## ========================================== - Coverage 92.2% 92.19% -0.02%  ========================================== Files 162 162 Lines 51700 51711 +11 ========================================== + Hits 47670 47674 +4  - Misses 4030 4037 +7
Flag Coverage Δ
#multiple 90.59% <36.36%> (-0.02%) ⬇️
#single 43.01% <18.18%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/window.py 95.61% <36.36%> (-0.79%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f492be6...9a28452. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the right way - this should be setup in base class (where groupby and resample also inherit)

.size is not the same as .count either - it’s the size of the window where
count is the size - min_count

@bourbaki
Copy link
Contributor Author

bourbaki commented Dec 5, 2018

Thanks for the feedback. Maybe this is stupid question but could you give an example where size and count should give different results?
And what base class do you mean exactly? Looking at the class diagram the most "promising" one is pandas.core.base.SelectionMixin

@bourbaki
Copy link
Contributor Author

bourbaki commented Dec 5, 2018

@TomAugspurger Do you think I need to implement roll_size function? I am looking into _libs/window.pyx and there are Cython implementations of basic aggregate functions for rolling windows.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 5, 2018 via email

@WillAyd WillAyd added the Window rolling, ewma, expanding label Dec 5, 2018
@WillAyd
Copy link
Member

WillAyd commented Dec 5, 2018

roll_size is only used when dealing with frequency windows otherwise there's a count method in one of the base classes you could use for reference. Your test case should also take into account various combinations of arguments to the Window

@jreback
Copy link
Contributor

jreback commented Dec 6, 2018

so looking at this again, I think that this is almost exactly count, but counts all values rather than notna ones. So prob easiest to move count to _count.() and parameterize this to count na's via a keyword (skipna=True), then both size and count can call this.

@jorisvandenbossche
Copy link
Member

Maybe this is stupid question but could you give an example where size and count should give different results?

size gives the full size of the group/window (the number of rows), while count is a count of the non-missing values. So once you have missing values, there will be different results.

To give a concrete example with groupby, as there both are implemented:

In [12]: df = pd.DataFrame({'a': [1, 1, 1, 2, 2, 2], 'b':[1, 2, 3, np.nan, 5, 6]}) In [13]: df.groupby('a')['b'].count() Out[13]: a 1 3 2 2 Name: b, dtype: int64 In [14]: df.groupby('a')['b'].size() Out[14]: a 1 3 2 3 Name: b, dtype: int64 
@bourbaki
Copy link
Contributor Author

bourbaki commented Dec 7, 2018

I have pushed the solution I came up with. Could roll_size be redundant?

@bourbaki
Copy link
Contributor Author

@jreback Is this approach ok?

@WillAyd
Copy link
Member

WillAyd commented Dec 10, 2018

Would rather see something that integrates into the existing count function as noted above by myself and Jeff. roll_count is only used with certain arguments so emulating via roll_size isn't the right way to go about this

@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

if you want to update according to comment, pls ping.

@jreback jreback closed this Jan 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Window rolling, ewma, expanding

6 participants