Skip to content

Conversation

@HagaiHargil
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented Feb 13, 2018

Hello @HagaiHargil! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 21, 2018 at 06:55 Hours UTC
missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
successful array alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Series, so change DataFrame to Series.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe rephrase it as

Fill missing values with 'fill_value'. There are two sources of missing values * Missing values present either Series before the operation * Newly created missing values as a result of an alignment In either case, missing values are not filled if both Series are missing after alignment. At least one value from either Series must be not NA for 'fill_value' to be used. 

Though I'm sure if this is any clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a problem with the first line - "Fill missing values with 'fill_value'". This is basically what bothered me in the first place, it doesn't exactly fill missing value, in the naive way you would expect it to.

If you insist I'll gladly rephrase my current wording. In the mean time I'll fix the rest of your remarks.

Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a DataFrame for this example? It can just be a single-column DataFrame with these same values and index.

@gfyoung gfyoung added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 14, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to make a common shared doc-string would be nice as well (could be followon PR)

fill_value : None or float value, default None (NaN)
Fill missing (NaN) values with this value. If both Series are
missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the 'and any new elemnt needed for successful array alignment', this is redundant with your last sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which sentence are you referring to. The "and any new element needed" sentence refers to the alignment process. The last sentence ("If data in both corresponding...") only deals with a "corner case" of two NaNs, without any direct reference to the data alignment process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use the word 'array' as these are not arrays. this is still not very clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this phrasing much better. Could you change "new element" to "new missing values"

You can remove the commas around the "and any new element... " clause.

Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show a and b as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to have a line with just >>> a on it before showing the Series. Likewise for b.

fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are
missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Examples
--------
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show a and b.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob want to show say a with 1 column and b with 2

Examples
--------
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Examples
--------
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob want to show say a with 1 column and b with 2

@jreback jreback changed the title Clarify and add fill_value example in arithmetic ops DOC: Clarify and add fill_value example in arithmetic ops Feb 15, 2018
@codecov
Copy link

codecov bot commented Feb 18, 2018

Codecov Report

Merging #19675 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #19675 +/- ## ========================================== + Coverage 91.57% 91.61% +0.03%  ========================================== Files 150 150 Lines 48817 48882 +65 ========================================== + Hits 44704 44782 +78  + Misses 4113 4100 -13
Flag Coverage Δ
#multiple 89.98% <100%> (+0.03%) ⬆️
#single 41.79% <100%> (+0.06%) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 96.86% <100%> (+0.03%) ⬆️
pandas/core/arrays/base.py 60% <0%> (-0.61%) ⬇️
pandas/core/series.py 94.46% <0%> (-0.11%) ⬇️
pandas/core/sparse/series.py 95.25% <0%> (-0.02%) ⬇️
pandas/core/indexes/multi.py 95.06% <0%> (-0.01%) ⬇️
pandas/core/sparse/frame.py 94.81% <0%> (ø) ⬆️
pandas/core/groupby.py 92.2% <0%> (ø) ⬆️
pandas/core/indexes/api.py 98.78% <0%> (ø) ⬆️
pandas/io/parquet.py 71.79% <0%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9551c8...3192301. Read the comment docs.

@HagaiHargil
Copy link
Contributor Author

I'm also not sure what a shared docstring really is in this context :)

b 1.0
c 1.0
d NaN
>>> b = pd.DataFrame([[1, np.nan], [np.nan, 2], [1, np.nan], [np.nan, 2]], index=['a', 'b', 'c_', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not pass the linter, need to format this, prob easier to use a dict to construct as well.

fill_value : None or float value, default None (NaN)
Fill missing (NaN) values with this value. If both Series are
missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use the word 'array' as these are not arrays. this is still not very clear.

@HagaiHargil
Copy link
Contributor Author

I changed the confusing "array" into series\dataframe.

Regarding the "this is still not very clear" comment -

First, I guess you agree with me that the previous text was plain wrong. Now I'm trying to find a clear enough sentence that fits the actual behavior of this keyword, but inconsistent behavior leads to badly-phrased docs. I'm really not sure what to say anymore.

@TomAugspurger
Copy link
Contributor

Looks like there are some linting errors failing the build: #19675 (comment)

This also seems to cause issues with our _make_flex_doc helper. Are you able to import pandas after making your changes locally?

@HagaiHargil
Copy link
Contributor Author

It was an interesting bug with the {} dict constructor.

c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'c_', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer just using abde or something, the c_ is confusing

@jreback jreback added this to the 0.23.0 milestone Feb 22, 2018
@jreback jreback merged commit 4ed8313 into pandas-dev:master Feb 22, 2018
@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

thanks @HagaiHargil

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations

5 participants