Skip to content

Conversation

@joders
Copy link
Contributor

@joders joders commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################ ###################### Docstring (pandas.DataFrame.count) ###################### ################################################################################ Count non-NA cells for each column or row. Return Series with number of non-NA observations over requested axis. Works with non-floating point data as well (detects `NaN` and `None`) Parameters ---------- axis : {0 or 'index', 1 or 'columns'}, default 0 If equal 0 or 'index' counts are generated for each column. If equal 1 or 'columns' counts are generated for each row. level : int or str, optional If the axis is a `MultiIndex` (hierarchical), count along a particular level, collapsing into a `DataFrame`. A `str` specifies the level name. numeric_only : boolean, default False Include only `float`, `int` or `boolean` data. Returns ------- Series or DataFrame For each column/row the number of non-NA/null entries. If level is specified returns a `DataFrame`. See Also -------- Series.count: number of non-NA elements in a Series DataFrame.shape: number of DataFrame rows and columns (including NA elements) DataFrame.isnull: boolean same-sized DataFrame showing places of NA elements Examples -------- >>> df=pd.DataFrame({ "Person":["John","Myla",None], ... "Age":[24.,np.nan,21.], ... "Single":[False,True,True] }) >>> df Person Age Single 0 John 24.0 False 1 Myla NaN True 2 None 21.0 True >>> df.count() Person 2 Age 2 Single 3 dtype: int64 >>> df.count(axis=1) 0 3 1 2 2 2 dtype: int64 ################################################################################ ################################## Validation ################################## ################################################################################ Docstring for "pandas.DataFrame.count" correct. :) 

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `NaN` and
`None`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add NaT here (that's our missing value for datetime data)

level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a DataFrame
If equal 0 or 'index' counts are generated for each column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove "equal" from this line and the next.

Examples
--------
>>> df=pd.DataFrame({ "Person":["John","Myla",None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pep8 on this example. space around =, no space after {, space after :, space after ,.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an example with level=? I think you could

  1. Make the dataframe 2 items longer and repeat John an Myla.
  2. Update the df output and df.count examples
  3. show df.set_index(['Person', 'Single']).count(level='Person')
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
elements)
DataFrame.isnull: boolean same-sized DataFrame showing places of NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to isna instead

2 None 21.0 True
3 John 33.0 True
4 Myla 26.0 False
>>> df.count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between cases

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you paste the output of the doc validation script again?

Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `None`,
`NaN` and `NaT`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End with a .

If 1 or 'columns' counts are generated for each **row**.
level : int or str, optional
If the axis is a `MultiIndex` (hierarchical), count along a
particular level, collapsing into a `DataFrame`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks around the `level` parameter.

@joders
Copy link
Contributor Author

joders commented Mar 11, 2018

################################################################################ ###################### Docstring (pandas.DataFrame.count) ###################### ################################################################################ Count non-NA cells for each column or row. Return Series with number of non-NA observations over requested axis. Works with non-floating point data as well (detects `None`, `NaN` and `NaT`). Parameters ---------- axis : {0 or 'index', 1 or 'columns'}, default 0 If 0 or 'index' counts are generated for each column. If 1 or 'columns' counts are generated for each **row**. level : int or str, optional If the axis is a `MultiIndex` (hierarchical), count along a particular `level`, collapsing into a `DataFrame`. A `str` specifies the level name. numeric_only : boolean, default False Include only `float`, `int` or `boolean` data. Returns ------- Series or DataFrame For each column/row the number of non-NA/null entries. If `level` is specified returns a `DataFrame`. See Also -------- Series.count: number of non-NA elements in a Series DataFrame.shape: number of DataFrame rows and columns (including NA elements) DataFrame.isna: boolean same-sized DataFrame showing places of NA elements Examples -------- Constructing DataFrame from a dictionary: >>> df = pd.DataFrame({"Person": ... ["John", "Myla", None, "John", "Myla"], ... "Age": [24., np.nan, 21., 33, 26], ... "Single": [False, True, True, True, False]}) >>> df Person Age Single 0 John 24.0 False 1 Myla NaN True 2 None 21.0 True 3 John 33.0 True 4 Myla 26.0 False Notice the uncounted NA values: >>> df.count() Person 4 Age 4 Single 5 dtype: int64 Counts for each **row**: >>> df.count(axis='columns') 0 3 1 2 2 2 3 3 4 3 dtype: int64 Counts for one level of a `MultiIndex`: >>> df.set_index(["Person", "Single"]).count(level="Person") Age Person John 2 Myla 1 ################################################################################ ################################## Validation ################################## ################################################################################ Docstring for "pandas.DataFrame.count" correct. :) 
axis. Works with non-floating point data as well (detects NaN and None)
Count non-NA cells for each column or row.
Return Series with number of non-NA observations over requested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change, maybe remove the first sentence since this can return a DataFrame with level.

I think just use the extended summary to say what counts as non-null data.

The values None, NaN, NaT, and optionally np.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the first sentence in the extended summary, i.e. :
"Return Series with number of non-NA observations over requested axis."

If I understand you right I would change the entire summary (i.e. short and extended summary) to look like the following:

 Count non-NA cells for each column or row. The values None, NaN, NaT, and optionally np.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA. 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. np.inf to `numpy.inf` and single backticks around pandas.options.mode.use_inf_as_na.

>>> df = pd.DataFrame({"Person":
... ["John", "Myla", None, "John", "Myla"],
... "Age": [24., np.nan, 21., 33, 26],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: indendt one more space. smae with line below.

Copy link
Contributor Author

@joders joders Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me flake complains if I change that. on my system flake doesn't check the examples, so I copy it in the code:

 df = pd.DataFrame({"Person": ["John", "Myla", None, "John", "Myla"], "Age": [24., np.nan, 21., 33, 26], "Single": [False, True, True, True, False]}) df 

If I have it like it like this flake only complains about the pd not being defined:
pandas/core/frame.py:5672:14: F821 undefined name 'pd'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misread.

@joders
Copy link
Contributor Author

joders commented Mar 12, 2018

################################################################################ ###################### Docstring (pandas.DataFrame.count) ###################### ################################################################################ Count non-NA cells for each column or row. The values `None`, `NaN`, `NaT`, and optionally `numpy.inf` (depending on `pandas.options.mode.use_inf_as_na`) are considered NA. Parameters ---------- axis : {0 or 'index', 1 or 'columns'}, default 0 If 0 or 'index' counts are generated for each column. If 1 or 'columns' counts are generated for each **row**. level : int or str, optional If the axis is a `MultiIndex` (hierarchical), count along a particular `level`, collapsing into a `DataFrame`. A `str` specifies the level name. numeric_only : boolean, default False Include only `float`, `int` or `boolean` data. Returns ------- Series or DataFrame For each column/row the number of non-NA/null entries. If `level` is specified returns a `DataFrame`. See Also -------- Series.count: number of non-NA elements in a Series DataFrame.shape: number of DataFrame rows and columns (including NA elements) DataFrame.isna: boolean same-sized DataFrame showing places of NA elements Examples -------- Constructing DataFrame from a dictionary: >>> df = pd.DataFrame({"Person": ... ["John", "Myla", None, "John", "Myla"], ... "Age": [24., np.nan, 21., 33, 26], ... "Single": [False, True, True, True, False]}) >>> df Person Age Single 0 John 24.0 False 1 Myla NaN True 2 None 21.0 True 3 John 33.0 True 4 Myla 26.0 False Notice the uncounted NA values: >>> df.count() Person 4 Age 4 Single 5 dtype: int64 Counts for each **row**: >>> df.count(axis='columns') 0 3 1 2 2 2 3 3 4 3 dtype: int64 Counts for one level of a `MultiIndex`: >>> df.set_index(["Person", "Single"]).count(level="Person") Age Person John 2 Myla 1 ################################################################################ ################################## Validation ################################## ################################################################################ Docstring for "pandas.DataFrame.count" correct. :) 
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 12, 2018
@TomAugspurger TomAugspurger merged commit 0596cb1 into pandas-dev:master Mar 12, 2018
@TomAugspurger
Copy link
Contributor

Thanks @joders!

@joders
Copy link
Contributor Author

joders commented Mar 13, 2018

thanks for providing pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants