-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: Patch handling of keep_default_na=False #19260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| the default NaN values are used for parsing. | ||
| * If `keep_default_na` is False, and `na_values` are not specified, only | ||
| the NaN values specified `na_values` are used for parsing. | ||
| * If `keep_default_na` is False, and `na_values` are not specified, no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the same as the 3rd bullet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that first line is a just copy-paste fail.
doc/source/io.rst Outdated
| If na_values are specified and keep_default_na is ``False`` the default NaN | ||
| values are overridden, otherwise they're appended to. | ||
| Whether or not to include the default NaN values when parsing the data. | ||
| Provided that `na_filter` is True, depending on whether `na_values` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 2nd sentence is slightly confusing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
pandas/tests/io/parser/na_values.py Outdated
| 'seven']}) | ||
| tm.assert_frame_equal(xp.reindex(columns=df.columns), df) | ||
| | ||
| # see gh-19227: keep_default_na=False should be enforced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split this test up, getting a bit long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
8ef3ae5 to 9cc889c Compare | If na_values are specified and keep_default_na is ``False`` the default NaN | ||
| values are overridden, otherwise they're appended to. | ||
| Whether or not to include the default NaN values when parsing the data. | ||
| Depending on whether `na_values` is passed in, the behavior is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
| elif isinstance(na_values, dict): | ||
| na_values = na_values.copy() # Prevent aliasing. | ||
| if keep_default_na: | ||
| for k, v in compat.iteritems(na_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to have a comment or 2 to explain the logic here for future readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
9cc889c to dd5880f Compare Codecov Report
@@ Coverage Diff @@ ## master #19260 +/- ## ========================================== + Coverage 91.56% 91.56% +<.01% ========================================== Files 148 148 Lines 48874 48878 +4 ========================================== + Hits 44751 44755 +4 Misses 4123 4123
Continue to review full report at Codecov.
|
| (k, _floatify_na_values(v)) for k, v in na_values.items() # noqa | ||
| ) | ||
| | ||
| na_values[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an anti-pattern to modify the dict that you are iterating. can you create you create a new one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback : I realize that, but I think that's why up above, someone wrote na_values = na_values.copy(). The reference to the original input is destroyed and created a "new" dictionary. Do you just want me to change the assigned variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you should create a new empty dict and then assign to it (it doesn't have to be a copy). iterating and modifying is a no-no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to create a copy because at the bottom, we return na_values and na_fvalues regardless of the logic branch. However, I'll iterate over the old_na_values instead.
dd5880f to ecbe462 Compare Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
ecbe462 to 63faf8d Compare | @jreback : The OSX builds on Travis seemed to have stalled completely (all of the other tests have been done for a LONG time now). |
| thanks @gfyoung |
Patches very buggy behavior of
keep_default_na=Falsewheneverna_valuesis a dictkeep_default_nafor column that doesn't exist inna_valuesdictionaryna_valueis a scalar in thena_valuesdictionary.In addition, clarifies documentation on the handling of the keep
keep_default_naparameter with respect tona_filterandna_values.Closes #19227.
cc @neilser