Skip to content

Conversation

@pykler
Copy link
Contributor

@pykler pykler commented Apr 13, 2015

closes #9785

@cpcloud cpcloud added this to the 0.16.1 milestone Apr 13, 2015
@cpcloud cpcloud added the Bug label Apr 13, 2015
@jreback jreback added Error Reporting Incorrect or improved errors from pandas Compat pandas objects compatability with Numpy or Python functions Bug and removed Bug labels Apr 13, 2015
@pykler pykler changed the title Fixing == __eq__ operator for MultiIndex #9875 Fixing == __eq__ operator for MultiIndex ... closes #9785 Apr 13, 2015
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange to me. Why would we ever want to use NotImplementedType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function just returns the type of an object ... NotImplemented is an object. So not sure what the question is?

if the question is why is this feature important? Pandas has code that checks the type of return values and acts accordingly. Occasionally Numpy will return this type and this code breaks with an attribute error because NotImplementedType has no attribute dtype.

PS. Without this the tests below fail. This may not be clear from the small diff but the bug in this case is visible in _indexOp pandas/core/index.py:58

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for clarifying. My sense is that this method probably should raise if it gets NotImplemented, because this is not really a valid numpy dtype.

Instead, I would make the wrapper functions in pandas.core.index handle NotImplemented explicitly, e.g., add something like

if result is NotImplemented: return result 

Operations that wrap arithmetic should know how to handle NotImplemented -- that's a saner place to put it than here (where it could eat some strange errors)..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you are saying about numpy types, but it shouldn't eat up any errors, since this function just says that the type of the input is so and so ... right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based purely on the name, function expects a numpy/pandas array or dtype object, and NotImplemented is certainly not one of those :). Returning arbitrary output instead of raising an error feels like an anti-pattern to me.

In contrast, for something like is_bool_dtype it does make sense to return False when given NotImplemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll make this change in is_bool_dtype instead ... ccing @jreback for reference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, this seems more reasonable, though from my perspective I would probably still be happier explicitly handling NotImplemented inside the wrapper functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So conceptually though, the code in the wrapper got a result, its trying to check if this result is a bool type ... the check is crashing on this NULL type of exception. So this is where its gets tricky conceptually, is NULL part of this function's domain or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed... might be better to use try/except logic, e.g.,

try: tipo = _get_dtype_type(arr_or_dtype) except Exception: return False 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, one last thing. I tested the following and it also crapped out.

_get_dtype_type(False) 

so is this also supposed to crap out? or is it supposed to return False?

is_bool_dtype(False) 
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is an internal function, so it's not extensively tested (and not worth worrying about testing too much). It expects the argument to already be a numpy dtype or numpy array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I caught ValueError and made sure that's raised ... does this look better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes :)

@jreback
Copy link
Contributor

jreback commented Apr 14, 2015

merged via 07257a0

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Compat pandas objects compatability with Numpy or Python functions Error Reporting Incorrect or improved errors from pandas

4 participants