Skip to content

Conversation

@toobaz
Copy link
Member

@toobaz toobaz commented Jun 14, 2016

@jreback take this just as a wild guess of how to fix #13365 . If the approach makes sense, I will add tests, whatsnew and some more docs.

@codecov-io
Copy link

codecov-io commented Jun 14, 2016

Current coverage is 84.34%

Merging #13440 into master will increase coverage by <.01%

@@ master #13440 diff @@ ========================================== Files 138 138 Lines 51107 51124 +17 Methods 0 0 Messages 0 0 Branches 0 0 ========================================== + Hits 43103 43119 +16  - Misses 8004 8005 +1  Partials 0 0 

Powered by Codecov. Last updated by bf4786a...7f4a72a

@jreback
Copy link
Contributor

jreback commented Jun 14, 2016

you are adding a crazy amount of code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get_indexer

@toobaz
Copy link
Member Author

toobaz commented Jun 14, 2016

Thanks for the .get_indexer() suggestion, fixed. Apart from that, do you have in mind a simpler approach? What I could do would be to override the entire _join_non_unique(), duplicating some lines of code but removing the need for the two new methods currently used in the CategoricalIndex- specific parts.

@sinhrks sinhrks added Indexing Related to indexing on series/frames, not to indexes themselves Categorical Categorical Data Type labels Jun 14, 2016
@toobaz
Copy link
Member Author

toobaz commented Jun 21, 2016

@jreback : what do you want to do?

  1. I complete the current approach with tests and docs
  2. I override the entire _join_non_unique() instead that adding two new methods
  3. you suggest another approach
  4. we close this
@jreback
Copy link
Contributor

jreback commented Jun 23, 2016

The approach is not bad, but the actual impl seems not very elegant. Further your impl of _array_putmask is not good.

@toobaz
Copy link
Member Author

toobaz commented Jun 23, 2016

Any constructive criticism? :-) My mood doesn't depend on them, but maybe this PR/issue does.

@jreback
Copy link
Contributor

jreback commented Jun 23, 2016

maybe @sinhrks has some

@sinhrks
Copy link
Member

sinhrks commented Jun 24, 2016

I may misunderstand, but the point is how pandas handle extended dtype? We should decide whether to make them more numpy compat, or use separated logic using if-else branch.

I think most of other impl takes latter (if-else) option currently.

@toobaz
Copy link
Member Author

toobaz commented Jun 24, 2016

@sinhrks : yes, the bug is due to a numpy method (putmask()) not working (and it's not a mere problem of type checks, there's really missing logic, which this PR introduces) on Categorical.

Certainly the cleanest option (for code organization, not that it would spare us from including the missing logic) would be that Categorical inherits from np.ndarray. But that's obviously a huge work... and might (I don't know enough to judge) even be plain impossible due to the in-memory structure of Categorical being different from a typical ndarray.

Then sure, we can solve this also with "if-else"s without overriding... I have hard time considering such approach more "elegant" than mine, but if it is the path of least resistance to closing the bug I'll update the PR.

@jreback
Copy link
Contributor

jreback commented Jun 24, 2016

@toobaz I wasn't looking for a general 'why are we doing things this way', that is obvious. The issue is you are jumping thru hoops and not using the existing Categorical machinery

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really really odd way to do this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... as already said, if you want me I override the entire _join_non_unique I will, I don't see many other alternatives currently.

@toobaz
Copy link
Member Author

toobaz commented Jun 25, 2016

OK, now I'm (extending and) using union_categorical() (thanks for the tip - although I don't have clear what you mean by "violates the guarantee"), and overriding directly _join_non_unique() so that the change is much less invasive.

@toobaz
Copy link
Member Author

toobaz commented Jul 9, 2016

@jreback ping

Feel free to close, I won't be offended.

@jreback
Copy link
Contributor

jreback commented Jul 9, 2016

yep, don't really have time to look, but needs some refactoring. if you wan to submit again pls feel free.

@jreback jreback closed this Jul 9, 2016
@toobaz
Copy link
Member Author

toobaz commented Jul 9, 2016

Submit again?! Well... what?!

I guess the bug(s) will be fixed by somebody who understands how contributing to this project works. I tried in vain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves

4 participants