Skip to content

Conversation

@nbonnotte
Copy link
Contributor

closes #7791
xref #6797

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add names=self.names

@jreback
Copy link
Contributor

jreback commented Nov 8, 2015

couple of comments. pls also add a whatsnew note (bug fix section)

@jreback jreback added Bug Output-Formatting __repr__ of pandas objects, to_string MultiIndex labels Nov 8, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 8, 2015
@nbonnotte
Copy link
Contributor Author

Hum, na_rep has its own difficulty:

For a plain index:

In [79]: pd.DataFrame({'a': np.NaN, 'b':[2,3]}).set_index('a').index Out[79]: Float64Index([nan, nan], dtype='float64', name=u'a') 

and that's cool, but for a multi-index:

In [80]: pd.DataFrame({'a': np.NaN, 'b':[2,3]}).set_index(['a','b']).index Out[80]: MultiIndex(levels=[[], [2, 3]], labels=[[-1, -1], [0, 1]], names=[u'a', u'b']) 

so there are no nan values stored, and thus:

In [81]: pd.DataFrame({'a': np.NaN, 'b':[2,3]}).set_index(['a','b']).index.levels[0] Out[81]: Float64Index([], dtype='float64', name=u'a') 
@jreback
Copy link
Contributor

jreback commented Nov 8, 2015

You are setting everything to the index, which collapses the frame.

In [10]: pd.DataFrame({'a': np.NaN, 'b':[2,3], 'c' : [5,6]}).set_index(['a','b']) Out[10]: c a b NaN 2 5 3 6 

having nans in an index is quite odd and only semi-supported, meaning some edge cases exist.

@nbonnotte
Copy link
Contributor Author

Even when there is a third column 'c' that stays outside the index, we still get an empty Float64Index:

In [5]: pd.DataFrame({'a': np.NaN, 'b':[2,3], 'c' : [5,6]}).set_index(['a','b']).index Out[5]: MultiIndex(levels=[[], [2, 3]], labels=[[-1, -1], [0, 1]], names=[u'a', u'b']) In [6]: pd.DataFrame({'a': np.NaN, 'b':[2,3], 'c' : [5,6]}).set_index('a').index Out[6]: Float64Index([nan, nan], dtype='float64', name=u'a') 

If this is an "edge case", I'm pushing what I have done so far:

  • what's new
  • a short explanation for the test for date_format
  • add names=self.names where it should be
  • test if quoting is also passed to the multi-indexes

What I have NOT done:

@jreback
Copy link
Contributor

jreback commented Nov 10, 2015

ok, this looks good. pls squash and ping when green.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say something like bug in DataFrame.to_csv in passing thru arguments for formatting MultiIndexes, including date_format

@nbonnotte
Copy link
Contributor Author

All done

@jreback
Copy link
Contributor

jreback commented Nov 11, 2015

merged via 9cbe8b9

thanks!

@jreback jreback closed this Nov 11, 2015
@nbonnotte nbonnotte deleted the groupby-then-to_csv branch November 16, 2015 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug MultiIndex Output-Formatting __repr__ of pandas objects, to_string

2 participants