pandas unexpected set_index behaviour

Question

data = [['g1','a',1],['g1','b',2],['g2','b',3],['g2','a',4]] df = pandas.DataFrame(data=data, columns=['group','name','count']) print df.set_index(['group','name']) print df.set_index(['name','group']) count group name g1 a 1 b 2 g2 b 3 a 4 count name group a g1 1 b g1 2 g2 3 a g2 4

This behavior is rather surprising to me because I was expecting the second output to be like

 count name group a g1 1 g2 4 b g1 2 g2 3

chrisaycock · Accepted Answer · 2016-02-01 20:58:09Z

7

The DataFrame needs to be sorted first to get your desired output:

In [12]: df.sort_values('name').set_index(['name','group']) Out[12]: count name group a g1 1 g2 4 b g1 2 g2 3

answered Feb 1, 2016 at 20:58

chrisaycock

38.1k15 gold badges94 silver badges128 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

EdChum Over a year ago

Indeed the docs discuss why this is important

Jason Strimpel Over a year ago

Link in the docs is now here pandas.pydata.org/pandas-docs/stable/…

Collectives™ on Stack Overflow

pandas unexpected set_index behaviour

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related