69

I have a DataFrame that looks like

 Emp1 Empl2 date Company 0 0 0 2012-05-01 apple 1 0 1 2012-05-29 apple 2 0 1 2013-05-02 apple 3 0 1 2013-11-22 apple 18 1 0 2011-09-09 google 19 1 0 2012-02-02 google 20 1 0 2012-11-26 google 21 1 0 2013-05-11 google 

I want to pass the company and date for setting a MultiIndex for this DataFrame. Currently it has a default index. I am using

df.set_index(['Company', 'date'], inplace=True) 

But when I print, it prints None. Is this not the correct way of doing it? Also I want to shuffle the positions of the columns company and date so that company becomes the first index, and date becomes the second in Hierarchy. Any ideas on this?

3
  • Works for me no problem, what version pandas and numpy are you using? Can you post code and data to reproduce the issue Commented Jun 4, 2014 at 15:24
  • @ Andy Hayden : Before setting the index the df was printed in the way I have posted in my original question. Commented Jun 4, 2014 at 15:52
  • How is that a multi index? It looks like you set a normal 1 dimensional index. Commented Aug 31, 2018 at 22:37

2 Answers 2

99

When you pass inplace in makes the changes on the original variable and returns None, and the function does not return the modified dataframe, it returns None.

is_none = df.set_index(['Company', 'date'], inplace=True) df # the dataframe you want is_none # has the value None 

so when you have a line like:

df = df.set_index(['Company', 'date'], inplace=True) 

it first modifies df... but then it sets df to None!

That is, you should just use the line:

df.set_index(['Company', 'date'], inplace=True) 
Sign up to request clarification or add additional context in comments.

Comments

6

The result of set_index() is a copy, so you can assign it back to df (instead of using inplace= parameter).

df = df.set_index(['Company', 'date']) 

res1


Note how set_index() overwrites the old index by default. You can keep the old index by appending the new indices via the append= parameter.

df = df.set_index(['Company', 'date'], append=True) 

res2


The new index doesn't need to come from the columns. You can pass a pandas Series or a numpy array of the same length as the dataframe to set_index().

new_idx = pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) df = df.set_index([new_idx, 'date']) 

res3


To set a brand new MultiIndex, you can use pd.MultiIndex object. Depending on what you use to build the index, there are convenient methods, from_arrays(), from_tuples(), from_product().

For example, if you want to create a MultiIndex from the Cartesian product of lst1 and lst2, you can do so by calling from_product(). Note that the length of the MultiIndex must match the length of the dataframe for this to work.

lst1 = ['a', 'b', 'c', 'd'] lst2 = [100, 200] df.index = pd.MultiIndex.from_product([lst1, lst2]) 

res5

2 Comments

Can I use set_idex to create a multiindex using tuples stores in one of the columns of the dataframe? Or do I have to (1) construct and assign a multiindex using pd.Multiindex.from_tuples(); and then (2) drop the column from the dataframe?
@Confounded The column must be converted into a MultiIndex if you want to use it as a MultiIndex later, so your method is the correct way. You can also remove the column beforehand using pop. So something like df.set_index(pd.MultiIndex.from_tuples(df.pop('tup_col'))).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.