Set MultiIndex of an existing DataFrame in pandas

Question

I have a DataFrame that looks like

 Emp1 Empl2 date Company 0 0 0 2012-05-01 apple 1 0 1 2012-05-29 apple 2 0 1 2013-05-02 apple 3 0 1 2013-11-22 apple 18 1 0 2011-09-09 google 19 1 0 2012-02-02 google 20 1 0 2012-11-26 google 21 1 0 2013-05-11 google

I want to pass the company and date for setting a MultiIndex for this DataFrame. Currently it has a default index. I am using

df.set_index(['Company', 'date'], inplace=True)

But when I print, it prints None. Is this not the correct way of doing it? Also I want to shuffle the positions of the columns company and date so that company becomes the first index, and date becomes the second in Hierarchy. Any ideas on this?

Works for me no problem, what version pandas and numpy are you using? Can you post code and data to reproduce the issue — EdChum
– EdChum, Commented Jun 4, 2014 at 15:24
@ Andy Hayden : Before setting the index the df was printed in the way I have posted in my original question. — user3527975
– user3527975, Commented Jun 4, 2014 at 15:52
How is that a multi index? It looks like you set a normal 1 dimensional index. — Soerendip
– Soerendip, Commented Aug 31, 2018 at 22:37

Andy Hayden · Accepted Answer · 2014-06-04 15:37:57Z

When you pass inplace in makes the changes on the original variable and returns None, and the function does not return the modified dataframe, it returns None.

is_none = df.set_index(['Company', 'date'], inplace=True) df # the dataframe you want is_none # has the value None

so when you have a line like:

df = df.set_index(['Company', 'date'], inplace=True)

it first modifies df... but then it sets df to None!

That is, you should just use the line:

df.set_index(['Company', 'date'], inplace=True)

cottontail · Accepted Answer · 2023-01-26 22:51:00Z

The result of set_index() is a copy, so you can assign it back to df (instead of using inplace= parameter).

df = df.set_index(['Company', 'date'])

Note how set_index() overwrites the old index by default. You can keep the old index by appending the new indices via the append= parameter.

df = df.set_index(['Company', 'date'], append=True)

The new index doesn't need to come from the columns. You can pass a pandas Series or a numpy array of the same length as the dataframe to set_index().

new_idx = pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) df = df.set_index([new_idx, 'date'])

To set a brand new MultiIndex, you can use pd.MultiIndex object. Depending on what you use to build the index, there are convenient methods, from_arrays(), from_tuples(), from_product().

For example, if you want to create a MultiIndex from the Cartesian product of lst1 and lst2, you can do so by calling from_product(). Note that the length of the MultiIndex must match the length of the dataframe for this to work.

lst1 = ['a', 'b', 'c', 'd'] lst2 = [100, 200] df.index = pd.MultiIndex.from_product([lst1, lst2])

Can I use set_idex to create a multiindex using tuples stores in one of the columns of the dataframe? Or do I have to (1) construct and assign a multiindex using pd.Multiindex.from_tuples(); and then (2) drop the column from the dataframe?
@Confounded The column must be converted into a MultiIndex if you want to use it as a MultiIndex later, so your method is the correct way. You can also remove the column beforehand using pop. So something like df.set_index(pd.MultiIndex.from_tuples(df.pop('tup_col'))).

Collectives™ on Stack Overflow

Set MultiIndex of an existing DataFrame in pandas

2 Answers 2

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Linked

Related