Pandas group by column and add numbered rows to each group

Question

I have the following data stored in a pandas.DataFrame object named df. The column id is a unique identifier and the remaining columns are irrelevant for the purpose of this question.

	id	x1	x2	x3
0	01001	523.41	639673	1222.13
1	01002	54.832	33746	615.443
2	01003	48.3824	45196	934.142

I want to know if there's a way to group by id and use assign to add multiple numbered rows to each group.

In other words, I want to use range to add an arbitrary number of rows to each id. The desired result looks as follows:

	id	x1	x2	x3	new_col
0	01001	523.41	639673	1222.13	2020
0	01001	523.41	639673	1222.13	2021
1	01002	54.832	33746	615.443	2020
1	01002	54.832	33746	615.443	2021
2	01003	48.3824	45196	934.142	2020
2	01003	48.3824	45196	934.142	2021

I was hoping something along these lines would work.

df = df.groupby('id').assign(new_col=range(2020, 2022))

Sunderam Dubey · Accepted Answer · 2022-05-30 15:38:39Z

You can repeat the index and use numpy.tile to tile the new data:

import numpy as np new = ['2020', '2021'] df2 = df.loc[df.index.repeat(len(new))] df2['new_col'] = np.tile(new, len(df))

Output:

 id x1 x2 x3 new_col 0 1001 523.4100 639673 1222.130 2020 0 1001 523.4100 639673 1222.130 2021 1 1002 54.8320 33746 615.443 2020 1 1002 54.8320 33746 615.443 2021 2 1003 48.3824 45196 934.142 2020 2 1003 48.3824 45196 934.142 2021

Alternatively:

new = range(2020, 2022) df2 = df.loc[df.index.repeat(len(new))] df2['new_col'] = new.start + df2.groupby('id').cumcount()

I added an alternative solution based on df.index.repeat. It's very similar to your solution, but it is slightly closer to what I originally had in mind.

Sunderam Dubey · Accepted Answer · 2022-05-30 15:39:11Z

We can also create a MultiIndex.from_product and reindex relative to level=0, to scale up the DataFrame, and reset_index to convert the inner level of the MultiIndex into a column:

df = df.reindex( index=pd.MultiIndex.from_product( [df.index, range(2020, 2022)], names=[None, 'new_col'] ), level=0 ).reset_index(level='new_col')

df:

 new_col id x1 x2 x3 0 2020 1001 523.4100 639673 1222.130 0 2021 1001 523.4100 639673 1222.130 1 2020 1002 54.8320 33746 615.443 1 2021 1002 54.8320 33746 615.443 2 2020 1003 48.3824 45196 934.142 2 2021 1003 48.3824 45196 934.142

We can reorder new_col to the end if needed:

df = df.reindex( index=pd.MultiIndex.from_product( [df.index, range(2020, 2022)], names=[None, 'new_col'] ), level=0 ).reset_index(level='new_col') # Re-order columns df = df.reindex( columns=df.columns[df.columns != 'new_col'].union(['new_col'], sort=False) )

df:

 id x1 x2 x3 new_col 0 1001 523.4100 639673 1222.130 2020 0 1001 523.4100 639673 1222.130 2021 1 1002 54.8320 33746 615.443 2020 1 1002 54.8320 33746 615.443 2021 2 1003 48.3824 45196 934.142 2020 2 1003 48.3824 45196 934.142 2021

Setup:

import pandas as pd df = pd.DataFrame({ 'id': [1001, 1002, 1003], 'x1': [523.41, 54.832, 48.3824], 'x2': [639673, 33746, 45196], 'x3': [1222.13, 615.443, 934.142] })

G.G · Accepted Answer · 2022-11-08 08:23:20Z

df1.merge(pd.Series(range(2020, 2022),name='col1'),how='cross') id x1 x2 x3 col1 0 1001 523.4100 639673 1222.130 2020 1 1001 523.4100 639673 1222.130 2021 2 1002 54.8320 33746 615.443 2020 3 1002 54.8320 33746 615.443 2021 4 1003 48.3824 45196 934.142 2020 5 1003 48.3824 45196 934.142 2021

Collectives™ on Stack Overflow

Pandas group by column and add numbered rows to each group

3 Answers 3

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Related