0

I have the following data stored in a pandas.DataFrame object named df. The column id is a unique identifier and the remaining columns are irrelevant for the purpose of this question.

id x1 x2 x3
0 01001 523.41 639673 1222.13
1 01002 54.832 33746 615.443
2 01003 48.3824 45196 934.142

I want to know if there's a way to group by id and use assign to add multiple numbered rows to each group.

In other words, I want to use range to add an arbitrary number of rows to each id. The desired result looks as follows:

id x1 x2 x3 new_col
0 01001 523.41 639673 1222.13 2020
0 01001 523.41 639673 1222.13 2021
1 01002 54.832 33746 615.443 2020
1 01002 54.832 33746 615.443 2021
2 01003 48.3824 45196 934.142 2020
2 01003 48.3824 45196 934.142 2021

I was hoping something along these lines would work.

df = df.groupby('id').assign(new_col=range(2020, 2022)) 

3 Answers 3

4

You can repeat the index and use numpy.tile to tile the new data:

import numpy as np new = ['2020', '2021'] df2 = df.loc[df.index.repeat(len(new))] df2['new_col'] = np.tile(new, len(df)) 

Output:

 id x1 x2 x3 new_col 0 1001 523.4100 639673 1222.130 2020 0 1001 523.4100 639673 1222.130 2021 1 1002 54.8320 33746 615.443 2020 1 1002 54.8320 33746 615.443 2021 2 1003 48.3824 45196 934.142 2020 2 1003 48.3824 45196 934.142 2021 

Alternatively:

new = range(2020, 2022) df2 = df.loc[df.index.repeat(len(new))] df2['new_col'] = new.start + df2.groupby('id').cumcount() 
Sign up to request clarification or add additional context in comments.

1 Comment

I added an alternative solution based on df.index.repeat. It's very similar to your solution, but it is slightly closer to what I originally had in mind.
2

We can also create a MultiIndex.from_product and reindex relative to level=0, to scale up the DataFrame, and reset_index to convert the inner level of the MultiIndex into a column:

df = df.reindex( index=pd.MultiIndex.from_product( [df.index, range(2020, 2022)], names=[None, 'new_col'] ), level=0 ).reset_index(level='new_col') 

df:

 new_col id x1 x2 x3 0 2020 1001 523.4100 639673 1222.130 0 2021 1001 523.4100 639673 1222.130 1 2020 1002 54.8320 33746 615.443 1 2021 1002 54.8320 33746 615.443 2 2020 1003 48.3824 45196 934.142 2 2021 1003 48.3824 45196 934.142 

We can reorder new_col to the end if needed:

df = df.reindex( index=pd.MultiIndex.from_product( [df.index, range(2020, 2022)], names=[None, 'new_col'] ), level=0 ).reset_index(level='new_col') # Re-order columns df = df.reindex( columns=df.columns[df.columns != 'new_col'].union(['new_col'], sort=False) ) 

df:

 id x1 x2 x3 new_col 0 1001 523.4100 639673 1222.130 2020 0 1001 523.4100 639673 1222.130 2021 1 1002 54.8320 33746 615.443 2020 1 1002 54.8320 33746 615.443 2021 2 1003 48.3824 45196 934.142 2020 2 1003 48.3824 45196 934.142 2021 

Setup:

import pandas as pd df = pd.DataFrame({ 'id': [1001, 1002, 1003], 'x1': [523.41, 54.832, 48.3824], 'x2': [639673, 33746, 45196], 'x3': [1222.13, 615.443, 934.142] }) 

Comments

0
df1.merge(pd.Series(range(2020, 2022),name='col1'),how='cross') id x1 x2 x3 col1 0 1001 523.4100 639673 1222.130 2020 1 1001 523.4100 639673 1222.130 2021 2 1002 54.8320 33746 615.443 2020 3 1002 54.8320 33746 615.443 2021 4 1003 48.3824 45196 934.142 2020 5 1003 48.3824 45196 934.142 2021 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.