Currently, I have a dataframe that looks as such:
| abc | def | ghi | abc | def | ghi |
|---|---|---|---|---|---|
| 2 | 4 | 78 | 56 | 7 | 45 |
Is there a way to combine the columns that have the same name and create a new row for each set of values? Example:
| abc | def | ghi |
|---|---|---|
| 2 | 4 | 78 |
| 56 | 7 | 45 |
You can use .groupby(level=0, axis='columns') to assign a cumulative count and then perform a transformation based on that.
import pandas as pd new_cols = pd.MultiIndex.from_arrays([df.columns, df.groupby(level=0, axis=1).cumcount()]) out = df.set_axis(new_cols, axis=1).stack().reset_index(level=0, drop=True) print(out) abc def ghi 0 2 4 78 1 56 7 45 You can set up a MultiIndex with help of groupby.cumcount and stack:
(df .set_axis(pd.MultiIndex .from_arrays([df.columns, df.groupby(level=0, axis=1) .cumcount()]), axis=1) .stack() .droplevel(0) ) Output:
abc def ghi 0 2 4 78 1 56 7 45 One option is with pivot_longer from pyjanitor:
# pip install pyjanitor import pandas as pd import janitor df.pivot_longer(names_to = '.value', names_pattern = '(.+)') abc def ghi 0 2 4 78 1 56 7 45 In the above solution, the .value determines which parts of the column labels remain as headers - the labels are determined by the groups in the regular expression in names_pattern.
Another option would be to pass the names of the new columns to names_to, while passing a list of matching regular expressions to names_pattern :
df.pivot_longer(names_to = ['abc', 'def', 'ghi'], names_pattern = ['abc', 'def', 'ghi']) abc def ghi 0 2 4 78 1 56 7 45