I have a dataframe mentioned below:
ETHNIC SEX USUBJID 0 HISPANIC OR LATINO F 16 1 HISPANIC OR LATINO M 8 2 HISPANIC OR LATINO Total__##!!?? 24 3 NOT HISPANIC OR LATINO F 25 4 NOT HISPANIC OR LATINO M 18 5 NOT HISPANIC OR LATINO Total__##!!?? 43 6 Total__##!!?? F 41 7 Total__##!!?? M 26 8 Total__##!!?? Total__##!!?? 67 Just copy above dataframe to clipboard and execute df = pd.read_clipboard('\s\s+') to load above dataframe.
I'm trying to transform it to following dataframe:
stacked USUBJID 0 HISPANIC OR LATINO NaN <----- 0 F 16 1 M 8 2 Total__##!!?? 24 0 NOT HISPANIC OR LATINO NaN <----- 3 F 25 4 M 18 5 Total__##!!?? 43 0 Total__##!!?? NaN <----- 6 F 41 7 M 26 8 Total__##!!?? 67 I want to stack ETHNIC and SEX columns together under the value of ETHNIC column for each unique values in ETHNIC column.
I was trying something like this, which works but is I don't think a robust solution. I was trying to split it up to n (where n is the number of unique values in EHTNIC column) dataframes in a list with an empty row for each of the dataframe slices, then finally concatenating the list of the dataframe slices and doing the rest works.
cols = ['ETHNIC', 'SEX'] results = [] for v in df[cols[0]].unique(): results.append(pd.DataFrame([[None]*df.shape[1]], columns=df.columns)) results.append(df[df[cols[0]].eq(v)]) results = pd.concat(results) results[cols[0]] = results[cols[0]].bfill() results['stacked'] = results.apply(lambda x: x['SEX'] if x['SEX'] else x['ETHNIC'], axis=1) results = results.drop(columns=cols)[['stacked', 'USUBJID']]
df.to_dict('records'). I am having difficulty in copyinng the shared datadf = pd.read_clipboard('\s\s+')I tested it is working, and if it still doesn't work for you then let me know, I'll add it asdict