0

I have the following df from a vendor:

Unnamed: 0 Unnamed: 1 Unnamed: 2 agg metrics 10/20/22 10/20/22 10/21/22 10/21/22 title content title season episode start hours start hours book blue 1 3 2 2 5 2 movie orange 2 4 11 4 7 4 

I need the output like this:

title content title season episode date start hours book blue 1 3 10/20/22 2 2 book blue 1 3 10/21/22 5 2 movie orange 2 4 10/20/22 11 4 movie orange 2 4 10/21/22 7 4 
df = pd.read_csv('file') df = df.drop(labels=0, axis=0) df1 = df.melt(['Unnamed: 0','Unnamed: 1', 'Unnamed: 2', 'agg metrics'],var_name='Date', value_name='Value') 

but this doesn't return the proper output. apologies for not knowing how to represent this properly. hopefully my IP/OP helps.

Essentially, i'm having trouble transposing multiple headers.

Thanks for your help!

9
  • 1
    Please provide the original CSV in a text format. Commented Jan 11, 2023 at 16:17
  • 1
    Where are you getting dates from? Is there another dataframe? Commented Jan 11, 2023 at 16:27
  • @ScottBoston you're too fast! i was editing it when you commented. Commented Jan 11, 2023 at 16:29
  • @SomeDude you're too fast! i was editing it when you commented. Commented Jan 11, 2023 at 16:29
  • I m 2 secs earlier :) JK. The columns you have are multiindex? or the first row is only the column header? Commented Jan 11, 2023 at 16:30

2 Answers 2

1

You could do this and this is what QuangHoang's thought too I believe:

# Read csv with top two rows as headers resulting in multiindex, from your code I figure # you are not doing that. df = pd.read_csv( StringIO( """ Unnamed: 0,Unnamed: 1,Unnamed: 2,agg metrics,10/20/22,10/20/22,10/21/22,10/21/22 title,content title,season,episode,start,hours,start,hours book,blue,1,3,2,2,5,2 movie,orange,2,4,11,4,7,4 """ ), header=[0, 1], ) # Then filter columns that are date like and stack at level 0 and reset_index t = df.filter(regex="\d+/\d+/\d+") t1 = t.stack(0).rename_axis(["", "date"]).reset_index(1) # Then get other columns and reindex to the index of the intermediate output you got above. t2 = df[df.columns.difference(t.columns)].droplevel(0, axis=1).reindex(t1.index) # Then concat both along axis 1 out = pd.concat([t2, t1], axis=1) print(out) title content title season episode date hours start 0 book blue 1 3 10/20/22 2 2 0 book blue 1 3 10/21/22 2 5 1 movie orange 2 4 10/20/22 4 11 1 movie orange 2 4 10/21/22 4 7 
Sign up to request clarification or add additional context in comments.

6 Comments

I am getting this error when running t2: AttributeError: 'DataFrame' object has no attribute 'droplevel'
What version of pandas is yours? pd.__version__
'0.23.0'. it did appear to work without droplevel. t2 = df[df.columns.difference(t.columns)].reindex(t1.index)
Not sure why it worked and how. But I highly recommend upgrading your pandas, that's a very old version and not just for this activity but I m sure there are several performance enhancements done after that version.
Ok so you could get what you wanted then?
|
1

Here's an example of what I mean:

# mock csv file with StringIO s = StringIO(''' Unnamed: 0 Unnamed: 1 Unnamed: 2 agg metrics 10/20/22 10/20/22 10/21/22 10/21/22 title content title season episode start hours start hours book blue 1 3 2 2 5 2 movie orange 2 4 11 4 7 4 ''') # forget `sep` argument if your file is Comma Separated Value df = pd.read_csv(s, sep='\s\s+', header=[0,1], index_col=[0,1,2,3]) df.stack(level=0).reset_index() 

Output (rename your columns accordingly):

title level_0 level_1 level_2 level_3 Unnamed: 0 hours start 0 book blue 1 3 10/20/22 2 2 1 book blue 1 3 10/21/22 2 5 2 movie orange 2 4 10/20/22 4 11 3 movie orange 2 4 10/21/22 4 7 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.