1

I am trying to generate a heatmap using seaborn, however I am having a small problem with the formatting of my data.

Currently, my data is in the form:

Name Diag Date A 1 2006-12-01 A 1 1994-02-12 A 2 2001-07-23 B 2 1999-09-12 B 1 2016-10-12 C 3 2010-01-20 C 2 1998-08-20 

I would like to create a heatmap (preferably in python) showing Name on one axis against Diag - if occured. I have tried to pivot the table using pd.pivot, however I was given the error

ValueError: Index contains duplicate entries, cannot reshape

this came from:

piv = df.pivot_table(index='Name',columns='Diag')

Time is irrelevant, but I would like to show which Names have had which Diag, and which Diag combos cluster together. Do I need to create a new table for this or is it possible for that I have? In some cases the Name is not associated with all Diag

EDIT: I have since tried: piv = df.pivot_table(index='Name',columns='Diag', values='Time', aggfunc='mean')

However as Time is in datetime format, I end up with:
pandas.core.base.DataError: No numeric types to aggregate

1
  • 1
    this question could greatly benefit from some code showing what you actually tried for your pivot syntax. Showing just the error leaves any of us reading it to have to guess quite a lot. Commented Apr 6, 2017 at 12:52

1 Answer 1

8

You need pivot_table with some aggregate function, because for same index and column have multiple values and pivot need unique values only:

print (df) Name Diag Time 0 A 1 12 <-duplicates for same A, 1 different value 1 A 1 13 <-duplicates for same A, 1 different value 2 A 2 14 3 B 2 18 4 B 1 1 5 C 3 9 6 C 2 8 df = df.pivot_table(index='Name',columns='Diag', values='Time', aggfunc='mean') print (df) Diag 1 2 3 Name A 12.5 14.0 NaN B 1.0 18.0 NaN C NaN 8.0 9.0 

Alternative solution:

df = df.groupby(['Name','Diag'])['Time'].mean().unstack() print (df) Diag 1 2 3 Name A 12.5 14.0 NaN B 1.0 18.0 NaN C NaN 8.0 9.0 

EDIT:

You can also check all duplicates by duplicated:

df = df.loc[df.duplicated(['Name','Diag'], keep=False), ['Name','Diag']] print (df) Name Diag 0 A 1 1 A 1 

EDIT:

mean of datetimes is not easy - need convert dates to nanoseconds, get mean and last convert to datetimes. Also there is another problem - need replace NaN to some scalar, e.g. 0 what is converted to 0 datetime - 1970-01-01.

df.Date = pd.to_datetime(df.Date) df['dates_in_ns'] = pd.Series(df.Date.values.astype(np.int64), index=df.index) df = df.pivot_table(index='Name', columns='Diag', values='dates_in_ns', aggfunc='mean', fill_value=0) df = df.apply(pd.to_datetime) print (df) Diag 1 2 3 Name A 2000-07-07 12:00:00 2001-07-23 1970-01-01 B 2016-10-12 00:00:00 1999-09-12 1970-01-01 C 1970-01-01 00:00:00 1998-08-20 2010-01-20 
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks! this is helpful. the problem now may be that time is actually in datatime format so is not numeric. perhaps I should just add a column on ones as a workaround?
I think it is same problem. But I have idea how check this values, give me a sec.
pandas.core.base.DataError: No numeric types to aggregate is my current error. Unfortunatley i didnt put the correct data types in the example for Time its in the form '2016-12-12'
Ok, no problem. Main question is - do you need mean?
Or only first value?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.