11

I have got a dataframe with a column of datetime64 type. In this column there are several rows with dates as 1999-09-09 23:59:59 where as they should have actually been represented as missing dates NaT. Somebody just decided to use this particular date to represent the missing data. Now I want these dates to be replaced as NaT (the missing date type for Pandas).

Also if I perform operation on this column with NaTs, like

df['date'] - df['column with missing date'] 

Does Pandas ignore the missing dates and maintain NaT for those rows or will it throw an error some thing like Null pointer exception in Java.

1
  • Does df.loc[df['date'] == '1999-09-09 23:59:59 ', 'date'] = pd.NaT work? Commented Jul 17, 2014 at 12:43

2 Answers 2

14
In [6]: import pandas as pd df = pd.DataFrame({'date':[pd.datetime(1999,9,9,23,59,59), pd.datetime(2014,1,1)]* 10}) df Out[6]: date 0 1999-09-09 23:59:59 1 2014-01-01 00:00:00 2 1999-09-09 23:59:59 3 2014-01-01 00:00:00 4 1999-09-09 23:59:59 5 2014-01-01 00:00:00 6 1999-09-09 23:59:59 7 2014-01-01 00:00:00 8 1999-09-09 23:59:59 9 2014-01-01 00:00:00 10 1999-09-09 23:59:59 11 2014-01-01 00:00:00 12 1999-09-09 23:59:59 13 2014-01-01 00:00:00 14 1999-09-09 23:59:59 15 2014-01-01 00:00:00 16 1999-09-09 23:59:59 17 2014-01-01 00:00:00 18 1999-09-09 23:59:59 19 2014-01-01 00:00:00 In [9]: import numpy as np df.loc[df['date'] == '1999-09-09 23:59:59 ', 'date'] = pd.NaT df Out[9]: date 0 NaT 1 2014-01-01 2 NaT 3 2014-01-01 4 NaT 5 2014-01-01 6 NaT 7 2014-01-01 8 NaT 9 2014-01-01 10 NaT 11 2014-01-01 12 NaT 13 2014-01-01 14 NaT 15 2014-01-01 16 NaT 17 2014-01-01 18 NaT 19 2014-01-01 

To answer your second question most pandas functions handle NaN's appropriately, you can always just drop them:

In [10]: df.dropna() Out[10]: date 1 2014-01-01 3 2014-01-01 5 2014-01-01 7 2014-01-01 9 2014-01-01 11 2014-01-01 13 2014-01-01 15 2014-01-01 17 2014-01-01 19 2014-01-01 

and perform the operation just on these rows

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the solution. The thing is that I didn't want to drop these NaT rows as I wanted to subtract this date column from another date column. It does the job and gives NaT values in the resulting columns wherever there was a NaT in the original column.
@user3527975 the point here is that dropna does not affect the original dataframe, this will only happen if you assign back to the original df like: df = df.dropna() or df.dropna(inplace=True)
1

There are some operations, especially between columns, that do not disconsider NaNs or NaTs. That is why you are getting NaTs as a result. If you want to disconsider the 1999-09-09 23:59:59 and also have a subtractable column, try to convert to NaTs and then swap the NaTs with zeros (.fillna(0)), so that, when subtracted, it will keep the value from the other column.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.