107

I have a pandas dataframe:

 lat lng alt days date time 0 40.003834 116.321462 211 39745.175405 2008-10-24 04:12:35 1 40.003783 116.321431 201 39745.175463 2008-10-24 04:12:40 2 40.003690 116.321429 203 39745.175521 2008-10-24 04:12:45 3 40.003589 116.321427 194 39745.175579 2008-10-24 04:12:50 4 40.003522 116.321412 190 39745.175637 2008-10-24 04:12:55 5 40.003509 116.321484 188 39745.175694 2008-10-24 04:13:00 

For which I am trying to convert the df['date'] and df['time'] columns into a datetime. I can do:

df['Datetime'] = pd.to_datetime(df['date']+df['time']) df = df.set_index(['Datetime']) del df['date'] del df['time'] 

And I get:

 lat lng alt days Datetime 2008-10-2404:12:35 40.003834 116.321462 211 39745.175405 2008-10-2404:12:40 40.003783 116.321431 201 39745.175463 2008-10-2404:12:45 40.003690 116.321429 203 39745.175521 2008-10-2404:12:50 40.003589 116.321427 194 39745.175579 2008-10-2404:12:55 40.003522 116.321412 190 39745.175637 

But then if I try:

df.between_time(time(1),time(22,59,59))['lng'].std() 

I get an error - 'TypeError: Index must be DatetimeIndex'

So, I've also tried setting the DatetimeIndex:

df['Datetime'] = pd.to_datetime(df['date']+df['time']) #df = df.set_index(['Datetime']) df = df.set_index(pd.DatetimeIndex(df['Datetime'])) del df['date'] del df['time'] 

And this throws an error also - 'DateParseError: unknown string format'

How do I create the datetime column and DatetimeIndex correctly so that df.between_time() works right?

1
  • The 'DateParseError: unknown string format' is that it cannot figure out the "2008-10-2404:12:35" format since the 'DD' and 'HH' are adjacent. Commented Oct 9, 2019 at 19:57

5 Answers 5

138

To simplify Kirubaharan's answer a bit:

df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time']) df = df.set_index('Datetime') 

And to get rid of unwanted columns (as OP did but did not specify per se in the question):

df = df.drop(['date','time'], axis=1) 
Sign up to request clarification or add additional context in comments.

2 Comments

So the trick here is adding a space between the date and time and then the pd.to_datetime() Does The Right Thing with the resultant strings?
Exactly, there are required formats
88

You are not creating datetime index properly,

format = '%Y-%m-%d %H:%M:%S' df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'], format=format) df = df.set_index(pd.DatetimeIndex(df['Datetime'])) 

Comments

4

You may also want to set inplace=True. This way it returns the same df

df["datetime"] = pd.to_datetime(df["date"] + " " + df["time"], format = "%Y-%m-%d %H:%M:%S") df.set_index(["datetime"], inplace=True) 

2 Comments

if inplace=True, does it really return anything? can we simply remove the assignment operator and just use the right-hand-side?
@MJK In place prevents you from creating df object by performing the operation on the same df. To use the assignment on the right, you'd have to type in the operation as the first argument to df.set_index, It is cleaner to use the assignment operator first.
1

This worked best for me:

format = '%Y-%m-%d%H:%M:%S' df['Datetime'] = pd.to_datetime(df['date'] + df['time'].astype("string"), format=format) 

In some cases Python treats df['date'] as column of integers.

Comments

1

I had trouble with setting a column formatted as YYYY-MM-DD as a date time index column in a data frame I needed for time series forecasting. This is how I solved it for a dateframe where I wanted "dateCol" to be the datetime index:

idx = pd.DatetimeIndex(self.df[dateCol]) self.df = self.df.set_index(idx) 

Then to drop the column so it's not duplicated in the dataframe

self.df = self.df.drop(dateCol, axis=1) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.