2

I am trying to calculate time duration inside of each sliding window for this data:

 ID DATE 2017-05-17 15:49:51 2 2017-05-17 15:49:52 5 2017-05-17 15:49:55 2 2017-05-17 15:49:56 3 2017-05-17 15:49:58 5 2017-05-17 15:49:59 5 

In this example DATE is the index, and I am trying to get the duration inside rolling window of size 3 which overlap each other. Answer should be like this:

 ID duration DATE 2017-05-17 15:49:51 2 4 2017-05-17 15:49:52 5 4 2017-05-17 15:49:55 2 3 2017-05-17 15:49:56 3 3 2017-05-17 15:49:58 5 NaN 2017-05-17 15:49:59 5 NaN 

I tried:

df['duration'] = df.rolling(window=3).apply(df.index.max()-df.index.min()) 

But I got this error:

TypeError: 'DatetimeIndex' object is not callable 
6
  • 1
    try df['duration'] = df.rolling(window=3).apply(lambda x: x.index.max()-x.index.min()) Commented Sep 11, 2017 at 8:46
  • I did that before, I got this error AttributeError: 'numpy.ndarray' object has no attribute 'index' Commented Sep 11, 2017 at 8:49
  • Related: stackoverflow.com/questions/37486502/… Commented Sep 11, 2017 at 8:53
  • I also try this df['duration'] = df.rolling(5).apply(lambda x: pd.to_datetime(x.index.max()) - pd.to_datetime(x.index.min())) Got the same error AttributeError: 'numpy.ndarray' object has no attribute 'index' Commented Sep 11, 2017 at 8:53
  • As the linked question explains, rolling works on a numpy array, not a dataframe, so you do not have access to all the pandas functionality inside. You have to find a workaround based on array-indexing. Commented Sep 11, 2017 at 8:57

2 Answers 2

4
df.reset_index(inplace=True) df['PREVIOUS_TIME']= df.DATE.shift(-2) df['duration']=(df.PREVIOUS_TIME-df.DATE)/np.timedelta64(1,'s') df.drop('PREVIOUS_TIME',axis=1,inplace=True) df.set_index('DATE',inplace=True) 

Assuming that 'DATE' is a datetime.

Sign up to request clarification or add additional context in comments.

5 Comments

DATE is the index so I can't call df.DATE.shift(-3)
df.DATE.reset_index(inplace=True); Afterwards df.set_index('DATE',inplace=True)
It doesn't give me the answer I'm looking for. The time sliding windows overlap each other if you look at my example: In Window-1: 15:49:55 - 15:49:51 = 4 In window-2: 15:49:56 - 15:49:52 = 4 In window-3: 15:49:58 - 15:49:55 = 3 and so on.
Ah, ok, then you should do shift -2 instead of shift -3
Thanks so much, I spent too much time to figure this up
0
def timediff(time_window: pd.Series) -> float: duration = time_window.index.max() - time_window.index.min() return duration.total_seconds() df['duration'] = np.nan df['duration'] = df.duration.rolling(window=3).apply(func=timediff, raw=False) 

I've just stumbled across this question and wanted to provide a solution using the rolling window approach:
with raw=False (default) you provide a Series to the function, so you can use index.max() - index.min() or index[-1] - index[0]
The only problem is that you need to return a number and not a timedelta object.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.