Skip to content

BUG: df.apply handles np.timedelta64 as timestamp, should be timedelta #7778

@stharrold

Description

@stharrold

I think there may be a bug with the row-wise handling of numpy.timedelta64 data types when using DataFrame.apply. As a check, the problem does not appear when using DataFrame.applymap. The problem may be related to #4532, but I'm unsure. I've included an example below.

This is only a minor problem for my use-case, which is cross-checking timestamps from a counter/timer card. I can easily work around the issue with DataFrame.itertuples etc.

Thank you for your time and for making such a useful package!

Example

Version

Import and check versions.

$ date Thu Jul 17 16:28:38 CDT 2014 $ conda update pandas Fetching package metadata: .. # All requested packages already installed. # packages in environment at /Users/harrold/anaconda: # pandas 0.14.1 np18py27_0 $ ipython Python 2.7.8 |Anaconda 2.0.1 (x86_64)| (default, Jul 2 2014, 15:36:00) Type "copyright", "credits" or "license" for more information. IPython 2.1.0 -- An enhanced Interactive Python. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: from __future__ import print_function In [2]: import numpy as np In [3]: import pandas as pd In [4]: pd.util.print_versions.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 2.7.8.final.0 python-bits: 64 OS: Darwin OS-release: 11.4.2 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 pandas: 0.14.1 nose: 1.3.3 Cython: 0.20.1 numpy: 1.8.1 scipy: 0.14.0 statsmodels: 0.5.0 IPython: 2.1.0 sphinx: 1.2.2 patsy: 0.2.1 scikits.timeseries: None dateutil: 1.5 pytz: 2014.4 bottleneck: None tables: 3.1.1 numexpr: 2.3.1 matplotlib: 1.3.1 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: 0.7.5 xlsxwriter: 0.5.5 lxml: 3.3.5 bs4: 4.3.1 html5lib: 0.999 httplib2: 0.8 apiclient: 1.2 rpy2: None sqlalchemy: 0.9.4 pymysql: None psycopg2: None 
Create test data

Using subset of original raw data as example.

In [5]: datetime_start = np.datetime64(u'2014-05-31T01:23:19.9600345Z') In [6]: timedeltas_elapsed = [30053400, 40053249, 50053098] 

Compute datetimes from elapsed timedeltas, then create differential timedeltas from datetimes. All elements are either type numpy.datetime64 or numpy.timedelta64.

In [7]: df = pd.DataFrame(dict(datetimes = timedeltas_elapsed)) In [8]: df = df.applymap(lambda elt: np.timedelta64(elt, 'us')) In [9]: df = df.applymap(lambda elt: np.datetime64(datetime_start + elt)) In [10]: df['differential_timedeltas'] = df['datetimes'] - df['datetimes'].shift() In [11]: print(df) datetimes differential_timedeltas 0 2014-05-31 01:23:50.013434500 NaT 1 2014-05-31 01:24:00.013283500 00:00:09.999849 2 2014-05-31 01:24:10.013132500 00:00:09.999849 
Expected behavior

With element-wise handling using DataFrame.applymap, all elements are correctly identified as datetimes (timestamps) or timedeltas.

In [12]: print(df.applymap(lambda elt: type(elt))) datetimes differential_timedeltas 0 <class 'pandas.tslib.Timestamp'> <type 'numpy.timedelta64'> 1 <class 'pandas.tslib.Timestamp'> <type 'numpy.timedelta64'> 2 <class 'pandas.tslib.Timestamp'> <type 'numpy.timedelta64'> 
Bug

With row-wise handling using DataFrame.apply, all elements are type pandas.tslib.Timestamp. I expected 'differential_timedeltas' to be type numpy.timedelta64 or another type of timedelta, not a type of datetime (timestamp).

In [13]: # For 'datetimes': In [14]: print(df.apply(lambda row: type(row['datetimes']), axis=1)) 0 <class 'pandas.tslib.Timestamp'> 1 <class 'pandas.tslib.Timestamp'> 2 <class 'pandas.tslib.Timestamp'> dtype: object In [15]: # For 'differential_timedeltas': In [16]: print(df.apply(lambda row: type(row['differential_timedeltas']), axis=1)) 0 <class 'pandas.tslib.NaTType'> 1 <class 'pandas.tslib.Timestamp'> 2 <class 'pandas.tslib.Timestamp'> dtype: object 

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeDtype ConversionsUnexpected or buggy dtype conversionsTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions