FutureWarning: Support for nested sequences for 'parse_dates' in pd.read_csv is deprecated. How to combine date and time columns with pd.to_datetime?

Question

Here is an example of my .csv file:

date, time, value 20240112,085917,11 20240112,085917,22

I used to import it to DataFrame with the following way:

df = pd.read_csv(csv_file, parse_dates=[['date', 'time']]).set_index('date_time')

And I was getting the following structure:

date_time value 2023-10-02 10:00:00 11 2023-10-02 10:01:00 22

Now after updating to Pandas 2.2.0 I started to get this error:

FutureWarning: Support for nested sequences for 'parse_dates' in pd.read_csv is deprecated. Combine the desired columns with pd.to_datetime after parsing instead.

So in order to achieve the same result now I have to do:

df['datetime'] = df.date.astype(str) + ' ' + df.time.astype(str) df['datetime'] = pd.to_datetime(df.datetime, format="%Y%m%d %H%M%S") df = df.drop(['date', 'time'], axis=1).set_index('datetime')

Is there any way to do it in the new versions of Pandas without strings concatenations which are very slow usually?

not that I'm aware of but you can accomplish it all in one line: df["datetime"] = pd.to_datetime(arg=df.pop("date").str.cat(df.pop("time"), sep=" ")) — Jason Baker
– Jason Baker, Commented Feb 12, 2024 at 18:44
Why oh why, does this new to_datetime solutions read much worse than the now deprecated feature of read_csv? — tiagoams
– tiagoams, Commented May 22, 2024 at 14:54

mozway · Accepted Answer · 2024-02-12 18:56:02Z

Since parsing the date will involve strings anyway and given your time format without separator, this seems like the most reasonable option.

You could simplify your code to read the columns as string directly and to pop the columns:

df = pd.read_csv(csv_file, sep=', *', engine='python', dtype={'date': str, 'time': str}) df['datetime'] = pd.to_datetime(df.pop('date')+' '+df.pop('time'), format="%Y%m%d %H%M%S") df = df.set_index('datetime')

NB. if your days and hours/minutes/seconds are reliably padded with zeros, you can use df.pop('date')+df.pop('time') and format="%Y%m%d%H%M%S".

Output:

 value datetime 2024-01-12 08:59:17 11 2024-01-12 08:59:17 22

A variant with numeric operations and a timedelta:

df = pd.read_csv(csv_file, sep=', *', engine='python', dtype={'date': str}) a = df.pop('time').to_numpy() a, s = np.divmod(a, 100) h, m = np.divmod(a, 100) df['datetime'] = (pd.to_datetime(df.pop('date')) +pd.to_timedelta(h*3600+m*60+s, unit='s') )

which is actually much slower (27.7 ms ± 4.11 ms per loop vs 350 µs ± 44.5 µs per loop for the string approach)

Collectives™ on Stack Overflow

FutureWarning: Support for nested sequences for 'parse_dates' in pd.read_csv is deprecated. How to combine date and time columns with pd.to_datetime?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related