1

I have a pandas code and work with lot of datafiles. I use the following code to convert time delta to date time index.

df['date_time'] = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"] df['date_time'] = pd.DatetimeIndex(df['date_time']) 

But one particular data file gives me the error:

raise e ValueError: Unknown string format 

What could be the reason behind this error? If it is due to a invalid data in the datafile, how to remove it?

3
  • Can you add some sample data? Commented Jul 10, 2017 at 13:51
  • The code actually works fine with most of the input data. But few input files shows this error. So i would like to know if it is due to the invalid data. If so, how to remove them Commented Jul 10, 2017 at 13:54
  • Not uncommon in my experience, especially if the data was loaded with an unknown encoding scheme. What happens if you first run it through a pd.to_datetime(df['date_time'])? In my experience, if you can isolate the offending string, you'll have your answer. Commented Jul 10, 2017 at 13:58

1 Answer 1

1

I think you need parameter errors='coerce' for convert non datetime to NaT in to_datetime:

df['date_time'] = pd.to_datetime(df['date_time'], errors='coerce') 

And then if need remove all rows with NaT use dropna:

df = df.dropna(subset=['date_time']) 

Sample:

a = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00", "2016-05-24 12:50:00","2016-05-25 23:00:00","aaa"] df = pd.DataFrame({'date_time':a}) print (df) date_time 0 2016-05-19 08:25:00 1 2016-05-19 16:00:00 2 2016-05-20 07:45:00 3 2016-05-24 12:50:00 4 2016-05-25 23:00:00 5 aaa df['date_time'] = pd.to_datetime(df['date_time'], errors='coerce') print (df) date_time 0 2016-05-19 08:25:00 1 2016-05-19 16:00:00 2 2016-05-20 07:45:00 3 2016-05-24 12:50:00 4 2016-05-25 23:00:00 5 NaT df = df.dropna(subset=['date_time']) print (df) date_time 0 2016-05-19 08:25:00 1 2016-05-19 16:00:00 2 2016-05-20 07:45:00 3 2016-05-24 12:50:00 4 2016-05-25 23:00:00 
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you for your help ! but errors = "coerce" didnt improve the code :( and when i tried dropna, i got the following error : ValueError: No axis named date_time for object type <class 'pandas.core.frame.DataFrame'>
why it did not improve? I add parameter subset to dropna, I hope now it works nice.
subset worked :) but still same error! raise e ValueError: Unknown string format. looks like i have to manually check the datafile !
Hmmm, if use errors='coerce' get Unknown string format ?
yes i do get the error ! in spite of errors='coerce' and dropna ! :/
|