3

What I am trying to do

I am using Py Arrow to parse data from a csv (originally from a Postgres database). I am having issues parsing a timestamp (with a timezone) that looks like 2017-08-19 14:22:11.802755+00.

I am then receiving an error that looks like:

pyarrow.lib.ArrowInvalid: In CSV column #11: CSV conversion error to timestamp[ns]: invalid value '2017-08-19 12:22:11.802755+00'

What I have tried to do

I tried using a specified parser for the data, so this is how I read the csv (snippet for brevity):

 arrow_table = arrow_csv.read_csv( input_file=input_buffer, convert_options=arrow_csv.ConvertOptions( timestamp_parsers=[ISO8601, "%Y-%m-%d %H:%M:%S.%6N %z"],# I have also tried omitting this column_types=arrow_schema, strings_can_be_null=True, true_values=['t'], false_values=['f'], ) ) 

Not that in column_types I map the column that I want to parse like (I am mapping Postgres types to Arrow types, which works for all other types except for this):

timestamp with time zone': pa.timestamp('ns', tz="+00:00") 

But none of that seems to work. I'm happy to provide further information if needed.

1 Answer 1

2

Unfortuantely, Arrow's IOS8601 parser does not support offset strings. The strptime parser is based on the 2008 POSIX definition of strptime via vendored musl which does not support %z. Some implementations of strptime do include support (e.g. the libc implementation)

This seems like a valid feature request for either parser. I've filed ARROW-13348 to track this.

For workarounds, your best bet may be parsing the column as strings and using some other library (pandas?) to convert to Timestamps.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.