0

I'm having troubles with datatype inference and therefore I decided to map datatype manually, however I have discovered I can't recognize Pandas datatype as expected.

import pyarrow as pa import pyarrow.parquet as pq import pandas as pd pd_to_pa_dtypes = {"object": pa.string(), "string": pa.string(), "double": pa.float64(), pd.Int32Dtype(): pa.int32(), "int64": pa.int64(), pd.Int64Dtype(): pa.int64(), "datetime64[ns]": pa.timestamp("ns", tz="UTC"), pd.StringDtype(): pa.string(), '<M8[ns]': pa.timestamp("ns", tz="UTC")} date = pd.to_datetime(["30/04/2021", "28/04/2021"], format="%d/%m/%Y") df = pd.DataFrame(date) print(df[0].dtype) # which print datetime64[ns] pd_to_pa_dtypes[df[0].dtype] # KeyError: dtype('<M8[ns]') # However I inserted in my dictionary both "datetime64[ns]" and "<M8[ns]", also if i check datatypes df[0].dtype == "datetime64[ns]" # True df[0].dtype == "<M8[ns]" # True 

This happens also for some other datatypes, for example I had problems with int64, while some are mapped as expected.

1
  • 1
    try pd_to_pa_dtypes[df[0].dtype.str]. Commented Apr 30, 2021 at 12:27

2 Answers 2

1

The problem is:

type(df[0].dtype) # is numpy.dtype and not str 

therefore when you try to access the value in pd_to_pa_dtypes dictionary you get an error as df[0].dtype it is not the same as the key str you have in pd_to_pa_dtypes.

Now to your next potential question, you get true when your run the followings due to the implementation of __eq__ of dtype class.

df[0].dtype == "datetime64[ns]" # True df[0].dtype == "<M8[ns]" # True 

So to conclude with, use the str representation of your dtype object as follows:

pd_to_pa_dtypes[df[0].dtype.str] 
Sign up to request clarification or add additional context in comments.

Comments

0

what happens if you try convert_dtypes() in Pandas to Convert columns to best possible dtypes using dtype. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.convert_dtypes.html

1 Comment

df = pd.DataFrame(date, dtype="datetime64[ns]") pd_to_pa_dtypes[df[0].dtype] Still gives me error, i'm not mapping correctly datatypes into my dictionary

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.