How to infer types in pandas dataframe

Question

I have a dataframe which I read in using pyspark with:

df1 = spark.read.csv("/user/me/data/*").toPandas()

Unfortunately, pyspark leaves all the types as Object, even numerical values. I need to merge this with another dataframe I read in with df2 = pd.read_csv("file.csv") so I need the types in df1 to be inferred exactly as pandas would have done it.

How can you infer types of an existing pandas dataframe?

piRSquared · Accepted Answer · 2017-09-18 15:38:31Z

If you have the same column names you could use pd.DataFrame.astype:

df1 = df1.astype(df2.dtypes)

Otherwise, you need to construct a dictionary where keys are the column names in df1 and the values are dtypes. You can start with d = df2.dtypes.to_dict() to see what it should look like. Then construct a new dictionary altering the keys where needed.

Once you've constructed the dictionary d, use:

df1 = df1.astype(d)

Thank you for this. Will this also convert '' into NaN if the type is float? This is what pd.read_csv does.
I don't think it will. You need to manually convert that column with pd.to_numeric(df['mycol'], errors='coerce')

Collectives™ on Stack Overflow

How to infer types in pandas dataframe

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related