What is the best way of defining a Pandas df datatypes as a dictionary when there are many columns?

Question

I often load in csv files using pd.read_csv(), and more often than not they have columns with differing datatypes.

This is fine, as I can pass a dictionary to the dtype argument with all of the columns mapped out with their respective data types. The problem I'm finding is that, occasionally, these csv files have got a lot of columns, and the resulting dictionary is extremely long.

Often times, the dictionary will look like this:

 df_dtype = { 'A' : str, 'B' : str, 'C' : int }

But when the df is long, the dictionary starts to look like this:

 df_dtype = { 'A' : str, 'B' : str, 'C' : int, 'D' : str, 'E' : str, 'F' : int, 'G' : str, 'H' : str, 'I' : int, 'J' : str, 'K' : str, 'L' : int, 'M' : str, 'N' : str, 'O' : int, 'P' : str, 'Q' : str, 'R' : int, 'S' : str, 'T' : str, 'U' : int, 'V' : str, 'W' : str, 'X' : int, 'Y' : str, 'Z' : str }

Which is ugly and makes the code less readable.

What is the best practice for doing this? Should I make the dictionary as a separate file in the directory? Is there a prettier way to format it?

jezrael · Accepted Answer · 2019-09-04 10:54:28Z

One idea is change format for specify type in key of dict and columns names in lists:

d_types = {str: ['A', 'B', 'D'], int: ['C','F'], float: ['G']} #swap key values in dict #http://stackoverflow.com/a/31674731/2901002 d = {k: oldk for oldk, oldv in d_types.items() for k in oldv} print (d) {'A': <class 'str'>, 'B': <class 'str'>, 'D': <class 'str'>, 'C': <class 'int'>, 'F': <class 'int'>, 'G': <class 'float'>}

Collectives™ on Stack Overflow

What is the best way of defining a Pandas df datatypes as a dictionary when there are many columns?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related