1

I often load in csv files using pd.read_csv(), and more often than not they have columns with differing datatypes.

This is fine, as I can pass a dictionary to the dtype argument with all of the columns mapped out with their respective data types. The problem I'm finding is that, occasionally, these csv files have got a lot of columns, and the resulting dictionary is extremely long.

Often times, the dictionary will look like this:

 df_dtype = { 'A' : str, 'B' : str, 'C' : int } 

But when the df is long, the dictionary starts to look like this:

 df_dtype = { 'A' : str, 'B' : str, 'C' : int, 'D' : str, 'E' : str, 'F' : int, 'G' : str, 'H' : str, 'I' : int, 'J' : str, 'K' : str, 'L' : int, 'M' : str, 'N' : str, 'O' : int, 'P' : str, 'Q' : str, 'R' : int, 'S' : str, 'T' : str, 'U' : int, 'V' : str, 'W' : str, 'X' : int, 'Y' : str, 'Z' : str } 

Which is ugly and makes the code less readable.

What is the best practice for doing this? Should I make the dictionary as a separate file in the directory? Is there a prettier way to format it?

1 Answer 1

2

One idea is change format for specify type in key of dict and columns names in lists:

d_types = {str: ['A', 'B', 'D'], int: ['C','F'], float: ['G']} #swap key values in dict #http://stackoverflow.com/a/31674731/2901002 d = {k: oldk for oldk, oldv in d_types.items() for k in oldv} print (d) {'A': <class 'str'>, 'B': <class 'str'>, 'D': <class 'str'>, 'C': <class 'int'>, 'F': <class 'int'>, 'G': <class 'float'>} 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.