I often load in csv files using pd.read_csv(), and more often than not they have columns with differing datatypes.
This is fine, as I can pass a dictionary to the dtype argument with all of the columns mapped out with their respective data types. The problem I'm finding is that, occasionally, these csv files have got a lot of columns, and the resulting dictionary is extremely long.
Often times, the dictionary will look like this:
df_dtype = { 'A' : str, 'B' : str, 'C' : int } But when the df is long, the dictionary starts to look like this:
df_dtype = { 'A' : str, 'B' : str, 'C' : int, 'D' : str, 'E' : str, 'F' : int, 'G' : str, 'H' : str, 'I' : int, 'J' : str, 'K' : str, 'L' : int, 'M' : str, 'N' : str, 'O' : int, 'P' : str, 'Q' : str, 'R' : int, 'S' : str, 'T' : str, 'U' : int, 'V' : str, 'W' : str, 'X' : int, 'Y' : str, 'Z' : str } Which is ugly and makes the code less readable.
What is the best practice for doing this? Should I make the dictionary as a separate file in the directory? Is there a prettier way to format it?