So I have a csv dataset that by my book is well formed, and I'm trying to get the pandas package to load it correctly. The header consists of 5 column names , but the final column consists of JSON objects which contain unescaped commas. e.g.
A,B,C,D,E 1,2,3,4,{K1:V1,K2:V2} I'm loading my data with a simple training = pd.read_csv('data/training.dat')
however, pandas is clearly misinterpreting the additional commas as new unlabeled columns, and I'm getting an error like this:
CParserError: Error tokenizing data. C error: Expected 75 fields in line 3, saw 84 I'm trying to navigate the docs, but clearly failing, does anyone know how to correctly configure the pd.read_csv command to parse it correctly?
I guess the alternative is I could hack together a script that flattens the JSON objects using a union of their keys as columns.