I have a csv file, where each line is enclosed with double quotes, where the first field is not enclosed in double quotes, but all other fields are, like this -
"Col1,""Col2"",""Col3"",""Col4"",""Col5"",""Col6"",""Col7""" "1,""entry "",""entry "",""entry"",""entry"",""entry"",""entry""" "2,""entry "",""entry "",""entry"",""entry"",""entry"",""entry""" and so forth and I'm trying to read it into a pandas dataframe. The desired output would be something like -
Col1 "Col2" "Col3" "Col4" "Col5" "Col6" "Col7" 0 1 entry entry entry entry entry entry 1 2 entry entry entry entry entry entry I ran the following command on terminal - file 'filename.csv', and the output is the following - ISO-8859 text, with very long lines, with CRLF line terminators
I tried various ways of changing the read_csv parameters, such as
input_data = pd.read_csv('filename.csv', sep = ',',encoding = 'iso-8859-1', engine = 'python')
The output of that is a data frame with 2 columns and 100+ rows, where the first column is empty, and the second column contains all the data I actually want, when in fact I want a dataframe with 7 columns and 100++ rows, something like -
I cannot post the real data because of confidentiality purposes unfortunately.
Could someone help me out here? The fix intuitively feels like a simple one, but I'm not sure what I'm missing?
"Col1,""Col2"",""Col3"",""Col4"",""Col5"",""Col6"",""Col7""". I suspect that's not actually your CSV's contents, but the result of aprintstatement?