1

I am trying to parse a CSV file, but somehow pandas does not recognize the separator/delimiter. I have look at the similar replies around, but I still have not managed to parse my file correctly (only the header is parsed properly).

Each line of the file looks like this: https://drive.google.com/a/company.com/uc?export=download&id=10p-c0i2xtWBSvJ3OJV5pgEUarE1X,-1,"{""type"":""F03""}",0,0,"{}","{}"

The code I have tried is following:

In [0]: import pandas as pd In [1]: data = pd.read_csv('file.csv', sep=',') data.head() Out [1]: filename file_size file_attributes region_count region_id region_shape_attributes region_attributes 0 https://drive... NaN NaN NaN NaN NaN NaN 1 https://drive... NaN NaN NaN NaN NaN NaN 2 https://drive... NaN NaN NaN NaN NaN NaN 3 https://drive... NaN NaN NaN NaN NaN NaN 4 https://drive... NaN NaN NaN NaN NaN NaN In [2]: data['filename'][0] Out [2]: 'https://drive.google.com/a/company.com/uc?export=download&id=10p-c0i2xtWBSvJ3OJV5pgEUarE1X,-1,"{""type"":""F03""}",0,0,"{}","{}"' 
2
  • 1
    are the quotes part of that line? Since then that is logical, since it means the entire row is one string. Commented Aug 22, 2019 at 8:50
  • No, they are not. They are added by pandas when trying to parse. Commented Aug 22, 2019 at 9:24

1 Answer 1

1

Sorry, I did not manage to reproduce your issue. However, you can parse the columns from data data frame by the following piece of code.

df = data[['filename']] cols_to_extract = [ 'filename', 'file_size', 'file_attributes', 'region_count', 'region_id', 'region_shape_attributes', 'region_attributes'] df[cols_to_extract] = pd.DataFrame(df['filename'].str.split(',').tolist(), columns=cols_to_extract) df.head() 

Output should look like this:

 file_name file_size file_attributes region_count region_id region_shape_attributes region_attributes 0 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}" 1 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}" 2 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}" 3 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}" 4 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}" 

I hope it will be helpful.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.