2

I have a dataset like this:

ISIN,"MIC","Datum","Open","Hoog","Laag","Close","Number of Shares","Number of Trades","Turnover","Valuta" NL0011821202,"Euronext Amsterdam Brussels","04/09/2017","14.82","14.95","14.785","14.855","7482805","6970","111345512.83","EUR" NL0011821202,"Euronext Amsterdam Brussels","05/09/2017","14.91","14.92","14.585","14.655","15240971","12549","224265257.14","EUR" NL0011821202,"Euronext Amsterdam Brussels","07/09/2017","14.69","14.74","14.535","14.595","15544695","15817","227478163.74","EUR" 

However I can't read the file in correctly with pd.read_csv('filename.csv') I've tried all kinds of combinations like:

 sep='"', delimiter="," 

But with no luck at all! I want the first line to be the columns and the quote characters and comma to be removed.

How do I go about this efficiently?

4
  • 1
    For me working very nice pd.read_csv('filename.csv'), is possible share real sample file with 3 rows, e.g. by dropbox, gdocs or similar? Commented Sep 4, 2018 at 12:26
  • 1
    Could try pd.read_csv('filename.csv', sep=',', quotechar='"') ? Commented Sep 4, 2018 at 12:30
  • Btw, tested in pandas 0.23.4 Commented Sep 4, 2018 at 12:30
  • @ChrisA Doesn't work. Commented Sep 4, 2018 at 12:36

1 Answer 1

5

Problem is there is sometimes double ", solution is change separator for match zero or more " before and after ,:

df = pd.read_csv('ING_DAILY - ING_DAILY.csv', sep='["]*,["]*', engine='python') 

Then is necessary remove " from columns names and from first and last columns:

df.columns = df.columns.str.strip('"') df.iloc[:, [0,-1]] = df.iloc[:, [0,-1]].apply(lambda x: x.str.strip('"')) print (df.head(3)) ISIN MIC Datum Open Hoog \ 0 NL0011821202 Euronext Amsterdam Brussels 04/09/2017 14.82 14.950 1 NL0011821202 Euronext Amsterdam Brussels 05/09/2017 14.91 14.920 2 NL0011821202 Euronext Amsterdam Brussels 06/09/2017 14.69 14.725 Laag Close Number of Shares Number of Trades Turnover Valuta 0 14.785 14.855 7482805 6970 1.113455e+08 EUR 1 14.585 14.655 15240971 12549 2.242653e+08 EUR 2 14.570 14.615 14851426 15303 2.175316e+08 EUR 
Sign up to request clarification or add additional context in comments.

1 Comment

this unfortunately doesn't work with low_memory=True, because that parameter uses C, while sep='["]*,["]*' only uses python engine

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.