2

I have a file 'data.csv' that looks something like

ColA, ColB, ColC 1,2,3 4,5,6 7,8,9 

I want to open and read the file columns into lists, with the 1st entry of that list omitted, e.g.

dataA = [1,4,7] dataB = [2,5,8] dataC = [3,6,9] 

In reality there are more than 3 columns and the lists are very long, this is just an example of the format. I've tried:

csv_file = open('data.csv','rb') csv_array = [] for row in csv.reader(csv_file, delimiter=','): csv_array.append(row) 

Where I would then allocate each index of csv_array to a list, e.g.

dataA = [int(i) for i in csv_array[0]] 

But I'm getting errors:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? 

Also it feels like a very long winded way of just saving data to a few lists...

Thanks!

edit:

Here is how I solved it:

import pandas as pd df = pd.read_csv('data.csv', names = ['ColA','ColB','ColC'] dataA = map(int,(df.ColA.tolist())[1:3]) 

and repeat for the rest of the columns.

1
  • If you don't mind using imports, I would suggest using tablib which you can find here : docs.python-tablib.org/en/master Commented Oct 13, 2017 at 14:50

6 Answers 6

4

Just to spell this out for people trying to solve a similar problem, perhaps without Pandas, here's a simple refactoring with comments.

import csv # Open the file in 'r' mode, not 'rb' csv_file = open('data.csv','r') dataA = [] dataB = [] dataC = [] # Read off and discard first line, to skip headers csv_file.readline() # Split columns while reading for a, b, c in csv.reader(csv_file, delimiter=','): # Append each variable to a separate list dataA.append(a) dataB.append(b) dataC.append(c) 

This does nothing to convert the individual fields to numbers (use append(int(a)) etc if you want that) but should hopefully be explicit and flexible enough to show you how to adapt this to new requirements.

Sign up to request clarification or add additional context in comments.

Comments

3

Use Pandas:

import pandas as pd df = pd.DataFrame.from_csv(path) rows = df.apply(lambda x: x.tolist(), axis=1) 

2 Comments

Why use pandas when csv comes as batteries included ?
Deprecated since version 0.21.0: Use pandas.read_csv() instead. source For example: pd.read_csv(path)
3

To skip the header, create your reader on a seperate line. Then to convert from a list of rows to a list of columns, use zip():

import csv with open('data.csv', 'rb') as f_input: csv_input = csv.reader(f_input) header = next(csv_input) data = zip(*[map(int, row) for row in csv_input]) print data 

Giving you:

[(1, 4, 7), (2, 5, 8), (3, 6, 9)] 

So if needed:

dataA = data[0] 

Comments

0

Seems like you have OSX line endings in your csv file. Try saving the csv file as "Windows Comma Separated (.csv)" format.

There are also easier ways to do what you're doing with the csv reader:

csv_array = [] with open('data.csv', 'r') as csv_file: reader = csv.reader(csv_file) # remove headers reader.next() # loop over rows in the file, append them to your array. each row is already formatted as a list. for row in reader: csv_array.append(row) 

You can then set dataA = csv_array[0]

3 Comments

the file was exported from excel, it's just a plain .csv file as far as I can tell...
Ok this is almost working, but I'm getting the rows instead of the columns, i.e. dataA = [1,2,3] and I want dataA=[1,4,7]
Sorry for missing that part of the question. You can just use zip to transpose it: csv_array = zip(csv_array[0], csv_array[1], csv_array[2]. There might be easier ways, but this I think is the most verbose and easy to get.
0

First if you read the csv file with csv.reader(csv_file, delimiter=','), you will still read the header.

csv_array[0] will be the header row -> ['ColA', ' ColB', ' ColC']

Also if you're using mac, this issues is already referenced here: CSV new-line character seen in unquoted field error

And I would recommend using pandas&numpy instead if you will do more analysis using the data. It read the csv file to pandas dataframe. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Comments

0

use csv.DictReader() to select specific columns

dataA = [] dataB = [] with open('data.csv', 'r') as csv_file: csv_reader = csv.DictReader(csv_file, delimiter=',') for row in csv_reader: dataA.append(row['ColA']) dataB.append(row['ColB']) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.