Python read csv file columns into lists, ignoring headers

Question

I have a file 'data.csv' that looks something like

ColA, ColB, ColC 1,2,3 4,5,6 7,8,9

I want to open and read the file columns into lists, with the 1st entry of that list omitted, e.g.

dataA = [1,4,7] dataB = [2,5,8] dataC = [3,6,9]

In reality there are more than 3 columns and the lists are very long, this is just an example of the format. I've tried:

csv_file = open('data.csv','rb') csv_array = [] for row in csv.reader(csv_file, delimiter=','): csv_array.append(row)

Where I would then allocate each index of csv_array to a list, e.g.

dataA = [int(i) for i in csv_array[0]]

But I'm getting errors:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Also it feels like a very long winded way of just saving data to a few lists...

Thanks!

edit:

Here is how I solved it:

import pandas as pd df = pd.read_csv('data.csv', names = ['ColA','ColB','ColC'] dataA = map(int,(df.ColA.tolist())[1:3])

and repeat for the rest of the columns.

If you don't mind using imports, I would suggest using tablib which you can find here : docs.python-tablib.org/en/master — rak007
– rak007, Commented Oct 13, 2017 at 14:50

tripleee · Accepted Answer · 2020-01-12 12:18:50Z

Just to spell this out for people trying to solve a similar problem, perhaps without Pandas, here's a simple refactoring with comments.

import csv # Open the file in 'r' mode, not 'rb' csv_file = open('data.csv','r') dataA = [] dataB = [] dataC = [] # Read off and discard first line, to skip headers csv_file.readline() # Split columns while reading for a, b, c in csv.reader(csv_file, delimiter=','): # Append each variable to a separate list dataA.append(a) dataB.append(b) dataC.append(c)

This does nothing to convert the individual fields to numbers (use append(int(a)) etc if you want that) but should hopefully be explicit and flexible enough to show you how to adapt this to new requirements.

Jurgy · Accepted Answer · 2017-10-13 14:53:27Z

3

Use Pandas:

import pandas as pd df = pd.DataFrame.from_csv(path) rows = df.apply(lambda x: x.tolist(), axis=1)

answered Oct 13, 2017 at 14:53

Jurgy

2,3801 gold badge23 silver badges38 bronze badges

2 Comments

Thomas Junk Over a year ago

Why use pandas when csv comes as batteries included ?

Andy Over a year ago

Deprecated since version 0.21.0: Use pandas.read_csv() instead. source For example: pd.read_csv(path)

Martin Evans · Accepted Answer · 2017-10-13 14:52:58Z

To skip the header, create your reader on a seperate line. Then to convert from a list of rows to a list of columns, use zip():

import csv with open('data.csv', 'rb') as f_input: csv_input = csv.reader(f_input) header = next(csv_input) data = zip(*[map(int, row) for row in csv_input]) print data

Giving you:

[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

So if needed:

dataA = data[0]

pythomatic · Accepted Answer · 2017-10-13 14:53:58Z

Seems like you have OSX line endings in your csv file. Try saving the csv file as "Windows Comma Separated (.csv)" format.

There are also easier ways to do what you're doing with the csv reader:

csv_array = [] with open('data.csv', 'r') as csv_file: reader = csv.reader(csv_file) # remove headers reader.next() # loop over rows in the file, append them to your array. each row is already formatted as a list. for row in reader: csv_array.append(row)

You can then set dataA = csv_array[0]

the file was exported from excel, it's just a plain .csv file as far as I can tell...
Ok this is almost working, but I'm getting the rows instead of the columns, i.e. dataA = [1,2,3] and I want dataA=[1,4,7]
Sorry for missing that part of the question. You can just use zip to transpose it: csv_array = zip(csv_array[0], csv_array[1], csv_array[2]. There might be easier ways, but this I think is the most verbose and easy to get.

Clmelissa · Accepted Answer · 2017-10-13 15:06:17Z

First if you read the csv file with csv.reader(csv_file, delimiter=','), you will still read the header.

csv_array[0] will be the header row -> ['ColA', ' ColB', ' ColC']

Also if you're using mac, this issues is already referenced here: CSV new-line character seen in unquoted field error

And I would recommend using pandas&numpy instead if you will do more analysis using the data. It read the csv file to pandas dataframe. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Ajay · Accepted Answer · 2020-06-02 18:34:23Z

use csv.DictReader() to select specific columns

dataA = [] dataB = [] with open('data.csv', 'r') as csv_file: csv_reader = csv.DictReader(csv_file, delimiter=',') for row in csv_reader: dataA.append(row['ColA']) dataB.append(row['ColB'])

Collectives™ on Stack Overflow

Python read csv file columns into lists, ignoring headers

6 Answers 6

Comments

2 Comments

Comments

3 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

2 Comments

Comments

3 Comments

Comments

Comments

Linked

Related