1

So I have a text file that looks like this:

1,989785345,"something 1",,234.34,254.123 2,234823423,"something 2",,224.4,254.123 3,732847233,"something 3",,266.2,254.123 4,876234234,"something 4",,34.4,254.123 ... 

I'm running this code right here:

file = open("file.txt", 'r') readFile = file.readline() lineID = readFile.split(",") print lineID[1] 

This lets me break up the content in my text file by "," but what I want to do is separate it into columns because I have a massive number of IDs and other things in each line. How would I go about splitting the text file into columns and call each individual row in the column one by one?

1 Answer 1

9

You have a CSV file, use the csv module to read it:

import csv with open('file.txt', 'rb') as csvfile: reader = csv.reader(csvfile) for row in reader: 

This still gives you data by row, but with the zip() function you can transpose this to columns instead:

import csv with open('file.txt', 'rb') as csvfile: reader = csv.reader(csvfile) for column in zip(*reader): 

Do be careful with the latter; the whole file will be read into memory in one go, and a large CSV file could eat up all your available memory in the process.

Sign up to request clarification or add additional context in comments.

9 Comments

So what would be an efficient way to read certain columns (say using a list for headers) from a CSV file without loading it all into memory? I have a huge file with millions of columns and I want to read and use only several hundred at a time. Thank you
Just process the data as you read it. The for row in reader: loop will only need memory for the current row plus a buffer for the file.
Thank you for your reply, but not sure I follow... I need to process data by columns, not rows.
@Confounded: the row contains all columns, so you index the row to get the data for the columns you are interested in. Process that data as you read the rows.
When I say I need to process data buy columns, I mean that I need to have the whole column before I can start processing. If I understood you correctly, I would need to read the whole file using for row approach to do that.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.