Python read csv line from gzipped file

Question

I am trying to parse a gzipped csv file (where the fields are separated by | characters), to test if reading the file directly in Python will be faster than zcat file.gz | python in parsing the contents.

I have the following code:

#!/usr/bin/python3 import gzip if __name__ == "__main__": total=0 count=0 f=gzip.open('SmallData.DAT.gz', 'r') for line in f.readlines(): split_line = line.split('|') total += int(split_line[52]) count += 1 print(count, " :: ", total)

But I get the following error:

$ ./PyZip.py Traceback (most recent call last): File "./PyZip.py", line 11, in <module> split_line = line.split('|') TypeError: a bytes-like object is required, not 'str'

How can I modify this to read the line and split it properly?

I'm interested mainly in just the 52nd field as delimited by |. The lines in my input file are like the following:

field1|field2|field3|...field52|field53

Is there a faster way than what I have in summing all the values in the 52nd field?

Thanks!

Does this answer your question? Cannot split, a bytes-like object is required, not 'str' — mkrieger1
– mkrieger1, Commented Dec 21, 2022 at 10:03

blhsing · Accepted Answer · 2018-08-01 03:05:57Z

You should decode the line first before splitting, since unzipped files are read as bytes:

split_line = line.decode('utf-8').split('|')

The code you have for summing all the values in the 52nd field is fine. There's no way to make it faster because all the lines simply have to be read and split in order to identify the 52th field of every line.

Avinash Kancharla · Accepted Answer · 2018-08-01 03:26:25Z

Just try decoding the bytes object to a string. i.e,

line.decode('utf-8')

Updated script :

#!/usr/bin/python3 import gzip if __name__ == "__main__": total=0 count=0 f=gzip.open('SmallData.DAT.gz', 'r') for line in f.readlines(): split_line = line.decode("utf-8").split('|') total += int(split_line[52]) count += 1 print(count, " :: ", total)

Collectives™ on Stack Overflow

Python read csv line from gzipped file

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related