I am trying to parse a gzipped csv file (where the fields are separated by | characters), to test if reading the file directly in Python will be faster than zcat file.gz | python in parsing the contents.
I have the following code:
#!/usr/bin/python3 import gzip if __name__ == "__main__": total=0 count=0 f=gzip.open('SmallData.DAT.gz', 'r') for line in f.readlines(): split_line = line.split('|') total += int(split_line[52]) count += 1 print(count, " :: ", total) But I get the following error:
$ ./PyZip.py Traceback (most recent call last): File "./PyZip.py", line 11, in <module> split_line = line.split('|') TypeError: a bytes-like object is required, not 'str' How can I modify this to read the line and split it properly?
I'm interested mainly in just the 52nd field as delimited by |. The lines in my input file are like the following:
field1|field2|field3|...field52|field53
Is there a faster way than what I have in summing all the values in the 52nd field?
Thanks!