Difficulty utilizing Python struct.Struct.unpack_from with different format strings

Question

First time poster, long-time lurker. Have searched high and low for an answer to this but it's got to that stage...!

I am having some trouble implementing the answer given by John Machin to this past question:

How to efficiently parse fixed width files?

At a very high level I am using this code to split up fixed format text files and import them into a PostgreSQL database. I have successfully used this code to implement the solution for one text file, however I am now trying to expand my program to work with different text files with different fixed formats, and am continuously running into the same error:

struct.error: unpack_from requires a buffer of at least [x] bytes

Of course, I get a different value for x depending on the format string I am feeding to the function - my problem is that it continues to work for one and only one format, and not any others. The only thing I am changing is the variable used to calculate the format string, and the variable names in the script which relate to the format.

So for example this works fine:

cnv_text = lambda s: str(s.strip()) cnv_int = lambda s: int(s) if s.isspace() is False else s.strip() cnv_date_ymd = lambda s: datetime.datetime.strptime(s, '%Y%m%d') if s.isspace() is False else s.strip() # YYYY-MM-DD unpack_len = 0 unpack_fmt = "" splitData = [] conn = psycopg2.connect("[connection info]") cur = conn.cursor() Table1specs = [ ('A', 6, 14, cnv_text), ('B', 20, 255, cnv_text), ('C', 275, 1, cnv_text), ('D', 276, 1, cnv_text), ('E', 277, 1, cnv_text), ('F', 278, 1, cnv_text), ('G', 279, 1, cnv_text), ('H', 280, 1, cnv_text), ('I', 281, 8, cnv_date_ymd), ('J', 289, 8, cnv_date_ymd), ('K', 297, 8, cnv_date_ymd), ('L', 305, 8, cnv_date_ymd), ('M', 313, 8, cnv_date_ymd), ('N', 321, 1, cnv_text), ('O', 335, 2, cnv_text), ('P', 337, 2, cnv_int), ('Q', 339, 5, cnv_int), ('R', 344, 255, cnv_text), ('S', 599, 1, cnv_int), ('T', 600, 1, cnv_int), ('U', 601, 5, cnv_int), ('V', 606, 10, cnv_text) ] #for each column in the spec variable... for column in Table1specs: start = column[1] - 1 end = start + column[2] if start > unpack_len: unpack_fmt += str(start - unpack_len) + "x" unpack_fmt += str(end - start) + "s" unpack_len = end field_indices = range(len(Table1specs)) print unpack_len, unpack_fmt #set unpacker unpacker = struct.Struct(unpack_fmt).unpack_from class Record(object): pass filename = "Table1Data.txt" f = open(filename, 'r') for line in f: raw_fields = unpacker(line) r = Record() for x in field_indices: setattr(r, Table1specs[x][0], Table1specs[x][3](raw_fields[x])) splitData.append(r.__dict__)

All the data is appended to splitData, which I then cycle through in a loop and work into SQL statements for input into the database via psycopg2. When I change the specs to something else (and the other variables also to reflect this), then I receive the error. It is thrown from the 'raw_fields = unpacker(line)' line.

I have exhausted all resources and am at a loose end... any thoughts or ideas welcomed.

(Could it be to do with the text file I am importing from?)

Best regards.

Can you give us some minimal-working example code to reproduce this error? — aIKid
– aIKid, Commented Mar 21, 2014 at 16:02
@alKid added in code example - similar to the code in the answer from linked question which is why I did not include originally :). — Joe Plumb
– Joe Plumb, Commented Mar 21, 2014 at 16:31
@user3446927 - Your example is labeled "This works fine." Please provide an example of code that fails. — Robᵩ
– Robᵩ, Commented Mar 26, 2014 at 2:20
@Rob, have now solved this issue. Problem was with the text files I was parsing - the lines were not long enough so I have written a function that writes spaces to the end of each line to make them the correct length... seems to be working ok so far. — Joe Plumb
– Joe Plumb, Commented Mar 26, 2014 at 13:05
Excellent. Please delete this question so that others don't spend time on it unnecessarily. — Robᵩ
– Robᵩ, Commented Mar 26, 2014 at 13:46

Joe Plumb · Accepted Answer · 2014-03-26 14:08:55Z

Have since solved this issue: Problem was being caused by the text files I was parsing. The lines were not long enough so I have written a function that writes spaces to the end of each line to make them the correct length:

def checkLineLength(checkFile, minLength): print ('Checking length of lines in file '+ checkFile+', where minimum line length is '+str(minLength)) counter = 0 fixedFile = 'fixed'+checkFile src = open(checkFile, 'r') dest = open(fixedFile, 'w') lines = src.readlines() for line in lines: if len(line) < minLength: x = (line.rstrip('\r\n') + (" "*(minLength-(len(line)-1))+'\r\n')) dest.write(x) counter += 1 else: dest.write(line) if counter > 0: os.remove(checkFile) os.rename(fixedFile, checkFile) print (str(counter) + " lines fixed in "+ checkFile) else: print('Line length in '+checkFile+' is fine.' ) os.remove(fixedFile)

Collectives™ on Stack Overflow

Difficulty utilizing Python struct.Struct.unpack_from with different format strings

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related