Python: Split file by line match

Question

I have a text file with different sections I would like to split in separate files . In the example below split point would be the "Step" lines.

Step Number: 1; Plot Name: deg0_R58; Type: Arrow Plot x(mm),y(mm),z(mm),Bx(T),By(T),Bz(T),Bm(T) 5.505E+01,-1.124E-02,-2.000E+00, 3.443E-04,-1.523E-05, 3.913E-04 5.511E+01,-1.124E-02,-2.000E+00, 3.417E-04,-1.511E-05, 3.912E-04 5.516E+01,-1.124E-02,-2.000E+00, 3.390E-04,-1.499E-05, 3.910E-04 ... Step Number: 2; Plot Name: deg0_R58; Type: Arrow Plot ...

The reason for this is that the pandas function pandas.read_csv() will not work on the entire file because of the "Step" lines.

I only need the files temporarily for the pandas.read_csv() so I don't actually want to write them. I've tried slicing the file with itertools.islice but then I can't process the output with pandas.read_csv because it needs a file type object.

Here is what I've got so far:

buf = [] with open(filepath, 'r') as f: for line in f: if 'Step' in line: buf.append( [] ) else: buf[-1].append( line )

Is there a way to get buf list of lines into a file type format?

->

Thanks for the input, StringIO works great! Here's what I made of it just in case anyone is facing a similar problem:

steps_Dict= {} fsection = None step_nr = 0; with open( filepath, 'r' ) as f: print f for line in f: if 'Step' in line: if fsection: step_nr = step_nr + 1 # Steps start with 1 fsection.seek(0) steps_Dict[ step_nr ] = pd.read_csv(fsection, sep=',', header=0 ) print steps_Dict fsection = StringIO.StringIO() # new section else: # append to section if line.strip(): # Skip Blank Lines;Alternative with pandas 0.16, pd.read_csv skip_blank_lines=True a parameter could be used ? fsection.write( line ) if fsection: # captures the last section fsection.seek(0) steps_Dict[ step_nr +1] = pd.read_csv( fsection, sep=',', header=0 ) steps_Panel = pd.Panel( steps_Dict )

You could save it as a temporary file, load it, and then delete it. It's a little much, but a straightforward approach. — Celeo
– Celeo, Commented Apr 9, 2015 at 16:50

Gnijuohz · Accepted Answer · 2015-04-09 17:23:43Z

You can use StringIO to store the string if you don't need to write into a file.

import StringIO output = StringIO.StringIO() with open(filepath, 'r') as f: for line in f: if 'Step' not in line: output.write(line)

Then you can use Pandas' read_csv function with output.

As @Julien pointed out in the comment below. You also need to do output.seek(0) before reading it with pandas:

import pandas as pd output.seek(0) pd.read_csv(output)

Make sure you rewind your output file with output.seek(0) before using read_csv. Otherwise, pandas will think that the file is empty.

Julien Spronck · Accepted Answer · 2015-04-09 17:00:19Z

You could use the StringIO module to create a file-like object that can be used by pd.read_csv():

import StringIO import pandas as pd astr = StringIO.StringIO() astr.write('This,is,a,test\n') astr.write('This,is,another,test\n') astr.seek(0) df = pd.read_csv(astr)

Daniel · Accepted Answer · 2015-04-09 17:05:03Z

You can use the pandas.io.parsers.read_csv function and skip the lines you don't need or want and read the file directly into a DataFrame.

 import pandas z = pandas.io.parsers.read_csv("C:/path/a.txt", skiprows=0, header=1, sep=",") z x(mm) y(mm) z(mm) Bx(T) By(T) Bz(T) Bm(T) 0 55.05 -0.01124 -2 0.000344 -0.000015 0.000391 NaN 1 55.11 -0.01124 -2 0.000342 -0.000015 0.000391 NaN 2 55.16 -0.01124 -2 0.000339 -0.000015 0.000391 NaN

Collectives™ on Stack Overflow

Python: Split file by line match

3 Answers 3

3 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Related