13

I want to do some basic filtering on a file. Read it, do processing, write it back.

I'm not looking for "golfing", but want the simplest and most elegant method to achieve this. I came up with:

from __future__ import with_statement filename = "..." # or sys.argv... with open(filename) as f: new_txt = # ...some translation of f.read() open(filename, 'w').write(new_txt) 

The with statement makes things shorter since I don't have to explicitly open and close the file.

Any other ideas ?

6 Answers 6

26

Actually an easier way using fileinput is to use the inplace parameter:

import fileinput for line in fileinput.input (filenameToProcess, inplace=1): process (line) 

If you use the inplace parameter it will redirect stdout to your file, so that if you do a print it will write back to your file.

This example adds line numbers to your file:

import fileinput for line in fileinput.input ("b.txt",inplace=1): print "%d: %s" % (fileinput.lineno(),line), 
Sign up to request clarification or add additional context in comments.

3 Comments

Very nice, thanks for pointing this option out. You can also use the filelineno() function from fileinput to automatically have the line number, without counting it yourself.
Oh, and you forgot the comma after the print - the code adds extra newlines :-)
Thanks for catching that -- I have changed the example.
4

I would go for elegance a different way: implement your file-reading and filtering operations as generators, You'll write more lines of code, but it will be more flexible, maintainable, and performant code.

See David M. Beazley's Generator Tricks for Systems Programmers, which is a really important thing for anyone who's writing this kind of code to read.

2 Comments

Excellent link -- thank you! I am a little concerned with the increased difficulty in debugging pipelines, but the power is undeniable.
Test-driven development is your friend.
3

This seems to work:

with open(filename, "r+") as f: new_txt = process(f.read()) f.truncate(0) f.write(new_txt) 

1 Comment

Works here only when calling f.seek(0) after f.truncate(0), otherwise the new file starts with 11 zero bytes (Python 2.7.3 on Linux).
2

If you're looking for the python equivalent of "perl -pi", here's a pretty good one:

 import fileinput for line in fileinput.input(): # process line 

See http://www.python.org/doc/2.5.2/lib/module-fileinput.html for more.

Done this way, you would use your python script in a pipe to create the new file:

 $ myscript.py infile.txt > outfile.txt 

1 Comment

It doesn't really help me though, since I want to write back to the same file. And redirection won't work this way for the same file
1

To do it in a way which won't eat your data if you crash in the middle:

from twisted.python.filepath import FilePath p = FilePath(filename) p.setContent(process(p.getContent())) 

Comments

0

My ugly (but short as stated in the question) solution with generator expressions;

# Some setup first file('test.txt', 'w').write('\n'.join('%05d' % i for i in range(100))) # This is the filter function def f(i): return i % 3 # This is the main part file('test2.txt', 'w').write('\n'.join(str(f(int(l))) for l in file('test.txt', 'r').readlines())) # And a wrapper for sanity def filter_file(infile, outfile, filter_function) outfile.write('\n'.join(filter_function(l) for l in infile.readlines())) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.