17

I'm attempting to remove all lines where my regex matches(regex is simply looking for any line that has yahoo in it). Each match is on it's own line, so there's no need for the multiline option.

This is what I have so far...

import re inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8") inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile)) inputfile.close() 

I'm receiving the following error:

Traceback (most recent call last): line 170, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer

5
  • 4
    So what's the problem? Commented Jun 20, 2013 at 18:43
  • you are not reading the file. You need something like inputfile.readlines() Commented Jun 20, 2013 at 18:44
  • You're trying to close 2 files you never opened, and naming a file opened for writing inputfile is confusing at best. Commented Jun 20, 2013 at 18:44
  • ... and re.sub is about replacing the matching content of a string. Not testing if a string match. Commented Jun 20, 2013 at 18:45
  • I'm trying to replace the matched content with nothing, hence "". Commented Jun 20, 2013 at 18:47

3 Answers 3

17

Use fileinput module if you want to modify the original file:

import re import fileinput for line in fileinput.input(r'C:\temp\Scripts\remove.txt', inplace = True): if not re.search(r'\byahoo\b', line): print(line, end="") 
Sign up to request clarification or add additional context in comments.

7 Comments

It's adding new lines between the text that still exists. Any tips on how to avoid this?
I've tried print line, and print (line,) and print (line), none seem to work.
@user2506096 use print(line, end = "") on py3.x
@JonClements thanks for your inputs, I was not able to reply due to poor internet connection and looks like you handled everything. :)
note: you can't use inplace=1 and fileinput.hook_encoded() simultaneously so fileinput-based solution won't work if you need to decode the file content using encoding other than the default locale.getpreferredencoding(False).
|
6

Here's Python 3 variant of @Ashwini Chaudhary's answer, to remove all lines that contain a regex pattern from a give filename:

#!/usr/bin/env python3 """Usage: remove-pattern <pattern> <file>""" import fileinput import re import sys def main(): pattern, filename = sys.argv[1:] # get pattern, filename from command-line matched = re.compile(pattern).search with fileinput.FileInput(filename, inplace=1, backup='.bak') as file: for line in file: if not matched(line): # save lines that do not match print(line, end='') # this goes to filename due to inplace=1 main() 

It assumes locale.getpreferredencoding(False) == input_file_encoding otherwise it might break on non-ascii characters.

To make it work regardless what current locale is or for input files that have a different encoding:

#!/usr/bin/env python3 import os import re import sys from tempfile import NamedTemporaryFile def main(): encoding = 'utf-8' pattern, filename = sys.argv[1:] matched = re.compile(pattern).search with open(filename, encoding=encoding) as input_file: with NamedTemporaryFile(mode='w', encoding=encoding, dir=os.path.dirname(filename), delete=False) as outfile: for line in input_file: if not matched(line): print(line, end='', file=outfile) os.replace(outfile.name, input_file.name) main() 

Comments

5

You have to read the file try something like:

import re inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8") inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile.read())) file.close() outputfile.close() 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.