Using Python to Remove All Lines Matching Regex

Question

I'm attempting to remove all lines where my regex matches(regex is simply looking for any line that has yahoo in it). Each match is on it's own line, so there's no need for the multiline option.

This is what I have so far...

import re inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8") inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile)) inputfile.close()

I'm receiving the following error:

Traceback (most recent call last): line 170, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer

you are not reading the file. You need something like inputfile.readlines() — karthikr
– karthikr, Commented Jun 20, 2013 at 18:44
You're trying to close 2 files you never opened, and naming a file opened for writing inputfile is confusing at best. — Wooble
– Wooble, Commented Jun 20, 2013 at 18:44
... and re.sub is about replacing the matching content of a string. Not testing if a string match. — Sylvain Leroux
– Sylvain Leroux, Commented Jun 20, 2013 at 18:45
I'm trying to replace the matched content with nothing, hence "". — MrMr
– MrMr, Commented Jun 20, 2013 at 18:47

Anonyme2000 · Accepted Answer · 2021-09-22 17:12:07Z

17

Use fileinput module if you want to modify the original file:

import re import fileinput for line in fileinput.input(r'C:\temp\Scripts\remove.txt', inplace = True): if not re.search(r'\byahoo\b', line): print(line, end="")

edited Sep 22, 2021 at 17:12

Anonyme2000

781 gold badge1 silver badge9 bronze badges

answered Jun 20, 2013 at 18:45

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

MrMr Over a year ago

It's adding new lines between the text that still exists. Any tips on how to avoid this?

MrMr Over a year ago

I've tried print line, and print (line,) and print (line), none seem to work.

Ashwini Chaudhary Over a year ago

@user2506096 use print(line, end = "") on py3.x

Ashwini Chaudhary Over a year ago

@JonClements thanks for your inputs, I was not able to reply due to poor internet connection and looks like you handled everything. :)

jfs Over a year ago

note: you can't use inplace=1 and fileinput.hook_encoded() simultaneously so fileinput-based solution won't work if you need to decode the file content using encoding other than the default locale.getpreferredencoding(False).

|

jfs · Accepted Answer · 2018-10-28 07:05:13Z

Here's Python 3 variant of @Ashwini Chaudhary's answer, to remove all lines that contain a regex pattern from a give filename:

#!/usr/bin/env python3 """Usage: remove-pattern <pattern> <file>""" import fileinput import re import sys def main(): pattern, filename = sys.argv[1:] # get pattern, filename from command-line matched = re.compile(pattern).search with fileinput.FileInput(filename, inplace=1, backup='.bak') as file: for line in file: if not matched(line): # save lines that do not match print(line, end='') # this goes to filename due to inplace=1 main()

It assumes locale.getpreferredencoding(False) == input_file_encoding otherwise it might break on non-ascii characters.

To make it work regardless what current locale is or for input files that have a different encoding:

#!/usr/bin/env python3 import os import re import sys from tempfile import NamedTemporaryFile def main(): encoding = 'utf-8' pattern, filename = sys.argv[1:] matched = re.compile(pattern).search with open(filename, encoding=encoding) as input_file: with NamedTemporaryFile(mode='w', encoding=encoding, dir=os.path.dirname(filename), delete=False) as outfile: for line in input_file: if not matched(line): print(line, end='', file=outfile) os.replace(outfile.name, input_file.name) main()

Victor Castillo Torres · Accepted Answer · 2013-06-20 18:45:02Z

You have to read the file try something like:

import re inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8") inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile.read())) file.close() outputfile.close()

Collectives™ on Stack Overflow

Using Python to Remove All Lines Matching Regex

3 Answers 3

7 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

Comments

Comments

Linked

Related