Parsing large text file using regex

Question

I have a large text file (60Mb) that looks like the following:

:VPN () :add_adtr_rule (true) :additional_products () :addr_type_indication (IPv4) :certificates () :color (black) :comments () :connectra (false) :connectra_settings () :cp_products_installed (false) :data_source (not-installed) :data_source_settings () :edges () :enforce_gtp_rate_limit (false) :firewall (not-installed) :floodgate (not-installed) :gtp_rate_limit (2048) :interfaces () :ipaddr (10.19.45.18)

for every instance in which :add_adtr_rule is true, there are thousands of ':add_adtr_rule (false)' entries, I need the value of the ipaddr - so in this instance I would need the 10.19.45.18. How can I use a regex to extract this information.

I have tried the following code, that returns an empty list:

import re with open("objects_5_0_C-Mod.txt", "r") as f: text = f.read() ip=re.findall(r':add_adtr_rule [\(]true[\)]\s+.*\s+.*\s+.*\s+.*\s+:ipaddr\s+[\(](.*)[\)]', text) print(ip)

Assuming that the file consists of repeated blocks like the above, and given that I am not a regex expert, I would have started by writing a generator yielding one block at a time. This would be generically useful for querying the file. I would have then tested for the 'true' and extracted or ignored depending. But siam's regex looks good for a one-off job. — Terry Jan Reedy
– Terry Jan Reedy, Commented Mar 6, 2017 at 20:20

m87 · Accepted Answer · 2017-03-07 08:02:40Z

The following regex should do it :

(?s)(?:add_adtr_rule\s\(true\)).*?:ipaddr\s\((.*?)\)

see regex demo / explanation

python ( demo )

import re s = """:VPN () :add_adtr_rule (true) :additional_products () :addr_type_indication (IPv4) :certificates () :color (black) :comments () :connectra (false) :connectra_settings () :cp_products_installed (false) :data_source (not-installed) :data_source_settings () :edges () :enforce_gtp_rate_limit (false) :firewall (not-installed) :floodgate (not-installed) :gtp_rate_limit (2048) :interfaces () :ipaddr (10.19.45.18)""" r = r"(?s)(?:add_adtr_rule\s\(true\)).*?:ipaddr\s\((.*?)\)" ip = re.findall(r, s) print (ip)

Nice. By working the your regex with the manual, docs.python.org/3/library/re.html#regular-expression-syntax, I learned some new features.
@Clyde Glad it worked! BTW, an accept would be much appreciated tho :-)
@Siam - What do you mean by accept? Sorry I'm fairly new to this site.

Jan · Accepted Answer · 2017-03-06 20:05:39Z

You might want to add anchors to speed up things. Consider the following example with MULTILINE and VERBOSE turned on:

^:add_adtr_rule\ \(true\) # start of line, followed by :add_ ... [\s\S]+? # everything else afterwards, lazily ^:ipaddr\ \((?P<ip>[^)]+)\) # start of line, ip and group "ip" between ()

See a demo on regex101.com.

With your given code this comes down to:

import re rx = re.compile(r''' ^:add_adtr_rule\ \(true\) [\s\S]+? ^:ipaddr\ \((?P<ip>[^)]+)\) ''', re.MULTILINE | re.VERBOSE) with open("objects_5_0_C-Mod.txt", "r") as f: text = f.read() ips = [match.group('ip') for match in rx.finditer(text)] print(ips)

Collectives™ on Stack Overflow

Parsing large text file using regex

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related