0

I have 3 file in /some/dir:

$ ls /some/dir fiot_csv2apex_nomuratest.xml fiot_csv2apex_nomurauat.xml fiot_csv2apex_nomura.xml 

I want my script to extract only the file that does NOT contain substrings "uat" or "test" in its filename.

To start off simply, I'm only trying to exclude the "uat" substring but my attempts fail.

Here is the entire script that does NOT try to exclude any of those 3 files:

#!/usr/bin/env python import xml.etree.ElementTree as ET, sys, os, re, fnmatch param = sys.argv[1] client = param.split('_')[0] market = param.split('_')[1] suffix = param.split('_')[2] toapex_pattern = market + '*2apex*' + client + '*' + '.xml' files_dir = '/some/dir' config_files = os.listdir(files_dir) for f in config_files: if fnmatch.fnmatch(f, toapex_pattern): print(f) 

The above script will output all the 3 files in /some/dir as expected. The script is being run like this:

python /test/scripts/regex.py nomura_fiot_b 

I attempted to exclude "uat" by modifying toapex_pattern variable like this:

toapex_pattern = market + '*2apex*' + client + '(?!uat)' + '*' + '.xml': 

However, after that the script did not produce any output.

I also tried this:

toapex_pattern = re.compile(market + '*2apex*' + client + '(?!uat)' + '*' + '.xml') 

But this resulted in a type error:

TypeError: object of type '_sre.SRE_Pattern' has no len() 

And if I try this:

toapex_pattern = market + '*2apex*' + client + '[^uat]' + '*' + '.xml' 

the output is:

fiot_csv2apex_nomuratest.xml fiot_csv2apex_nomurauat.xml 

The desired output is:

fiot_csv2apex_nomura.xml 

How should I modify the toapex_pattern variable to achieve the desired output?

1

1 Answer 1

1

An fnmatch pattern is not a regular expression. Things like (?!...) won't work.

Generally, exclusive patterns will not work well with fnmatch. You can to something like this

[!u][!a][!t] 

to match any three letters that are not "uat"... but that would still mean you'd implicitly require at least 3 letters, and you could not control any further which ones.

Spare yourself the hassle, use fnmatch to get into the general ballpark, and then use a second step to exclude things you don't want.

files_dir = '/some/dir' config_files = os.listdir(files_dir) for file_name in config_files: if fnmatch.fnmatch(file_name, toapex_pattern) and not "uat" in file_name: print(file_name) 

Alternatively, use regex from the start.

import re files_dir = '/some/dir' config_files = os.listdir(files_dir) # ... toapex_pattern = re.escape(market) + '.*2apex.*' + re.escape(client) + '(?!uat).*\\.xml$': for file_name in config_files: if re.match(toapex_pattern, file_name): print(file_name) 

Just throwing it in, you could call the script as python /test/scripts/regex.py nomura fiot b and use sys.argv[1], sys.argv[2] and sys.argv[3] directly, without having to split anything yourself first.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.