0

I have a directory with files like img-0001.jpg, img-0005.pg, img-0006.jpg, ... , img-xxxx.jpg. What I need to do is to get a list with all files starting at 0238, literally img-0238.jpg. The next existing filename is img-0240.jpg

Right now I use glob to get all filenames.

list_images = glob.glob(path_images + "*.jpg") 

Thanks in advance

Edit:

-> The last filename is img-0315.jpg

4
  • If that's "literally" the file name you want, you don't need a glob. If you want *0100*.jpg or perhaps *01[0-9][0-9].jpg then ... those are the wildcards. Commented Feb 17, 2020 at 13:29
  • I need all from 0238 on Commented Feb 17, 2020 at 13:30
  • 1
    That seems to directly conflict with what's in your question. Probably edit it to show in more detail what files exactly you have, and which of those you want. Commented Feb 17, 2020 at 13:32
  • edited and put some additional information Commented Feb 17, 2020 at 13:39

3 Answers 3

1

Glob doesn't allow regex filtering. But you filter list right after you receive all matching files. Here is how it would look like using re:

import re list_images = [f for f in glob.glob(path_images + "*.jpg") \ if re.search(r'[1-9]\d{3,}|0[3-9]\d{2,}|02[4-9]\d|023[8-9]\.jpg$', f)] 

The regular expression with verify that file ends with number with 4 digits bigger or equal 0238.

You can play around with regular expression using https://regex101.com/

Basically, we check if number is:

  • starts with 1 followed by any 3 digits
  • or starts with 0[3-9] followed by any 2 digits
  • or starts with 02[4-9] followed by any 1 digit
  • or starts with 023 and followed by either 8 or 9.

But it's probably would be easier to do simple comparison:

list_images = [f for f in glob.glob(path_images + "*.jpg") \ if f[-8:-4] > "0237" and f[-8:-4] < "0316"] 
Sign up to request clarification or add additional context in comments.

2 Comments

tried the second solution out, the list_images list is empty
What's your path_images value? make sure it ends with directory separator, make sure that your path in glog.glob(...) is valid and returns list of files. Just checked, works perfectly if path is valid.
0

You can specify multiple repeated wildcards to match all files whose number is 23[89] or 2[4-9][0-9] or 30[0-9] etc;

list_images = [] for pattern in ('023[89]', '02[4-9][0-9]', '030[0-9]', '031[0-5]'): list_images.extend(glob.glob( os.path.join(path_images, '*{0}.jpg'.format(pattern)))) 

or you can just filter out the ones you don't want.

list_images = [x for x in glob.glob(os.path.join(path_images, "*.jpg")) if 238 <= int(x[-8:-4]) <= 315] 

1 Comment

tried both out, list_images stays empty in both cases
0

For something like this, you could try the wcmatch library. It's a library that aims to enhance file globbing and wildcard matching.

In this example, we enable brace expansion and demonstrate the pattern by filtering a list of files:

from wcmatch import glob files = [] # Generate list of files from img-0000.jpg to img-0315.jpg for x in range(316): files.append('path/img-{:04d}.jpg'.format(x)) print(glob.globfilter(files, 'path/img-{0238..0315}.jpg', flags=glob.BRACE)) 

And we get the following output:

['path/img-0238.jpg', 'path/img-0239.jpg', 'path/img-0240.jpg', 'path/img-0241.jpg', 'path/img-0242.jpg', 'path/img-0243.jpg', 'path/img-0244.jpg', 'path/img-0245.jpg', 'path/img-0246.jpg', 'path/img-0247.jpg', 'path/img-0248.jpg', 'path/img-0249.jpg', 'path/img-0250.jpg', 'path/img-0251.jpg', 'path/img-0252.jpg', 'path/img-0253.jpg', 'path/img-0254.jpg', 'path/img-0255.jpg', 'path/img-0256.jpg', 'path/img-0257.jpg', 'path/img-0258.jpg', 'path/img-0259.jpg', 'path/img-0260.jpg', 'path/img-0261.jpg', 'path/img-0262.jpg', 'path/img-0263.jpg', 'path/img-0264.jpg', 'path/img-0265.jpg', 'path/img-0266.jpg', 'path/img-0267.jpg', 'path/img-0268.jpg', 'path/img-0269.jpg', 'path/img-0270.jpg', 'path/img-0271.jpg', 'path/img-0272.jpg', 'path/img-0273.jpg', 'path/img-0274.jpg', 'path/img-0275.jpg', 'path/img-0276.jpg', 'path/img-0277.jpg', 'path/img-0278.jpg', 'path/img-0279.jpg', 'path/img-0280.jpg', 'path/img-0281.jpg', 'path/img-0282.jpg', 'path/img-0283.jpg', 'path/img-0284.jpg', 'path/img-0285.jpg', 'path/img-0286.jpg', 'path/img-0287.jpg', 'path/img-0288.jpg', 'path/img-0289.jpg', 'path/img-0290.jpg', 'path/img-0291.jpg', 'path/img-0292.jpg', 'path/img-0293.jpg', 'path/img-0294.jpg', 'path/img-0295.jpg', 'path/img-0296.jpg', 'path/img-0297.jpg', 'path/img-0298.jpg', 'path/img-0299.jpg', 'path/img-0300.jpg', 'path/img-0301.jpg', 'path/img-0302.jpg', 'path/img-0303.jpg', 'path/img-0304.jpg', 'path/img-0305.jpg', 'path/img-0306.jpg', 'path/img-0307.jpg', 'path/img-0308.jpg', 'path/img-0309.jpg', 'path/img-0310.jpg', 'path/img-0311.jpg', 'path/img-0312.jpg', 'path/img-0313.jpg', 'path/img-0314.jpg', 'path/img-0315.jpg'] 

So, we could apply this to a file search:

from wcmatch import glob list_images = glob.glob('path/img-{0238..0315}.jpg', flags=glob.BRACE) 

In this example, we've hard coded the path, but in your example, make sure path_images has a trailing / so that the pattern is constructed correctly. Others have suggested this might be an issue. Print out your pattern to confirm the pattern is correct.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.