5
\$\begingroup\$

I wrote a code to remove the background of 8000 images but that whole code is taking approximately 8 hours to give the result.

How to improve its time complexity? As I have to work on a larger dataset in the future.

from rembg import remove import cv2 import glob for img in glob.glob('../images/*.jpg'): a = img.split('../images/') a1 = a[1].split('.jpg') try: cv_img = cv2.imread(img) output = remove(cv_img) except: continue cv2.imwrite('../output image/' + str(a1[0]) + '.png', output) 
\$\endgroup\$

1 Answer 1

8
\$\begingroup\$

Performance

This is a simple loop, and I would expect that the majority of time is spent in rembg.remove() - but you should profile to demonstrate that.

If my guess is correct, and if that method is single-threaded, the simplest approach is to divide the work across more cores, to process images in parallel.


General code review

PEP-8 recommends that indentation should be 4 spaces per level, rather than variously 2 and 3.

Some of the names could be better - img is actually the input filename; it's not an image until we read it. a and a1 are utterly meaningless.

Instead of using string.split() to compose the output filename, we can use os.path or pathlib.

I think that except: continue isn't very useful error handling. You probably want to have some messages on the error stream indicating which files weren't converted, and possibly also write a log file.

I would probably move the cv2.imwrite() within the try block too - if that fails, we want to know about it.

We can get a cleaner implementation, and use this as the basis for parallelising:

import cv2 import rembg import sys from pathlib import Path in_dir = Path('../images') out_dir = Path('../output image') for path in in_dir.glob('*.jpg'): try: image = cv2.imread(str(path)) if image is None or not image.data: raise cv2.error("read failed") output = rembg.remove(image) path = out_dir / path.with_suffix('.png').name cv2.imwrite(path, output) except Exception as e: print(f"{path}: {e}", file=sys.stderr) 
\$\endgroup\$
1
  • 2
    \$\begingroup\$ For super easy parallelisation I’d recommend using something like p_map or p_umap from the p_tqdm package, which comes with a progress bar and ETA. \$\endgroup\$ Commented Sep 14, 2022 at 9:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.