Remove background from a directory of JPEG images

Question

I wrote a code to remove the background of 8000 images but that whole code is taking approximately 8 hours to give the result.

How to improve its time complexity? As I have to work on a larger dataset in the future.

from rembg import remove import cv2 import glob for img in glob.glob('../images/*.jpg'): a = img.split('../images/') a1 = a[1].split('.jpg') try: cv_img = cv2.imread(img) output = remove(cv_img) except: continue cv2.imwrite('../output image/' + str(a1[0]) + '.png', output)

Toby Speight · Accepted Answer · 2022-09-13 17:28:45Z

Performance

This is a simple loop, and I would expect that the majority of time is spent in rembg.remove() - but you should profile to demonstrate that.

If my guess is correct, and if that method is single-threaded, the simplest approach is to divide the work across more cores, to process images in parallel.

General code review

PEP-8 recommends that indentation should be 4 spaces per level, rather than variously 2 and 3.

Some of the names could be better - img is actually the input filename; it's not an image until we read it. a and a1 are utterly meaningless.

Instead of using string.split() to compose the output filename, we can use os.path or pathlib.

I think that except: continue isn't very useful error handling. You probably want to have some messages on the error stream indicating which files weren't converted, and possibly also write a log file.

I would probably move the cv2.imwrite() within the try block too - if that fails, we want to know about it.

We can get a cleaner implementation, and use this as the basis for parallelising:

import cv2 import rembg import sys from pathlib import Path in_dir = Path('../images') out_dir = Path('../output image') for path in in_dir.glob('*.jpg'): try: image = cv2.imread(str(path)) if image is None or not image.data: raise cv2.error("read failed") output = rembg.remove(image) path = out_dir / path.with_suffix('.png').name cv2.imwrite(path, output) except Exception as e: print(f"{path}: {e}", file=sys.stderr)

For super easy parallelisation I’d recommend using something like p_map or p_umap from the p_tqdm package, which comes with a progress bar and ETA. — Seb
– Seb, Commented Sep 14, 2022 at 9:05

Stack Exchange Network

Remove background from a directory of JPEG images

1 Answer 1

Performance

General code review

You must log in to answer this question.

Hot Network Questions

Remove background from a directory of JPEG images

1 Answer 1

Performance

General code review

You must log in to answer this question.

Related

Hot Network Questions