2
$\begingroup$

I am working on a video processing pipeline where the frames have a width that is much larger than the height (wide aspect ratio). My main goal is to apply action recognition on human-object interactions. The problem I’m facing is that when I apply random cropping to the frames during data augmentation, there’s a significant chance that the object of interest (a person or object they are interacting with) gets cropped out.

The dimensions of the frames are something like 1280x720 (width is much larger than height), and I am concerned that random cropping might focus on areas without the object of interest, which could negatively affect model performance.

$\endgroup$

1 Answer 1

0
$\begingroup$

You can try letter-boxing

Letterboxing is a great approach! It allows you to maintain the aspect ratio of your frames without losing any critical information about the human-object interactions. By adding padding to the sides or top/bottom of the frame, you ensure that no part of the image is cropped out, which is especially useful when dealing with wide aspect ratios like 1280x720.

Here is the code of the letter boxing using python

def class_letterbox(im, new_shape=(640, 640), color=(0, 0, 0), scaleup=True): """ This function is used to letterbox the image. Args: im (_type_): _description_. new_shape (tuple, optional): shape of the image. Defaults to (224, 224). color (tuple, optional): color of the image. Defaults to (0, 0, 0). scaleup (bool, optional): scale up the image. Defaults to True. Returns: im (np.array): Processed image. """ # Resize and pad image while meeting stride-multiple constraints shape = im.shape[:2] # current shape [height, width] if isinstance(new_shape, int): new_shape = (new_shape, new_shape) if im.shape[0] == new_shape[0] and im.shape[1] == new_shape[1]: return im # Scale ratio (new / old) r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) if not scaleup: # only scale down, do not scale up (for better val mAP) r = min(r, 1.0) # Compute padding # ratio = r, r # width, height ratios new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding dw /= 2 # divide padding into 2 sides dh /= 2 if shape[::-1] != new_unpad: # resize im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border return im 

alternatively, you can try following things

To address the issue of random cropping potentially removing the object of interest during augmentation, you could try a few strategies:

Center Cropping:

Instead of random cropping, you could use center cropping, ensuring that the middle of the frame (where the human-object interaction likely occurs) is always preserved.

Object-aware Cropping:

Use a bounding box or region proposal to locate the human and object before cropping. You can crop around this region to ensure the object of interest remains in the frame.

Random Resizing with Padding:

Instead of cropping, resize the frames while keeping the aspect ratio, and pad the remaining area with a solid color (usually black or white). This would retain the object of interest and help the model generalize to different sizes.

Custom Cropping Policies:

You can define a set of policies that guide cropping. For example, limit the crop to a central area or the region where humans and objects tend to appear, rather than performing purely random crops.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.