Feature: Batch- and axis-aware image/video/volume operations

Describe what you are looking for

Feature request: Batch- and axis-aware image/video/volume operations

Summary

We need batch- and axis-aware versions of common image operations so that videos and volumes can be processed without Python loops over frames/slices. The same APIs should accept the shapes we choose:

Single image: (H, W, C)
Video / batch of frames: (N, H, W, C) — N frames (or batch size)
Volume (3D stack): (D, H, W, C) — D depth slices

The general form is one leading dimension (D or N) + (H, W, C): both (N, H, W, C) and (D, H, W, C) should be supported so that video (N frames) and volume (D slices) are first-class. We may also want an arbitrary batch/axis (e.g. apply along axis 0 for any ndarray). Operations should work on the exact shapes we pass, with no need to reshape or loop in Python.

Critical point: cv2.flip (and similarly other cv2 functions) does not work on videos and volumes: OpenCV expects a single image (H, W, C). For video (N, H, W, C) or volume (D, H, W, C) we must loop over frames/slices in Python, which is slow and prevents using optimized batch paths. We need flip (and all operations below) to accept both shapes—and in general one leading dimension (N or D) + (H, W, C)—so a single call can process the whole array.

Target semantics

Input shape: Caller passes an array of shape we want, e.g.:
- (H, W, C) — single image
- (N, H, W, C) — N images (video frames or batch)
- (D, H, W, C) — D slices (volume, e.g. 3D medical imaging)
- General form: one leading dimension (N or D) + (H, W, C). Both (N, H, W, C) and (D, H, W, C) should be explicitly supported.
- Optionally: a generic "batch axis" (e.g. apply op along axis 0 for any ndarray).
Output shape: Same as input shape (or explicitly documented, e.g. resize changes H, W).
No Python loop: The implementation should process all batch elements (or the chosen axis) in one go, using vectorized/backend code, not a per-frame loop in Python.

List of operations we need (batch/axis-aware)

Flip — cv2 analogue: cv2.flip
Flip along any axis (or multiple axes), similar to numpy.flip(array, axis=...). 2D/batch (N, H, W, C) or (H, W, C): flip along spatial axes (e.g. axis 1 = horizontal, axis 2 = vertical). 3D volume (D, H, W, C): flip along axis 0 (depth), 1 (height), 2 (width), or any combination (e.g. axis=(0, 2)). Same semantics as numpy; one API for 2D and 3D.
Warp affine — cv2 analogue: cv2.warpAffine
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H', W', C). Apply 2D affine (2×3 or 3×3 matrix) per frame; one matrix per batch element or one shared. Same N; spatial size can change to (H', W').
- 3D volume: (D, H, W, C) → (D', H', W', C). Apply 3D affine (4×4 matrix) with trilinear (or nearest) interpolation. Enables true 3D rotation/scaling/translation. No cv2 equivalent; analogous to scipy.ndimage.affine_transform.
Warp perspective — cv2 analogue: cv2.warpPerspective
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H', W', C). Apply 2D perspective (3×3 matrix) per frame; one matrix per batch element or one shared. Same N; spatial size can change to (H', W').
- 3D volume: (D, H, W, C) → (D', H', W', C). Apply 3D projective transform (4×4 matrix with perspective divide). No cv2 equivalent.
Resize — cv2 analogue: cv2.resize
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H', W', C). Resize spatial dimensions (height, width) per frame; same or per-frame target size. Extends cv2.resize.
- 3D volume: (D, H, W, C) → (D', H', W', C). Resize in all three spatial directions (depth, height, width). No cv2 equivalent.
Remap — cv2 analogue: cv2.remap
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H', W', C) (or same shape). Generic remap with map_x, map_y; same or per-frame maps. Extends cv2.remap.
- 3D volume: (D, H, W, C) → (D', H', W', C) (or same shape). 3D remap with coordinate maps for all three axes. No cv2 equivalent.
Copy make border (pad) — cv2 analogue: cv2.copyMakeBorder
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H+top+bottom, W+left+right, C). Pad height and width. Extends cv2.copyMakeBorder.
- 3D volume: (D, H, W, C) → (D+front+back, H+top+bottom, W+left+right, C). Pad in all three directions. No cv2 equivalent.
Blur (box filter) — cv2 analogue: cv2.blur
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H, W, C). Rectangular kernel blur per frame. Extends cv2.blur.
- 3D volume: See Blur3D below (3D-only op).
Gaussian blur — cv2 analogue: cv2.GaussianBlur
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H, W, C). Gaussian kernel blur per frame. Extends cv2.GaussianBlur.
- 3D volume: See GaussianBlur3D below (3D-only op).
Median blur — cv2 analogue: cv2.medianBlur
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H, W, C). Median filter per frame. Extends cv2.medianBlur.
- 3D volume: See MedianBlur3D below (3D-only op).
Filter2D (2D convolution) — cv2 analogue: cv2.filter2D
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H, W, C). Custom 2D kernel per frame; same or per-element kernel. Extends cv2.filter2D.
- 3D volume: See Filter3D below (3D-only op).
SepFilter2D (separable 2D filter) — cv2 analogue: cv2.sepFilter2D
We need two transforms:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H, W, C). Separable (kernelX, kernelY) per frame. Extends cv2.sepFilter2D.
- 3D volume: See SepFilter3D below (3D-only op).
Erode / Dilate / MorphologyEx — cv2 analogues: cv2.erode, cv2.dilate, cv2.morphologyEx
We need two variants:
- 2D batch (direct cv2 extension): (N, H, W, C) → (N, H, W, C). Morphological ops per frame with 2D structuring element. Extends cv2 erode/dilate/morphologyEx.
- 3D volume: See Erode3D / Dilate3D / MorphologyEx3D below (3D-only ops).

3D-only operations (no cv2 equivalent)

These operate on volumes (D, H, W, C) and have no 2D cv2 counterpart; they are the natural 3D extensions of the 2D ops above.

Blur3D — Box filter in 3D. (D, H, W, C) → (D, H, W, C). Rectangular 3D kernel.
GaussianBlur3D — Gaussian blur in 3D. (D, H, W, C) → (D, H, W, C). Kernel size and/or sigma per axis.
MedianBlur3D — Median filter in 3D. (D, H, W, C) → (D, H, W, C).
Filter3D — 3D convolution with a 3D kernel. (D, H, W, C) → (D, H, W, C). Custom 3D kernel.
SepFilter3D — Separable 3D filter (e.g. kernelD, kernelH, kernelW). (D, H, W, C) → (D, H, W, C). More efficient than full 3D kernel when separable.
Erode3D / Dilate3D / MorphologyEx3D — Morphological ops with a 3D structuring element. (D, H, W, C) → (D, H, W, C).

Why this matters

Videos (N, H, W, C): Today we loop over N and call cv2.flip, cv2.resize, cv2.warpAffine, etc. per frame. That loses SIMD/GPU batch optimizations and adds Python overhead.
Volumes (D, H, W, C): Same for 3D data: we loop over D and call the same cv2 APIs. We need one call that applies along the leading dimension. Specifying both N (video) and D (volume) makes it clear we need the same semantics for both shapes.
Consistency: All of the above operations are used in augmentation pipelines (e.g. AlbumentationsX). Having them in a single backend with consistent “shape we want” semantics would let us support video and volume augmentation without per-frame loops.

Summary table

2D batch = direct cv2 extension: (N, H, W, C) → (N, H', W', C) (or same spatial size). 3D volume = (D, H, W, C) → (D', H', W', C) (or same size).

Operation	cv2 / current API	2D batch (N,H,W,C)	3D volume (D,H,W,C)
Flip	cv2.flip	Yes (flip along any axis)	Yes (axis 0,1,2 or combo, like numpy.flip)
Warp affine	cv2.warpAffine	(N,H,W,C)→(N,H',W',C)	(D,H,W,C)→(D',H',W',C)
Warp perspective	cv2.warpPerspective	(N,H,W,C)→(N,H',W',C)	(D,H,W,C)→(D',H',W',C)
Resize	cv2.resize	(N,H,W,C)→(N,H',W',C)	(D,H,W,C)→(D',H',W',C) (resize in D,H,W)
Remap	cv2.remap	Yes	Yes (3D coordinate maps)
Copy make border	cv2.copyMakeBorder	Yes (pad H,W)	Yes (pad D,H,W)
Blur	cv2.blur	Yes	Blur3D
Gaussian blur	cv2.GaussianBlur	Yes	GaussianBlur3D
Median blur	cv2.medianBlur	Yes	MedianBlur3D
Filter2D	cv2.filter2D	Yes	Filter3D
SepFilter2D	cv2.sepFilter2D	Yes	SepFilter3D
Erode / Dilate / MorphologyEx	cv2.erode, dilate, morphologyEx	Yes	Erode3D / Dilate3D / MorphologyEx3D

3D-only (no cv2 equivalent): Blur3D, GaussianBlur3D, MedianBlur3D, Filter3D, SepFilter3D, Erode3D, Dilate3D, MorphologyEx3D — all on (D, H, W, C).

All 2D batch ops should accept single image (H, W, C) and video (N, H, W, C). All 3D ops should accept volume (D, H, W, C). Process in a single call without Python loops.

Can you contribute to the implementation?

I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

I have searched the existing issues

Code of Conduct

I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Batch- and axis-aware image/video/volume operations #313

Describe what you are looking for

Feature request: Batch- and axis-aware image/video/volume operations

Summary

Target semantics

List of operations we need (batch/axis-aware)

3D-only operations (no cv2 equivalent)

Why this matters

Summary table

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Batch- and axis-aware image/video/volume operations #313

Description

Describe what you are looking for

Feature request: Batch- and axis-aware image/video/volume operations

Summary

Target semantics

List of operations we need (batch/axis-aware)

3D-only operations (no cv2 equivalent)

Why this matters

Summary table

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions