- Notifications
You must be signed in to change notification settings - Fork 110
Description
Describe what you are looking for
Feature request: Batch- and axis-aware image/video/volume operations
Summary
We need batch- and axis-aware versions of common image operations so that videos and volumes can be processed without Python loops over frames/slices. The same APIs should accept the shapes we choose:
- Single image:
(H, W, C) - Video / batch of frames:
(N, H, W, C)— N frames (or batch size) - Volume (3D stack):
(D, H, W, C)— D depth slices
The general form is one leading dimension (D or N) + (H, W, C): both (N, H, W, C) and (D, H, W, C) should be supported so that video (N frames) and volume (D slices) are first-class. We may also want an arbitrary batch/axis (e.g. apply along axis 0 for any ndarray). Operations should work on the exact shapes we pass, with no need to reshape or loop in Python.
Critical point: cv2.flip (and similarly other cv2 functions) does not work on videos and volumes: OpenCV expects a single image (H, W, C). For video (N, H, W, C) or volume (D, H, W, C) we must loop over frames/slices in Python, which is slow and prevents using optimized batch paths. We need flip (and all operations below) to accept both shapes—and in general one leading dimension (N or D) + (H, W, C)—so a single call can process the whole array.
Target semantics
- Input shape: Caller passes an array of shape we want, e.g.:
(H, W, C)— single image(N, H, W, C)— N images (video frames or batch)(D, H, W, C)— D slices (volume, e.g. 3D medical imaging)- General form: one leading dimension (N or D) +
(H, W, C). Both(N, H, W, C)and(D, H, W, C)should be explicitly supported. - Optionally: a generic "batch axis" (e.g. apply op along axis 0 for any ndarray).
- Output shape: Same as input shape (or explicitly documented, e.g. resize changes H, W).
- No Python loop: The implementation should process all batch elements (or the chosen axis) in one go, using vectorized/backend code, not a per-frame loop in Python.
List of operations we need (batch/axis-aware)
-
Flip — cv2 analogue:
cv2.flip
Flip along any axis (or multiple axes), similar tonumpy.flip(array, axis=...). 2D/batch(N, H, W, C)or(H, W, C): flip along spatial axes (e.g. axis 1 = horizontal, axis 2 = vertical). 3D volume(D, H, W, C): flip along axis 0 (depth), 1 (height), 2 (width), or any combination (e.g.axis=(0, 2)). Same semantics as numpy; one API for 2D and 3D. -
Warp affine — cv2 analogue:
cv2.warpAffine
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H', W', C). Apply 2D affine (2×3 or 3×3 matrix) per frame; one matrix per batch element or one shared. Same N; spatial size can change to (H', W'). - 3D volume:
(D, H, W, C)→(D', H', W', C). Apply 3D affine (4×4 matrix) with trilinear (or nearest) interpolation. Enables true 3D rotation/scaling/translation. No cv2 equivalent; analogous toscipy.ndimage.affine_transform.
- 2D batch (direct cv2 extension):
-
Warp perspective — cv2 analogue:
cv2.warpPerspective
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H', W', C). Apply 2D perspective (3×3 matrix) per frame; one matrix per batch element or one shared. Same N; spatial size can change to (H', W'). - 3D volume:
(D, H, W, C)→(D', H', W', C). Apply 3D projective transform (4×4 matrix with perspective divide). No cv2 equivalent.
- 2D batch (direct cv2 extension):
-
Resize — cv2 analogue:
cv2.resize
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H', W', C). Resize spatial dimensions (height, width) per frame; same or per-frame target size. Extendscv2.resize. - 3D volume:
(D, H, W, C)→(D', H', W', C). Resize in all three spatial directions (depth, height, width). No cv2 equivalent.
- 2D batch (direct cv2 extension):
-
Remap — cv2 analogue:
cv2.remap
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H', W', C)(or same shape). Generic remap withmap_x,map_y; same or per-frame maps. Extendscv2.remap. - 3D volume:
(D, H, W, C)→(D', H', W', C)(or same shape). 3D remap with coordinate maps for all three axes. No cv2 equivalent.
- 2D batch (direct cv2 extension):
-
Copy make border (pad) — cv2 analogue:
cv2.copyMakeBorder
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H+top+bottom, W+left+right, C). Pad height and width. Extendscv2.copyMakeBorder. - 3D volume:
(D, H, W, C)→(D+front+back, H+top+bottom, W+left+right, C). Pad in all three directions. No cv2 equivalent.
- 2D batch (direct cv2 extension):
-
Blur (box filter) — cv2 analogue:
cv2.blur
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H, W, C). Rectangular kernel blur per frame. Extendscv2.blur. - 3D volume: See Blur3D below (3D-only op).
- 2D batch (direct cv2 extension):
-
Gaussian blur — cv2 analogue:
cv2.GaussianBlur
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H, W, C). Gaussian kernel blur per frame. Extendscv2.GaussianBlur. - 3D volume: See GaussianBlur3D below (3D-only op).
- 2D batch (direct cv2 extension):
-
Median blur — cv2 analogue:
cv2.medianBlur
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H, W, C). Median filter per frame. Extendscv2.medianBlur. - 3D volume: See MedianBlur3D below (3D-only op).
- 2D batch (direct cv2 extension):
-
Filter2D (2D convolution) — cv2 analogue:
cv2.filter2D
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H, W, C). Custom 2D kernel per frame; same or per-element kernel. Extendscv2.filter2D. - 3D volume: See Filter3D below (3D-only op).
- 2D batch (direct cv2 extension):
-
SepFilter2D (separable 2D filter) — cv2 analogue:
cv2.sepFilter2D
We need two transforms:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H, W, C). Separable (kernelX, kernelY) per frame. Extendscv2.sepFilter2D. - 3D volume: See SepFilter3D below (3D-only op).
- 2D batch (direct cv2 extension):
-
Erode / Dilate / MorphologyEx — cv2 analogues:
cv2.erode,cv2.dilate,cv2.morphologyEx
We need two variants:- 2D batch (direct cv2 extension):
(N, H, W, C)→(N, H, W, C). Morphological ops per frame with 2D structuring element. Extends cv2 erode/dilate/morphologyEx. - 3D volume: See Erode3D / Dilate3D / MorphologyEx3D below (3D-only ops).
- 2D batch (direct cv2 extension):
3D-only operations (no cv2 equivalent)
These operate on volumes (D, H, W, C) and have no 2D cv2 counterpart; they are the natural 3D extensions of the 2D ops above.
- Blur3D — Box filter in 3D.
(D, H, W, C)→(D, H, W, C). Rectangular 3D kernel. - GaussianBlur3D — Gaussian blur in 3D.
(D, H, W, C)→(D, H, W, C). Kernel size and/or sigma per axis. - MedianBlur3D — Median filter in 3D.
(D, H, W, C)→(D, H, W, C). - Filter3D — 3D convolution with a 3D kernel.
(D, H, W, C)→(D, H, W, C). Custom 3D kernel. - SepFilter3D — Separable 3D filter (e.g. kernelD, kernelH, kernelW).
(D, H, W, C)→(D, H, W, C). More efficient than full 3D kernel when separable. - Erode3D / Dilate3D / MorphologyEx3D — Morphological ops with a 3D structuring element.
(D, H, W, C)→(D, H, W, C).
Why this matters
- Videos
(N, H, W, C): Today we loop over N and callcv2.flip,cv2.resize,cv2.warpAffine, etc. per frame. That loses SIMD/GPU batch optimizations and adds Python overhead. - Volumes
(D, H, W, C): Same for 3D data: we loop over D and call the same cv2 APIs. We need one call that applies along the leading dimension. Specifying both N (video) and D (volume) makes it clear we need the same semantics for both shapes. - Consistency: All of the above operations are used in augmentation pipelines (e.g. AlbumentationsX). Having them in a single backend with consistent “shape we want” semantics would let us support video and volume augmentation without per-frame loops.
Summary table
2D batch = direct cv2 extension: (N, H, W, C) → (N, H', W', C) (or same spatial size). 3D volume = (D, H, W, C) → (D', H', W', C) (or same size).
| Operation | cv2 / current API | 2D batch (N,H,W,C) | 3D volume (D,H,W,C) |
|---|---|---|---|
| Flip | cv2.flip | Yes (flip along any axis) | Yes (axis 0,1,2 or combo, like numpy.flip) |
| Warp affine | cv2.warpAffine | (N,H,W,C)→(N,H',W',C) | (D,H,W,C)→(D',H',W',C) |
| Warp perspective | cv2.warpPerspective | (N,H,W,C)→(N,H',W',C) | (D,H,W,C)→(D',H',W',C) |
| Resize | cv2.resize | (N,H,W,C)→(N,H',W',C) | (D,H,W,C)→(D',H',W',C) (resize in D,H,W) |
| Remap | cv2.remap | Yes | Yes (3D coordinate maps) |
| Copy make border | cv2.copyMakeBorder | Yes (pad H,W) | Yes (pad D,H,W) |
| Blur | cv2.blur | Yes | Blur3D |
| Gaussian blur | cv2.GaussianBlur | Yes | GaussianBlur3D |
| Median blur | cv2.medianBlur | Yes | MedianBlur3D |
| Filter2D | cv2.filter2D | Yes | Filter3D |
| SepFilter2D | cv2.sepFilter2D | Yes | SepFilter3D |
| Erode / Dilate / MorphologyEx | cv2.erode, dilate, morphologyEx | Yes | Erode3D / Dilate3D / MorphologyEx3D |
3D-only (no cv2 equivalent): Blur3D, GaussianBlur3D, MedianBlur3D, Filter3D, SepFilter3D, Erode3D, Dilate3D, MorphologyEx3D — all on (D, H, W, C).
All 2D batch ops should accept single image (H, W, C) and video (N, H, W, C). All 3D ops should accept volume (D, H, W, C). Process in a single call without Python loops.
Can you contribute to the implementation?
- I can contribute
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct