Skip to content

Feature: Axis-aware / batched APIs to avoid Python loops #311

@ternaus

Description

@ternaus

Describe what you are looking for

Summary

We are evaluating NumKong for use in AlbumentationsX (image augmentation library). NumKong’s single-vector APIs are fast, but in real code we often need reductions along an axis or batched operations (e.g. per column, per row). Right now we have to loop in Python and call NumKong once per column/row, which loses to NumPy’s vectorized axis= support.

Unifying request: Expose the same operations you already have (sum, moments, min, max, argmin, argmax, scale, euclidean, dot), but with axis or batch semantics so a single NumKong call can process many vectors without a Python loop.

Below are the concrete APIs we need. Each follows the same pattern: “we need this operation over an axis or over a batch, not just on a single vector.”


1. Elementwise ops on 2D/3D arrays

Current: nk.scale(a, alpha=..., beta=...) requires 1D; we use scale(img.ravel(), ...).reshape(img.shape).

Request: Support 2D and 3D arrays (e.g. (H, W) or (H, W, C)) with elementwise semantics, so we don’t have to flatten/reshape. Same for other elementwise ops (e.g. blend, fma) if they are currently 1D-only.


2. Sum and moments along an axis

Current: nk.moments(nk.Tensor(x)) returns one (sum, sum_sq) for a 1D array. For 2D we either flatten (losing axis structure) or loop over columns.

Request:

  • Either Tensor.sum(axis=None | 0 | 1) with standard axis semantics,
  • and/or moments(..., axis=...) returning (sum, sum_sq) per column or per row for 2D arrays.

Example: x shape (n_samples, n_features)moments(x, axis=0) returns two arrays of shape (n_features,) (sum and sum_sq per feature). That would replace a Python loop over columns and make NumKong competitive for StandardScaler-style mean/var.


3. Min, max, argmin, argmax along an axis

Current: Tensor.minmax() and argmin/argmax() work on a single 1D array. For 2D we loop over columns.

Request:

  • Tensor.minmax(axis=0) (and optionally axis=1) on 2D arrays, returning four arrays of shape (n_cols,): min, argmin, max, argmax (same as np.min(x, axis=0), np.argmin(x, axis=0), etc.).
  • Or separate Tensor.argmin(axis=...) and Tensor.argmax(axis=...) with the same axis semantics.

This would help in peak-finding (e.g. keypoint recovery from distance maps) and other axis-wise reductions without Python loops.


4. Per-row L2 norms (batched euclidean from zero)

Current: nk.euclidean(a, b) for two 1D vectors. To get per-row norms we loop: for each row r, call nk.euclidean(r, zeros).

Request: A batched API, e.g. nk.row_norms(matrix) or nk.euclidean_norms(matrix), where matrix has shape (n, d) and the result has shape (n,) (L2 norm of each row). Alternatively, a batched “distance from zero” that returns the same.

Use case: normalizing rows (e.g. stain vectors in histology: vectors / np.sqrt(np.sum(vectors**2, axis=1, keepdims=True))).


5. Batched dot (matrix–vector products)

Current: nk.dot(a, b) for two 1D vectors. For “each row of A dotted with b” we loop over rows.

Request: e.g. nk.dot(A, b) with A shape (n, d) and b shape (d,) returning shape (n,), i.e. one dot product per row, in a single call. Same idea for other batch sizes as needed.


6. Documentation of shape/axis support

Current: It’s not always clear which functions accept only 1D and which (if any) support 2D and axis=. We hit “All tensors must be vectors” for scale on 2D.

Request: In the docs, for each function, clearly state accepted shapes (e.g. “1D only”, “1D or 2D with optional axis=…”). If axis/batch support is planned, a short note in the README or a “Roadmap” section would help downstream integrators plan.


Why this matters

We benchmarked NumKong vs NumPy/albucore. NumKong wins when we can call it once on a large buffer (e.g. scale on a flattened image, single euclidean). It loses when we have to loop (e.g. moments per column, argmin per column, euclidean per row). Providing axis-aware or batched versions of the same operations would let us use NumKong in many more hot paths without sacrificing performance to Python loop overhead.

Thank you for considering these requests.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions