Feature: Axis-aware / batched APIs to avoid Python loops

Describe what you are looking for

Summary

We are evaluating NumKong for use in AlbumentationsX (image augmentation library). NumKong’s single-vector APIs are fast, but in real code we often need reductions along an axis or batched operations (e.g. per column, per row). Right now we have to loop in Python and call NumKong once per column/row, which loses to NumPy’s vectorized axis= support.

Unifying request: Expose the same operations you already have (sum, moments, min, max, argmin, argmax, scale, euclidean, dot), but with axis or batch semantics so a single NumKong call can process many vectors without a Python loop.

Below are the concrete APIs we need. Each follows the same pattern: “we need this operation over an axis or over a batch, not just on a single vector.”

1. Elementwise ops on 2D/3D arrays

Current: nk.scale(a, alpha=..., beta=...) requires 1D; we use scale(img.ravel(), ...).reshape(img.shape).

Request: Support 2D and 3D arrays (e.g. (H, W) or (H, W, C)) with elementwise semantics, so we don’t have to flatten/reshape. Same for other elementwise ops (e.g. blend, fma) if they are currently 1D-only.

2. Sum and moments along an axis

Current: nk.moments(nk.Tensor(x)) returns one (sum, sum_sq) for a 1D array. For 2D we either flatten (losing axis structure) or loop over columns.

Request:

Either Tensor.sum(axis=None | 0 | 1) with standard axis semantics,
and/or moments(..., axis=...) returning (sum, sum_sq) per column or per row for 2D arrays.

Example: x shape (n_samples, n_features) → moments(x, axis=0) returns two arrays of shape (n_features,) (sum and sum_sq per feature). That would replace a Python loop over columns and make NumKong competitive for StandardScaler-style mean/var.

3. Min, max, argmin, argmax along an axis

Current: Tensor.minmax() and argmin/argmax() work on a single 1D array. For 2D we loop over columns.

Request:

Tensor.minmax(axis=0) (and optionally axis=1) on 2D arrays, returning four arrays of shape (n_cols,): min, argmin, max, argmax (same as np.min(x, axis=0), np.argmin(x, axis=0), etc.).
Or separate Tensor.argmin(axis=...) and Tensor.argmax(axis=...) with the same axis semantics.

This would help in peak-finding (e.g. keypoint recovery from distance maps) and other axis-wise reductions without Python loops.

4. Per-row L2 norms (batched euclidean from zero)

Current: nk.euclidean(a, b) for two 1D vectors. To get per-row norms we loop: for each row r, call nk.euclidean(r, zeros).

Request: A batched API, e.g. nk.row_norms(matrix) or nk.euclidean_norms(matrix), where matrix has shape (n, d) and the result has shape (n,) (L2 norm of each row). Alternatively, a batched “distance from zero” that returns the same.

Use case: normalizing rows (e.g. stain vectors in histology: vectors / np.sqrt(np.sum(vectors**2, axis=1, keepdims=True))).

5. Batched dot (matrix–vector products)

Current: nk.dot(a, b) for two 1D vectors. For “each row of A dotted with b” we loop over rows.

Request: e.g. nk.dot(A, b) with A shape (n, d) and b shape (d,) returning shape (n,), i.e. one dot product per row, in a single call. Same idea for other batch sizes as needed.

6. Documentation of shape/axis support

Current: It’s not always clear which functions accept only 1D and which (if any) support 2D and axis=. We hit “All tensors must be vectors” for scale on 2D.

Request: In the docs, for each function, clearly state accepted shapes (e.g. “1D only”, “1D or 2D with optional axis=…”). If axis/batch support is planned, a short note in the README or a “Roadmap” section would help downstream integrators plan.

Why this matters

We benchmarked NumKong vs NumPy/albucore. NumKong wins when we can call it once on a large buffer (e.g. scale on a flattened image, single euclidean). It loses when we have to loop (e.g. moments per column, argmin per column, euclidean per row). Providing axis-aware or batched versions of the same operations would let us use NumKong in many more hot paths without sacrificing performance to Python loop overhead.

Thank you for considering these requests.

Can you contribute to the implementation?

I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

I have searched the existing issues

Code of Conduct

I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Axis-aware / batched APIs to avoid Python loops #311

Describe what you are looking for

Summary

1. Elementwise ops on 2D/3D arrays

2. Sum and moments along an axis

3. Min, max, argmin, argmax along an axis

4. Per-row L2 norms (batched euclidean from zero)

5. Batched dot (matrix–vector products)

6. Documentation of shape/axis support

Why this matters

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Axis-aware / batched APIs to avoid Python loops #311

Description

Describe what you are looking for

Summary

1. Elementwise ops on 2D/3D arrays

2. Sum and moments along an axis

3. Min, max, argmin, argmax along an axis

4. Per-row L2 norms (batched euclidean from zero)

5. Batched dot (matrix–vector products)

6. Documentation of shape/axis support

Why this matters

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions