Skip to content

Add tests, docstrings and validation to metrics.py#5284

Open
us wants to merge 1 commit intogoogle:mainfrom
us:worktree-metrics-tests-docstrings
Open

Add tests, docstrings and validation to metrics.py#5284
us wants to merge 1 commit intogoogle:mainfrom
us:worktree-metrics-tests-docstrings

Conversation

@us
Copy link
Contributor

@us us commented Feb 26, 2026

What does this PR do?

Addresses the TODO at flax/nnx/training/metrics.py line 26 (# TODO: add tests and docstrings) and the MultiMetric validation TODOs at lines 389 and 410.

Changes

flax/nnx/training/metrics.py

  • Add docstring to Statistics dataclass (the only public class missing one)
  • Add reserved name validation in MultiMetric.__init__ — prevents metric names that shadow methods (reset, update, compute, split) or internal attributes (_metric_names, _expected_kwargs)
  • Add unmatched kwarg detection in MultiMetric.update — catches typos by validating keyword arguments against the expected set derived from underlying metrics via inspect.signature
  • Update MultiMetric.update docstring to document the new TypeError behavior

tests/nnx/metrics_test.py — 25 new tests across 4 new test classes:

  • TestAverage: initial NaN, single/multiple batch, reset, custom argname, missing argname, scalar float/int
  • TestWelford: multiple batches (mean + std + SEM), reset (all 3 stats), custom argname, missing argname
  • TestAccuracy: int64 label conversion, invalid float labels, threshold type error
  • TestMultiMetric: reserved name errors (parameterized over all 4), internal name errors (parameterized over 2), unmatched kwarg with guard assertion, compute dict structure, split/merge preservation, validation-disabled-with-var-keyword

Checklist

  • This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other checks if that's the case).
  • This change is discussed in a Github issue/discussion (please add a link).
  • The documentation and docstrings adhere to the documentation guidelines.
  • This change includes necessary high-coverage tests. (No quality testing = no merge!)
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flax/nnx/training/metrics.py module by integrating crucial validation mechanisms and expanding its test coverage. The changes aim to improve the module's reliability and developer experience by catching common errors early, ensuring that metric definitions and updates adhere to expected patterns, and providing clear documentation for key data structures.

Highlights

  • Docstring and Validation for metrics.py: Added a comprehensive docstring to the Statistics dataclass and implemented robust validation checks within the MultiMetric class to prevent naming conflicts and detect unexpected keyword arguments.
  • Reserved Name Validation: Introduced validation in MultiMetric.__init__ to prevent users from defining metrics with names that conflict with internal methods or attributes (e.g., 'reset', 'update', 'compute').
  • Keyword Argument Validation: Implemented dynamic validation in MultiMetric.update to detect and raise TypeError for unexpected keyword arguments, improving error detection for typos or incorrect usage.
  • Extensive Unit Tests: Added 25 new unit tests across four new test classes (TestAverage, TestWelford, TestAccuracy, TestMultiMetric) to ensure the correctness and robustness of the metrics.py module, covering various scenarios including validation logic.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • flax/nnx/training/metrics.py
    • Imported the inspect module for dynamic signature introspection.
    • Defined _MULTIMETRIC_RESERVED_NAMES to list keywords that cannot be used as metric names.
    • Added a detailed docstring to the Statistics dataclass.
    • Implemented validation in MultiMetric.__init__ to check for reserved metric names, raising a ValueError if a conflict is found.
    • Introduced _expected_kwargs attribute in MultiMetric to store keyword arguments expected by underlying metrics' update methods, dynamically determined using inspect.signature.
    • Added logic to MultiMetric.update to raise a TypeError if unexpected keyword arguments are provided, enhancing error detection.
    • Updated the docstring for MultiMetric.update to reflect the new TypeError behavior.
  • tests/nnx/metrics_test.py
    • Added TestAverage class with tests for initial NaN state, single/multiple batch updates, reset functionality, custom argument names, missing argument names, and scalar float/integer inputs.
    • Added TestWelford class with tests for multiple batch updates (mean, standard deviation, SEM), reset functionality, custom argument names, and missing argument names.
    • Added new tests to TestAccuracy for handling int64 labels, invalid float label dtypes, and TypeError for incorrect threshold types.
    • Added TestMultiMetric class with tests for ValueError when using reserved names, TypeError for unmatched keyword arguments, correct dictionary structure from compute, preservation of state after split/merge, and validation being correctly disabled when var_keyword arguments are present.
Activity
  • The author addressed the TODO at flax/nnx/training/metrics.py line 26 regarding adding tests and docstrings.
  • The author addressed the MultiMetric validation TODOs at lines 389 and 410.
  • The author confirmed adherence to documentation guidelines.
  • The author confirmed the inclusion of necessary high-coverage tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves flax/nnx/training/metrics.py by adding comprehensive tests, docstrings, and validation logic. The added validation in MultiMetric prevents the use of reserved names and catches typos in keyword arguments passed to update, which is a great enhancement for usability. The new tests are thorough and cover a wide range of scenarios, including edge cases. The code is well-written and the changes are well-documented. I have one suggestion to make the keyword argument validation in MultiMetric even more robust for custom user-defined metrics.

@us us force-pushed the worktree-metrics-tests-docstrings branch from 8d076c5 to 20aecf5 Compare February 26, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant