Skip to content

Conversation

@Shaobo-Zhou
Copy link
Contributor

@Shaobo-Zhou Shaobo-Zhou commented Nov 26, 2025

Description

This PR introduces a shaped, step-wise reward signal for the RL-based compiler.

For the figures of merit expected_fidelity and estimated_success_probability, the reward is computed in two regimes:

  1. Exact regime (native + mapped circuits)
    If the circuit consists only of device-native gates and respects the device’s coupling map, the step reward is based on the change in the exact calibration-aware metric between successive steps.

  2. Approximate regime (non-native / unmapped circuits)
    If the circuit still contains non-native gates or violates the device topology, a conservative canonical cost model is used to approximate the expected fidelity and ESP. The intermediate reward is then derived from the change in this approximate metric.

Checklist:

  • The pull request only contains commits that are focused and relevant to this change.
  • I have added appropriate tests that cover the new/changed functionality.
  • I have updated the documentation to reflect these changes.
  • I have added entries to the changelog for any noteworthy additions, changes, fixes, or removals.
  • I have added migration instructions to the upgrade guide (if needed).
  • The changes follow the project's style guidelines and introduce no new warnings.
  • The changes are fully tested and pass the CI checks.
  • I have reviewed my own code changes.
Shaobo Zhou and others added 30 commits March 29, 2025 19:20
Update action space and feature space Update actions Update action space
Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations Remove example_test.py Remove example_test.py
Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations
Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>
Fix bugs Fix bugs Fix bugs
Fix windows runtime warning problem Fix windows runtime warning issue
@Shaobo-Zhou Shaobo-Zhou changed the title Add step-wise intermediate reward Add stepwise intermediate reward Nov 26, 2025
@Shaobo-Zhou Shaobo-Zhou changed the title Add stepwise intermediate reward Add stepwise intermediate reward for RL Nov 26, 2025
@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 90.36145% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mqt/predictor/rl/predictorenv.py 89.3% 19 Missing ⚠️
src/mqt/predictor/rl/cost_model.py 91.9% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Shaobo-Zhou Shaobo-Zhou marked this pull request as ready for review December 3, 2025 22:07
@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added device-aware canonical gate cost model with per-device cost tables.
    • Introduced dual-mode reward computation (exact/approximate) for enhanced flexibility.
    • Added intermediate rewards for RL reward design.
  • Bug Fixes

    • Fixed multiprocessing compatibility on Windows with Python 3.13+.
    • Improved idle-time decay calculations with discrete time unit handling.
  • Documentation

    • Updated setup guide with new benchmark configuration examples.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a device-aware canonical cost model and dual-path RL reward computation (exact vs approximate), integrates dt-based idle decay into ESP, introduces a Windows/Python 3.13 multiprocessing fallback, updates RL environment step/reset and layout handling, and adjusts docs/tests and packaging metadata (BenchmarkLevel -> INDEP, new pyproject sdist config).

Changes

Cohort / File(s) Change Summary
Changelog & Docs
CHANGELOG.md, docs/setup.md
Updated CHANGELOG entries and PR links; example in setup changed BenchmarkLevel.ALGBenchmarkLevel.INDEP.
Packaging
pyproject.toml
Added [tool.check-sdist] with sdist-only entry and a new warnings filter for tests.
ML predictor parallelism
src/mqt/predictor/ml/predictor.py
Added module-level NO_PARALLEL constant and conditional single-thread fallbacks for compile_training_circuits, generate_training_data, and train_random_forest_model to avoid Windows/Python 3.13 multiprocessing issues.
Reward idle-time scaling
src/mqt/predictor/reward.py
Incorporates discrete time unit dt into idle-time exponential decay for single-qubit delay gates in estimated_success_probability.
Utilities / Logging
src/mqt/predictor/utils.py
Replaced Windows RuntimeWarning with logger.info for timeout-not-supported path; removed warnings.warn usage.
RL cost model (new module)
src/mqt/predictor/rl/cost_model.py
New device-aware canonical cost tables (ibm_torino, ankaa_3, emerald) and APIs: get_cost_table, canonical_cost, estimate_counts, approx_expected_fidelity, approx_estimated_success_probability with fallbacks and clamping.
RL environment & reward refactor
src/mqt/predictor/rl/predictorenv.py
Major changes: new `calculate_reward(mode="auto"
Tests — RL compilation
tests/compilation/test_predictor_rl.py
Tests updated to use BenchmarkLevel.INDEP; added cost-model imports; new tests for unknown-device/gate fallback and calculate_reward branches; mapping checks via CheckMap.
Tests — ML device selection
tests/device_selection/test_predictor_ml.py
Switched several test cases from ALG to INDEP, adjusting generated benchmark inputs and prediction scenarios.
Tests — Hellinger distance
tests/hellinger_distance/test_estimated_hellinger_distance.py
Removed Windows-only warning branch, increased timeout, removed num_workers from calls and unified call signatures to match ML predictor changes.

Sequence Diagram

sequenceDiagram actor Agent as RL Agent participant Env as PredictorEnv participant Calc as calculate_reward() participant Cost as cost_model participant Dev as Device Averages Cache participant Reward as reward.py Agent->>Env: step(action) Env->>Calc: calculate_reward(mode="auto") -- pre-action alt auto → native & mapped Calc->>Dev: _ensure_device_averages_cached() Dev-->>Calc: device averages (p1,p2,tau1,tau2,tbar) Calc->>Reward: expected_fidelity / estimated_success_probability (device data) Reward-->>Calc: exact_reward else auto → fallback Calc->>Cost: approx_expected_fidelity / approx_estimated_success_probability (cost model) Cost-->>Calc: approx_reward end Env->>Env: apply action (via _apply_qiskit_action / pm_property_set) Env->>Calc: calculate_reward(mode=exact/approx) -- post-action Calc-->>Env: post_reward Env->>Env: delta = post_reward - pre_reward → shape & apply penalties Env-->>Agent: observation, reward (delta-shaped), done, info 
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • src/mqt/predictor/rl/predictorenv.py — reward flow, device-average extraction, new signatures and pm_property_set propagation.
    • src/mqt/predictor/rl/cost_model.py — correctness of canonical tables, gate-cost mapping, and ESP/fidelity approximations.
    • src/mqt/predictor/ml/predictor.py and tests/hellinger_distance/* — parallelism fallback impact and updated call signatures.
    • Tests added/modified for cost-model and calculate_reward branches (validate assertions and monkeypatches).

Poem

🐰
I nibble costs from gates so fine,
dual-path rewards now hop in line.
Idle time counts with dt's small beat,
Windows rests while threads take a seat.
Circuits map, rewards sing — what a treat!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add stepwise intermediate reward for RL' accurately captures the main objective of the PR, which is to introduce a shaped, step-wise reward signal for the RL-based compiler.
Description check ✅ Passed The PR description provides a clear summary of changes, explains the two reward regimes (exact and approximate), and includes the template checklist. However, none of the checklist items are marked as complete, and the description lacks explicit statements about the status of testing, documentation, and other checklist items.
Docstring Coverage ✅ Passed Docstring coverage is 85.29% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8259b4d and 4b5bf6f.

📒 Files selected for processing (1)
  • pyproject.toml (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-10T08:09:54.528Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core PR: 1246 File: bindings/CMakeLists.txt:0-0 Timestamp: 2025-10-10T08:09:54.528Z Learning: In the Munich Quantum Toolkit (MQT) Core project, scikit-build-core is configured with `wheel.install-dir = "mqt/core"` in pyproject.toml, which automatically prefixes all CMake `DESTINATION` paths with `mqt/core/` during wheel installation. Therefore, CMake install destinations are relative to the `mqt/core` package namespace, not `site-packages`. 

Applied to files:

  • pyproject.toml
📚 Learning: 2025-10-10T08:10:16.394Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core PR: 1246 File: test/python/na/test_na_fomac.py:35-0 Timestamp: 2025-10-10T08:10:16.394Z Learning: In the munich-quantum-toolkit/core repository, scikit-build-core is configured with `wheel.install-dir = "mqt/core"` in pyproject.toml, which means CMake `install()` commands with `DESTINATION <path>` install files relative to `mqt/core/` in the wheel, making them accessible via `files("mqt.core").joinpath("<path>")`. 

Applied to files:

  • pyproject.toml
📚 Learning: 2025-12-02T10:08:36.022Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core-plugins-catalyst PR: 1 File: .github/workflows/cd.yml:40-48 Timestamp: 2025-12-02T10:08:36.022Z Learning: In munich-quantum-toolkit workflows, the reusable-python-packaging-sdist.yml workflow uploads the sdist artifact with the name `cibw-sdist` (or `dev-cibw-sdist` for pull requests), which means it is covered by the `cibw-*` glob pattern when downloading artifacts for deployment. 

Applied to files:

  • pyproject.toml
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
  • GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
  • GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04
🔇 Additional comments (2)
pyproject.toml (2)

130-130: Verify the warning filter pattern matches the actual cost model warning.

The new warning filter suppresses "No canonical cost table defined for device..." messages. Ensure that:

  1. The canonical cost model code emits this exact warning message during tests.
  2. The regex pattern correctly matches all expected variations.
  3. This suppression is scoped appropriately (only warnings during tests, not production).

97-98: Verify check-sdist tool integration in build pipeline.

The new [tool.check-sdist] configuration correctly marks _version.py as sdist-only, which aligns with the hatch-vcs versioning strategy (the version file is auto-generated per build). Ensure that check-sdist is properly integrated into your build and CI workflows to validate this constraint during the build process.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad5ae0b and 0c42b41.

📒 Files selected for processing (13)
  • CHANGELOG.md (3 hunks)
  • docs/setup.md (1 hunks)
  • noxfile.py (3 hunks)
  • pyproject.toml (1 hunks)
  • src/mqt/predictor/ml/predictor.py (5 hunks)
  • src/mqt/predictor/reward.py (1 hunks)
  • src/mqt/predictor/rl/cost_model.py (1 hunks)
  • src/mqt/predictor/rl/parsing.py (1 hunks)
  • src/mqt/predictor/rl/predictorenv.py (10 hunks)
  • src/mqt/predictor/utils.py (1 hunks)
  • tests/compilation/test_predictor_rl.py (5 hunks)
  • tests/device_selection/test_predictor_ml.py (3 hunks)
  • tests/hellinger_distance/test_estimated_hellinger_distance.py (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-10-14T14:37:38.047Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/yaqs PR: 212 File: CHANGELOG.md:12-15 Timestamp: 2025-10-14T14:37:38.047Z Learning: In the munich-quantum-toolkit/yaqs project, changelog entries follow the template: "- $TITLE ([#$NUMBER]($URL)) ([**AUTHOR**](https://github.com/$AUTHOR))". Issue references should not be included in changelog entries; the PR number is sufficient for traceability. 

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-11-05T09:23:46.540Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core PR: 1287 File: src/qdmi/dd/Device.cpp:492-521 Timestamp: 2025-11-05T09:23:46.540Z Learning: In the munich-quantum-toolkit/core repository, the `qasm3::Importer::imports()` function is backwards compatible with OpenQASM 2 programs. Therefore, it can be used to parse both QASM2 and QASM3 program formats without requiring separate importers for each version. 

Applied to files:

  • src/mqt/predictor/rl/parsing.py
📚 Learning: 2025-11-04T14:26:25.420Z
Learnt from: marcelwa Repo: munich-quantum-toolkit/core PR: 1243 File: test/python/qdmi/qiskit/conftest.py:11-19 Timestamp: 2025-11-04T14:26:25.420Z Learning: In the munich-quantum-toolkit/core repository, Qiskit is always available as a dependency during testing, so import guards for qiskit-dependent imports in test files (e.g., test/python/qdmi/qiskit/*.py) are not necessary. 

Applied to files:

  • src/mqt/predictor/rl/parsing.py
  • src/mqt/predictor/rl/predictorenv.py
📚 Learning: 2025-10-10T08:09:54.528Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core PR: 1246 File: bindings/CMakeLists.txt:0-0 Timestamp: 2025-10-10T08:09:54.528Z Learning: In the Munich Quantum Toolkit (MQT) Core project, scikit-build-core is configured with `wheel.install-dir = "mqt/core"` in pyproject.toml, which automatically prefixes all CMake `DESTINATION` paths with `mqt/core/` during wheel installation. Therefore, CMake install destinations are relative to the `mqt/core` package namespace, not `site-packages`. 

Applied to files:

  • pyproject.toml
📚 Learning: 2025-10-10T08:10:16.394Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core PR: 1246 File: test/python/na/test_na_fomac.py:35-0 Timestamp: 2025-10-10T08:10:16.394Z Learning: In the munich-quantum-toolkit/core repository, scikit-build-core is configured with `wheel.install-dir = "mqt/core"` in pyproject.toml, which means CMake `install()` commands with `DESTINATION <path>` install files relative to `mqt/core/` in the wheel, making them accessible via `files("mqt.core").joinpath("<path>")`. 

Applied to files:

  • pyproject.toml
📚 Learning: 2025-12-02T10:08:36.022Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core-plugins-catalyst PR: 1 File: .github/workflows/cd.yml:40-48 Timestamp: 2025-12-02T10:08:36.022Z Learning: In munich-quantum-toolkit workflows, the reusable-python-packaging-sdist.yml workflow uploads the sdist artifact with the name `cibw-sdist` (or `dev-cibw-sdist` for pull requests), which means it is covered by the `cibw-*` glob pattern when downloading artifacts for deployment. 

Applied to files:

  • pyproject.toml
📚 Learning: 2025-11-27T16:58:08.564Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/ddsim PR: 720 File: noxfile.py:57-60 Timestamp: 2025-11-27T16:58:08.564Z Learning: In Nox sessions, when using `session.install()` to install a tool and then `session.run(..., external=True)` to execute it, this pattern is valid because the session's virtual environment is activated and its bin directory is on PATH. The `external=True` flag allows calling executables on PATH, which includes the activated venv's bin directory, so it will find tools installed via `session.install()`. 

Applied to files:

  • noxfile.py
🧬 Code graph analysis (3)
tests/hellinger_distance/test_estimated_hellinger_distance.py (1)
src/mqt/predictor/ml/predictor.py (2)
  • compile_training_circuits (206-240)
  • generate_training_data (242-300)
src/mqt/predictor/reward.py (2)
tests/hellinger_distance/test_estimated_hellinger_distance.py (1)
  • device (48-50)
tests/compilation/test_reward.py (1)
  • device (38-40)
tests/compilation/test_predictor_rl.py (2)
src/mqt/predictor/rl/cost_model.py (2)
  • canonical_cost (161-173)
  • get_cost_table (140-158)
src/mqt/predictor/rl/predictorenv.py (1)
  • calculate_reward (272-355)
🪛 Ruff (0.14.7)
src/mqt/predictor/rl/predictorenv.py

210-210: Logging statement uses f-string

(G004)


241-241: Logging statement uses f-string

(G004)


269-269: Unused noqa directive (non-enabled: SLF001)

Remove unused noqa directive

(RUF100)


666-668: try-except-continue detected, consider logging the exception

(S112)


666-666: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (19)
noxfile.py (2)

21-21: LGTM!

The pathlib import is necessary for the filesystem operations in the new _cleanup function.


112-113: LGTM!

The conditional cleanup call is appropriately placed and only runs in CI environments to save disk space. The pattern is consistent with other CI checks in the file.

CHANGELOG.md (1)

14-15: LGTM! Changelog entries follow the project template.

The changelog entries, PR links, and contributor reference are correctly formatted and consistent with the project's changelog template. Based on learnings, the format matches the expected template.

Also applies to: 49-50, 68-68

pyproject.toml (1)

97-98: LGTM! Packaging configuration correctly added.

The check-sdist configuration correctly specifies that _version.py should only be present in source distributions (sdist), which is the expected behavior for version files generated by hatch-vcs during the build process.

src/mqt/predictor/utils.py (1)

54-54: LGTM! Appropriate logging level for platform limitation.

The change from warnings.warn with RuntimeWarning to logger.info is appropriate. Since running without timeout on Windows is a known platform limitation rather than an error condition, an informational log message is more suitable than a warning.

tests/device_selection/test_predictor_ml.py (1)

46-46: LGTM! Test benchmark level updated consistently.

The change from BenchmarkLevel.ALG to BenchmarkLevel.INDEP is consistent with the PR objective and changelog entry about updating the test circuit level for the RL predictor. All test cases in this file have been updated consistently.

Also applies to: 66-66, 96-96

src/mqt/predictor/ml/predictor.py (4)

64-64: LGTM! Platform-specific workaround for multiprocessing issue.

The NO_PARALLEL constant correctly identifies the Windows + Python 3.13+ combination where joblib's "loky" backend is broken. This is a well-documented issue with the missing _posixsubprocess module.


232-235: LGTM! Clear documentation and appropriate fallback.

The comment clearly explains the Windows/Python 3.13 multiprocessing limitation, and the fallback to single-threaded execution (num_jobs=1) is the appropriate workaround for this platform issue.


275-276: LGTM! Consistent application of multiprocessing workaround.

The same NO_PARALLEL conditional is correctly applied to the generate_training_data method, ensuring consistent behavior across all parallel execution paths.


415-418: LGTM! Multiprocessing workaround applied to ML training.

The NO_PARALLEL conditional is correctly applied to the GridSearchCV training, ensuring the RandomForest model training also respects the Windows/Python 3.13 limitation.

docs/setup.md (1)

112-112: LGTM! Documentation updated to reflect new benchmark level.

The documentation example has been updated from BenchmarkLevel.ALG to BenchmarkLevel.INDEP, which is consistent with the changes in test files and the changelog entry about updating the test circuit level for the RL predictor.

tests/compilation/test_predictor_rl.py (3)

46-54: LGTM!

The switch from BenchmarkLevel.ALG to BenchmarkLevel.INDEP aligns with the PR's focus on device-independent circuits for RL training. The test logic remains correct.


108-115: LGTM!

Good addition of mapping verification. The test now validates both native gates and proper coupling map adherence, which is essential for the new dual-path reward system.


156-167: LGTM!

Comprehensive test coverage for cost model fallback behavior. The identity check (is TORINO_CANONICAL_COSTS) correctly verifies the exact table is returned rather than a copy.

src/mqt/predictor/rl/cost_model.py (2)

1-35: LGTM!

Well-documented module header and type definitions. The CanonicalCostTable type alias is appropriately defined as a Mapping for read-only access.


211-237: LGTM!

The approximate fidelity calculation is correct. The defensive max(n_1q, 0) guards and final clamping to [0, 1] ensure numerical robustness.

src/mqt/predictor/rl/predictorenv.py (3)

179-188: LGTM!

Good addition of instance variables for the dual-path reward system. The device averages cache (_p1_avg, _p2_avg, etc.) and reward tracking variables (prev_reward, prev_reward_kind) are appropriately initialized.


731-740: LGTM!

The method correctly checks both native gate basis and coupling map compliance. The logic is clear and matches the determine_valid_actions_for_state pattern.


458-464: Undefined method fom_aware_compile and attribute max_iter.

The code references self.fom_aware_compile and self.max_iter, but their definitions cannot be verified without access to the codebase. If these are not defined in the class or inherited from a parent class, this will raise an AttributeError at runtime when a stochastic action is applied.

noxfile.py Outdated
def _cleanup(session: nox.Session) -> None:
"""Remove this session's virtualenv to save disk space in CI."""
version = session.python
if version != "3.13": # keep cache for last run
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded version check may break if test order changes.

The check if version != "3.13" assumes 3.13 is always the last test run. If PYTHON_ALL_VERSIONS is reordered or a newer version (e.g., 3.14) is added, this logic will fail to preserve the correct cache.

Consider dynamically determining the last version:

+ # Keep cache for the last Python version in the test matrix + last_version = PYTHON_ALL_VERSIONS[-1] - if version != "3.13": # keep cache for last run + if version != last_version: # keep cache for last run venv_dir = session.virtualenv.location
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if version != "3.13": # keep cache for last run
# Keep cache for the last Python version in the test matrix
last_version = PYTHON_ALL_VERSIONS[-1]
if version != last_version: # keep cache for last run
venv_dir = session.virtualenv.location
🤖 Prompt for AI Agents
In noxfile.py around line 67, the hardcoded check `if version != "3.13"` assumes 3.13 is always the last version and will break if PYTHON_ALL_VERSIONS is reordered or new versions are added; change the logic to compute the last version dynamically from PYTHON_ALL_VERSIONS (e.g., take the last element or compute the max/semver-aware last) and compare `version` against that computed value so the cache-preserve branch always targets the actual last configured Python version. 
noxfile.py Outdated
Comment on lines 73 to 76
gha_temp = pathlib.Path("/home/runner/work/_temp/setup-uv-cache")
if gha_temp.exists():
shutil.rmtree(gha_temp, ignore_errors=True)
session.log(f"Cleaned GitHub Actions uv temp cache at {gha_temp}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding a comment for GitHub Actions-specific cleanup.

The hardcoded path /home/runner/work/_temp/setup-uv-cache is specific to GitHub Actions and won't exist in other CI environments. While ignore_errors=True makes this safe, adding a comment would improve clarity.

+ # GitHub Actions-specific uv cache cleanup gha_temp = pathlib.Path("/home/runner/work/_temp/setup-uv-cache") if gha_temp.exists(): shutil.rmtree(gha_temp, ignore_errors=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
gha_temp = pathlib.Path("/home/runner/work/_temp/setup-uv-cache")
if gha_temp.exists():
shutil.rmtree(gha_temp, ignore_errors=True)
session.log(f"Cleaned GitHub Actions uv temp cache at {gha_temp}")
# GitHub Actions-specific uv cache cleanup
gha_temp = pathlib.Path("/home/runner/work/_temp/setup-uv-cache")
if gha_temp.exists():
shutil.rmtree(gha_temp, ignore_errors=True)
session.log(f"Cleaned GitHub Actions uv temp cache at {gha_temp}")
🤖 Prompt for AI Agents
noxfile.py around lines 73 to 76: the cleanup block removes a hardcoded GitHub Actions temp path which is GH‑Actions specific; add a one-line comment above this block indicating that the path is GitHub Actions specific (e.g., "Clean up GitHub Actions setup-uv-cache temp dir if present") so readers understand why that exact path is used and that it's safe in other environments due to ignore_errors=True. 
self.prev_reward = new_val
self.prev_reward_kind = new_kind

self.state._layout = self.layout # noqa: SLF001
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Unused noqa directive per static analysis.

The static analysis flags this noqa: SLF001 as unused. If SLF001 is not enabled in your linter configuration, the directive can be removed.

- self.state._layout = self.layout # noqa: SLF001 + self.state._layout = self.layout
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.state._layout = self.layout # noqa: SLF001
self.state._layout = self.layout
🧰 Tools
🪛 Ruff (0.14.7)

269-269: Unused noqa directive (non-enabled: SLF001)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
In src/mqt/predictor/rl/predictorenv.py around line 269, the trailing " # noqa: SLF001" on the assignment self.state._layout = self.layout is flagged as an unused/unsupported noqa directive; remove the " # noqa: SLF001" suffix so the line becomes a plain assignment, or if the linter actually requires suppression, replace it with a supported directive for your project's linter configuration — otherwise delete the directive to satisfy static analysis. 
Shaobo-Zhou and others added 4 commits December 3, 2025 23:18
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/mqt/predictor/rl/predictorenv.py (2)

269-269: Remove unused noqa directive.

Static analysis indicates SLF001 is not enabled, making this directive unnecessary.

- self.state._layout = self.layout # noqa: SLF001 + self.state._layout = self.layout

256-264: Clarify symmetric reward scaling intent.

Both positive and negative delta branches multiply by reward_scale, while no_effect_penalty is only used for zero delta. The naming suggests asymmetric behavior but implementation is symmetric. Consider renaming for clarity:

- self.reward_scale = 1 - self.no_effect_penalty = 0.001 + self.delta_scale = 1 + self.no_progress_penalty = 0.001

And update usage accordingly. This makes the symmetric scaling intent explicit.

src/mqt/predictor/rl/cost_model.py (1)

300-301: Remove redundant initialization of t_hat.

Line 300 initializes t_hat = 0.0 but it's immediately overwritten on line 301. This is dead code.

- t_hat = 0.0 t_hat = (n_1q * tau1_avg + n_2q * tau2_avg) / k_eff
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c42b41 and 8259b4d.

📒 Files selected for processing (5)
  • src/mqt/predictor/reward.py (1 hunks)
  • src/mqt/predictor/rl/cost_model.py (1 hunks)
  • src/mqt/predictor/rl/predictorenv.py (10 hunks)
  • tests/compilation/test_predictor_rl.py (5 hunks)
  • tests/hellinger_distance/test_estimated_hellinger_distance.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-01T15:57:31.153Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/core PR: 1283 File: src/qir/runtime/QIR.cpp:196-201 Timestamp: 2025-11-01T15:57:31.153Z Learning: In the QIR runtime (src/qir/runtime/QIR.cpp), the PRX gate (__quantum__qis__prx__body) is an alias for the R gate (Phased X-Rotation) and should call runtime.apply<qc::R>(theta, phi, qubit), not runtime.apply<qc::RX>() which is a single-parameter rotation gate. 

Applied to files:

  • src/mqt/predictor/rl/cost_model.py
📚 Learning: 2025-10-09T13:20:11.483Z
Learnt from: DRovara Repo: munich-quantum-toolkit/core PR: 1108 File: mlir/test/Dialect/MQTOpt/Transforms/lift-measurements.mlir:269-288 Timestamp: 2025-10-09T13:20:11.483Z Learning: In the MQT MLIR dialect, the `rz` gate should not be included in the `DIAGONAL_GATES` set for the `ReplaceBasisStateControlsWithIfPattern` because its operator matrix does not have the required shape | 1 0 | / | 0 x | for the targets-as-controls optimization. It is only included in `LiftMeasurementsAboveGatesPatterns` where the matrix structure requirement differs. 

Applied to files:

  • src/mqt/predictor/rl/cost_model.py
📚 Learning: 2025-11-27T21:26:39.677Z
Learnt from: burgholzer Repo: munich-quantum-toolkit/qmap PR: 846 File: python/mqt/qmap/plugins/qiskit/sc/load_calibration.py:34-34 Timestamp: 2025-11-27T21:26:39.677Z Learning: In the qmap project, the Ruff linter has the "PL" (pylint) rule category enabled, which includes PLC0415 (import-outside-top-level). Therefore, `# noqa: PLC0415` directives on lazy imports are appropriate and necessary, not unused. 

Applied to files:

  • src/mqt/predictor/rl/predictorenv.py
📚 Learning: 2025-11-04T14:26:25.420Z
Learnt from: marcelwa Repo: munich-quantum-toolkit/core PR: 1243 File: test/python/qdmi/qiskit/conftest.py:11-19 Timestamp: 2025-11-04T14:26:25.420Z Learning: In the munich-quantum-toolkit/core repository, Qiskit is always available as a dependency during testing, so import guards for qiskit-dependent imports in test files (e.g., test/python/qdmi/qiskit/*.py) are not necessary. 

Applied to files:

  • src/mqt/predictor/rl/predictorenv.py
🧬 Code graph analysis (4)
tests/hellinger_distance/test_estimated_hellinger_distance.py (2)
src/mqt/predictor/ml/predictor.py (2)
  • compile_training_circuits (206-240)
  • generate_training_data (242-300)
tests/device_selection/test_predictor_ml.py (2)
  • path_uncompiled_circuits (27-29)
  • path_compiled_circuits (33-35)
tests/compilation/test_predictor_rl.py (2)
src/mqt/predictor/rl/cost_model.py (2)
  • canonical_cost (171-183)
  • get_cost_table (150-168)
src/mqt/predictor/rl/predictorenv.py (1)
  • calculate_reward (272-356)
src/mqt/predictor/rl/predictorenv.py (5)
src/mqt/predictor/rl/actions.py (4)
  • CompilationOrigin (92-98)
  • DeviceDependentAction (136-146)
  • PassType (101-110)
  • Action (114-127)
src/mqt/predictor/rl/cost_model.py (2)
  • approx_estimated_success_probability (250-308)
  • approx_expected_fidelity (221-247)
src/mqt/predictor/rl/helper.py (1)
  • create_feature_dict (72-94)
src/mqt/predictor/rl/parsing.py (1)
  • postprocess_vf2postlayout (229-250)
src/mqt/predictor/utils.py (1)
  • calc_supermarq_features (93-145)
src/mqt/predictor/reward.py (2)
tests/hellinger_distance/test_estimated_hellinger_distance.py (1)
  • device (48-50)
tests/compilation/test_reward.py (1)
  • device (38-40)
🪛 Ruff (0.14.7)
src/mqt/predictor/rl/predictorenv.py

210-210: Logging statement uses f-string

(G004)


241-241: Logging statement uses f-string

(G004)


269-269: Unused noqa directive (non-enabled: SLF001)

Remove unused noqa directive

(RUF100)

🔇 Additional comments (13)
tests/hellinger_distance/test_estimated_hellinger_distance.py (1)

219-226: LGTM! API calls correctly updated to use keyword arguments.

The compile_training_circuits and generate_training_data calls now properly use keyword arguments, aligning with the updated function signatures in src/mqt/predictor/ml/predictor.py. The removal of num_workers is consistent with the new NO_PARALLEL constant handling parallelism internally.

src/mqt/predictor/rl/cost_model.py (4)

41-147: Well-structured canonical cost tables with clear documentation.

The per-device cost tables are well-organized with clear comments explaining the decomposition rationale. The registry pattern with DEVICE_CANONICAL_COSTS provides clean extensibility.


150-168: Safe fallback behavior with appropriate warnings.

The unknown-device fallback to Torino with both a warning and log message ensures visibility while maintaining functionality. The stacklevel=3 is appropriate for the call chain.


186-218: Robust gate cost estimation with conservative fallbacks.

The iteration pattern correctly uses the modern CircuitInstruction access. The conservative fallback for unknown gates (3×1q or 1×2q+4×1q) is reasonable for approximation purposes.


221-247: Clean implementation of approximate fidelity and ESP calculations.

Both functions properly clamp outputs to [0, 1] and handle edge cases. The ESP model's idle-time penalty with liveness modulation is a reasonable approximation approach.

Also applies to: 250-308

src/mqt/predictor/reward.py (1)

206-212: Proper handling of discrete time unit with safe fallback.

The device.dt or 1.0 fallback correctly handles backends without discrete-time calibration, preventing TypeError when device.dt is None. The dt-scaled duration is appropriately used in the exponential decay calculation.

tests/compilation/test_predictor_rl.py (3)

156-167: Good coverage for unknown-device fallback and unknown-gate defaults.

The test properly verifies that unknown devices trigger a UserWarning and fall back to TORINO_CANONICAL_COSTS, and that unknown gates return (0, 0). This aligns with the behavior defined in get_cost_table and canonical_cost.


170-219: Comprehensive test coverage for dual-path reward calculation.

The test covers all three modes (exact, approx, auto) for ESP and verifies that critical_depth always returns exact results. The addition of the auto-exact path test (lines 197-200) addresses the previous review suggestion. The monkeypatch approach for forcing the approx path is appropriate.


108-115: Good addition of mapping validation.

The CheckMap assertion ensures the compiled circuit respects the device's coupling map, complementing the existing native-gates check.

src/mqt/predictor/rl/predictorenv.py (4)

272-356: Well-designed dual-path reward calculation with proper mode handling.

The calculate_reward method cleanly separates exact (calibration-based) and approximate (cost-model-based) paths. The auto mode correctly uses _is_native_and_mapped to determine the appropriate path. The device_id parameter is now correctly passed to both approx_expected_fidelity and approx_estimated_success_probability.


608-730: Thorough device calibration caching with proper error handling.

The _ensure_device_averages_cached method properly aggregates 1q/2q error rates, durations, and coherence times. Key strengths:

  • Handles dt scaling for durations correctly
  • Tries flipped orientation for uni-directional couplings (lines 690-691)
  • Uses specific exceptions (KeyError, AttributeError) per past review feedback
  • Provides informative error messages when calibration data is missing

732-741: Clean helper for native/mapped circuit detection.

The _is_native_and_mapped method properly combines both checks using GatesInBasis and CheckMap passes, returning a boolean suitable for auto-mode reward selection.


190-270: Solid step-wise reward shaping implementation.

The refactored step() method implements the shaped reward signal as described in the PR objectives:

  • Pre-action reward evaluation establishes baseline
  • Post-action delta computation drives intermediate rewards
  • The prev_kind == "approx" and new_kind == "exact" check (line 253-254) correctly handles regime transitions
  • Terminal actions use exact metrics for final reward

One note: The logging statements at lines 210 and 241 use f-strings (flagged by Ruff G004), but this is a stylistic preference and doesn't affect functionality.

passes = action.transpile_pass(
self.device.operation_names,
CouplingMap(self.device.build_coupling_map()) if self.layout else None,
pm_property_set: PropertySet | None = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Type annotation mismatch for pm_property_set.

The variable is initialized as an empty dict ({}) but the type hint in _handle_qiskit_layout_postprocessing expects dict[str, Any] | None. While this works at runtime, consider initializing as None for consistency with the type, or use explicit typing:

- pm_property_set: PropertySet | None = {} + pm_property_set: dict[str, Any] | None = None

Then ensure downstream code handles the None case (which it already does at line 501).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pm_property_set: PropertySet | None = {}
pm_property_set: dict[str, Any] | None = None
🤖 Prompt for AI Agents
In src/mqt/predictor/rl/predictorenv.py around line 458, pm_property_set is annotated as PropertySet | None but initialized to an empty dict; change the initialization to None and/or add an explicit type annotation (e.g., pm_property_set: PropertySet | None = None) so the runtime value matches the declared type, and keep existing downstream None-handling (see line 501). 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant