fix: add structured outputs schema logging for Anthropic and Gemini #3454

nirga · 2025-11-20T15:17:41Z

Summary

Adds support for logging the gen_ai.request.structured_output_schema attribute for Anthropic Claude and Google Gemini APIs, completing coverage across all major LLM providers.

Changes

Anthropic Claude

Added logging of output_format parameter with json_schema type
Supports Claude's new Structured Outputs feature (launched November 14, 2025)
Works with Sonnet 4.5 and Opus 4.1 models
Requires beta header: anthropic-beta: structured-outputs-2025-11-13
Implementation: packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

Google Gemini

Added logging of response_schema from generation_config parameter
Also checks for direct response_schema kwargs
Implementation: packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py

OpenAI

Already supported (no changes needed)
Uses existing implementation

Sample Apps

Added demonstration apps for all three providers:

packages/sample-app/sample_app/openai_structured_outputs_demo.py (tested ✅)
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py
packages/sample-app/sample_app/gemini_structured_outputs_demo.py

Testing

OpenAI sample app tested successfully and shows the gen_ai.request.structured_output_schema attribute being logged correctly.

Summary by CodeRabbit

New Features
- Added support for tracking structured output schemas in instrumentation for Anthropic and Google Generative AI.
- Added example applications demonstrating structured outputs with Anthropic, Google Gemini, and OpenAI models.
Tests
- Added test suite for Anthropic structured output instrumentation.
Chores
- Updated Anthropic test dependency to latest version.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Important

Adds structured output schema logging for Anthropic and Google Gemini APIs, with sample apps and tests.

Behavior:
- Adds logging of gen_ai.request.structured_output_schema for Anthropic and Google Gemini APIs.
- Anthropic: Logs output_format with json_schema type in span_utils.py.
- Google Gemini: Logs response_schema from generation_config or kwargs in span_utils.py.
Testing:
- Adds test_structured_outputs.py for Anthropic, currently skipped due to SDK version.
Sample Apps:
- Adds anthropic_structured_outputs_demo.py, gemini_structured_outputs_demo.py, and openai_structured_outputs_demo.py for demonstration.

^{This description was created by}^{for ca5f423. You can customize this summary. It will automatically update as commits are pushed.}

Add support for logging gen_ai.request.structured_output_schema attribute for Anthropic Claude and Google Gemini APIs, completing coverage across all major LLM providers. Changes: - Anthropic: Log output_format parameter with json_schema type Supports Claude's new structured outputs feature (launched Nov 2025) for Sonnet 4.5 and Opus 4.1 models - Gemini: Log response_schema from generation_config parameter Supports both generation_config.response_schema and direct response_schema kwargs - OpenAI: Already supported (no changes needed) Sample apps added to demonstrate structured outputs for all three providers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-11-20T15:17:54Z

Walkthrough

This PR extends OpenTelemetry instrumentation for multiple LLM providers (Anthropic, Google Generative AI, OpenAI) to track structured output schemas via a new span attribute, includes demo scripts and comprehensive test coverage, and updates dependencies.

Changes

Cohort / File(s)	Summary
Instrumentation: Structured Output Schema Support `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`, `packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py`	Added logic to extract and populate `LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA` attribute. Anthropic: handles `output_format` dict with type "json_schema" in `aset_input_attributes`. Google Generative AI: extracts from `generation_config.response_schema` or `response_schema` in kwargs in `set_model_request_attributes`, with error handling.
Demo Scripts: Structured Outputs Examples `packages/sample-app/sample_app/anthropic_structured_outputs_demo.py`, `packages/sample-app/sample_app/gemini_structured_outputs_demo.py`, `packages/sample-app/sample_app/openai_structured_outputs_demo.py`	Three new demonstration scripts showing structured output usage for each provider. Each loads environment, initializes Traceloop, defines a JSON schema (joke + rating), sends API request with schema, and prints response. OpenAI version includes a Pydantic `Joke` model.
Test Support: Structured Outputs Test Coverage `packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py`	New test module with constants `JOKE_SCHEMA` and `OUTPUT_FORMAT`, and three test functions validating structured outputs across legacy and event instrumentation modes; validates schema presence, span attributes, response content, and logging behavior.
Test Fixtures: HTTP Cassettes `packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/*`	Three VCR cassette YAML files recording HTTP interactions with the Anthropic API for structured output requests and responses.
Dependency Update `packages/opentelemetry-instrumentation-anthropic/pyproject.toml`	Bumped Anthropic test dependency constraint from `>=0.36.0` to `>=0.74.0`.

Sequence Diagram

sequenceDiagram participant App as Application participant Instr as Instrumentation<br/>(span_utils) participant LLMClient as LLM Client<br/>(Anthropic/Google/OpenAI) participant Span as Span Exporter App->>Instr: Call LLM with structured<br/>output schema activate Instr Instr->>Instr: Extract schema from<br/>output_format or<br/>generation_config Instr->>Span: Set LLM_REQUEST_<br/>STRUCTURED_OUTPUT_SCHEMA deactivate Instr Instr->>LLMClient: Forward API request activate LLMClient LLMClient-->>Instr: Response deactivate LLMClient Instr->>Span: Record span with<br/>schema attribute Span-->>App: Instrumented trace

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Span utilities logic: Two files with schema extraction logic that requires verification of correct path handling and error boundaries (anthropic aset_input_attributes vs. google set_model_request_attributes)
Test coverage: New test module with three test functions; requires validation of schema format expectations, cassette content alignment, and span attribute assertions
Demo scripts: Three similar demo scripts that should be reviewed for consistency in approach and correctness of API usage
Potential focus areas:
- Verify the schema JSON serialization approach in Anthropic span_utils (JSON.dumps vs. dict handling)
- Confirm Google Generative AI schema extraction covers both generation_config.response_schema and kwargs paths correctly
- Validate cassette HTTP recordings match the instrumentation expectations
- Check that demo scripts accurately reflect current SDK API signatures

Possibly related PRs

fix(anthropic): various fixes around tools parsing #3204: Modifies the same Anthropic span_utils.py file (aset_input_attributes) to add tool_use parsing and tooling-related attributes—shares instrumentation surface and may require coordination on span attribute ordering or conflict handling.

Suggested reviewers

doronkopit5

Poem

🐰 Schemas now shimmer in traces so bright,
Structured outputs dancing in telemetry light,
Anthropic, Google, and OpenAI play,
Each instrumented span reveals what they say!
Analytics hopping to new paradigm heights! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly summarizes the main change: adding structured outputs schema logging for Anthropic and Gemini, which aligns with the core modifications across both SDKs.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/structured-outputs-logging

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)

170-177: Consider logging structured_output_schema even when prompt capture is disabled

output_format handling sits under should_send_prompts(), so SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA won’t be set when prompt/content capture is turned off, even though this schema is typically configuration rather than user content. Consider moving this block outside the should_send_prompts() guard so the attribute is always populated when output_format is present, aligning with how other providers log this attribute.
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (1)
395-414: Avoid silent try/except/pass when serializing response_schema

Both blocks swallow all exceptions when calling json.dumps(...), which makes schema/serialization issues hard to debug and triggers Ruff warnings (S110, BLE001). Consider narrowing the exception type and logging instead of passing silently, e.g.:
- if generation_config and hasattr(generation_config, "response_schema"): - try: - _set_span_attribute( - span, - SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, - json.dumps(generation_config.response_schema), - ) - except Exception: - pass + if generation_config and hasattr(generation_config, "response_schema"): + try: + _set_span_attribute( + span, + SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, + json.dumps(generation_config.response_schema), + ) + except (TypeError, ValueError) as exc: + logger.debug( + "Failed to serialize generation_config.response_schema for span: %s", + exc, + ) @@ - if "response_schema" in kwargs: - try: - _set_span_attribute( - span, - SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, - json.dumps(kwargs.get("response_schema")), - ) - except Exception: - pass + if "response_schema" in kwargs: + try: + _set_span_attribute( + span, + SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, + json.dumps(kwargs.get("response_schema")), + ) + except (TypeError, ValueError) as exc: + logger.debug( + "Failed to serialize kwargs['response_schema'] for span: %s", + exc, + )
This keeps failures non-fatal while giving observability into bad schemas.

Please verify with your supported generation_config.response_schema / response_schema types that json.dumps(...) (or any custom encoder you choose) behaves as expected across the Google Generative AI SDK versions you intend to support.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between da7ec49 and 1de9ffa.

📒 Files selected for processing (5)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1 hunks)
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (1 hunks)
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1 hunks)
packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1 hunks)
packages/sample-app/sample_app/openai_structured_outputs_demo.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/sample-app/sample_app/gemini_structured_outputs_demo.py
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
packages/sample-app/sample_app/openai_structured_outputs_demo.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

🧬 Code graph analysis (5)

packages/sample-app/sample_app/gemini_structured_outputs_demo.py (3)

packages/traceloop-sdk/traceloop/sdk/__init__.py (2)

Traceloop (37-275)

init (49-206)

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1)

main (15-52)

packages/sample-app/sample_app/openai_structured_outputs_demo.py (1)

main (22-35)

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (3)

packages/traceloop-sdk/traceloop/sdk/__init__.py (2)

Traceloop (37-275)

init (49-206)

packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)

main (15-45)

packages/sample-app/sample_app/openai_structured_outputs_demo.py (1)

main (22-35)

packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (2)

packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (1)

_set_span_attribute (18-22)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/sample-app/sample_app/openai_structured_outputs_demo.py (3)

packages/traceloop-sdk/traceloop/sdk/__init__.py (2)

Traceloop (37-275)

init (49-206)

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1)

main (15-52)

packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)

main (15-45)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

set_span_attribute (21-25)

🪛 Flake8 (7.3.0)

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py

[error] 1-1: 'os' imported but unused

(F401)

packages/sample-app/sample_app/openai_structured_outputs_demo.py

[error] 4-4: 'opentelemetry.sdk.trace.export.ConsoleSpanExporter' imported but unused

(F401)

🪛 Ruff (0.14.5)

packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py

403-404: try-except-pass detected, consider logging the exception

(S110)

403-403: Do not catch blind exception: Exception

(BLE001)

413-414: try-except-pass detected, consider logging the exception

(S110)

413-413: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint

🔇 Additional comments (1)

packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)

1-49: Gemini structured outputs demo looks good

The demo cleanly configures the client from environment, defines a simple JSON schema, and uses GenerationConfig.response_schema consistently with the other providers. No changes needed from my side.

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py

packages/sample-app/sample_app/openai_structured_outputs_demo.py

Remove unused imports to fix flake8 lint errors: - Remove unused 'os' import from anthropic_structured_outputs_demo.py - Remove unused 'ConsoleSpanExporter' import from openai_structured_outputs_demo.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

packages/sample-app/sample_app/openai_structured_outputs_demo.py (1)
25-25: Consider aligning the prompt with other demos.

The prompt in this demo doesn't explicitly request a rating, while the Anthropic and Gemini demos both ask to "rate it." Although structured outputs will enforce the schema regardless, explicitly requesting the rating improves output quality and consistency across demos.
- messages=[{"role": "user", "content": "Tell me a joke about OpenTelemetry"}], + messages=[{"role": "user", "content": "Tell me a joke about OpenTelemetry and rate it from 1 to 10"}],

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1de9ffa and d6360b2.

📒 Files selected for processing (2)

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1 hunks)
packages/sample-app/sample_app/openai_structured_outputs_demo.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/sample-app/sample_app/openai_structured_outputs_demo.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR Repo: traceloop/openllmetry PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-17T15:06:48.109Z Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans

📚 Learning: 2025-08-17T15:06:48.109Z

Learnt from: CR Repo: traceloop/openllmetry PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-17T15:06:48.109Z Learning: For debugging OpenTelemetry spans, use ConsoleSpanExporter with Traceloop to print spans to console

Applied to files:

packages/sample-app/sample_app/openai_structured_outputs_demo.py

🧬 Code graph analysis (1)

packages/sample-app/sample_app/openai_structured_outputs_demo.py (3)

packages/traceloop-sdk/traceloop/sdk/__init__.py (2)

Traceloop (37-275)

init (49-206)

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1)

main (14-51)

packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)

main (15-45)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (6)

packages/sample-app/sample_app/openai_structured_outputs_demo.py (6)

1-5: LGTM! Unused import issue resolved.

The imports are clean and all used in the code. The previously flagged ConsoleSpanExporter import has been removed.

7-9: LGTM! Proper API key handling.

Environment variables are loaded correctly, and the API key is retrieved from the environment as per coding guidelines.

11-13: LGTM!

Traceloop initialization is correct with an appropriate app name for this demo.

16-18: LGTM!

The Pydantic model is well-defined for structured output validation.

37-38: LGTM!

Standard entry point implementation is correct.

23-27: Model and beta API endpoint verified as available; note known SDK parsing issues.

Verification confirms that gpt-4o-2024-08-06 is still available and actively supported by OpenAI (including for fine-tuning), and the client.beta.chat.completions.parse beta endpoint is available. However, the openai-python SDK has known integration bugs with parse() related to JSON validation and edge cases in parsed responses. Test your structured output handling thoroughly and monitor the openai-python repository for bug fixes.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed d6360b2 in 13 minutes and 44 seconds. Click for details.

Reviewed 21 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/sample_app/anthropic_structured_outputs_demo.py:1

Draft comment:
Good removal of unused 'os' import to keep the code clean.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/sample-app/sample_app/openai_structured_outputs_demo.py:4

Draft comment:
Removed unused 'ConsoleSpanExporter' import; this is a good cleanup.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_IqIYoUKp7bNNE3SH

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Add comprehensive test coverage for Anthropic structured outputs feature: - Three test scenarios: legacy attributes, with content events, without content - Tests verify gen_ai.request.structured_output_schema attribute is logged - Enhanced span_utils.py to handle both json_schema and json output formats Note: Tests are currently skipped as they require anthropic SDK >= 0.50.0 which supports the output_format parameter. The feature was announced in November 2025 but the SDK version (0.49.0) doesn't yet support it. Tests will be enabled once the SDK is updated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 1de9ffa in 89 minutes and 38 seconds. Click for details.

Reviewed 214 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:170

Draft comment:
Consider handling cases where the provided schema might not be JSON serializable. Logging or error handling would help diagnose issues.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is suggesting defensive programming for json.dumps(), but this appears to be speculative rather than identifying a real issue. The function is already wrapped with @dont_throw decorator which should handle exceptions. Additionally, the same pattern of calling json.dumps() without explicit try-catch is used throughout the file (lines 154, 167, 243, 311), so this would be an inconsistent suggestion unless applied everywhere. The comment doesn't point to a specific bug introduced by this change - it's more of a general code quality suggestion that could apply to many places in the codebase. According to the rules, speculative comments should be removed, and comments should only be kept if there's strong evidence of an issue. Could the schema contain non-serializable objects that would cause json.dumps() to fail? Perhaps the @dont_throw decorator doesn't provide adequate error visibility, and explicit logging would be better for debugging. Maybe this specific case is more prone to serialization issues than the other json.dumps() calls in the file. While it's theoretically possible for the schema to be non-serializable, the comment is speculative and doesn't provide evidence that this is a real issue. The @dont_throw decorator already provides error handling at the function level, and the same pattern is used consistently throughout the file. If this were a real concern, it would apply to all json.dumps() calls, not just this one. The comment doesn't identify a specific problem with the change. This comment should be deleted. It's a speculative suggestion about potential error handling that doesn't identify a specific issue with the code change. The function is already protected by the @dont_throw decorator, and the same json.dumps() pattern is used consistently throughout the file without additional error handling.

2.

packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py:395

Draft comment:
If both generation_config.response_schema and kwargs['response_schema'] are provided, the latter overwrites the former. Verify if this override behavior is intended.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to verify if the override behavior is intended, which is against the rules. It does not provide a specific suggestion or ask for a test to be written. Therefore, it should be removed.

3.

packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py:396

Draft comment:
Consider logging exceptions in the try/except blocks when setting the structured output schema to aid future debugging.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This comment is about code that was added in the diff (lines 395-414). It's suggesting a code quality improvement - adding logging to exception handlers. The file already has logging infrastructure in place and uses it elsewhere (lines 70, 118 show similar patterns with logger.warning). The suggestion is actionable and clear. However, I need to consider the rules: "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This seems to fit that criteria. But I also need to consider if this is "obvious or unimportant" - the code currently silently swallows exceptions, which could make debugging harder. The comment is making a reasonable suggestion for better observability. However, the pattern of silent exception handling might be intentional here (similar to the @dont_throw decorator used elsewhere). Looking more carefully, there are other places in the code with similar bare exception handlers that don't log (lines 43-44 in _is_image_part). This suggests the silent exception handling might be a deliberate pattern in this codebase. The silent exception handling might be intentional and consistent with other patterns in the codebase (e.g., _is_image_part function). The comment is suggesting a style preference rather than fixing a clear bug. Without knowing if there's a specific debugging issue or if this pattern is deliberately chosen for performance or other reasons, this might be speculative or stylistic rather than clearly necessary. While there are some instances of silent exception handling in the codebase, there are also examples where exceptions ARE logged (lines 70, 118). The new code is handling JSON serialization which could fail in various ways, and logging would help understand why. However, this is still a "nice to have" suggestion rather than a clear bug fix. The rule states comments should be about clear code changes required, not suggestions for improvements unless they're clearly actionable quality refactors. This is a reasonable code quality suggestion, but it's not a required change - it's more of a "nice to have" improvement. The codebase shows mixed patterns (some exceptions logged, some not), and the silent exception handling might be intentional. Given the rule to only keep comments with STRONG EVIDENCE they are correct and necessary, and that by default we should assume comments are not useful, this should be deleted.

Workflow ID: wflow_QU6hLC1XyLCmcwJo

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed ca5f423 in 34 minutes and 3 seconds. Click for details.

Reviewed 222 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:178

Draft comment:
Consider caching the result of output_format.get('json_schema') in a variable for clarity before accessing the 'schema' key.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:172

Draft comment:
Add an inline comment explaining the difference between 'json_schema' and 'json' types in output_format to aid future maintenance.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py:43

Draft comment:
Remove the duplicate pytest.mark.skip decorator to avoid redundancy.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_ZrOmwwx7Az5swzKf

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

galkleinman

LGTM

neat: consider moving magic strings to consts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (3)
17-40: Structured output schema and OUTPUT_FORMAT definition

This setup looks correct for a JSON-schema‑backed structured output; if you want the example to fully reflect the “1 to 10” description, you could optionally add "minimum": 1 / "maximum": 10 to the rating property, but it isn’t required for validating the instrumentation behavior.

43-60: Duplicate skip decorator and ARG001 on fixtures

You have two identical @pytest.mark.skip decorators on this test; one is sufficient. Also, Ruff’s ARG001 on instrument_legacy is expected here because it’s a pytest fixture injected by name, so it doesn’t need to be referenced in the body.
-@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") -@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") +@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support")
106-151: Reduce duplication and make log-count assertions less brittle

test_anthropic_structured_outputs_with_events_with_content and ..._with_no_content are almost identical apart from the instrumentation fixture and expected logging, so you could factor shared setup/assertions into a helper or parametrize over (fixture, expected_log_count) to cut duplication. Also, hard‑coding len(logs) == 2 may be fragile if the instrumentation later adds extra events—consider asserting a minimum count or filtering logs by an identifying attribute instead. ARG001 on the instrument_with_content / instrument_with_no_content parameters is similarly expected pytest‑fixture behavior.

Also applies to: 153-197

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d6360b2 and 6f12631.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (4)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)

verify_metrics (7-71)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Ruff (0.14.5)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

47-47: Unused function argument: instrument_legacy

(ARG001)

109-109: Unused function argument: instrument_with_content

(ARG001)

156-156: Unused function argument: instrument_with_no_content

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint

🔇 Additional comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (1)

62-104: Span, schema, metrics, and logs assertions for legacy path

The assertions on gen‑ai prompt/completion attributes, LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, request/response models, metrics, and legacy log behavior give solid end‑to‑end coverage for the Anthropic structured‑output path once the SDK version is bumped.

- Update anthropic SDK from >=0.36.0 to >=0.50.0 to support structured outputs - Updated to version 0.74.1 which includes the output_format parameter - Remove skip decorators from structured outputs tests - Tests are ready to run once VCR cassettes are recorded with valid API key To record cassettes: export ANTHROPIC_API_KEY=your_key_here cd packages/opentelemetry-instrumentation-anthropic poetry run pytest tests/test_structured_outputs.py --record-mode=once 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (1)

44-44: Consider using the stable model identifier "claude-sonnet-4-5" instead of the dated variant.

The model identifier "claude-sonnet-4-5-20250929" is valid and officially supported by Anthropic. However, using the stable identifier "claude-sonnet-4-5" (without the date suffix) would be more maintainable and future-proof, as it automatically uses the latest available version of Sonnet 4.5 rather than pinning to a specific release date. If you need to pin to a specific version, the current approach is fine; otherwise, consider updating lines 44, 105, and 151 to use "claude-sonnet-4-5" for consistency with standard Anthropic API practices.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6f12631 and 13f1e33.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-anthropic/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (5)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)

verify_metrics (7-71)

packages/opentelemetry-instrumentation-anthropic/tests/conftest.py (1)

anthropic_client (70-71)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Ruff (0.14.5)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

41-41: Unused function argument: instrument_legacy

(ARG001)

102-102: Unused function argument: instrument_with_content

(ARG001)

148-148: Unused function argument: instrument_with_no_content

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Lint
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)

🔇 Additional comments (2)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (2)

12-35: LGTM! Well-structured schema definitions.

The JOKE_SCHEMA and OUTPUT_FORMAT structures are clearly defined and align with Anthropic's structured outputs API format. The schema properly constrains the response with required fields and additionalProperties set to False for strict validation.

40-42: Static analysis warnings are false positives.

The Ruff warnings about unused function arguments (instrument_legacy, instrument_with_content, instrument_with_no_content) are false positives. These are pytest fixtures used for their side effects—they configure the instrumentation before each test runs. This is a standard pytest pattern where fixtures don't need to be explicitly referenced in the test body.

Also applies to: 101-103, 147-149

packages/opentelemetry-instrumentation-anthropic/pyproject.toml

coderabbitai · 2025-11-23T11:55:19Z

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

+}
+
+
+@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support")


⚠️ Potential issue | 🟡 Minor

Apply skip decorator consistently across all structured output tests.

Only the first test has the skip decorator for SDK version >= 0.50.0, but all three tests use the same beta.messages.create API with output_format and betas=["structured-outputs-2025-11-13"]. If the SDK version requirement applies to the first test, it should apply to all three tests that exercise the same structured outputs feature.

Apply this diff to add the skip decorator to the remaining tests:

+@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") @pytest.mark.vcr def test_anthropic_structured_outputs_with_events_with_content( instrument_with_content, anthropic_client, span_exporter, log_exporter, reader ):

+@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") @pytest.mark.vcr def test_anthropic_structured_outputs_with_events_with_no_content( instrument_with_no_content, anthropic_client, span_exporter, log_exporter, reader ):

Also applies to: 100-100, 146-146

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py around lines 38, 100, and 146, the pytest.mark.skip decorator for "Requires anthropic SDK >= 0.50.0 with structured outputs support" is only applied to the first test; add the same @pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") decorator immediately above the other two test functions (lines ~100 and ~146) so all tests using beta.messages.create with output_format and betas=["structured-outputs-2025-11-13"] are consistently skipped when the SDK requirement is not met.

Replace LogRecord with Event from opentelemetry._events to fix compatibility with OpenTelemetry SDK 1.38.0. The LogRecord API no longer supports event_name parameter; events are now emitted using the Event class with a name parameter. Changes: - Replace LogRecord import with Event from opentelemetry._events - Update event_emitter.py to create Event instances instead of LogRecord - Update test assertions to use event.name instead of event.event_name All 40 tests now passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Remove @pytest.mark.skip decorator from test_anthropic_structured_outputs_legacy now that anthropic SDK 0.74.1 is installed. Tests require VCR cassettes to be recorded with valid ANTHROPIC_API_KEY: poetry run pytest tests/test_structured_outputs.py --record-mode=once 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…tputs Update the minimum version constraint from >=0.50.0 to >=0.74.0 to ensure structured outputs support is available. Version 0.74.1 includes the necessary .parse() and transform_schema() methods for structured outputs. Verified: - Anthropic SDK 0.74.1 installed - beta.messages.parse() method available - beta.messages.create() with output_format supported - All 15 legacy tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/event_emitter.py (1)

6-7: Event-based emission wiring looks correct; only minor naming nit

Switching to Event with name, body, and attributes=EVENT_ATTRIBUTES keeps semantics aligned with the tests that assert on log.log_record.name and gen_ai.system. Only minor nit is shadowing the event parameter with a local event variable in both helpers; consider renaming the local (e.g. otel_event) for clarity, but it's not functionally problematic.

Also applies to: 212-218, 236-241

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (3)

12-35: Structured-output schema and OUTPUT_FORMAT are clear and reusable

The JOKE_SCHEMA and OUTPUT_FORMAT constants are well-factored and make the tests readable. If you ever want stronger validation, you could also assert additionalProperties/required in the tests when inspecting the schema attribute, but that's optional.

38-97: Legacy structured-outputs test covers key attributes and metrics

The legacy path test exercises span prompt/completion, validates LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA contents, checks request/response model attribution, parses the JSON response, and verifies metrics and absence of events when using legacy attributes. This is a solid end-to-end check of the new schema attribute on spans.

Static analysis warning about instrument_legacy being unused is expected for a pytest fixture used only for side effects; if ARG001 is enforced in CI you can silence it with a # noqa: ARG001 on that parameter, but functionally this is fine.

99-188: Event-mode structured-outputs tests look good; consider asserting log contents if needed

Both event-mode tests (with and without content) correctly:

exercise the same beta.messages.create structured-outputs path,

assert the presence and basic structure of LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA,

confirm request/response model attribution,

validate the JSON response shape, and

verify metrics plus the expected event count (2 logs).

That gives good coverage of the new behavior. If you later want stronger regression protection, you might reuse assert_message_in_logs from test_messages.py here to check event bodies as well, but it's not strictly necessary.

Similar to the first test, instrument_with_content and instrument_with_no_content being unused in the body is normal for fixtures; add # noqa: ARG001 only if your linter treats this as an error.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 13f1e33 and f5ca45f.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/event_emitter.py (3 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/event_emitter.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (4)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)

verify_metrics (7-71)

packages/opentelemetry-instrumentation-anthropic/tests/conftest.py (1)

anthropic_client (70-71)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Ruff (0.14.5)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

40-40: Unused function argument: instrument_legacy

(ARG001)

101-101: Unused function argument: instrument_with_content

(ARG001)

147-147: Unused function argument: instrument_with_no_content

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

2442-2453: Helper update matches new Event naming semantics

Asserting on log.log_record.name is consistent with using Event(name=...) in the emitter. The remaining checks on system attribute and body still validate the important parts of the payload.

Reverting changes from commit 0f2d11a that incorrectly changed from LogRecord to Event API. Main branch is working correctly with LogRecord and event_name.

- Fix output_format structure to use direct 'schema' field instead of nested 'json_schema.schema' - Update span_utils.py to extract schema from correct location - Simplify tests for events mode (no request attribute checks) - Add VCR cassettes for structured outputs tests

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (2)
32-87: Good coverage of span attributes; consider tightening schema assertion

This test nicely validates:

Prompt and completion content/roles

Presence of LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA

Request/response model attributes

Legacy mode avoiding logs

To strengthen regression protection around structured outputs, you could optionally assert full equality between schema_attr and JOKE_SCHEMA (not just presence of keys) so schema drift is caught immediately:
- assert "properties" in schema_attr - assert "joke" in schema_attr["properties"] - assert "rating" in schema_attr["properties"] + assert schema_attr == JOKE_SCHEMA
34-35: Handle Ruff ARG001 warnings for fixture-only parameters

Ruff flags instrument_legacy, instrument_with_content, and instrument_with_no_content as unused arguments, even though they are pytest fixtures used for side effects only.

If Ruff is enforced on tests, consider one of:

Explicitly “use” them in the body:
def test_...(..., instrument_legacy, ...): _ = instrument_legacy # noqa: ARG001 ...
Or add a per-line/per-file # noqa: ARG001 as appropriate.

Or configure Ruff to ignore ARG001 in test files.

This keeps the fixture pattern while satisfying the linter.

Also applies to: 91-92, 120-121

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c91f7c6 and d8c8b35.

📒 Files selected for processing (5)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/test_anthropic_structured_outputs_legacy.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/test_anthropic_structured_outputs_with_events_with_content.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/test_anthropic_structured_outputs_with_events_with_no_content.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (1 hunks)

✅ Files skipped from review due to trivial changes (2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/test_anthropic_structured_outputs_with_events_with_no_content.yaml
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/test_anthropic_structured_outputs_legacy.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

🧰 Additional context used

📓 Path-based instructions (2)

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/test_anthropic_structured_outputs_with_events_with_content.yaml

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR Repo: traceloop/openllmetry PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-17T15:06:48.109Z Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans

🪛 Ruff (0.14.5)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py

34-34: Unused function argument: instrument_legacy

(ARG001)

91-91: Unused function argument: instrument_with_content

(ARG001)

120-120: Unused function argument: instrument_with_no_content

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (3)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (3)

10-29: Schema and OUTPUT_FORMAT setup looks solid

The shared JOKE_SCHEMA and OUTPUT_FORMAT definitions are clear, reusable across tests, and accurately capture the structured output shape you’re exercising. This aligns well with the intent to validate structured-output schema handling.

89-116: Events-with-content test focuses correctly on logs and payload

This test exercises the events-with-content path: it reuses the same structured output call, asserts the JSON payload has the expected keys, and verifies that two logs are emitted. That’s a good, focused check on the logging behavior without duplicating all the span assertions from the legacy test.

118-144: Events-with-no-content test mirrors content-path behavior appropriately

This test mirrors the previous one for the “no content” instrumentation variant while revalidating span count, operation name, and the structured JSON payload plus log count. The symmetry between the two variants makes the behavior easy to compare and maintain.

coderabbitai · 2025-11-23T13:30:34Z

...ttes/test_structured_outputs/test_anthropic_structured_outputs_with_events_with_content.yaml

+ headers:
+ CF-RAY:
+ - 9a30f7bc0cccf169-TLV
+ Connection:
+ - keep-alive
+ Content-Encoding:
+ - gzip
+ Content-Type:
+ - application/json
+ Date:
+ - Sun, 23 Nov 2025 13:21:09 GMT
+ Server:
+ - cloudflare
+ Transfer-Encoding:
+ - chunked
+ X-Robots-Tag:
+ - none
+ anthropic-organization-id:
+ - 617d109c-a187-4902-889d-689223d134aa
+ anthropic-ratelimit-input-tokens-limit:
+ - '2000000'
+ anthropic-ratelimit-input-tokens-remaining:
+ - '2000000'
+ anthropic-ratelimit-input-tokens-reset:
+ - '2025-11-23T13:21:07Z'
+ anthropic-ratelimit-output-tokens-limit:
+ - '400000'
+ anthropic-ratelimit-output-tokens-remaining:
+ - '400000'
+ anthropic-ratelimit-output-tokens-reset:
+ - '2025-11-23T13:21:09Z'
+ anthropic-ratelimit-tokens-limit:
+ - '2400000'
+ anthropic-ratelimit-tokens-remaining:
+ - '2400000'
+ anthropic-ratelimit-tokens-reset:
+ - '2025-11-23T13:21:07Z'
+ cf-cache-status:
+ - DYNAMIC
+ request-id:
+ - req_011CVQtbb1HQigBLDM6oAQT3
+ retry-after:
+ - '53'
+ strict-transport-security:
+ - max-age=31536000; includeSubDomains; preload
+ x-envoy-upstream-service-time:
+ - '3261'


🛠️ Refactor suggestion | 🟠 Major

Scrub Anthropic identifiers from cassette response headers

The response headers currently include anthropic-organization-id and request-id with what appear to be real identifiers. Per the repo guidelines for cassettes (“never commit secrets or PII; scrub sensitive data”), these should be replaced with stable placeholders (e.g., org_XXXXXXXX / req_XXXXXXXX) before committing.

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_structured_outputs/test_anthropic_structured_outputs_with_events_with_content.yaml around lines 58 to 104, scrub the sensitive Anthropic identifiers in the response headers by replacing the real anthropic-organization-id and request-id values with stable placeholders (for example use org_XXXXXXXX and req_XXXXXXXX respectively), preserving the YAML structure and quoting style so the cassette remains valid.

nirga changed the title ~~feat: add structured outputs schema logging for Anthropic and Gemini~~ fix: add structured outputs schema logging for Anthropic and Gemini Nov 20, 2025

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

packages/sample-app/sample_app/anthropic_structured_outputs_demo.py Outdated Show resolved Hide resolved

packages/sample-app/sample_app/openai_structured_outputs_demo.py Show resolved Hide resolved

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

ellipsis-dev bot reviewed Nov 20, 2025

View reviewed changes

galkleinman approved these changes Nov 20, 2025

View reviewed changes

Merge branch 'main' into feat/structured-outputs-logging

6f12631

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

nirga and others added 3 commits November 23, 2025 13:59

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

nirga added 3 commits November 23, 2025 14:21

Merge branch 'main' into feat/structured-outputs-logging

8aa83f7

revert: undo Event API changes to event_emitter.py and test_messages.py

cd7f14d

Reverting changes from commit 0f2d11a that incorrectly changed from LogRecord to Event API. Main branch is working correctly with LogRecord and event_name.

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

nirga merged commit 72b28c6 into main Nov 23, 2025
12 checks passed

nirga deleted the feat/structured-outputs-logging branch November 23, 2025 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add structured outputs schema logging for Anthropic and Gemini #3454

fix: add structured outputs schema logging for Anthropic and Gemini #3454

Uh oh!

nirga commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

galkleinman left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 23, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Nov 23, 2025

Uh oh!

Labels

3 participants

		}


		@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support")

fix: add structured outputs schema logging for Anthropic and Gemini #3454

fix: add structured outputs schema logging for Anthropic and Gemini #3454

Uh oh!

Conversation

nirga commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Anthropic Claude

Google Gemini

OpenAI

Sample Apps

Testing

Related Documentation

Summary by CodeRabbit

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

galkleinman left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 23, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

nirga commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading