Skip to content

Conversation

@devsergiy
Copy link
Member

@devsergiy devsergiy commented Nov 5, 2025

Use cache for variables normalization and variables remapping.

Closes ENG-8477

Summary by CodeRabbit

  • Performance

    • Added caches for variable normalization and variable remapping to speed request processing.
  • Observability

    • Cache-hit info recorded in telemetry with new metrics and span attributes for normalization and remapping.
  • New Features / Debug

    • Debug response headers now report normalization and remapping HIT/MISS via a unified cache-header flag.
  • Tests

    • Added comprehensive tests covering normalization/remapping cache behavior, headers, telemetry, and varied request scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Adds variables-normalization and remap-variables caches and cache-entry types; NormalizeVariables and RemapVariables now return cache-hit flags; cache-hit booleans recorded on operation context, emitted as HTTP headers and OTel attributes; GraphMux creates/registers these caches and tests/assertions added for header telemetry.

Changes

Cohort / File(s) Summary
Tests & test helpers
router-tests/normalization_cache_test.go, router-tests/telemetry/telemetry_test.go, router-tests/testenv/testenv.go, router-tests/testenv/*.go
Adds cacheHit type, assertCacheHeaders helper and TestVarsNormalizationRemappingCaches; expands telemetry expectations to include variables_normalization and remap_variables metrics/attributes; adapts testenv to consolidated EnableCacheResponseHeaders.
Graph server & cache lifecycle
router/core/graph_server.go
Adds VariablesNormalizationCache and RemapVariablesCache fields to graphMux; creates/configures these caches in buildOperationCaches, registers metrics, exposes them via OperationProcessorOptions, and closes them on shutdown.
Operation processor, cache types & wiring
router/core/operation_processor.go, router/core/operation_processor_test.go, router/core/cache_warmup.go
Introduces VariablesNormalizationCacheEntry and RemapVariablesCacheEntry; removes operationID from NormalizationCacheEntry; adds normalizeVariablesCacheKey() and remapVariablesCacheKey(); wires new caches into OperationCache and NewOperationProcessor; updates NormalizeVariables and RemapVariables signatures to return cache-hit flags and updates tests/callers.
Call-site updates
router/core/cache_warmup.go, router/core/websocket.go, router/core/operation_processor_test.go, router/core/graphql_prehandler.go
Updates callers to new return arities: NormalizeVariables now returns (cached bool, ..., error), RemapVariables returns (cached bool, error); callers capture/propagate cache-hit flags and adjust error handling and span attributes.
Request context fields & prehandler telemetry
router/core/context.go, router/core/graphql_prehandler.go
Adds variablesNormalizationCacheHit and variablesRemappingCacheHit to operationContext; records cache-hit booleans on normalization/remap spans and stores them in request/operation context.
HTTP response headers, OTel attributes & config
router/core/graphql_handler.go, router/pkg/otel/attributes.go, router/pkg/config/config.go, router/pkg/config/config.schema.json, router/pkg/config/fixtures/*, router/pkg/config/testdata/*
Adds VariablesNormalizationCacheHeader and VariablesRemappingCacheHeader; centralizes cache header emission under EnableCacheResponseHeaders; adds OTel keys WgVariablesNormalizationCacheHit and WgVariablesRemappingCacheHit; updates config/schema/fixtures/testdata to replace per-header flags with EnableCacheResponseHeaders.
Lambda & misc formatting
aws-lambda-router/cmd/main.go
Removes EnableExecutionPlanCacheResponseHeader from EngineExecutionConfiguration literal and adjusts imports/formatting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • All updated signatures for NormalizeVariables and RemapVariables and every call site update.
    • Removal of operationID from NormalizationCacheEntry and persisted-cache interactions.
    • Correctness and collision risk in normalizeVariablesCacheKey() / remapVariablesCacheKey().
    • Ristretto cache initialization, sizing, metrics registration, concurrency and shutdown in graphMux.
    • Consistency between cache-hit flags, emitted HTTP headers, OTel span attributes, and updated telemetry tests.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: cache vars and remapping normalization' accurately summarizes the main change: implementing caching for variables normalization and variables remapping.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d8b263 and a56bf24.

📒 Files selected for processing (2)
  • router-tests/normalization_cache_test.go (1 hunks)
  • router/core/graphql_handler.go (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2252 File: router-tests/telemetry/telemetry_test.go:9684-9693 Timestamp: 2025-10-01T20:39:16.113Z Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed. 

Applied to files:

  • router-tests/normalization_cache_test.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma Repo: wundergraph/cosmo PR: 2181 File: router/core/operation_processor.go:0-0 Timestamp: 2025-09-02T12:52:27.677Z Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur. 

Applied to files:

  • router/core/graphql_handler.go
🧬 Code graph analysis (1)
router-tests/normalization_cache_test.go (2)
router/core/graphql_handler.go (3)
  • NormalizationCacheHeader (39-39)
  • VariablesNormalizationCacheHeader (40-40)
  • VariablesRemappingCacheHeader (41-41)
router/core/operation_processor.go (1)
  • GraphQLRequest (194-199)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (8)
router-tests/normalization_cache_test.go (4)

14-20: LGTM!

The struct is well-documented and provides a clean abstraction for asserting cache hit/miss status across the three normalization stages.


22-38: LGTM!

The helper function is well-structured with proper use of t.Helper() and descriptive assertion messages for each cache header.


43-63: LGTM!

The test properly validates cache behavior across variable value changes. The use of named struct fields addresses prior review feedback and improves readability.


65-156: LGTM!

Comprehensive test coverage for edge cases including inline value extraction, query structure changes, cache key isolation, and list coercion. The inline comments at lines 151-153 clearly explain the expected cache behavior for the list coercion test.

router/core/graphql_handler.go (4)

36-42: LGTM!

The new cache header constants follow the established X-WG-* naming convention and are logically grouped with related headers.


68-84: LGTM!

The consolidation of cache header options into EnableCacheResponseHeaders addresses prior review feedback and simplifies configuration.


86-105: LGTM!

The tracer initialization and handler field wiring are correct. The new cache-related fields are properly initialized from options.


442-469: LGTM!

The unified enableCacheResponseHeaders flag correctly controls all five cache headers. The explicit if-else pattern for each header maintains clarity and is consistent with the existing implementation.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-814390db96b8942ae86fb6cdfccc3afe096104ec 
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/operation_processor.go (1)

944-979: Copy cached variables before storing entries

VariablesNormalizationCacheEntry.variables currently aliases o.parsedOperation.Request.Variables, which is backed by the pooled parseKit buffer. After the entry is cached, the kit gets reused (OperationProcessor.freeKit), so later requests mutate that shared backing array, corrupting the cached payload and any future hits.

Please copy the slice before caching so the entry owns its data. Apply the same fix in both places where we construct VariablesNormalizationCacheEntry.

@@ -entry := VariablesNormalizationCacheEntry{ +cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...) +entry := VariablesNormalizationCacheEntry{	uploadsMapping: uploadsMapping,	id: o.parsedOperation.ID,	normalizedRepresentation: o.parsedOperation.NormalizedRepresentation, -variables: o.parsedOperation.Request.Variables, +variables: cachedVariables,	reparse: false,	} @@ -entry := VariablesNormalizationCacheEntry{ +cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...) +entry := VariablesNormalizationCacheEntry{	uploadsMapping: uploadsMapping,	id: o.parsedOperation.ID,	normalizedRepresentation: o.parsedOperation.NormalizedRepresentation, -variables: o.parsedOperation.Request.Variables, +variables: cachedVariables,	reparse: true,	}
🧹 Nitpick comments (1)
router/core/graph_server.go (1)

707-731: Expose cache metrics for the new caches

The new variables/remap caches are now part of the mux, but we never register them with metricInfos, so their hit/miss stats stay invisible in OTLP/Prometheus. Please append both caches to metricInfos alongside the existing ones so operators can monitor them.

@@	if s.normalizationCache != nil {	metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("query_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.normalizationCache.Metrics))	} +if s.variablesNormalizationCache != nil { +metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.variablesNormalizationCache.Metrics)) +} + +if s.remapVariablesCache != nil { +metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_remap", srv.engineExecutionConfiguration.NormalizationCacheSize, s.remapVariablesCache.Metrics)) +} +	if s.persistedOperationCache != nil {	metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("persisted_query_normalization", 1024, s.persistedOperationCache.Metrics))	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f865487 and 5096314.

📒 Files selected for processing (3)
  • router-tests/normalization_cache_test.go (1 hunks)
  • router/core/graph_server.go (4 hunks)
  • router/core/operation_processor.go (10 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2252 File: router-tests/telemetry/telemetry_test.go:9684-9693 Timestamp: 2025-10-01T20:39:16.113Z Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed. 

Applied to files:

  • router-tests/normalization_cache_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2172 File: router/core/graph_server.go:0-0 Timestamp: 2025-09-17T20:55:39.456Z Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration. 

Applied to files:

  • router/core/graph_server.go
🧬 Code graph analysis (2)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (284-341)
  • Environment (1731-1767)
  • GraphQLRequest (1907-1915)
router/core/graphql_handler.go (1)
  • NormalizationCacheHeader (39-39)
router/core/graph_server.go (1)
router/core/operation_processor.go (3)
  • NormalizationCacheEntry (755-759)
  • VariablesNormalizationCacheEntry (761-767)
  • RemapVariablesCacheEntry (769-773)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
@ysmolski ysmolski self-assigned this Nov 11, 2025
@ysmolski ysmolski changed the title fix: normalization cache fix: cache vars and remapping normalization Nov 13, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
router/core/operation_processor.go (2)

813-817: Consider handling or documenting jsonparser.Delete errors.

The jsonparser.Delete calls on Line 816 can return errors, but they're being silently ignored. While delete operations on valid JSON should rarely fail, consider either:

  1. Handling the error (log or return it)
  2. Adding a comment explaining why it's safe to ignore

Example:

 for _, varName := range skipIncludeVariableNames { -o.parsedOperation.Request.Variables = jsonparser.Delete(o.parsedOperation.Request.Variables, varName) +var err error +o.parsedOperation.Request.Variables, err = jsonparser.Delete(o.parsedOperation.Request.Variables, varName) +if err != nil { +// Log or handle error +} }

943-950: Address or remove the TODO-style comment.

The comment on Line 943 suggests uncertainty about keyGen reset discipline: "should not be needed if we properly reset after use - check do we have any remaining places where we do not reset keygen - maybe wrap into a type which will reset once we got key".

After reviewing the code, the resets appear properly placed (lines 885, 950, 995, 1045, 1053, 1193, 1202). Consider either:

  1. Removing the comment if the concern has been addressed
  2. Creating a wrapper type as suggested for better encapsulation
  3. Converting this to a proper TODO with a tracking ticket
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2bf4a4 and dcbf441.

📒 Files selected for processing (1)
  • router/core/operation_processor.go (12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
  • GraphQLRequest (1909-1917)
🔇 Additional comments (7)
router/core/operation_processor.go (7)

48-78: LGTM! Documentation improvements.

The enhanced comments provide clearer explanations of the field purposes and when they're available.


106-128: LGTM! Cache fields properly added.

The new cache fields follow the established naming conventions and type patterns used for other caches in the struct.


163-176: LGTM! Cache fields properly integrated.

The cache fields are correctly added to the OperationCache struct following existing patterns.


767-794: LGTM! Well-documented cache entry types.

The cache entry structures are well-designed with clear documentation. The reparse flag in VariablesNormalizationCacheEntry is a smart optimization to avoid unnecessary document reparsing when the normalized form hasn't changed.


889-990: LGTM! Well-implemented caching logic with smart optimization.

The caching implementation is solid:

  • Cache key generation appropriately uses both variables and normalized representation
  • The reparse flag optimization avoids unnecessary document reparsing when the normalized form is unchanged (lines 954-963)
  • Proper error handling and cache miss fallback
  • Upload mappings are correctly preserved in cache entries

992-1073: LGTM! RemapVariables caching correctly implemented.

The caching logic is well-implemented:

  • Cache key appropriately uses only the normalized representation (line 993), since variable remapping depends on operation structure, not input variable values
  • Correct handling of cache hits with document reparsing (lines 1008-1010)
  • Proper error handling and fallback
  • keyGen properly reset after each use

1432-1441: LGTM! Cache initialization properly wired.

The new caches are correctly integrated into the NewOperationProcessor initialization:

  • Condition check includes the new cache types (lines 1432-1433)
  • Fields properly assigned from options (lines 1439-1440)
  • Follows the established pattern for other caches
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/operation_processor.go (1)

944-944: Consider removing the TODO-style comment.

The comment suggests checking whether keyGen is properly reset after use, but inspection shows keyGen.Reset() is consistently called after each use throughout the file (lines 885, 951, 996, 1047, 1055). This appears to be a leftover development note that could be removed for code clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcbf441 and 11e6116.

📒 Files selected for processing (2)
  • router/core/graphql_prehandler.go (3 hunks)
  • router/core/operation_processor.go (12 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma Repo: wundergraph/cosmo PR: 2181 File: router/core/operation_processor.go:0-0 Timestamp: 2025-09-02T12:52:27.677Z Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur. 

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma Repo: wundergraph/cosmo PR: 2181 File: router/core/operation_processor.go:0-0 Timestamp: 2025-09-02T12:52:27.677Z Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior. 

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2252 File: router-tests/telemetry/telemetry_test.go:9684-9693 Timestamp: 2025-10-01T20:39:16.113Z Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed. 

Applied to files:

  • router/core/graphql_prehandler.go
🧬 Code graph analysis (2)
router/core/graphql_prehandler.go (1)
router/pkg/otel/attributes.go (2)
  • WgVariablesNormalizationCacheHit (39-39)
  • WgVariablesRemappingCacheHit (40-40)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
  • GraphQLRequest (1909-1917)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_test
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
🔇 Additional comments (8)
router/core/operation_processor.go (5)

767-794: Well-documented cache entry types.

The new cache entries are clearly documented with helpful comments explaining each field's purpose. The reparse flag in VariablesNormalizationCacheEntry is a smart optimization that avoids unnecessary document reparsing when the normalized form hasn't changed.

Based on learnings: This addresses a past review comment requesting comments and descriptions for these structures.


812-817: Correct handling of skip/include variables after cache hit.

This logic properly removes skip/include variables when retrieving a cached normalized operation, ensuring consistency with the non-cached path where these variables are removed during normalization.


880-991: Solid caching implementation for variable normalization.

The cache key appropriately includes both variables and normalized representation since normalization depends on both. The reparse flag optimization is well-designed: when the normalized form doesn't change, it avoids the expensive setAndParseOperationDoc() call on cache retrieval.


993-1075: Appropriate caching implementation for variable remapping.

The cache key correctly includes only the normalized representation (not variable values) since remapping depends on operation structure. Unlike NormalizeVariables, this always calls setAndParseOperationDoc() after a cache hit, which is appropriate since remapping modifies the operation structure.


1434-1443: Proper wiring of new caches into OperationProcessor.

The new caches are correctly integrated following the existing pattern for other cache types.

router/core/graphql_prehandler.go (3)

786-786: Correct integration of variables normalization cache.

The cached boolean is properly captured from NormalizeVariables and propagated to both the OTEL span and the request context for telemetry tracking.

Also applies to: 801-802


804-831: Excellent documentation for file upload path mapping.

The enhanced comments with concrete examples make the complex upload path remapping logic much clearer. The distinction between direct argument passing vs. nested object usage is now well-explained.


833-836: Correct integration of variable remapping cache.

The cached boolean is properly captured and propagated to telemetry. The updated comment at lines 833-835 helpfully clarifies the key-value relationship in the remapVariables map (new names as keys, old names as values).

Also applies to: 852-853

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
router-tests/normalization_cache_test.go (4)

87-106: Clarify or remove misleading comment.

The comment on line 90 states "Test with unused variables that should be removed," but the test doesn't include any unused variables. The $id variable is used in both queries. Consider removing this comment or clarifying what aspect of variable handling is being tested.


108-131: Add t.Parallel() for consistency.

This subtest doesn't call t.Parallel(), unlike the other subtests in this test function. For consistency and to improve test execution performance, consider adding it unless there's a specific reason for sequential execution.

Apply this diff:

	t.Run("Cache key isolation - different operations don't collide", func(t *testing.T) { +t.Parallel()	testenv.Run(t, &testenv.Config{}, func(t *testing.T, xEnv *testenv.Environment) {

64-85: Consider adding explanatory comments for cache hit patterns.

Line 83 expects cacheHit{false, false, true} (normalization MISS, variables MISS, remapping HIT) when the inline value changes from id: 1 to id: 2. While the logic is correct, a brief comment explaining why remapping hits while the others miss would improve readability and make the test's intent clearer for future maintainers.


133-151: Consider adding explanatory comments for cache hit patterns.

Line 149 expects cacheHit{true, false, true} (normalization HIT, variables MISS, remapping HIT) when the variable value changes. A brief comment explaining this pattern would help clarify that normalization hits because the query structure is unchanged, variables misses because the value differs, and remapping hits because the structure remains the same.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11e6116 and ca894b4.

📒 Files selected for processing (1)
  • router-tests/normalization_cache_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2252 File: router-tests/telemetry/telemetry_test.go:9684-9693 Timestamp: 2025-10-01T20:39:16.113Z Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed. 

Applied to files:

  • router-tests/normalization_cache_test.go
🧬 Code graph analysis (1)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (2)
  • TestResponse (1924-1928)
  • GraphQLRequest (1909-1917)
router/core/graphql_handler.go (3)
  • NormalizationCacheHeader (39-39)
  • VariablesNormalizationCacheHeader (40-40)
  • VariablesRemappingCacheHeader (41-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
router-tests/normalization_cache_test.go (1)

22-38: LGTM! Well-designed test helper.

The assertCacheHeaders helper effectively reduces duplication and provides clear, specific error messages for each cache stage. Good use of t.Helper() for accurate test failure reporting.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/operation_processor.go (1)

1430-1439: Cache wiring is correct; consider extracting condition for readability.

The new caches are properly wired into the OperationCache. The conditional on lines 1430-1431 checks all six cache options, which is functionally correct.

The multi-condition check could be extracted for improved readability:

+func (opts *OperationProcessorOptions) hasCaching() bool { +return opts.NormalizationCache != nil || opts.ValidationCache != nil ||  +opts.QueryDepthCache != nil || opts.OperationHashCache != nil ||  +opts.EnablePersistedOperationsCache || opts.VariablesNormalizationCache != nil ||  +opts.RemapVariablesCache != nil +} + -if opts.NormalizationCache != nil || opts.ValidationCache != nil || opts.QueryDepthCache != nil || opts.OperationHashCache != nil || opts.EnablePersistedOperationsCache || -opts.VariablesNormalizationCache != nil || opts.RemapVariablesCache != nil { +if opts.hasCaching() {	processor.operationCache = &OperationCache{
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca894b4 and 47893d8.

📒 Files selected for processing (2)
  • router-tests/normalization_cache_test.go (1 hunks)
  • router/core/operation_processor.go (11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2252 File: router-tests/telemetry/telemetry_test.go:9684-9693 Timestamp: 2025-10-01T20:39:16.113Z Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed. 

Applied to files:

  • router-tests/normalization_cache_test.go
🧬 Code graph analysis (2)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (2)
  • TestResponse (1924-1928)
  • GraphQLRequest (1909-1917)
router/core/graphql_handler.go (3)
  • NormalizationCacheHeader (39-39)
  • VariablesNormalizationCacheHeader (40-40)
  • VariablesRemappingCacheHeader (41-41)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
  • GraphQLRequest (1909-1917)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (7)
router-tests/normalization_cache_test.go (2)

14-38: LGTM - Clean test infrastructure.

The cacheHit struct and assertCacheHeaders helper provide a clean abstraction for validating cache behavior. The error messages in the assertions are descriptive and will aid debugging if tests fail.


40-157: Excellent test coverage for cache interactions.

The test suite comprehensively validates normalization, variables normalization, and remapping cache behavior across multiple scenarios. The comment on lines 150-152 is particularly helpful in explaining why specific caches hit or miss during list coercion.

The expected cache patterns align correctly with the cache key generation logic:

  • Normalization cache keys include query structure
  • Variables normalization keys include both normalized representation and variable values
  • Remapping keys include only normalized representation
router/core/operation_processor.go (5)

767-794: Well-designed cache entry structures with clear documentation.

The cache entry types appropriately capture the state needed for each normalization phase:

  • VariablesNormalizationCacheEntry includes the reparse flag optimization to avoid unnecessary document reparsing
  • RemapVariablesCacheEntry captures the internal ID used for downstream caches
  • Documentation clearly explains each field's purpose

880-887: Cache key generation correctly reflects dependencies.

The cache key design appropriately captures dependencies:

  • normalizeVariablesCacheKey includes both variables and normalized representation, since variable normalization depends on actual values
  • remapVariablesCacheKey includes only normalized representation, since remapping depends only on query structure

This design maximizes cache hits while maintaining correctness.

Also applies to: 991-996


890-989: Solid cache integration with smart reparse optimization.

The NormalizeVariables cache implementation is well-designed:

  1. Cache key includes both variables and normalized representation - correctly captures dependencies
  2. Reparse optimization (lines 952-965) avoids expensive document reparsing when only variable values changed but structure remained unchanged
  3. Error handling is consistent with existing patterns
  4. Return signature now includes cached flag and mapping for upload path handling

The logic correctly handles both cache hit and miss paths while maintaining all necessary state.


999-1071: RemapVariables cache integration follows consistent pattern.

The RemapVariables cache implementation correctly:

  1. Uses normalized representation as cache key - remapping depends only on query structure, not variable values
  2. Always reparses after cache hit (line 1008) - necessary since downstream processing needs the parsed document
  3. Calculates InternalID - used as the cache key for validation, complexity, and planner caches downstream
  4. Returns cached flag for observability (used in headers and telemetry)

The logic is consistent with the NormalizeVariables pattern and correctly maintains all required state.


813-817: Correct cleanup of skip/include variables after cache hit.

This code ensures consistency between cache hit and miss paths:

  • On cache miss: normalization removes skip/include variables from doc.Input.Variables, which is copied to request variables (line 838)
  • On cache hit: the normalized representation is loaded without skip/include, but the original request variables still have them

The manual deletion ensures both paths produce identical results.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/telemetry/telemetry_test.go (1)

3758-3758: Attribute count bumps on “Operation - Normalize” spans may be brittle

Bumping require.Len(t, sn[2].Attributes(), …) to account for the two new normalization/remapping cache‑hit attributes is correct now, but these tests are sensitive to any future attribute additions. Consider asserting the presence (and values) of the new cache‑related attributes explicitly and relaxing the length check (e.g. >= baseline) to reduce churn when span metadata evolves.

Also applies to: 4480-4480, 4935-4935, 7097-7097, 9554-9554

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faacc11 and 9e0a7ac.

📒 Files selected for processing (1)
  • router-tests/telemetry/telemetry_test.go (31 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2252 File: router-tests/telemetry/telemetry_test.go:9684-9693 Timestamp: 2025-10-01T20:39:16.113Z Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed. 

Applied to files:

  • router-tests/telemetry/telemetry_test.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech Repo: wundergraph/cosmo PR: 2327 File: graphqlmetrics/core/metrics_service.go:435-442 Timestamp: 2025-11-12T18:38:46.619Z Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration. 

Applied to files:

  • router-tests/telemetry/telemetry_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (6)
router-tests/telemetry/telemetry_test.go (6)

581-612: New cache metrics for variables_normalization/remap_variables match existing patterns

The added hit/key/cost/max datapoints for cache_type = variables_normalization and remap_variables are consistent with the existing cache families (plan/query_normalization/validation) and correctly use the same base attributes and capacities for the default, no–feature-flag scenario.

Also applies to: 704-749, 839-868, 923-936


1081-1112: Persisted/non‑persisted cache telemetry extended coherently to new cache types

For the mixed persisted/non‑persisted case, the hit/miss, keys added, and cost totals for variables_normalization and remap_variables align with the request sequence and mirror how the tests treat the existing cache types. The expectations look internally consistent.

Also applies to: 1204-1249, 1339-1368, 1423-1436


1545-1576: Prometheus+OTLP cache telemetry keeps new cache types in sync

In the Prometheus-enabled scenario, the additional variables_normalization/remap_variables datapoints are wired exactly like the OTLP-only case (same attributes, same counts), so both exporters see a consistent cache story.

Also applies to: 1668-1713, 1803-1832, 1887-1900


2011-2042: Small validation cache eviction metrics cover new cache types without skewing semantics

The “small validation cache” test now reports request/key/cost/max metrics for variables_normalization and remap_variables with added/evicted/updated values that stay in step with the other caches and don’t disturb the validation‑cache eviction expectations.

Also applies to: 2134-2179, 2269-2298, 2353-2366


2476-2515: Feature‑flag cache telemetry correctly duplicates new cache types across main/FF dimensions

For the feature‑flag scenario, variables_normalization and remap_variables are added everywhere the other cache types appear (requests/keys/cost/max), with parallel counts for mainAttributes and featureFlagAttributes. That keeps the attribute surface symmetric across feature‑flagged vs non‑flagged traffic.

Also applies to: 2491-2505, 2507-2521, 2523-2537, 2603-2634, 2728-2775, 2874-2921, 3015-3046, 3112-3143, 3201-3214, 3244-3257


11704-11816: Exclusion filtering updated to cover new cache_type labels

In the custom‑exporter exclusion test, including variables_normalization and remap_variables in the expected router.graphql.cache.requests.stats datapoints ensures label‑filtering behavior is validated for all cache families, not just the original ones. Values are intentionally ignored, so the added datapoints are safe.

@ysmolski ysmolski requested a review from Noroth November 20, 2025 13:36
@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 85.02994% with 25 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@c914f13). Learn more about missing BASE report.

Files with missing lines Patch % Lines
router/core/operation_processor.go 79.76% 12 Missing and 5 partials ⚠️
router/core/graph_server.go 92.15% 2 Missing and 2 partials ⚠️
router/core/cache_warmup.go 0.00% 2 Missing ⚠️
router/core/websocket.go 66.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@ Coverage Diff @@ ## main #2318 +/- ## ======================================= Coverage ? 37.78% ======================================= Files ? 209 Lines ? 22674 Branches ? 0 ======================================= Hits ? 8567 Misses ? 13077 Partials ? 1030 

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/operation_processor.go (1)

896-912: Consider defensive copying of cached slice return value.

The cache-hit path returns entry.uploadsMapping directly (line 910), which is a reference type. If callers modify the returned slice, it could corrupt cached data.

Consider either:

  1. Returning a copy of the slice:
- return true, entry.uploadsMapping, nil + uploadsCopy := make([]uploads.UploadPathMapping, len(entry.uploadsMapping)) + copy(uploadsCopy, entry.uploadsMapping) + return true, uploadsCopy, nil
  1. Documenting that the returned slice must be treated as read-only

The same consideration applies to the non-cached paths (lines 970, 994).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db0e4a4 and aee6117.

📒 Files selected for processing (1)
  • router/core/operation_processor.go (12 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_test
🔇 Additional comments (12)
router/core/operation_processor.go (12)

50-78: LGTM! Documentation improvements enhance clarity.

The expanded documentation for ParsedOperation fields and the addition of NormalizationCacheHit tracking are well-structured and clearly describe their purpose.


116-117: LGTM! Cache fields follow established patterns.

The new VariablesNormalizationCache and RemapVariablesCache fields are consistent with existing cache field conventions.


148-164: LGTM! Documentation clarifies keyGen usage pattern.

The comment about resetting keyGen after Sum64 is helpful for maintainers and addresses previous review feedback.


174-175: LGTM! Cache fields properly integrated into OperationCache.

The new cache fields follow the established pattern and are correctly typed.


770-797: LGTM! Cache entry structures are well-documented.

The new cache entry types appropriately capture the state needed for caching normalization and remapping results. The detailed field documentation is helpful.

Note: These entries store reference types (slices, maps). The current usage appears safe since retrieved values are not mutated, but maintainers should continue to treat cached reference-type values as read-only to prevent cache corruption.


815-821: LGTM! Skip/include variable removal is correct for cache-hit path.

Since NormalizationCacheEntry doesn't store variables, this manual removal is necessary to match the behavior of the normalization process. The use of jsonparser.Delete (which returns a new slice) is appropriate and avoids mutating cached data.


841-862: LGTM! Comments clarify caching strategy.

The updated comments effectively explain why variables are not included in NormalizationCacheEntry - this allows reuse across different variable values, which is an appropriate optimization since normalization doesn't depend on variable values (except for skip/include directives).


886-895: LGTM! Cache key generation follows established patterns.

The normalizeVariablesCacheKey() helper correctly includes both variables and normalized representation, ensuring cache entries are specific to each unique input combination.


914-995: LGTM! Variables normalization caching logic is sound.

The caching implementation correctly handles:

  • The optimization to skip reparsing when the document is unchanged (reparse=false)
  • Proper detection of document changes by comparing before/after state
  • Appropriate cache entry creation and storage

The cached flag return value enables telemetry tracking of cache effectiveness.


997-1002: LGTM! Cache key generation is correct for remapping.

The remapVariablesCacheKey() method correctly keys only on normalizedRepresentation since remapping is operation-structure-dependent, not variable-value-dependent.


1005-1077: LGTM! Remapping cache implementation is sound.

The caching logic correctly:

  • Always reparses from cache since the document structure cannot be directly cached
  • Computes internalID even when remapping is disabled (necessary for planner cache key)
  • Stores the complete remapping result state
  • Returns appropriate cache-hit flags for telemetry

The signature change to return a cached boolean enables downstream monitoring of cache effectiveness.


1438-1446: LGTM! Cache initialization properly integrates new caches.

The new caches are correctly wired into the OperationCache creation logic, following the established pattern for existing caches.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
router/core/operation_processor.go (1)

1084-1093: Related to previous concern about disabled parameter.

The cache entry is stored regardless of the disabled flag value, which could lead to incorrect cache hits as mentioned in the previous comment.

🧹 Nitpick comments (2)
router/core/operation_processor.go (2)

814-820: Consider batching variable deletions for efficiency.

Each jsonparser.Delete call allocates a new byte slice. For requests with multiple skip/include variables, this creates repeated allocations. Consider collecting all variable names and deleting them in a single pass if jsonparser supports it, or using a more efficient approach.

However, this is a minor concern since skip/include variables are typically few in number.


849-849: Minor: Inconsistent comment indentation.

Line 849 starts with spaces instead of a tab, unlike the other comment lines. This is a minor formatting inconsistency.

- // +//
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aee6117 and 815e29e.

📒 Files selected for processing (1)
  • router/core/operation_processor.go (13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (11)
router/core/operation_processor.go (11)

48-78: LGTM!

The enhanced documentation for ParsedOperation fields improves code clarity and maintainability.


147-164: LGTM!

The comment on keyGen clarifies the usage pattern and helps prevent bugs from incorrect Reset/Sum64 ordering.


174-175: LGTM!

New cache fields are properly typed and consistent with existing cache patterns.


770-786: LGTM!

The VariablesNormalizationCacheEntry struct is well-documented with clear explanations for each field, including the reparse flag and uploadsMapping purpose.


788-796: LGTM!

The RemapVariablesCacheEntry struct is appropriately structured with clear documentation on the internalID usage.


900-907: LGTM!

The cache key generation correctly includes both variables and normalized representation, ensuring unique cache entries per variable combination.


976-988: LGTM!

Correctly stores cache entry with reparse: false when the operation didn't change during normalization, avoiding unnecessary reparsing on cache hit.


1000-1012: LGTM!

Correctly stores cache entry with reparse: true when the operation changed during normalization, ensuring the document is properly reparsed on cache hit.


1014-1019: LGTM!

The cache key correctly uses only the normalized representation since variable remapping depends on operation structure, not variable values.


1457-1466: LGTM!

The initialization correctly wires the new caches into OperationCache and updates the creation condition to include the new cache types.


1363-1365: LGTM!

Clear documentation explaining the method's purpose.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 815e29e and a7db1a8.

📒 Files selected for processing (2)
  • router/core/graph_server.go (5 hunks)
  • router/core/graphql_handler.go (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma Repo: wundergraph/cosmo PR: 2181 File: router/core/operation_processor.go:0-0 Timestamp: 2025-09-02T12:52:27.677Z Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur. 

Applied to files:

  • router/core/graphql_handler.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech Repo: wundergraph/cosmo PR: 2327 File: graphqlmetrics/core/metrics_service.go:435-442 Timestamp: 2025-11-12T18:38:46.619Z Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration. 

Applied to files:

  • router/core/graph_server.go
📚 Learning: 2025-08-14T17:46:00.930Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2137 File: router/pkg/metric/oltp_event_metric_store.go:68-68 Timestamp: 2025-08-14T17:46:00.930Z Learning: In the Cosmo router codebase, meter providers are shut down centrally as part of the router lifecycle, so individual metric stores like otlpEventMetrics use no-op Shutdown() methods rather than calling meterProvider.Shutdown(ctx) directly. 

Applied to files:

  • router/core/graph_server.go
🧬 Code graph analysis (2)
router/core/graphql_handler.go (2)
connect/src/wg/cosmo/platform/v1/platform_pb.ts (1)
  • Header (9607-9645)
cli/src/utils.ts (1)
  • Header (23-26)
router/core/graph_server.go (2)
router/core/operation_processor.go (4)
  • NormalizationCacheEntry (764-768)
  • ComplexityCacheEntry (798-803)
  • VariablesNormalizationCacheEntry (770-786)
  • RemapVariablesCacheEntry (788-796)
router/pkg/metric/cache_metrics.go (2)
  • CacheMetrics (49-55)
  • NewCacheMetricInfo (33-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: codecov/patch
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image
🔇 Additional comments (6)
router/core/graphql_handler.go (2)

37-41: LGTM!

The new cache header constants follow the established naming convention and are appropriately exposed as public API constants.


446-463: LGTM!

The helper function s eliminates duplication and makes the code more maintainable. The uniform pattern for setting cache headers is clean and consistent.

router/core/graph_server.go (4)

508-525: LGTM!

The new cache fields in graphMux follow the established pattern and use appropriate type parameters for the cache entry types.


712-718: LGTM!

The cache metrics registration for the new caches follows the established pattern and uses descriptive metric type names.


748-755: LGTM!

The shutdown logic correctly closes the new caches alongside existing ones. The simplified approach (without nil checks) is appropriate since the Close() method handles nil internally.


1249-1250: LGTM!

The new caches are properly wired into OperationProcessorOptions, following the same pattern as existing caches.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
router/core/operation_processor.go (1)

913-929: Critical: parsedOperation.Variables not synchronized after cache restore.

When restoring from the variables normalization cache (line 920), o.parsedOperation.Request.Variables is updated from entry.variables, but o.parsedOperation.Variables (the *fastjson.Object) is not re-parsed to match. This creates a data inconsistency since downstream code (e.g., line 1234 in writeSkipIncludeCacheKeyToKeyGen) uses parsedOperation.Variables.Get() expecting it to reflect the current Request.Variables.

The setAndParseOperationDoc() call (lines 922-926) updates kit.doc.Input.Variables but does not update parsedOperation.Variables.

This is the same issue flagged in the past review comments and remains unresolved.

Apply this fix immediately after line 920:

	o.parsedOperation.NormalizedRepresentation = entry.normalizedRepresentation	o.parsedOperation.ID = entry.id	o.parsedOperation.Request.Variables = entry.variables +o.parsedOperation.Variables = fastjson.MustParseBytes(entry.variables).GetObject()	if entry.reparse {

This ensures the fastjson object stays in sync with the JSON bytes, matching the pattern used in unmarshalOperation (lines 369, 371, 382).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50cd276 and 9301cf9.

📒 Files selected for processing (1)
  • router/core/operation_processor.go (13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_test
🔇 Additional comments (5)
router/core/operation_processor.go (5)

770-796: LGTM! Well-designed cache entry types.

The new cache entry structures are well-documented and contain the appropriate fields for restoring cached state. The reparse flag in VariablesNormalizationCacheEntry correctly tracks when document reparsing is needed.


1014-1024: Good fix! Cache key now accounts for the disabled parameter.

This addresses the concern from the past review where the cache key didn't distinguish between disabled=true and disabled=false, which could have caused incorrect cache hits. The fix correctly incorporates the disabled flag into the cache key.


1462-1471: LGTM! Cache initialization properly wired.

The cache initialization correctly creates the OperationCache when any cache is provided and properly assigns all cache fields including the new variablesNormalizationCache and remapVariablesCache.


976-1012: Cache storage logic is well-structured.

The cache storage correctly differentiates between cases where the operation document changed vs. didn't change, setting the reparse flag appropriately. The cache entries store all necessary state for restoration.

Note: The cache restoration logic (lines 917-927) has a separate critical issue already flagged above regarding parsedOperation.Variables not being updated.


1026-1099: LGTM! RemapVariables caching correctly implemented.

The caching logic for RemapVariables is well-implemented:

  • Cache key properly includes the disabled parameter (fixed from past review)
  • Cache hit restores all necessary state and rebuilds the document
  • Cache storage captures the complete result
@ysmolski ysmolski requested a review from jensneuse November 26, 2025 12:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
router/core/graph_server.go (1)

747-755: Relying on cache Close() nil-guard in graphMux.Shutdown is acceptable

Calling Close() unconditionally on all caches (plan, persisted, normalization, variablesNormalization, remapVariables, complexity, validation, operationHash) simplifies the shutdown logic and is safe given Cache.Close already handles nil receivers and closed caches. This keeps cleanup consistent as new caches are added.

🧹 Nitpick comments (3)
aws-lambda-router/cmd/main.go (1)

7-10: Lambda main wiring matches new engine config; consider softer shutdown error handling

  • The updated EngineExecutionConfiguration literal (only setting EnableSingleFlight, EnableRequestTracing, and MaxConcurrentResolvers) correctly drops the now-removed execution-plan cache header flag and will compile against the new config surface.
  • Using HTTP_PORT to switch between pure HTTP mode and Lambda, and ignoring the expected “server closed” error in the HTTP path, is a reasonable pattern here.
  • In the SIGTERM shutdown callback, you currently panic on r.Shutdown error; given this runs in the termination path, it might be slightly friendlier to log the error at Error/Warn level and return, instead of panicking, unless you explicitly want crashes to surface in logs as hard failures.

Also applies to: 26-33, 59-63, 76-81, 89-97

router/pkg/config/config.schema.json (1)

2755-2758: Schema property for unified cache response headers looks consistent

The new engine.debug.enable_cache_response_headers boolean cleanly replaces the previous per-cache header toggles and matches the naming used in fixtures and testdata. The description is clear; if more cache types later expose hit/miss headers, you might broaden the wording to “GraphQL operation-level caches” instead of listing specific caches, but that’s optional.

router/core/graph_server.go (1)

508-525: New variables/remap caches and metrics wiring are coherent

The additional variablesNormalizationCache and remapVariablesCache fields on graphMux, their creation in buildOperationCaches, and their inclusion in configureCacheMetrics (as variables_normalization and remap_variables) are internally consistent:

  • They are enabled under the same EnableNormalizationCache / NormalizationCacheSize gate as the main normalization cache.
  • They use the same sizing and metrics flags as other caches.
  • Metric registration only occurs when the caches are non-nil.

This cleanly extends both caching and cache metrics without changing existing behavior.

You could optionally gate remapVariablesCache creation on !srv.engineExecutionConfiguration.DisableVariablesRemapping to avoid allocating an unused cache when remapping is disabled, but that’s not required for correctness.

Also applies to: 567-603, 676-718

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c39ddf0 and 8d8b263.

📒 Files selected for processing (9)
  • aws-lambda-router/cmd/main.go (2 hunks)
  • router-tests/testenv/testenv.go (1 hunks)
  • router/core/graph_server.go (6 hunks)
  • router/core/graphql_handler.go (4 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (1 hunks)
  • router/pkg/config/fixtures/full.yaml (1 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech Repo: wundergraph/cosmo PR: 2327 File: graphqlmetrics/core/metrics_service.go:435-442 Timestamp: 2025-11-12T18:38:46.619Z Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration. 

Applied to files:

  • router/core/graph_server.go
📚 Learning: 2025-08-14T17:46:00.930Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2137 File: router/pkg/metric/oltp_event_metric_store.go:68-68 Timestamp: 2025-08-14T17:46:00.930Z Learning: In the Cosmo router codebase, meter providers are shut down centrally as part of the router lifecycle, so individual metric stores like otlpEventMetrics use no-op Shutdown() methods rather than calling meterProvider.Shutdown(ctx) directly. 

Applied to files:

  • router/core/graph_server.go
📚 Learning: 2025-07-30T09:29:46.660Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2090 File: router/pkg/config/config.schema.json:0-0 Timestamp: 2025-07-30T09:29:46.660Z Learning: The "operation_name_trim_limit" configuration property in router/pkg/config/config.schema.json should be placed at the security level as a sibling to complexity_limits, not inside the complexity_limits object. 

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-07-21T15:06:36.664Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2067 File: router/pkg/config/config.schema.json:1637-1644 Timestamp: 2025-07-21T15:06:36.664Z Learning: In the Cosmo router project, when extending JSON schema validation for security-sensitive fields like JWKS secrets, backwards compatibility is maintained by implementing warnings in the Go code rather than hard validation constraints in the schema. This allows existing configurations to continue working while alerting users to potential security issues. 

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2167 File: router/internal/expr/retry_context.go:3-10 Timestamp: 2025-08-26T17:19:28.178Z Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations. 

Applied to files:

  • aws-lambda-router/cmd/main.go
📚 Learning: 2025-08-12T13:50:45.964Z
Learnt from: Noroth Repo: wundergraph/cosmo PR: 2132 File: router-plugin/plugin.go:139-146 Timestamp: 2025-08-12T13:50:45.964Z Learning: In the Cosmo router plugin system, the plugin framework creates its own logger independently. When creating a logger in NewRouterPlugin, it's only needed for the gRPC server setup (passed via setup.GrpcServerInitOpts.Logger) and doesn't need to be assigned back to serveConfig.Logger. 

Applied to files:

  • aws-lambda-router/cmd/main.go
📚 Learning: 2025-08-12T09:13:38.973Z
Learnt from: Noroth Repo: wundergraph/cosmo PR: 2132 File: router-plugin/plugin.go:88-104 Timestamp: 2025-08-12T09:13:38.973Z Learning: In the Cosmo router plugin system, plugin logs are written to stdout and incorporated by the router into its zap logger, which handles timestamping. Therefore, plugin loggers should use DisableTime: true to avoid redundant timestamps that could interfere with the router's log processing. 

Applied to files:

  • aws-lambda-router/cmd/main.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma Repo: wundergraph/cosmo PR: 2181 File: router/core/operation_processor.go:0-0 Timestamp: 2025-09-02T12:52:27.677Z Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur. 

Applied to files:

  • router/core/graphql_handler.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma Repo: wundergraph/cosmo PR: 2181 File: router/core/operation_processor.go:0-0 Timestamp: 2025-09-02T12:52:27.677Z Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior. 

Applied to files:

  • router/core/graphql_handler.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki Repo: wundergraph/cosmo PR: 2273 File: router/core/graphql_handler.go:0-0 Timestamp: 2025-11-19T15:13:57.821Z Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code. 

Applied to files:

  • router/core/graphql_handler.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon Repo: wundergraph/cosmo PR: 2252 File: router-tests/telemetry/telemetry_test.go:9684-9693 Timestamp: 2025-10-01T20:39:16.113Z Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed. 

Applied to files:

  • router-tests/testenv/testenv.go
🧬 Code graph analysis (2)
router/core/graph_server.go (3)
router/core/operation_processor.go (3)
  • NormalizationCacheEntry (764-768)
  • VariablesNormalizationCacheEntry (770-786)
  • RemapVariablesCacheEntry (788-796)
router/core/engine_loader_hooks.go (1)
  • NewEngineRequestHooks (55-86)
router/core/graphql_handler.go (1)
  • HandlerOptions (68-84)
router-tests/testenv/testenv.go (1)
router/pkg/config/config.go (1)
  • EngineDebugConfiguration (355-371)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: Analyze (go)
🔇 Additional comments (8)
router/pkg/config/fixtures/full.yaml (1)

353-356: Engine debug fixture aligned with new cache header flag

engine.debug.enable_cache_response_headers: false matches the new consolidated flag in the schema and keeps this “full” fixture consistent with the JSON testdata. No issues from a config-surface perspective.

router/pkg/config/testdata/config_defaults.json (1)

349-366: Default debug config correctly reflects unified cache header toggle

Adding "EnableCacheResponseHeaders": false under EngineExecutionConfiguration.Debug cleanly replaces the old per-header booleans and keeps the defaults JSON in sync with the schema and YAML fixture. Looks good.

router/pkg/config/testdata/config_full.json (1)

729-746: Full-config testdata updated for consolidated cache header flag

The addition of "EnableCacheResponseHeaders": false to the debug block aligns this “full” config snapshot with the new unified cache response header setting. No behavioral concerns.

router-tests/testenv/testenv.go (1)

1303-1327: Engine execution/debug config wiring looks consistent with new cache/header flags

The execution config in configureRouter correctly enables normalization cache, sets a reasonable cache size, and uses the new unified EnableCacheResponseHeaders debug flag; this should exercise the new cache behavior and headers in tests without surprises.

router/pkg/config/config.go (1)

355-370: Unified debug flag EnableCacheResponseHeaders is well-integrated

The updated EngineDebugConfiguration cleanly consolidates cache header toggles under EnableCacheResponseHeaders with consistent env/yaml tags; this matches how the flag is consumed in graph_server.go and graphql_handler.go.

router/core/graphql_handler.go (2)

36-42: Cache headers and unified setDebugCacheHeaders behavior look correct

The new cache header constants for execution plan, persisted operations, normalization, variables normalization, and variables remapping align with the new caches added in graph_server.go. GraphQLHandler’s enableCacheResponseHeaders flag and setDebugCacheHeaders helper cleanly emit "HIT"/"MISS" across all cache types in one place, only when enabled, which keeps header behavior consistent and easy to reason about.

Also applies to: 129-133, 442-456


68-103: Based on my verification, the review comment is incorrect. The codebase has an explicit design pattern to always initialize tracerProvider to avoid nil pointer panics. At router initialization (router.go:240-242), a noop tracer provider is always created. HandlerOptions always receives this non-nil provider from s.tracerProvider, and there are no code paths where it could be nil. This aligns with the codebase's deliberate pattern documented in the router comment: "Create noop tracer and meter to avoid nil pointer panics and to avoid checking for nil everywhere."

Nil TracerProvider defensive guard is unnecessary

The suggested defensive check contradicts the codebase's established design pattern. The router always initializes TracerProvider with a noop provider (router.go:242) specifically to eliminate nil checks throughout the codebase. All callers of NewGraphQLHandler provide a non-nil tracer provider from the router instance, making the proposed guard redundant.

router/core/graph_server.go (1)

1241-1255: Operation processor wiring and loader hooks integration for new caches look correct

  • OperationProcessorOptions now receives VariablesNormalizationCache and RemapVariablesCache from the corresponding graphMux fields, matching the entry types defined in operation_processor.go.
  • NewEngineRequestHooks is initialized with the mux’s metric store, optional subgraph access logger, tracer provider, and attribute expressions, and then passed via HandlerOptions.EngineLoaderHooks.
  • HandlerOptions also correctly wires EnableCacheResponseHeaders from s.engineExecutionConfiguration.Debug.EnableCacheResponseHeaders and enables response header propagation when headerRules are present.

This keeps the new caching behavior and loader instrumentation fully connected through the server → mux → handler stack.

Also applies to: 1344-1352, 1355-1364

},
"EnableSingleFlight": true,
"EnableRequestTracing": true,
"EnableExecutionPlanCacheResponseHeader": false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it removed? We need to deprecate it

WgAcquireResolverWaitTimeMs = attribute.Key("wg.engine.resolver.wait_time_ms")
WgNormalizationCacheHit = attribute.Key("wg.engine.normalization_cache_hit")
WgVariablesNormalizationCacheHit = attribute.Key("wg.engine.variables_normalization_cache_hit")
WgVariablesRemappingCacheHit = attribute.Key("wg.engine.variables_remapping_cache_hit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should show this in the trace UI in the studio too. Please create an issue.

Comment on lines +1344 to +1352
loaderHooks := NewEngineRequestHooks(
gm.metricStore,
subgraphAccessLogger,
s.tracerProvider,
tracingAttExpressions,
telemetryAttExpressions,
metricAttExpressions,
exprManager.VisitorManager.IsSubgraphResponseBodyUsedInExpressions(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you extract the hooks into a variable here? It's only used once anyway

Comment on lines 442 to +458
func (h *GraphQLHandler) setDebugCacheHeaders(w http.ResponseWriter, opCtx *operationContext) {
if h.enableNormalizationCacheResponseHeader {
if h.enableCacheResponseHeaders {
if opCtx.normalizationCacheHit {
w.Header().Set(NormalizationCacheHeader, "HIT")
} else {
w.Header().Set(NormalizationCacheHeader, "MISS")
}
}
if h.enablePersistedOperationCacheResponseHeader {
if opCtx.variablesNormalizationCacheHit {
w.Header().Set(VariablesNormalizationCacheHeader, "HIT")
} else {
w.Header().Set(VariablesNormalizationCacheHeader, "MISS")
}
if opCtx.variablesRemappingCacheHit {
w.Header().Set(VariablesRemappingCacheHeader, "HIT")
} else {
w.Header().Set(VariablesRemappingCacheHeader, "MISS")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a bit of cleanup here:

func (h *GraphQLHandler) setDebugCacheHeader(w http.ResponseWriter, headerName string, hit bool) { headerValue := "MISS" if hit { headerValue = "HIT" } w.Header().Set(headerName, headerValue) } func (h *GraphQLHandler) setDebugCacheHeaders(w http.ResponseWriter, opCtx *operationContext) { if h.enableCacheResponseHeaders { h.setDebugCacheHeader(w, NormalizationCacheHeader, opCtx.normalizationCacheHit) h.setDebugCacheHeader(w, VariablesNormalizationCacheHeader, opCtx.variablesNormalizationCacheHit) h.setDebugCacheHeader(w, VariablesRemappingCacheHeader, opCtx.variablesRemappingCacheHit) h.setDebugCacheHeader(w, PersistedOperationCacheHeader, opCtx.persistedOperationCacheHit) h.setDebugCacheHeader(w, ExecutionPlanCacheHeader, opCtx.planCacheHit)	} }
// NewUploadPath string - if variable was used in the inline object like this `arg: {f: $f}` this field will hold the new extracted path, example "variables.a.f", if it is an empty, there was no change in the path
// }
uploadsMapping, err := operationKit.NormalizeVariables()
cached, uploadsMapping, err := operationKit.NormalizeVariables()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the comment removed? Should it be updated to explain how the NormalizeVariables function has changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment