chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956
chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956vladimir-dd wants to merge 4 commits intomasterfrom
Conversation
c4c80b6 to fa052b6 Compare | @codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76fb1c59bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f5faf86 to 783f621 Compare 67c992a to eccada1 Compare 4042a17 to d2df3d5 Compare …ssion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hput The egress_throughput goal measures compressed bytes received by the blackhole. Switching v2 series from zlib to zstd produces smaller compressed payloads (better compression ratio), which registers as a false regression in egress bytes/sec. ingress_throughput measures how fast Vector consumes statsd data from the generator, which is compression-agnostic and reflects actual pipeline performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d2df3d5 to 48bdb12 Compare There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48bdb12f7e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…zero limits Start proptest ranges at 1 instead of 0 for uncompressed_limit and compressed_limit. The old validate_payload_size_limits rejected zero limits, but with_payload_limits is now infallible, so finish() can panic on division-by-zero when computing recommended_splits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sion Sketches endpoint now uses zstd instead of zlib, matching Series v2. Only Series v1 remains on zlib. Validated against real Datadog API: 36/36 correctness checks passed, all 18 metrics match between v1 and v2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| @codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4f8e56d64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| impl DatadogMetricsCompression { | ||
| pub(super) const fn content_encoding(self) -> &'static str { | ||
| match self { | ||
| Self::Zstd => "zstd", |
There was a problem hiding this comment.
Use Datadog’s expected zstd1 content encoding token
The new Zstd branch returns "zstd", but Datadog’s metrics v2 API documentation and generated clients use MetricContentEncoding::ZSTD1 / contentEncoding: "zstd1" for compressed submits. Because this value is propagated directly to the Content-Encoding header, Series v2 (the default) and sketches requests can be rejected with 4xx on environments that enforce the documented enum, causing dropped metrics instead of the intended compression improvement.
Useful? React with 👍 / 👎.
Summary
Rationale: Switch Series v2 (
/api/v2/series) and sketches(/api/beta/sketches) to zstd compression.DatadogMetricsCompressionenum (Zlib/Zstd) inconfig.rswithcompressor(),content_encoding(), andmax_compressed_size()methodscompression()method onDatadogMetricsEndpoint: Series v2 and Sketches → Zstd, Series v1 → Zlibmax_compressed_size(n)for each scheme: Zlib uses the DEFLATE stored-block worst-case formula; Zstd mirrors theZSTD_compressBoundC macrocontent_encodingthroughDatadogMetricsRequestand the request builder instead of hardcoding"deflate"DatadogMetricsEncoder::new()infallible — production limits frompayload_limits()are always valid; removeCreateErrorandvalidate_payload_size_limitsbuffered_boundfor all compressor types (zstd 128KB blocks, zlib 4KB BufWriter) to avoid underestimating compressed payload sizestatsd_to_datadog_metrics): switch toingress_throughput, which is a better default benchmark of overall throughputCompressed size capacity estimate:
The encoder needs to decide whether accepting one more metric would exceed the compressed payload limit, without being able to back out a compressor write. The estimate splits into two parts:
get_ref().len()) — exact compressed sizemax_compressed_size(buffered_bound + n)(worst-case upper bound)All compressors buffer internally before flushing (zstd: 128 KB per block, zlib: 4 KB BufWriter).
buffered_boundtracks an upper bound on uncompressed bytes not yet visible inget_ref().len(), resetting tonwhen a flush is detected.Tests added:
max_compressed_size_is_upper_bound: empirically validates both Zlib and Zstd formulas are true upper bounds using incompressible (Xorshift64) data, and are not overly conservative (slack ≤ 1% + 64 bytes)zstd_v2_payload_never_exceeds_512kb_with_incompressible_data: end-to-end test with real 512KB limit, verifies payload ≤ 512KB (safety) and > 95% utilization (efficiency) using high-entropy printable ASCII metric namescompressed_limit_is_respected_regardless_of_compressor_internal_buffering: regression test for zstd's 128KB internal buffering — uses a 512-byte compressed limit whereget_ref().len()stays 0 throughout, verifying the encoder stops after a handful of metrics (not 100)zstd_buffered_bound_resets_to_last_metric_size_after_block_flush: white-box test directly verifyingbuffered_boundresets to exactlyn(not 0) after a zstd block flushencode_series_v2_breaks_out_when_limit_reached_compressed: verifies the hot-path compressed-limit check works correctly for the zstd pathencoding_check_for_payload_limit_edge_cases_v2: proptest that any Series v2 payload decompresses cleanly with zstd and stays within configured limitsv2_series_default_limits_split_large_batches: validates 120k metrics are correctly split across multiple batches with v2 limitsdefault_batch_config_uses_endpoint_specific_size_limits/v1_batch_config_uses_v1_size_limit/explicit_max_bytes_applies_to_both_endpoints: verify per-endpoint batch size limitsCorrectness analysis
V1/zlib path preserved
Series(V1).compression()andSketches.compression()both returnZlib— no change in compressor selectionZlib.content_encoding()returns"deflate"— same as the previously hardcodedContent-EncodingheaderZlib.compressor()returnsCompression::zlib_default().into()— identical to the oldget_compressor()write_payload_header/write_payload_footerstill emit JSON wrapping ({"series":[/]}) for V1, nothing for V2/Sketchesmax_compressed_size(n)formula is algebraically identical to the oldn + max_compressed_overhead_len(n):both compute
n + (1 + n.saturating_sub(6) / 16384) * 5buffered_boundnow makes the compressed-size estimate slightly more conservative by accounting for the ~4 KB BufWriter buffer. This is more correct than before and the impact is negligible against the 3.2 MB compressed limitV2/zstd path
ZSTD_compressBoundformula (n + (n >> 8) + correction for <128KB) matches the C library macro exactlybuffered_boundtracking is sound: accumulates on each write (+= n), resets ton(not 0) when a flush is detected — because the triggering write may straddle the block boundary,nis a safe upper bound on what remains bufferedbuffered_bound(header viatry_encode, footer is 0 bytes for V2)reset_state()creates the correct compressor for the endpoint (was previously always zlib viaDefault)finish()retains its existing safety net: if the payload exceeds the compressed limit after finalization, it returnsTooLargewith a recommended split countRemoved code
CreateError/FailedToBuild: construction is now infallible since limits always come frompayload_limits()validate_payload_size_limits: no longer needed —with_payload_limits()is gated behind#[cfg(test)], production code always uses well-known API limitsis_series(): only consumer was the removedvalidate_payload_size_limitsget_compressor()/max_compressed_overhead_len()/max_compression_overhead_len(): replaced byDatadogMetricsCompression::compressor()andmax_compressed_size()Vector configuration
How did you test this PR?
cargo test --no-default-features --features sinks-datadog_metrics).End-to-end correctness test (branch)
Ran
scripts/validate_dd_metrics_correctness.pyagainst the real Datadog API. All 18 metric checks passed for both v1 and v2, with identical values:All 18 metrics match between v1 and v2.
v1/zlib vs v2/zstd performance benchmark (branch)
Ran
scripts/benchmark_dd_metrics_v1_v2.pyagainst the real API at 50k events/sec, 2 repeats, 15s warmup, 60s measure:bytes_sent()in the DD metrics service was changed fromrequest_encoded_size()(uncompressed) torequest_wire_size()(compressed/on-the-wire);Key takeaway: v2 delivers the same metric throughput as v1 while using 22% less CPU, 66% less memory, and 67% less bandwidth. The higher HTTP request rate is expected due to the smaller v2 payload limit (512KB vs 3.2MB).
SMP regression benchmark
The
statsd_to_datadog_metricsSMP benchmark reported a -69% drop inegress_throughput(compressed bytes received by the blackhole), whileingress_throughputhas increased by ~75%:ingress_throughput benchmark:

egress_throughput benchmark - "regression" here is an improvement(OPW sends out 3x less bytes):

Change Type
Is this a breaking change?
Does this PR include user facing changes?
References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this templatemake fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make test