Skip to content

chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956

Open
vladimir-dd wants to merge 4 commits intomasterfrom
vladimir-dd/metrics-v2-zstd
Open

chore(datadog_metrics sink): switch series v2 and sketches to zstd compression#24956
vladimir-dd wants to merge 4 commits intomasterfrom
vladimir-dd/metrics-v2-zstd

Conversation

@vladimir-dd
Copy link
Contributor

@vladimir-dd vladimir-dd commented Mar 18, 2026

Summary

Rationale: Switch Series v2 (/api/v2/series) and sketches(/api/beta/sketches) to zstd compression.

  • Add DatadogMetricsCompression enum (Zlib/Zstd) in config.rs with compressor(), content_encoding(), and max_compressed_size() methods
  • Add compression() method on DatadogMetricsEndpoint: Series v2 and Sketches → Zstd, Series v1 → Zlib
  • Add max_compressed_size(n) for each scheme: Zlib uses the DEFLATE stored-block worst-case formula; Zstd mirrors the ZSTD_compressBound C macro
  • Propagate content_encoding through DatadogMetricsRequest and the request builder instead of hardcoding "deflate"
  • Make DatadogMetricsEncoder::new() infallible — production limits from payload_limits() are always valid; remove CreateError and validate_payload_size_limits
  • Track buffered_bound for all compressor types (zstd 128KB blocks, zlib 4KB BufWriter) to avoid underestimating compressed payload size
  • Fix SMP regression benchmark (statsd_to_datadog_metrics): switch to ingress_throughput, which is a better default benchmark of overall throughput

Compressed size capacity estimate:

The encoder needs to decide whether accepting one more metric would exceed the compressed payload limit, without being able to back out a compressor write. The estimate splits into two parts:

  1. Bytes already flushed to the output buffer (get_ref().len()) — exact compressed size
  2. Bytes still in the compressor's internal buffer — estimated via max_compressed_size(buffered_bound + n) (worst-case upper bound)

All compressors buffer internally before flushing (zstd: 128 KB per block, zlib: 4 KB BufWriter). buffered_bound tracks an upper bound on uncompressed bytes not yet visible in get_ref().len(), resetting to n when a flush is detected.

Tests added:

  • max_compressed_size_is_upper_bound: empirically validates both Zlib and Zstd formulas are true upper bounds using incompressible (Xorshift64) data, and are not overly conservative (slack ≤ 1% + 64 bytes)
  • zstd_v2_payload_never_exceeds_512kb_with_incompressible_data: end-to-end test with real 512KB limit, verifies payload ≤ 512KB (safety) and > 95% utilization (efficiency) using high-entropy printable ASCII metric names
  • compressed_limit_is_respected_regardless_of_compressor_internal_buffering: regression test for zstd's 128KB internal buffering — uses a 512-byte compressed limit where get_ref().len() stays 0 throughout, verifying the encoder stops after a handful of metrics (not 100)
  • zstd_buffered_bound_resets_to_last_metric_size_after_block_flush: white-box test directly verifying buffered_bound resets to exactly n (not 0) after a zstd block flush
  • encode_series_v2_breaks_out_when_limit_reached_compressed: verifies the hot-path compressed-limit check works correctly for the zstd path
  • encoding_check_for_payload_limit_edge_cases_v2: proptest that any Series v2 payload decompresses cleanly with zstd and stays within configured limits
  • v2_series_default_limits_split_large_batches: validates 120k metrics are correctly split across multiple batches with v2 limits
  • default_batch_config_uses_endpoint_specific_size_limits / v1_batch_config_uses_v1_size_limit / explicit_max_bytes_applies_to_both_endpoints: verify per-endpoint batch size limits
Correctness analysis

V1/zlib path preserved

  • Series(V1).compression() and Sketches.compression() both return Zlib — no change in compressor selection
  • Zlib.content_encoding() returns "deflate" — same as the previously hardcoded Content-Encoding header
  • Zlib.compressor() returns Compression::zlib_default().into() — identical to the old get_compressor()
  • write_payload_header / write_payload_footer still emit JSON wrapping ({"series":[ / ]}) for V1, nothing for V2/Sketches
  • The zlib max_compressed_size(n) formula is algebraically identical to the old n + max_compressed_overhead_len(n):
    both compute n + (1 + n.saturating_sub(6) / 16384) * 5
  • The only behavioral change for zlib: buffered_bound now makes the compressed-size estimate slightly more conservative by accounting for the ~4 KB BufWriter buffer. This is more correct than before and the impact is negligible against the 3.2 MB compressed limit

V2/zstd path

  • The ZSTD_compressBound formula (n + (n >> 8) + correction for <128KB) matches the C library macro exactly
  • buffered_bound tracking is sound: accumulates on each write (+= n), resets to n (not 0) when a flush is detected — because the triggering write may straddle the block boundary, n is a safe upper bound on what remains buffered
  • Header/footer bytes written to the compressor are tracked in buffered_bound (header via try_encode, footer is 0 bytes for V2)
  • reset_state() creates the correct compressor for the endpoint (was previously always zlib via Default)
  • finish() retains its existing safety net: if the payload exceeds the compressed limit after finalization, it returns TooLarge with a recommended split count

Removed code

  • CreateError / FailedToBuild: construction is now infallible since limits always come from payload_limits()
  • validate_payload_size_limits: no longer needed — with_payload_limits() is gated behind #[cfg(test)], production code always uses well-known API limits
  • is_series(): only consumer was the removed validate_payload_size_limits
  • get_compressor() / max_compressed_overhead_len() / max_compression_overhead_len(): replaced by DatadogMetricsCompression::compressor() and max_compressed_size()

Vector configuration

sinks: datadog_metrics: type: datadog_metrics inputs: [...] default_api_key: "${DD_API_KEY}" series_api_version: v2 # now correctly uses zstd

How did you test this PR?

  • Unit tests: all datadog metrics encoder tests pass (cargo test --no-default-features --features sinks-datadog_metrics).

End-to-end correctness test (branch)

Ran scripts/validate_dd_metrics_correctness.py against the real Datadog API. All 18 metric checks passed for both v1 and v2, with identical values:

Metric v1 v2
counter 50.0 50.0 ✅
gauge 42.5 42.5 ✅
set 1.0 1.0 ✅
dist avg/count/sum/min/max
histogram count/avg
summary sum/count/ratio
multi-tag counter (group:a/b/*)
multi-tag gauge (group:a/b)

All 18 metrics match between v1 and v2.

v1/zlib vs v2/zstd performance benchmark (branch)

Ran scripts/benchmark_dd_metrics_v1_v2.py against the real API at 50k events/sec, 2 repeats, 15s warmup, 60s measure:

Metric v1/zlib v2/zstd Delta
Sent events/s 50,922 50,311 -1.2% (≈equal)
Compressed bytes/s* 3.33 MB/s 1.11 MB/s -66.6% (better compression)
Avg CPU % 169.7 131.7 -22.4%
Avg RSS (MB) 7,334 2,478 -66.2%
Peak RSS (MB) 10,162 2,710 -73.3%
Delivery ratio 1.27 1.20 ≈equal
HTTP requests/s 10.4 124.4 +1093% (expected: smaller 512KB batches vs 3.2MB)
  • bytes_sent() in the DD metrics service was changed from request_encoded_size()(uncompressed) to request_wire_size() (compressed/on-the-wire);

Key takeaway: v2 delivers the same metric throughput as v1 while using 22% less CPU, 66% less memory, and 67% less bandwidth. The higher HTTP request rate is expected due to the smaller v2 payload limit (512KB vs 3.2MB).

SMP regression benchmark

The statsd_to_datadog_metrics SMP benchmark reported a -69% drop in egress_throughput (compressed bytes received by the blackhole), while ingress_throughput has increased by ~75%:

ingress_throughput benchmark:
Screenshot 2026-03-20 at 08 10 46

egress_throughput benchmark - "regression" here is an improvement(OPW sends out 3x less bytes):
Screenshot 2026-03-20 at 08 20 03

Change Type

  • New feature

Is this a breaking change?

  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Mar 18, 2026
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch 14 times, most recently from c4c80b6 to fa052b6 Compare March 18, 2026 19:28
@vladimir-dd vladimir-dd changed the title feat(datadog_metrics sink): add zstd compression for series v2 endpoint feat(datadog_metrics sink): switch series v2 endpoint to zstd compression Mar 18, 2026
@vladimir-dd
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76fb1c59bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch 3 times, most recently from f5faf86 to 783f621 Compare March 19, 2026 09:34
@github-actions github-actions bot added the domain: releasing Anything related to releasing Vector label Mar 19, 2026
@vladimir-dd vladimir-dd changed the title feat(datadog_metrics sink): switch series v2 endpoint to zstd compression WIP: feat(datadog_metrics sink): switch series v2 endpoint to zstd compression Mar 19, 2026
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch 5 times, most recently from 67c992a to eccada1 Compare March 19, 2026 16:41
@vladimir-dd vladimir-dd changed the title WIP: feat(datadog_metrics sink): switch series v2 endpoint to zstd compression chore(datadog_metrics sink): switch series v2 endpoint to zstd compression Mar 20, 2026
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch from 4042a17 to d2df3d5 Compare March 20, 2026 07:56
@github-actions github-actions bot removed the domain: releasing Anything related to releasing Vector label Mar 20, 2026
vladimir-dd and others added 2 commits March 20, 2026 09:27
…ssion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hput The egress_throughput goal measures compressed bytes received by the blackhole. Switching v2 series from zlib to zstd produces smaller compressed payloads (better compression ratio), which registers as a false regression in egress bytes/sec. ingress_throughput measures how fast Vector consumes statsd data from the generator, which is compression-agnostic and reflects actual pipeline performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch from d2df3d5 to 48bdb12 Compare March 20, 2026 08:28
@vladimir-dd vladimir-dd marked this pull request as ready for review March 20, 2026 08:28
@vladimir-dd vladimir-dd requested a review from a team as a code owner March 20, 2026 08:28
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48bdb12f7e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vladimir-dd and others added 2 commits March 20, 2026 10:13
…zero limits Start proptest ranges at 1 instead of 0 for uncompressed_limit and compressed_limit. The old validate_payload_size_limits rejected zero limits, but with_payload_limits is now infallible, so finish() can panic on division-by-zero when computing recommended_splits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sion Sketches endpoint now uses zstd instead of zlib, matching Series v2. Only Series v1 remains on zlib. Validated against real Datadog API: 36/36 correctness checks passed, all 18 metrics match between v1 and v2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vladimir-dd vladimir-dd changed the title chore(datadog_metrics sink): switch series v2 endpoint to zstd compression chore(datadog_metrics sink): switch series v2 and sketches to zstd compression Mar 20, 2026
@pront
Copy link
Member

pront commented Mar 20, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4f8e56d64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

impl DatadogMetricsCompression {
pub(super) const fn content_encoding(self) -> &'static str {
match self {
Self::Zstd => "zstd",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use Datadog’s expected zstd1 content encoding token

The new Zstd branch returns "zstd", but Datadog’s metrics v2 API documentation and generated clients use MetricContentEncoding::ZSTD1 / contentEncoding: "zstd1" for compressed submits. Because this value is propagated directly to the Content-Encoding header, Series v2 (the default) and sketches requests can be rejected with 4xx on environments that enforce the documented enum, causing dropped metrics instead of the intended compression improvement.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: sinks Anything related to the Vector's sinks

2 participants