Skip to content

Conversation

@teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Dec 2, 2025

LLM summary:

This PR introduces RRD manifest support as the second part of the RRD footers series. The manifest provides an index of chunks in an RRD recording, enabling efficient query planning without loading actual data. Key changes include:

  • Extending RrdFooter and adding RrdManifest protobuf definitions
  • Implementing RrdManifest and RrdManifestBuilder to catalog chunks with their metadata (ID, size, offset, timeline ranges)
  • Adding transport layer conversions between application-level and protobuf types
  • Refactoring schema hash computation to a shared utility method

To which I'll add:

This PR introduces the core datastructure that this whole footer business is all about: the RrdManifest.

The RrdManifest is a protobuf message (which, as always, comes with both a transport-level and application-level implementations) which carries various important metadata about the associated recording.
The most important piece of metadata is the actual manifest record-batch, which lists all the chunks in the recording at at fine-grained enough level of detail that relevancy queries become possible without ever having to load the data in a local chunk-store first.
Effectively, you can view record-batch is a dataframe representation of the time panel.

There is a lot of code, but it's all fairly trivial and mechanical, bordering on boilerplate in some cases.
In fact, what you should really focus on is the fact that this code:

[
{
let frame1 = TimeInt::new_temporal(10);
let frame2 = TimeInt::new_temporal(20);
let frame3 = TimeInt::new_temporal(30);
let frame4 = TimeInt::new_temporal(40);
let points1 = MyPoint::from_iter(0..1);
let points3 = MyPoint::from_iter(2..3);
let points4 = MyPoint::from_iter(3..4);
let colors2 = MyColor::from_iter(1..2);
let colors3 = MyColor::from_iter(2..3);
Chunk::builder_with_id(next_chunk_id(), entity_path)
.with_sparse_component_batches(
next_row_id(),
build_timepoint(frame1),
[(MyPoints::descriptor_points(), Some(&points1 as _))],
)
.with_sparse_component_batches(
next_row_id(),
build_timepoint(frame2),
[(MyPoints::descriptor_colors(), Some(&colors2 as _))],
)
.with_sparse_component_batches(
next_row_id(),
build_timepoint(frame3),
[
(MyPoints::descriptor_points(), Some(&points3 as _)),
(MyPoints::descriptor_colors(), Some(&colors3 as _)),
],
)
.with_sparse_component_batches(
next_row_id(),
build_timepoint(frame4),
[(MyPoints::descriptor_points(), Some(&points4 as _))],
)
.build()
.unwrap()
.to_arrow_msg()
.unwrap()
},
{
let labels = vec![MyLabel("simple".to_owned())];
Chunk::builder_with_id(next_chunk_id(), entity_path)
.with_sparse_component_batches(
next_row_id(),
TimePoint::default(),
[(MyPoints::descriptor_labels(), Some(&labels as _))],
)
.build()
.unwrap()
.to_arrow_msg()
.unwrap()
},
]

yields this manifest record-batch:

image

with this schema:

chunk_byte_offset: u64
chunk_byte_size: u64
chunk_entity_path: Utf8
chunk_id: FixedSizeBinary[16]
chunk_is_static: bool
elapsed:end: Duration(ns) [
rerun:index:elapsed
]
elapsed:example_MyPoints:colors:end: Duration(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:colors
rerun:component_type:example.MyColor
rerun:index:elapsed
]
elapsed:example_MyPoints:colors:start: Duration(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:colors
rerun:component_type:example.MyColor
rerun:index:elapsed
]
elapsed:example_MyPoints:points:end: Duration(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:points
rerun:component_type:example.MyPoint
rerun:index:elapsed
]
elapsed:example_MyPoints:points:start: Duration(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:points
rerun:component_type:example.MyPoint
rerun:index:elapsed
]
elapsed:start: Duration(ns) [
rerun:index:elapsed
]
example_MyPoints:colors:has_static_data: bool [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:colors
rerun:component_type:example.MyColor
rerun:index:rerun:static
]
example_MyPoints:labels:has_static_data: bool [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:labels
rerun:component_type:example.MyLabel
rerun:index:rerun:static
]
example_MyPoints:points:has_static_data: bool [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:points
rerun:component_type:example.MyPoint
rerun:index:rerun:static
]
frame_nr:end: i64 [
rerun:index:frame_nr
]
frame_nr:example_MyPoints:colors:end: i64 [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:colors
rerun:component_type:example.MyColor
rerun:index:frame_nr
]
frame_nr:example_MyPoints:colors:start: i64 [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:colors
rerun:component_type:example.MyColor
rerun:index:frame_nr
]
frame_nr:example_MyPoints:points:end: i64 [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:points
rerun:component_type:example.MyPoint
rerun:index:frame_nr
]
frame_nr:example_MyPoints:points:start: i64 [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:points
rerun:component_type:example.MyPoint
rerun:index:frame_nr
]
frame_nr:start: i64 [
rerun:index:frame_nr
]
log_time:end: Timestamp(ns) [
rerun:index:log_time
]
log_time:example_MyPoints:colors:end: Timestamp(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:colors
rerun:component_type:example.MyColor
rerun:index:log_time
]
log_time:example_MyPoints:colors:start: Timestamp(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:colors
rerun:component_type:example.MyColor
rerun:index:log_time
]
log_time:example_MyPoints:points:end: Timestamp(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:points
rerun:component_type:example.MyPoint
rerun:index:log_time
]
log_time:example_MyPoints:points:start: Timestamp(ns) [
rerun:archetype:example.MyPoints
rerun:component:example.MyPoints:points
rerun:component_type:example.MyPoint
rerun:index:log_time
]
log_time:start: Timestamp(ns) [
rerun:index:log_time
]


Part of RRD footers series of PRs:

@teh-cmc teh-cmc added 📉 performance Optimization, memory use, etc do-not-merge Do not merge this PR include in changelog 🔩 data model Sorbet 🪵 Log & send APIs Affects the user-facing API for all languages dataplatform Rerun Data Platform integration labels Dec 2, 2025
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Web viewer failed to build.

| Result | Commit | Link | Manifest |
| ------ | ------- | ----- |
| ❌ | | https://rerun.io/viewer/pr/12047 | +nightly +main |

View image diff on kitdiff.

Note: This comment is updated whenever you push a commit.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces RRD manifest support as the second part of the RRD footers series. The manifest provides an index of chunks in an RRD recording, enabling efficient query planning without loading actual data. Key changes include:

  • Extending RrdFooter and adding RrdManifest protobuf definitions
  • Implementing RrdManifest and RrdManifestBuilder to catalog chunks with their metadata (ID, size, offset, timeline ranges)
  • Adding transport layer conversions between application-level and protobuf types
  • Refactoring schema hash computation to a shared utility method

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/store/re_protos/proto/rerun/v1alpha1/log_msg.proto Added RrdManifest message definition with schema, store ID, and dataframe fields
crates/store/re_protos/src/v1alpha1/rerun.log_msg.v1alpha1.rs Generated Rust code for the new protobuf definitions
crates/store/re_log_encoding/src/rrd/footer/instances.rs Implemented RrdManifest with schema validation, column accessors, and sanity checks
crates/store/re_log_encoding/src/rrd/footer/builders.rs Added RrdManifestBuilder for constructing manifests from chunks
crates/store/re_log_encoding/src/transport_to_app.rs Implemented bidirectional conversions between transport and application types
crates/store/re_log_encoding/src/rrd/encoder.rs Updated encoder to initialize empty manifests map
crates/store/re_server/src/store/layer.rs Refactored to use shared schema hash computation utility
crates/store/re_server/Cargo.toml Removed unused sha2 dependency
crates/store/re_log_encoding/Cargo.toml Added required dependencies (re_arrow_util, re_types_core, sha2)
crates/store/re_log_encoding/src/rrd/footer/mod.rs Updated exports to include new manifest types
crates/store/re_log_encoding/src/rrd/mod.rs Exposed RrdManifest and RrdManifestBuilder publicly
Comments suppressed due to low confidence (1)

crates/store/re_log_encoding/src/rrd/footer/instances.rs:1

  • The has_static_data should be false for temporal component-level data, not true. This contradicts line 147 where temporal chunk-level columns correctly set it to false, and the comment indicates this is temporal data.
use std::collections::HashMap; 

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@teh-cmc teh-cmc force-pushed the cmc/rrd_footers_2_rrd_manifests branch 2 times, most recently from 964c086 to c1e6776 Compare December 2, 2025 15:51
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very dumbed down version of the implementation we've been using since forever on the PM side. I haven't really changed anything besides removing a lot of columns and adding docs.

/// can generally be found (Lance, external dataframe libraries, etc).
///
/// If caller doesn't provide any part (i.e. all are `None`), an empty string is returned.
pub fn compute_column_name(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copypasta of the column naming routine that we've been using for ages on the Platform side, so it has been very well battle tested.

}
}

// Sanity checks
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RRD manifests are effectively user input so we need to be at least a little on the defensive so we don't register complete non-sense. All the checks marked cheap basically only look at schema concerns, never the data itself.

There are some extra paranoid layers on the dataplatform side.

@teh-cmc teh-cmc marked this pull request as ready for review December 2, 2025 17:07
emilk added a commit that referenced this pull request Dec 2, 2025
I'll actually apply it once these PRs are merged: * #12044 * #12047 * #12048
Copy link
Member

@zehiko zehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know manifest builder and I know it works and this looks as you said, just a slightly simplified version of it, hence - ship it!

@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Dec 3, 2025
teh-cmc added a commit that referenced this pull request Dec 3, 2025
LLM summary: > This PR introduces the foundational framing infrastructure for RRD stream footers, which will enable richer metadata and improved stream navigation. It establishes the protocol-level types and state machine changes needed to support footers without yet populating them with content. > > Key changes include: > - Addition of `RrdFooter` message types at both transport (protobuf) and application levels > - Introduction of `StreamFooter` frame for locating and validating RRD footers > - Enhanced decoder state machine to handle the new footer frames > - Updated encoder to emit footers with basic state tracking infrastructure To which I would add the following: This introduces all the framing infrastructure so that RRD streams are now capable of carrying footers in all circumstances (including pipes). Specifically, this introduces a couple new types: * `RrdFooter`, a(n empty) protobuf message that will from now on act as the payload for messages of type `MessageKind::End`. * `MessageKind::End` isn't new, it's something that was always there, but until now was always empty. * As always, this has both a cheap transport-level definition and a less cheap app-level definition. * `StreamFooter`, a simple binary frame that mirrors `StreamHeader`, and whose main job is to keep track of where the `RrdFooter`. * This is used for O(1) access to the `RrdFooter`, e.g. when registering data on a Redap-compliant server. For now, these footers are always an empty protobuf messages. We will be filling them up in the following PRs. --- Part of RRD footers series of PRs: * #12044 * #12047 * #12048 * rerun-io/dataplatform#2060 ## TODO * [x] `@rerun-bot full-check` --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Base automatically changed from cmc/rrd_footers_1_framing to main December 3, 2025 10:41
@teh-cmc teh-cmc force-pushed the cmc/rrd_footers_2_rrd_manifests branch from eae5d66 to fd1e2ef Compare December 3, 2025 10:42
@teh-cmc teh-cmc merged commit 0f8b061 into main Dec 3, 2025
31 of 34 checks passed
@teh-cmc teh-cmc deleted the cmc/rrd_footers_2_rrd_manifests branch December 3, 2025 10:43
teh-cmc added a commit that referenced this pull request Dec 3, 2025
LLM summary: > This PR implements encoding and decoding of RRD manifests in footers for the Rerun RRD file format. The changes enable random access to chunks within RRD files by storing metadata about chunk locations and properties in footer manifests. > > Key changes: > - Adds manifest building during encoding, tracking chunk metadata (offsets, sizes, entity paths, etc.) > - Implements manifest parsing during decoding with transport-to-application conversion > - Adds CLI support for displaying parsed footers (`--footers` flag) and recomputing manifests during routing (`--recompute-manifests` flag) To which I actually don't have all that much to add. This PR is basically all the remaining glue so that, whenever one uses our `Encoder` or one of our `Decoder` variants, RRD footers and manifests will automagically be computed, injected and serialized/deserialized. The most important part of this PR is arguably the addition of a `footer_roundtrip` test, that encodes a recording and then manually decodes all of its chunks directly using the generated RRD manifest, instead of using a `Decoder`. --- Part of RRD footers series of PRs: * #12044 * #12047 * #12048 * rerun-io/dataplatform#2060
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔩 data model Sorbet dataplatform Rerun Data Platform integration include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages 📉 performance Optimization, memory use, etc

3 participants