Coordination-free distributed state. Nanosecond latency. Algebraic correctness.
Fact → Bloom Clock → Invariant → Frontier → Gossip 286ns admission. 3.5M ops/sec. Zero consensus. Full observability. Distributed systems trade consistency for latency. Cuttlefish doesn't. It's a state kernel that enforces strict invariants at L1 cache speed by exploiting algebraic properties. If your operations commute, you don't need coordination.
The thesis: Correctness is algebra, not execution order. Commutative ops replicate without consensus. Non-commutative ops fail fast at admission. Bounded ops use escrow partitioning. The kernel tells you which is which.
| Operation | Latency | Notes |
|---|---|---|
| Full admission cycle | 286 ns | Causality + invariant + frontier |
| Instrumented admission | 298 ns | +12ns metrics overhead (4%) |
| Histogram record | 1.3 ns | Power-of-two buckets, RDTSC |
| Causal clock dominance | 700 ps | SIMD Bloom filter comparison |
| Tiered hash verification | 280 ns | BLAKE3 checkpoint integrity |
| Durable admission | 5.2 ns | Staged to io_uring, async fsync |
| WAL hash (200B payload) | 230 ns | 940 MiB/s throughput |
Comparison: etcd pays 1-10ms for linearizable writes. CockroachDB pays 1-50ms. Cuttlefish pays 286ns for causal+ consistency with full Prometheus observability.
[dependencies] ctfs = "1.0.2" zerocopy = "0.8"use ctfs::prelude::*; use ctfs::invariants::total_supply::{ConservationState, TotalSupplyInvariant}; use zerocopy::IntoBytes; fn main() { // Initialize: balance=0, bounds=[MIN, MAX] let state = ConservationState::new(0, i128::MIN, i128::MAX); let mut cell = StateCell::new(); cell.as_slice_mut().copy_from_slice(state.as_bytes()); let mut kernel = Kernel::with_state(TotalSupplyInvariant::new(), cell); // Admit a fact: +100 to balance let fact_id: FactId = [0u8; 32]; let payload = 100i128.to_le_bytes(); kernel.admit_raw(&fact_id, &[], &payload).unwrap(); }┌─────────────────────────────────────────────────────────────┐ │ Kernels │ ├─────────────┬─────────────┬─────────────┬───────────────────┤ │ Kernel │ TwoLane │ Escalating │ Durable/Network │ │ (basic) │ (BFVC+Exact)│ (auto-mode) │ (io_uring/gossip) │ ├─────────────┴─────────────┴─────────────┴───────────────────┤ │ Causal Layer │ ├─────────────┬─────────────┬─────────────────────────────────┤ │ CausalClock │ PreciseClock│ ExactCausalIndex │ │ (512b Bloom)│ (LRU exact) │ (Robin Hood, SIMD) │ ├─────────────┴─────────────┴─────────────────────────────────┤ │ Invariant Layer │ ├─────────────┬─────────────┬─────────────┬───────────────────┤ │ TotalSupply │ Max/GCounter│ Uniqueness │ GGraph │ │ (conserve) │ (monotonic) │ (idempotent)│ (grow-only graph) │ ├─────────────┴─────────────┴─────────────┴───────────────────┤ │ Persistence Layer │ ├─────────────┬─────────────┬─────────────────────────────────┤ │ WALArena │ SPSC Buffer │ io_uring Worker │ │ (zero-copy) │ (lock-free) │ (async fsync) │ ├─────────────┴─────────────┴─────────────────────────────────┤ │ Checkpoint Layer │ ├─────────────────────────────────────────────────────────────┤ │ Tiered BLAKE3 Hash │ Attestation │ Re-anchoring │ └─────────────────────────────────────────────────────────────┘ Run any example with cargo run --example <name>:
cargo run --example basic_kernel # Basic fact admission and causality cargo run --example multi_tenant # Isolated causal domains per tenant cargo run --example custom_invariant # Implement your own invariant cargo run --example two_lane_kernel # BFVC + ExactCausalIndex cargo run --example escalating_kernel # Auto-switch Bloom → Precise mode cargo run --example graph_invariant # Grow-only graph with constraintsuse ctfs::core::{Kernel, StateCell}; use ctfs::core::topology::FactId; use ctfs::invariants::total_supply::{ConservationState, TotalSupplyInvariant}; use zerocopy::IntoBytes; fn main() { // Initialize state let state = ConservationState::new(1000, 0, 1_000_000); let mut cell = StateCell::new(); cell.as_slice_mut().copy_from_slice(state.as_bytes()); let mut kernel = Kernel::with_state(TotalSupplyInvariant::new(), cell); // Admit facts let fact1: FactId = [1u8; 32]; kernel.admit_raw(&fact1, &[], &500i128.to_le_bytes()).unwrap(); // Admit with causal dependency let fact2: FactId = [2u8; 32]; kernel.admit_raw(&fact2, &[fact1], &(-200i128).to_le_bytes()).unwrap(); }use ctfs::core::{Invariant, InvariantError}; struct MonotonicInvariant; impl Invariant for MonotonicInvariant { fn apply(&self, payload: &[u8], state: &mut [u8]) -> Result<(), InvariantError> { let proposed = u64::from_le_bytes(payload[0..8].try_into().unwrap()); let current = u64::from_le_bytes(state[0..8].try_into().unwrap()); if proposed < current { return Err(InvariantError::Underflow); } state[0..8].copy_from_slice(&proposed.to_le_bytes()); Ok(()) } }Immutable, content-addressed state transitions. Each fact has a 32-byte ID, optional causal dependencies, and a payload. Facts form a DAG, not a chain.
Algebraic constraints enforced at admission. Pure functions: Δ_I(payload, state) → Result<(), Error>. O(1), allocation-free, branchless where possible.
512-bit Bloom filter vector clocks. Probabilistic but fast (~700ps dominance check). When saturation exceeds 40%, kernels escalate to precise tracking.
64-byte cache-aligned POD. Bit-for-bit deterministic. No pointers and no heap.
Tiered BLAKE3 hash of state + frontier. Verified on load—corrupt checkpoints are rejected.
| Kernel | Causality | Persistence | Use Case |
|---|---|---|---|
Kernel<I> | BFVC only | None | Embedded, single-node |
TwoLaneKernel<I> | BFVC + ExactIndex | None | Production, precise deps |
EscalatingKernel<I> | Auto BFVC→Precise | None | Long-running, adaptive |
TenantKernel<I> | Per-tenant BFVC+Exact | None | Multi-tenant isolation |
DurableKernel<I> | BFVC | io_uring WAL | Crash-safe single-node |
TwoLaneDurableKernel<I> | BFVC + ExactIndex | io_uring + Arena | Production crash-safe |
NetworkingKernel<I> | BFVC | Broadcast buffer | Gossip replication |
MultiKernel | BFVC | None | Multiple invariants |
DualKernel<I1, I2> | BFVC | None | Two invariants, zero overhead |
TwoLaneDurableKernel now supports non-blocking durability via DurableHandle:
// Non-blocking: returns immediately with a handle let handle = kernel.admit_async(&fact_id, &deps, &payload)?; // Poll for durability (or use in async context) while !handle.is_durable() { // Do other work... } // Or block if needed handle.spin_wait();| Invariant | Algebraic Class | Coordination |
|---|---|---|
TotalSupplyInvariant | Abelian group | None (commutative) |
MaxInvariant | Join-semilattice | None (monotonic) |
GCounterInvariant | Commutative monoid | None |
BoundedGCounterInvariant | Bounded monoid | None until bound |
LWWInvariant | Last-writer-wins | Tiebreaker only |
UniquenessInvariant | Idempotent set | None |
GGraphInvariant | Grow-only graph | None |
Rule: If Δ_I(a) ∘ Δ_I(b) = Δ_I(b) ∘ Δ_I(a), no coordination required.
Linux-only. Requires io_uring (kernel 5.1+).
[dependencies] ctfs = { version = "1.0.0", features = ["persistence"] }- WALArena: 4K-aligned memory arena for zero-copy fact staging. 4096 slots, 256 bytes each.
- SPSC Buffer: Lock-free producer-consumer queue between kernel and persistence worker.
- io_uring Worker: Async batched writes with explicit fsync before advancing persistence frontier.
- Checkpoints: Periodic state snapshots with tiered BLAKE3 integrity verification.
Facts are durable when persistence_frontier.dominates(fact_clock). The frontier advances only after fsync completes—not after write submission.
[dependencies] ctfs = { version = "1.0.0", features = ["networking"] }Gossip-based replication via NetworkingKernel. Facts are broadcast to peers; causality is enforced on receipt. Convergence is guaranteed for commutative invariants.
- GossipClock: Periodic Bloom clock exchange via UDP
- PushFact: Reliable fact broadcast via TCP
- PullRequest/PullResponse: Anti-entropy fact recovery
- QuotaRequest/QuotaGrant: Escrow quota transfer for bounded invariants
Bounded invariants (e.g., inventory limits) use escrow partitioning. Each node gets a quota slice. When a node exhausts its local quota, it broadcasts QuotaRequest to peers. Nodes with surplus respond with QuotaGrant. Exponential backoff (500ms → 4s) prevents thundering herd.
use ctfs::algebra::escrow::EscrowManager; fn main() -> Result<(), ctfs::algebra::escrow::EscrowError> { let node_a = 1u64; let node_b = 2u64; let mut escrow = EscrowManager::new(1000); // Global limit escrow.initialize(&[node_a, node_b])?; // Split 500/500 escrow.try_consume(node_a, 400)?; // Local op, no coordination escrow.transfer(node_a, node_b, 50)?; // Quota grant Ok(()) }Feature-gated on networking. Exposes GET /metrics endpoint.
use ctfs::core::{LatencyHistogram, MetricsServer}; use std::sync::Arc; fn main() -> std::io::Result<()> { let admission_hist = Arc::new(LatencyHistogram::new()); let persistence_hist = Arc::new(LatencyHistogram::new()); let _server = MetricsServer::spawn( "0.0.0.0:9090", admission_hist.clone(), Some(persistence_hist.clone()), )?; // Record latency manually or use kernel.admit_instrumented() admission_hist.record(286); // 286ns latency sample Ok(()) }Exported metrics:
ctfs_admission_latency_bucket{le="..."}— Histogram buckets (power-of-two, 8ns → 2^66ns)ctfs_admission_latency_count— Total admissionsctfs_admission_latency_p50/p90/p99— Percentile gaugesctfs_persistence_latency_*— Disk I/O latency (SPSC push → io_uring CQE)
64 buckets, power-of-two scale. Cache-line aligned. Lock-free atomics. RDTSC timing on x86_64 (~6.6ns overhead per call).
use ctfs::core::LatencyHistogram; fn main() { let hist = LatencyHistogram::new(); hist.record(286); // Record 286ns latency let (p50, p90, p99) = hist.percentiles(); let buckets = hist.snapshot(); // [u64; 64] println!("p50={}, p90={}, p99={}", p50, p90, p99); }# All benchmarks cargo bench # Specific suites cargo bench --bench kernel cargo bench --bench hardening cargo bench --bench metrics_overhead cargo bench --features persistence --bench durable_admissionadmit_baseline 286.0 ns admit_instrumented 298.0 ns (+12ns overhead) histogram_record 1.3 ns nanos_now_rdtsc 6.6 ns causal_clock_dominates 0.7 ns checkpoint/tiered_hash 280.0 ns durable_admission 5.2 ns wal_hasher/200B 230.0 ns (940 MiB/s) Forbidden in hot path:
- Heap allocation (
Box,Vec,HashMap) - Locks (
Mutex,RwLock) - Syscalls (except staged io_uring)
- Pointer chasing
- O(n) scans
Required:
#[repr(C, align(64))]for cache-line alignment- Fixed-width types (
i128,u64,[u8; 32]) - Little-endian byte order
- Branchless operations where possible
- SIMD-friendly memory layouts
Bit determinism: Same input → same output, byte-for-byte, across all nodes. No floats. No std::collections. No non-deterministic iteration.
- SQL or query languages
- Secondary indexes
- Full-text search
- Global total ordering
- Wall-clock timestamps
- Dynamic schema
- "Convenient" APIs that hide complexity
This is a kernel, not a database. Build your database on top.
Cuttlefish is grounded in:
- CALM Theorem: Consistency As Logical Monotonicity. Monotonic programs don't need coordination.
- CRDTs: Conflict-free Replicated Data Types. Lattice-based merge semantics.
- Bloom Clocks: Probabilistic vector clocks with O(1) space and O(1) dominance checks.
- Algebraic Effects: Invariants as pure functions over state, composable via semiring operations.
If you want the math: Alvaro et al., "Consistency Analysis in Bloom"
MIT
"The fastest distributed system is the one that doesn't coordinate."