Skip to content

Probes III#2154

Open
disconcision wants to merge 325 commits intodevfrom
probes-III
Open

Probes III#2154
disconcision wants to merge 325 commits intodevfrom
probes-III

Conversation

@disconcision
Copy link
Member

@disconcision disconcision commented Feb 28, 2026

Note: This incorporates fixes #2133, #2139, and #2145 which may or may not get merged first.

Closes #2122
Closes #2129
Closes #2142
Closes #2151
Closes #2158
Closes #2160
Closes #2062
Closes #2174


Probe System Overhaul

This continues work on the probes system (#1420 and #1879), incorporating richer call stack navigation, smarter automatic probe placement, a redesigned value abbreviation engine, and numerous correctness/performance fixes throughout the evaluation and editing pipelines. See #2058 for features and feature concepts deferred to future work.

In addition, this PR expands the Hazel CLI with improved static error reporting and the ability to get probe results inline in text form. The main goal is to permit agentic Hazel program development via external harnesses.


Nomenclature Changes

  • "Auto probe" → "Multi probe" for the per-line probe data structure
  • "Auto probe mode" is the new cursor-following feature (the toggle that automatically places multi probes)
  • "Dynamic cursor" → "Sample focus" internally, "Probe focus" in user-facing UI

Major User-Facing Features

Unified Probe Action (Cmd+E)

Replaces the old two-shortcut system (manual probe vs auto probe) with a single context-sensitive Cmd+E:

  • On definition forms (let bindings, tests): adds a multi probe that expands to per-line probes across the definition body
  • On other expressions: adds a manual probe on exactly that expression
  • On a selection: probes the enclosing term of the selection (rounds up if the selection is not a complete term)

The context menu adapts its label accordingly. The old Cmd+Shift+E shortcut for auto probe is removed.

Auto Probe Mode (Cmd+P)

A new toggle mode that automatically places a multi probe on whichever top-level definition the cursor is inside, following the cursor as it moves between definitions. Walks the ancestor chain to find the enclosing let/test body, and reconstitutes the probe only when the target changes.

Sample Focus Bar (Breadcrumb Navigation)

Replaces the old "Dynamic Cursor" sidebar panel with a horizontal breadcrumb bar showing the call stack as navigable function names:

  • Windowed display: Slides a visible window around the focused entry for long call stacks, with ellipsis markers
  • Function name resolution: Shows resolved names from evaluation, lambda symbol for anonymous functions
  • Navigation: Arrow keys move call depth, Enter/click jumps to call sites, body icon jumps to definition
  • Pin indicators: Visual pins on pinned entries with click-to-unpin
  • Clear all: Button to remove all probes at once

Enriched Call Stack Frames (Supporting the Sample Focus Bar)

Call stack frames now carry function names and definition-site IDs (not just application IDs). This enables the breadcrumb display, jump-to-definition, and named call displays throughout the UI.

Argument Value Display

Probes on function applications now show argument values inline: f(x = 5) or multi-line for tuple arguments. The environment dropdown filters out variables already shown in the call display.

Pinning Improvements

  • Pin Call (meta-click on application probes): Pins a specific call sample and creates a manual probe (if not already present) so pins persist when the cursor moves in auto-probe mode.
  • Pin Enclosing Call (meta-click on non-application probes with call stacks): A new action, not function-application-specific, which pins the enclosing call of any sample from any expression.
  • Visual distinction: The pinned call sample gets a red pin icon; a smaller red dot marks samples within the pinned call, making pinning visible even when the pinned application is off-screen.

Sample Color Schemes

Three selectable schemes color samples relative to the current focused sample:

  • Calls: Colors based on relative call stack position (previous default).
  • Steps: Colors based on relative step range — before, after, contains, or inside.
  • Hybrid: Mostly Steps, except samples in the same closure as the focused one are colored green (as in Calls) for direct association.

A segmented control (Calls / Hybrid / Steps) in the sidebar switches between schemes, with tooltips.

Redesigned Probe Sidebar

  • Legend: Tooltips explaining each color, adapts to active scheme
  • Toggle controls: Auto Probe on/off, Samples Shown (One/Many), keyboard shortcut badges
  • Quick reference panel: Keyboard shortcuts and icon meanings
  • Print panel: Toggle at the top to activate an alternate sidebar mode for print statements. Structured entries with sequence numbers, source lines, word wrap; Manual/Auto eval modes.

Improved Keyboard Controls

  • Cmd+Enter and end-of-line bounce to switch focus from editor to probes
  • Up/Down arrow navigation between probes when focused
  • All probe functions including showing environment now keyboard-controllable

Improved Manual/Multi Probe Interactions

When a manual probe exists on a term, the multi-probe system no longer creates a duplicate ephemeral probe on the same term. Ephemeral probes are suppressed when they would overlap with manual ones. Additionally, ephemeral probes and can be toggled on/off the same way manual probes are with the lifetime of the containing multi-probe.

Sample Focus Fixes

Turns out it is hard to maintain sample focus alignment across all reasonable interactions. This is now well-tested.

  • Recursive probe sample navigation fix (59100ce019): Fix for effective-first suffix scan in recursive probe navigation.
  • Probe alignment lag fix (2ef5bd1d7a): Detects stale focus and forces a second pass to fix alignment lag.
  • Intent preservation fixes (212b959bad, 1f5bfebc79): Fixes for sample focus intent preservation at 3+ nesting levels.

Abbreviation System Rewrite

Complete overhaul of the value abbreviation engine.

Hard-Cap Budget

To enable predictable resizing, rendered output never exceeds the budget (for budget >= 1). Enforced via safety-net retry: if a form overshoots, retry at budget-1 (preserves monotonicity).

Improved Sequence Abbreviation

  • Pre-computed item count: Chooses largest count where items + separators + annotation fit
  • Count annotations: Truncated sequences show ...+3 instead of bare ...
  • Even distribution for records: Budget split evenly across labeled tuple fields so all field names appear simultaneously
  • Two-tier fallback: Items can degrade to bare ellipses before full collapse

Additional Improvements

  • Proper Unicode width-aware accounting (display columns, not bytes/graphemes)
  • Invalid form classification for truncated labels (avoids backtick quoting overhead)
  • Coverage expanded to Module, Match, FixF, Theorem, ProofObject, Use, TyAlias, Closure, TypAp
  • Unit-cost atoms (EmptyHole, Deferral) no longer abbreviated to ellipsis
  • Font-metrics-based drag resizing instead of hardcoded pixel values

Hazel CLI

Command-line interface (src/CLI/Cli.re, src/CLI/Run.re):

hazel test <file>

Runs all test...end blocks, reports PASS/FAIL/INDET with line numbers and hints. Non-zero exit on failure. --verbose flag for passing tests.

hazel analyze <file>

Static analysis with Rust-style error formatting: file path, line/column, source line with ^ caret underlines.

hazel probe <file>

Probe output with Unicode brackets ⟦⟧. --auto flag auto-probes all expressions (one per line) without manual annotations.

hazel bench-parse <file>

Benchmarks parsing performance: unsegmented vs segmented parsing, slow vs fast paste. Reports per-file timings and speedup ratios.


Multi-Probe Placement Algorithm

Rewrite of the probe placement logic (MultiProbe.re) that selects which expression to probe on each line of a definition:

  • Priority-based pipeline: rightmost-ending, largest term at that position
  • Predicate filtering chain: term sort, delimiter prefix rejection, module declaration skip, hole avoidance, function-type avoidance, container redundancy, let-hole skip, avoid targeting sub-structural forms like case rule tiles
  • Special adjustments for multi-line if (promote else branch) and incomplete binding forms (promote trailing sibling)
  • Extensively documented with examples

Evaluator/Dynamics Fixes

Nested Probe Start Stack

pending_probe_starts changed from single int to stack per probe_id, correctly handling nested recursive calls.

Transient State Cleanup

clear_transient removes app_args (can be 100MB+), pending_probe_starts, and targets before worker serialization.

Fix Probe Sample Doubling from Nested Return Type Annotations

When a function with a return type annotation calls an inner function that also has a return type annotation, the outer Asc distribution would re-evaluate inner results, recording probe samples at a different (shorter) call stack. Fixed by using Ascriptions.transition_multiple with is_value: true to fully resolve all Asc layers in one step.


Editor Infrastructure

Parser Optimization: Segmented Parsing

Batch parser operations now use segmented parsing for significantly faster performance on large files. Also included: CLI benchmark (bench-parse)

TyDi Improvements

  • Minimum 2-character prefix before suggestions appear
  • Type-aware ordering: keywords first when expected type is unknown, variables first when type is known
  • Exact match suppression: all suggestions hidden when any candidate exactly matches the token

InfixDelimiterPrefix Precedence

Lowered from Precedence.max to Precedence.concave_grout — fixes multi-probe incorrectly selecting partial keyword tokens.

Editing Robustness

  • Shard theft guard: Prevents shard theft when typing before complete multi-shard tiles
  • Shard-index monotonicity: Enforced in rescan to prevent out-of-order reassembly crash
  • Inner-caret remolding: Fix parent tile not remolded when caret inside bidelimited form
  • Space suppression through grout lifecycle: Tracks suppressed spaces correctly

Performance: General

  • Undo stack capped at 50 entries (prevents unbounded memory growth)
  • Dirty-tracking autosave: Only re-persists slides marked dirty, eliminating expensive segment comparisons on every autosave cycle
  • Original doc segment caching: Avoids re-parsing original slides on each autosave

Performance: Throttle Statics During Typing

Statics are now throttled across all editor modes during typing, reducing latency and CPU usage on fast keystroke sequences.

Performance: Per-Projector View Cache

Projector and refractor views are cached per-projector, keyed on map identity + status + model + settings_version. Avoids redundant re-renders when only one projector's state changes. ChangeLength and ToggleShowEnv bump settings_version to invalidate correctly.

Performance: Remove Ineffective Caches

  • Removed Core.Memo.general memoization caches that leaked memory without providing measurable speedup
  • Removed a hashtable-based indent cache that wasn't providing benefit

Copy/Paste

  • Indentation preservation: Editor copy now includes indentation so pasted text preserves structure
  • Fast paste path: Segment splice for faster paste operations. The existing segment paste cache is retained, but compeletely internalized to avoid poluting the log with the verbose Paste(Segment(...)) action.

Select Term Improvement

Cmd/Ctrl+D now selects the next containing term up when a term is already selected.


UI/CSS

  • Redesigned sidebar header and layout
  • Sample focus bar overlaid on editor
  • Keyboard shortcut badges in sidebar
  • Box-sizing normalization (border-box). Should help with mystery scroll bar issues
  • Phantom horizontal scrollbar fix

Test Coverage

Feature Tests Scope
Multi-probe placement Test_MultiProbe.re 13 categories, ~40 tests
Sample selection (unit) Test_SampleSelection.re 7 categories, ~25 tests
Sample selection (integration) Test_Evaluator_ProbeSelection.re 5 categories, ~12 tests
Abbreviation hard cap Test_Abbreviate.re 22 expression types verified
Duplicate sample prevention Test_Evaluator_Probes.re 9 dedup tests
Ascription ID preservation Test_Evaluator_Probes.re constructor, list, let/seq
disconcision and others added 30 commits February 4, 2026 19:21
The autosave was slow because ScratchMode.persist was calling PersistentZipper.unpersist on every original documentation slide on every save, just to compare if they had changed. This adds a lazily-initialized cache in Init.re that stores the original segments, so the comparison is now a fast map lookup instead of re-parsing S-expressions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stack_frame type was a tuple (Id.t, option(string)) where the optional string (function name) is purely informational. Many comparison sites used structural equality (==) on stack frames, which broke after step-into because the cursor constructs frames with name=None while evaluation produces frames with name=Some("f"). This caused the sample selection logic to fail even though filter_by_pin (which used ids_of_stack) worked correctly. Changes: - Convert stack_frame from tuple to record {id, name} - Add equal_stack_frame (id-only) and equal_call_stack helpers - Fix all comparison sites to use id-only equality - Add ~eq parameter to ListUtil suffix/comparison functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test_SampleSelection.re (28 unit tests): Tests pure Selection/Cursor logic with hand-crafted samples. Covers equal_stack_frame, Cursor.relation, filter_by_pin, select in Single/Many modes, is_same_call, closest_to_cursor, and get_empty_status. Includes regression tests for None-vs-Some name mismatch that caused the step-into bug. Test_Evaluator_ProbeSelection.re (11 integration tests): Evaluates real programs, then pipes samples through Selection with simulated cursor/pin states. Tests step-into scenarios, pin filtering between calls, cursor relation classification, and Single vs Many mode behavior with real data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add fn_def_id to stack_frame for jump-to-definition from built-in internal calls. Add fallback navigation for built-in function entries: when clicking a name or separator that points to non-navigable built-in code, walk up the call stack to the nearest user-visible call site. Key fix: HazelFn built-ins use a "+" suffix on internal names (e.g. "fold_left+"), so strip it before checking Builtins.env_init to correctly identify built-in entries and skip non-navigable definition targets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplify click handlers to uniformly set cursor index to the clicked entry's depth, regardless of whether the jump target is a direct definition or a fallback. The jump target and cursor index are independent concerns: the cursor reflects what depth you're inspecting, not where the syntax jump lands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change name clicks from jump-to-definition to jump-to-call-site for more intuitive backtracking. Separators become decorative (no click). Add body icon at end of bar to jump to the innermost function definition, preserving the ability to navigate forward after backtracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When stepping deep into a call stack, probes on application expressions that are part of the pinned call chain now remain visible instead of being filtered out. Only probes whose ap_id matches a frame in the pinned stack are kept, so sibling branches and non-application probes at shallower depths are still filtered. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a function argument is syntactically a bare variable reference, the call display now shows "name = value" instead of just the value. Non-variable arguments (expressions, literals) render as before. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mous Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tooltip now distinguishes direct call sites from internal calls that jump to the enclosing user-visible call site. Fix pointer cursor and green focused highlight for anonymous/unknown breadcrumb entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Step-into only works for user-defined named functions, so hide it from the sample context menu for built-ins, anonymous functions, and constructors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a "Clear all" button on the far right of the closure cursor bar that removes all probes (manual and ephemeral) and resets the sample cursor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ard toggle Replace flat text output with per-entry divs showing sequence number (left), value (center, word-wrapping), and source line number (right). Replace bespoke eval mode buttons with standard Widgets.toggle_named. Enable text selection in print panel. Remove unused eval-mode-button CSS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The label name is already visible in source code, so probing the whole TupLabel (e.g. `brush = "a"`) is redundant. Now TupLabel is rejected like Parens, falling through to the value expression on the RHS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When stepping into a function call, ensure a manual probe exists on the application expression being jumped from. This preserves the probe even when auto probes would otherwise be removed (e.g. cursor movement in auto-auto mode). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
maybe_rm_pin now checks has_probe before clearing a pin: if the expression still has any probe (manual or ephemeral), the pin is kept. Fixes pin loss during auto-auto def switches after step-into, where the manual probe on the source expression should anchor the pin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ephemerals are derived from autos.ids and get rebuilt automatically by add_ids_from_auto_term on the next editor calculate cycle. Clearing autos.ids ensures auto probes are actually removed; previously only ephemerals were cleared and auto probes would immediately reappear. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
disconcision and others added 30 commits March 22, 2026 18:42
Agent state (system prompts, tool definitions, chat history) was being serialized into every scratchpad localStorage entry. With 52 documentation slides each carrying ~80-100KB of agent boilerplate, this alone nearly exhausted the ~5MB localStorage quota before any user interaction. Add StoreIDB.re, a reusable IndexedDB-backed store functor (async counterpart to Store.F). Agent data is now saved to IndexedDB keyed by mode:scratchpad_name, while localStorage gets a minimal empty stub. On load, editors render immediately with fresh agent state; IndexedDB populates real chat history asynchronously via LoadAgentData action. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite uses_of_binding to scan info_map instead of climbing ancestors. The old approach only handled Let/Fun; now works for all binding forms (match/case, fixf, theorem, etc.) without enumerating them. - Fix multiple references: old code used VarMap.lookup (List.assoc_opt) which only returned first co_ctx entry due to CoCtx.union concatenation. - Add bidirectional type variable highlighting (InfoTPat binding→uses). - Disable hover dispatch (infrastructure retained for future re-enablement). - Clean up VarHighlight.re: use DecUtil.abs_style, remove debug logging. - Move styling to CSS (border-bottom + subtle background tint). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9 tests: let ref→binding, let binding→refs, fun params, match/case bindings (both directions), shadowing, multiple refs, type variable ref→binding, type variable binding→refs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CoCtx.union: properly merge entries for same variable name instead of concatenating (which caused VarMap.lookup to shadow duplicates). - uses_of_binding: use InfoPat.co_ctx directly instead of scanning the entire info_map. O(1) lookup instead of O(n) scan. - Ctx.add_ctrs: use each variant's own annotation ID instead of the type definition's ID, so constructor highlights point to individual constructor declarations rather than the whole sum type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Populate InfoTPat.tvar_co_ctx via a post-processing pass in Statics.mk so uses_of_tvar_binding does O(1) lookup instead of scanning the info_map. Add constructor highlight tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hover dispatch restored in mousemove handler with equality check to avoid redundant dispatches. Hover highlights shown with dashed underline; caret highlights remain as fallback when hover target has no variable info. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pattern's co_ctx was only receiving body.co_ctx, so recursive self-references inside the definition weren't visible as uses. Now unions def.co_ctx with body.co_ctx. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop hover_id from Model, Hover action variant, mousemove dispatch, piece_at_point, hover CSS, and all related wiring. Caret-based highlighting is sufficient. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Constructor references now register in co_ctx (like Var does), so uses bubble up to the enclosing TyAlias. uses_of_ctr_binding climbs ancestors to read them. Enables definition→uses and reference→sibling highlighting for constructors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace localStorage with a unified IndexedDB database (HazelDB) for all app state. Scratch/documentation editors use per-slide granularity so autosave only writes the current slide. Agent data loads synchronously from an in-memory cache populated at startup. Key changes: - HazelDB.re: single shared database with KV + log tables and sync cache - Per-slide persistence for scratch/doc (meta + editor + agent per slide) - Store.F backed by IndexedDB with legacy localStorage migration fallback - Export/import reconstructs monolithic format from per-slide data - Bonsai startup wrapped in async HazelDB.kv_load_all callback - Removed: StoreIDB.re, Agent.Persistent.empty, LoadAgentData action - Consolidated localStorage access into Store.Legacy module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pattern constructors (e.g. `A` in `| A => ...`) now contribute to co_ctx via collect_pat_ctr_refs, wired into all expression forms with patterns (Fun, Let, Theorem, FixF, Case). This enables bidirectional highlighting between constructor definitions, expression uses, and pattern uses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Consolidate ScratchMode persistence into Persist submodule - Move clear_legacy_storage to HazelDB (eliminates StoreLegacy aliases) - Rename export/import_monolithic → Persist.export_all/import_all - Remove Store.Legacy module (legacy_get stays as private helper) - Remove stale Page.re Start action comment - Run formatter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ezjs_idb captures IDBKeyRange and window.indexedDB at module init time. Since the test binary links the web library, HazelDB.re loads in Node.js where these browser globals do not exist. Provide minimal stubs via --require so module initialization succeeds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…' into probes-III # Conflicts: #	src/haz3lcore/projectors/ProbeText.re
…-III # Conflicts: #	src/web/app/editors/Editors.re
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants