Skip to content

Feature/sem review#19

Open
antiartificial wants to merge 12 commits intoAtaraxy-Labs:mainfrom
antiartificial:feature/sem-review
Open

Feature/sem review#19
antiartificial wants to merge 12 commits intoAtaraxy-Labs:mainfrom
antiartificial:feature/sem-review

Conversation

@antiartificial
Copy link

@antiartificial antiartificial commented Mar 8, 2026

Adds a few new features:

sem review and sem log work with historical state whereas sem graph and sem impact work with current state. Signature change detection (new capability, powers the [signature changed] / [body only] labels in review and changelog).

  • sem log full history of any function (entity), following it through renames and moves across commits
  • sem changelog auto-categorized release notes from a commit range with a suggested semver bump
  • sem review groups changes by impact (API surface, internal, config) with dependency counts and a risk assessment

Other items in this PR:

  • This includes a slight fix to the version string handling, previously this was hardcoded and could become stale, now reflects what's in the Cargo.toml.
  • Test coverage for all three new modules (19 new tests)
  • README updated with documentation for the new commands

Example usage/output

sem review --from HEAD~2 --to HEAD # last two commits ┌─ Internal Changes ────────────────────────────────── │ ∆ function getSupportedMimeType [modified] │ ∆ variable types [modified] └─────────────────────────────────────────────────────── Summary: 2 internal Risk: low (internal/config changes only, no public API impact) 

Example usages:
sem log --entity authenticateUser # finds all references to entity (ex. a function)
sem log --entity login --file src/auth.ts # constrains search to specific file

Example output:

>sem log --entity getSupportedMimeType (base) prism-web/src/lib/audio.ts :: function :: getSupportedMimeType 349f65f 2026-02-18 Aaron Barton [modified] Fix Brave misdetected as Safari: prefer webm over mp4 MIME type 45b819f 2026-02-06 antiartifici [added] Add Prism: voice-driven Socratic interview simulator 

Example usages:
sem changelog --from v1.2.0 --to HEAD # ex changes spanning a tag range (or commits)
sem changelog --from v1.0.0 --to v2.0.0 --format markdown

Example output:

sem changelog --from HEAD~1 --to HEAD (base) ## Unreleased — 2026-03-08 ### Internal · `getSupportedMimeType` — function modified in prism-web/src/lib/audio.ts · `types` — variable modified in prism-web/src/lib/audio.ts Suggested version bump: PATCH (no new public API, no breaking changes) 
@antiartificial antiartificial marked this pull request as ready for review March 8, 2026 23:31
@rs545837
Copy link
Member

rs545837 commented Mar 9, 2026

Thanks I will take a look at this asap.

@rs545837
Copy link
Member

rs545837 commented Mar 9, 2026

Replied on the discussion with more context on how this relates to inspect and the broader roadmap. The sem log and sem changelog modules look good and fill real gaps in the CLI. sem review overlaps significantly with inspect which does entity-level triage with graph-based risk scoring and LLM integration.

Going to review the code in detail. Main things I want to check: how the signature detection interacts with sem-core's existing structural_hash (which already distinguishes cosmetic vs structural changes), and whether the log module's git history walking could live in sem-core as a library function so inspect and weave can use it too.

Palanikannan1437

This comment was marked as off-topic.

Copy link
Contributor

@Palanikannan1437 Palanikannan1437 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature set — review, changelog, and log are solid additions. Left some inline comments. The two I'd want addressed before merge are the signature breaking classification (return type changes + language-aware param handling) and the entity name collision in log history tracking. The rest are suggestions for follow-ups.

Also we could discuss more on what should stay here vs in inspect as @rs545837 mentioned above

pub fn is_breaking(&self) -> bool {
matches!(
self,
SignatureChangeKind::ParamsRemoved { .. }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ReturnTypeChanged should be in here too — changing a return type will break every caller that depends on the old type. Right now it silently gets classified as non-breaking, which means changelog/semver will miss real breakage.

pub fn get_log(&self, limit: usize) -> Result<Vec<CommitInfo>, GitError> {
/// Walk commits lazily from `to` (or HEAD) backward, stopping at `from`.
/// The callback returns `true` to continue or `false` to stop early.
pub fn for_each_commit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice API — the lazy callback with early-stop is a good design for entity log. One thing to consider for later: the callback receives CommitInfo with sha as a String, but callers like build_entity_log immediately re-resolve that SHA back into trees via resolve_tree. Passing richer data (or at least the Oid + parent) would avoid that round-trip.

@antiartificial
Copy link
Author

I'll have more time to digest and reflect this evening, thank you for your eyes and input.

@Palanikannan1437
Copy link
Contributor

Hey @antiartificial did you get some time to check them out?

@antiartificial
Copy link
Author

I've had some guests from out of town. Catching up!

@antiartificial
Copy link
Author

antiartificial commented Mar 17, 2026

Replied on the discussion with more context on how this relates to inspect and the broader roadmap. The sem log and sem changelog modules look good and fill real gaps in the CLI. sem review overlaps significantly with inspect which does entity-level triage with graph-based risk scoring and LLM integration.

Going to review the code in detail. Main things I want to check: how the signature detection interacts with sem-core's existing structural_hash (which already distinguishes cosmetic vs structural changes), and whether the log module's git history walking could live in sem-core as a library function so inspect and weave can use it too.

Thank you, the idea with sem review was to offer an offline first no LLM, would you be interested in merging the logic here, keeping sem review as the offline baseline?

the structural_hash and analyze_signature_change operate independently, I feel they compliment one another but don't coordinate. build_entity_log already uses both for formatting only plus signature analysis for the detail.

structural_hash ast level answers "did anything meaningful change, or was it just formatting or comments?" acts as a fast pre-filter.
analyze_signature_change function signature only (params and return type) answers "how exactly did the signature change? is it breaking"

sem review uses structural_change to label things "cosmetic" but doesn't run signature analysis so a review will show a parameter removal as just "modified" and it won't say "parameter removed (breaking). The changelog and log commands do run signature analysis. if inspect wants breaking code change detection in its triage:

  1. call the analyze_signature_change function directly
  2. use the changeling/log output which includes signature details

structural_hash is the "did anything real change" gate and signature analysis is the "what exactly changed in the API contract"

As far as some of the feedback: targeted single-entity parsing instead of computing a full semantic diff for all changed files in a commit parse only the tracked file and compare only the tracked entity and shared "analyzed change" structure when calling build_review and reclassifying everything into changelog categories and using an intermedia structure. I'm happy to tackle these in this effort if you don't mind it, or happy to do a follow up PR for each.

 Groups entity-level changes into API surface (cross-file dependents), internal (body-only, deletions, no cross-file refs), and config/data categories. Includes approximate dependent counts from the entity graph and a risk heuristic (low/medium/high). Supports terminal and JSON output.
Classifies entity-level changes into Keep-a-Changelog categories (Breaking, Added, Changed, Removed, Internal) with semver bump suggestion. Includes Conventional Commits parsing from commit messages and supports terminal, markdown, and JSON output formats. Also adds get_log_range() to GitBridge for commit message access.
Re-parses before/after entity content with tree-sitter to extract parameter lists and classify changes as breaking (params removed or reordered) vs non-breaking (params added, body-only). Supports TypeScript, JavaScript, Python, Rust, Go, Java, C, C++, and more. Integrated into `sem changelog` for automatic breaking change detection.
Add README sections for entity history tracking, semantic review, and changelog generation. Update language table with Swift, Elixir, Bash, and Vue. Document sem diff file comparison mode.
19 new tests covering risk assessment, config classification, entity resolution, changelog generation, and modification classification. All 83 tests pass.
- ReturnTypeChanged now classified as breaking (breaks callers) - ParamsAdded label changed to "signature changed" (can't verify defaults) - Entity log matches file_path + entity_name to prevent cross-file misattribution - Replace --follow (always true, no way to disable) with --no-follow flag - Remove aggressive has_breaking_commits && dependent_count >= 5 heuristic - compute_semver considers removed public entities with dependents as MAJOR - Add --full flag to sem changelog and sem review to show all changes
…s diff.rs, review.rds, changelog.rs consolidated normalize_extts. Config changesi n risk assessment, now considers modified config/data as medium risk instead of silently falling through to low. Richer CommitInfo.
…ommon::open_git_or_exit(). Replaces raw ANSI escapes with colored crate for consistency. Simplifies resolve_file_changes match bloxes to a git_or_exit helper. find_conventional_type now uses prebuilt HashMap<&str, &str> for o(1)instead of o(n)
@rs545837
Copy link
Member

Hey @antiartificial, sorry for the slow reply!

Took another look and the signature classification and entity disambiguation are both handled nicely, good work on those.

Here's what I'd like to do: let's merge sem log and sem changelog from this PR, but leave out sem review for now since it overlaps with inspect and I want to think through the right boundary there.

Could you split out the sem review parts so we can merge the rest? Once that's done this is good to go.

@antiartificial
Copy link
Author

@rs545837 no problem, we are all busy, certainly I can proceed with sem log and changelog sans review.

@rs545837
Copy link
Member

Awesome sounds great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants