Instead of lines changed, sem tells you what entities changed: functions, methods, classes.
sem diff ┌─ src/auth/login.ts ────────────────────────────────── │ │ ⊕ function validateToken [added] │ ∆ function authenticateUser [modified] │ ⊖ function legacyAuth [deleted] │ └────────────────────────────────────────────────────── ┌─ config/database.yml ───────────────────────────────── │ │ ∆ property production.pool_size [modified] │ - 5 │ + 20 │ └────────────────────────────────────────────────────── Summary: 1 added, 1 modified, 1 deleted across 2 files brew install sem-cliOr build from source (requires Rust):
git clone https://github.com/Ataraxy-Labs/sem cd sem/crates cargo install --path sem-cliOr grab a binary from GitHub Releases.
Or run via Docker:
docker build -t sem . docker run --rm -it -u "$(id -u):$(id -g)" -v "$(pwd):/repo" sem diffWorks in any Git repo. No setup required. Also works outside Git for arbitrary file comparison.
# Semantic diff of working changes sem diff # Staged changes only sem diff --staged # Specific commit sem diff --commit abc1234 # Commit range sem diff --from HEAD~5 --to HEAD # Plain text output (git status style) sem diff --format plain # JSON output (for AI agents, CI pipelines) sem diff --format json # Compare any two files (no git repo needed) sem diff file1.ts file2.ts # Read file changes from stdin (no git repo needed) echo '[{"filePath":"src/main.rs","status":"modified","beforeContent":"...","afterContent":"..."}]' \ | sem diff --stdin --format json # Only specific file types sem diff --file-exts .py .rs # Entity dependency graph sem graph # Impact analysis (what breaks if this entity changes?) sem impact validateToken # Entity-level blame sem blame src/auth.tsReplace git diff output with entity-level diffs. Agents and humans get sem output automatically without changing any commands.
# Set sem as your git diff tool git config --global diff.external sem-diff-wrapper # Create the wrapper script echo '#!/bin/sh sem diff "$2" "$5"' > ~/.local/bin/sem-diff-wrapper chmod +x ~/.local/bin/sem-diff-wrapperNow git diff shows entity-level changes instead of line-level. No prompts, no agent configuration needed. Everything that calls git diff gets sem output automatically.
To disable and go back to normal git diff:
git config --global --unset diff.external21 programming languages with full entity extraction via tree-sitter:
| Language | Extensions | Entities |
|---|---|---|
| TypeScript | .ts .tsx | functions, classes, interfaces, types, enums, exports |
| JavaScript | .js .jsx .mjs .cjs | functions, classes, variables, exports |
| Python | .py | functions, classes, decorated definitions |
| Go | .go | functions, methods, types, vars, consts |
| Rust | .rs | functions, structs, enums, impls, traits, mods, consts |
| Java | .java | classes, methods, interfaces, enums, fields, constructors |
| C | .c .h | functions, structs, enums, unions, typedefs |
| C++ | .cpp .cc .hpp | functions, classes, structs, enums, namespaces, templates |
| C# | .cs | classes, methods, interfaces, enums, structs, properties |
| Ruby | .rb | methods, classes, modules |
| PHP | .php | functions, classes, methods, interfaces, traits, enums |
| Swift | .swift | functions, classes, protocols, structs, enums, properties |
| Elixir | .ex .exs | modules, functions, macros, guards, protocols |
| Bash | .sh | functions |
| HCL/Terraform | .hcl .tf .tfvars | blocks, attributes (qualified names for nested blocks) |
| Kotlin | .kt .kts | classes, interfaces, objects, functions, properties, companion objects |
| Fortran | .f90 .f95 .f | functions, subroutines, modules, programs |
| Vue | .vue | template/script/style blocks + inner TS/JS entities |
| XML | .xml .plist .svg .csproj | elements (nested, tag-name identity) |
| ERB | .erb .html.erb | blocks, expressions, code tags |
Plus structured data formats:
| Format | Extensions | Entities |
|---|---|---|
| JSON | .json | properties, objects (RFC 6901 paths) |
| YAML | .yml .yaml | sections, properties (dot paths) |
| TOML | .toml | sections, properties |
| CSV | .csv .tsv | rows (first column as identity) |
| Markdown | .md .mdx | heading-based sections |
Everything else falls back to chunk-based diffing.
Three-phase entity matching:
- Exact ID match — same entity in before/after = modified or unchanged
- Structural hash match — same AST structure, different name = renamed or moved (ignores whitespace/comments)
- Fuzzy similarity — >80% token overlap = probable rename
This means sem detects renames and moves, not just additions and deletions. Structural hashing also distinguishes cosmetic changes (whitespace, formatting) from real logic changes.
sem diff --format json{ "summary": { "fileCount": 2, "added": 1, "modified": 1, "deleted": 1, "total": 3 }, "changes": [ { "entityId": "src/auth.ts::function::validateToken", "changeType": "added", "entityType": "function", "entityName": "validateToken", "filePath": "src/auth.ts" } ] }sem-core can be used as a Rust library dependency:
[dependencies] sem-core = { git = "https://github.com/Ataraxy-Labs/sem", version = "0.3" }Used by weave (semantic merge driver) and inspect (entity-level code review).
- tree-sitter for code parsing (native Rust, not WASM)
- git2 for Git operations
- rayon for parallel file processing
- xxhash for structural hashing
- Plugin system for adding new languages and formats
Want to add a new language? See CONTRIBUTING.md for a step-by-step guide.
MIT OR Apache-2.0