A high-performance Rust implementation of Python's difflib.unified_diff function with PyO3 bindings.
This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib module while maintaining API compatibility.
- 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns (see Performance section for detailed benchmarks)
- 100% Compatible: Drop-in replacement for
difflib.unified_diffwith identical output - Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
- Easy to use: Simple Python API with PyO3 bindings
pip install difflib-rs# Clone the repository git clone https://github.com/sweepai/difflib-rs.git cd difflib-rs # Set up virtual environment python -m venv venv source venv/bin/activate # Install build dependencies pip install maturin pytest # Build and install maturin develop --releaseThis is a drop-in replacement for Python's difflib.unified_diff. Simply replace your import:
- from difflib import unified_diff + from difflib_rs import unified_diff # Compare two sequences of lines a = ['line1', 'line2', 'line3'] b = ['line1', 'modified', 'line3'] diff = unified_diff( a, b, fromfile='original.txt', tofile='modified.txt', fromfiledate='2023-01-01', tofiledate='2023-01-02' ) for line in diff: print(line, end='')Note: Currently, only unified_diff is supported. Other difflib functions are not implemented, but pull requests are welcome!
Most agents (including Sweep) can add support for any other methods if needed. A copy of the Python implementation is provided in src/difflib.py for reference.
For additional convenience, use unified_diff_str directly with (unsplit) strings:
from difflib_rs import unified_diff_str # Compare two strings directly - no need to split first! text_a = """line1 line2 line3""" text_b = """line1 modified line3""" # The function handles splitting internally (more efficient) diff = unified_diff_str( text_a, text_b, fromfile='original.txt', tofile='modified.txt', keepends=False # Whether to keep line endings in the diff ) for line in diff: print(line, end='')The unified_diff_str function:
- Takes strings directly instead of lists
- Handles line splitting internally in Rust (faster than Python's
splitlines()) - Supports
\n,\r\n, and\rline endings - Has a
keependsparameter to preserve line endings in the output
The Rust implementation consistently outperforms Python's built-in difflib module while producing identical output:
| File Size | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|
| 100 lines | 86.0μs | 38.3μs | 2.24x | 71 |
| 500 lines | 450.6μs | 130.3μs | 3.46x | 300 |
| 1,000 lines | 910.2μs | 220.8μs | 4.12x | 587 |
| 2,000 lines | 2203.1μs | 482.3μs | 4.57x | 1,222 |
| File Size | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|
| 100 lines | 167.9μs | 49.3μs | 3.41x | 131 |
| 500 lines | 1028.5μs | 252.0μs | 4.08x | 655 |
| 1,000 lines | 1925.0μs | 414.3μs | 4.65x | 1,285 |
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|---|
| 5,000 lines | 5 | 2842.0μs | 859.7μs | 3.31x | 47 |
| 10,000 lines | 5 | 5003.2μs | 1471.3μs | 3.40x | 47 |
| 20,000 lines | 5 | 8470.5μs | 2821.6μs | 3.00x | 47 |
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|---|
| 5,000 lines | 250 | 7985.5μs | 1579.4μs | 5.06x | 1,869 |
| 10,000 lines | 500 | 14692.5μs | 2833.8μs | 5.18x | 3,793 |
| 20,000 lines | 1,000 | 34949.0μs | 6461.2μs | 5.41x | 7,569 |
| Test Case | Python Time | Rust Time | Speedup |
|---|---|---|---|
| Identical sequences (5,000 lines) | 1773.1μs | 406.1μs | 4.37x |
| Completely different (1,000 lines) | 284.5μs | 219.8μs | 1.29x |
Performance comparison of unified_diff_str vs unified_diff with Python splitlines():
| File Size | Python split + Rust diff | All Rust (unified_diff_str) | Speedup |
|---|---|---|---|
| 100 lines | 54.8μs | 21.1μs | 2.59x |
| 500 lines | 169.9μs | 118.3μs | 1.44x |
| 1000 lines | 316.1μs | 248.3μs | 1.27x |
| 2000 lines | 654.8μs | 550.4μs | 1.19x |
def unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n'): """ Compare two sequences of lines; generate the unified diff. Unified diffs are a compact way of showing line changes and a few lines of context. The number of context lines is set by n which defaults to three. Parameters: a: Sequence of lines to compare (the 'from' file) b: Sequence of lines to compare (the 'to' file) fromfile: Label to use for the 'from' file in the diff header tofile: Label to use for the 'to' file in the diff header fromfiledate: Modification date of the 'from' file tofiledate: Modification date of the 'to' file n: Number of context lines (default: 3) lineterm: Line terminator to use (default: '\n') Returns: Generator yielding unified diff format strings Note: This is a high-performance Rust implementation that provides 3-5x speedup over Python's difflib while maintaining 100% compatibility. """ pass# Activate virtual environment source venv/bin/activate # Run tests python -m pytest tests/ -v # Run benchmarks python -m pytest tests/test_benchmark.py -s # Build the package with optimizations maturin develop --releaseIf you want a feature or have an idea, just create a pull request! Contributions are welcome.
Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.