The Erbsland Regular Expression Library is a secure and reliable regular expression engine for modern C++.
It is designed to be lightweight, dependency-free, and predictable, while offering solid UTF-8 and Unicode support out of the box.
You can embed the library directly into your project to provide regular expression matching without pulling in large external dependencies. The pattern syntax is inspired by a pragmatic mix of PCRE and Python regular expressions, focusing on clarity, safety, and maintainability.
Internally, the engine implements a carefully optimized variant of Thompsonβs NFA algorithm, adapted for modern C++ and robust execution under strict resource limits.
- Zero dependencies β easy to embed and audit.
- Strong focus on security and reliability.
- Full UTF-8 support with strict input validation.
- Rich regular expression syntax:
- Greedy, lazy, and possessive quantifiers
- Atomic groups
- Optional syntax compatibility with PCRE, Python, and other popular regex engines.
- Built-in time and memory limits with safe defaults.
- Solid Unicode support without relying on ICU:
- Full Unicode character classes
- Case-insensitive matching using simple case folding
- Configurable string type support:
std::stringorstd::u8string(selected at build time)
- Human-readable error messages when parsing regular expressions.
- Rich API for:
- Finding first or all matches
- Replacing text using placeholders
- Efficient coroutine-based matching.
- String-view-based processing with no unnecessary allocations.
- Abstract input interface for custom or streaming input sources.
- Diagnostic Tools:
- Disassembler to display the generated code for any pattern.
- Assembler to write custom regex engines.
This library intentionally avoids certain features to remain predictable, secure, and memory-efficient:
- To keep the memory footprint low and matching deterministic:
- No Unicode normalization
- No multi-character case folding
- No Unicode character names
- The primary goal is security, not maximum throughput:
- Matching is efficient, but not intended for workloads where regex performance is the main bottleneck.
- Regular matching is fast, but due to almost no program optimizations, strict validation of the input, and the design of the NFA algorithm, not as fast as RE2 or PCRE.
- Some advanced constructs are deliberately not supported:
- No backreferences
- No lookahead assertions
- No conditional patterns
- β Stable and suitable for production use (for UTF-8 based strings).
- β Public API is stable
- β Tested on:
- Linux (GCC)
- macOS (Clang)
- Windows (MSVC)
- β UTF-16 and UTF-32 support:
- Implemented but not fully tested. Use it at your own risk.
- Not documented.
#include <iostream> #include <string> #include <el/re/regex.hpp> using namespace el::re; int main() { try { auto re = RegEx::compile(R"(\d+)"); auto text = std::string{"abc 12345 xyz"}; if (auto match = re->findFirst(text); match != nullptr) { std::cout << "Found a number: " << match->content(0) << "\n"; } } catch (const Error &error) { std::cerr << error.what() << "\n"; return 1; } return 0; }Direct performance comparisons between regular expression engines are often misleading.
The following benchmarks are provided only as a rough indication of performance characteristics.
All benchmarks were run on a 2021 MacBook Pro with an Apple M1 Max CPU and ample memory.
Important notes:
- The Erbsland Regular Expression Library:
- Always reads and validates UTF-8 input
- Performs Unicode-aware comparisons in all modes
- The compiled program from the pattern is not optimized for performance
- The engine has almost no speed optimizations
- Neither PCRE nor
std::regexenforce strict UTF-8 validation (for this benchmark). - βASCII modeβ in Erbsland RE only affects character class handling; input is still processed as Unicode.
These are the results for version 1.0.0 of the library.
Benchmarking file: war_and_peace.txt (3.20 MB) βββββββββββββββ¬ββββββββββββββ¬ββββββββββ¬ββββββββββββ¬ββββββ¬ββββββββββββββββββββββββ¬ββββββββββ β Pattern β Library β Mode β Time (ms) β % β Bar β Matches β βββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β Words β erbsland-re β Unicode β 145.751 β 100 β ββββββββββ β 576584 β β β pcre2 β Unicode β 66.280 β 45 β βββββ β 576584 β β β erbsland-re β Ascii β 144.482 β 99 β ββββββββββ β 586871 β β β std::regex β Ascii β 232.292 β 159 β ββββββββββββββββ β 586871 β β β pcre2 β Ascii β 45.934 β 32 β βββ β 586871 β βββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β Capitalized β erbsland-re β Unicode β 86.961 β 100 β ββββββββββ β 50038 β β β pcre2 β Unicode β 8.242 β 9 β β β 50038 β β β erbsland-re β Ascii β 86.307 β 99 β ββββββββββ β 60182 β β β std::regex β Ascii β 215.900 β 248 β βββββββββββββββββββββΊ β 60182 β β β pcre2 β Ascii β 7.872 β 9 β β β 60182 β βββββββββββββββ΄ββββββββββββββ΄ββββββββββ΄ββββββββββββ΄ββββββ΄ββββββββββββββββββββββββ΄ββββββββββ Benchmarking file: shakespeare.html (6.98 MB) βββββββββββββββββββββ¬ββββββββββββββ¬ββββββββββ¬ββββββββββββ¬ββββββ¬ββββββββββββββββββββββββ¬ββββββββββ β Pattern β Library β Mode β Time (ms) β % β Bar β Matches β βββββββββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β Words β erbsland-re β Unicode β 310.203 β 100 β ββββββββββ β 1301628 β β β pcre2 β Unicode β 166.467 β 54 β ββββββ β 1301628 β β β erbsland-re β Ascii β 309.185 β 100 β ββββββββββ β 1301773 β β β std::regex β Ascii β 521.527 β 168 β βββββββββββββββββ β 1301773 β β β pcre2 β Ascii β 100.895 β 33 β ββββ β 1301773 β βββββββββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β Capitalized β erbsland-re β Unicode β 205.873 β 100 β ββββββββββ β 184903 β β β pcre2 β Unicode β 29.793 β 14 β ββ β 184903 β β β erbsland-re β Ascii β 217.043 β 105 β βββββββββββ β 185027 β β β std::regex β Ascii β 493.012 β 239 β βββββββββββββββββββββΊ β 185027 β β β pcre2 β Ascii β 27.376 β 13 β ββ β 185027 β βββββββββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β URI β erbsland-re β Unicode β 146.271 β 100 β ββββββββββ β 10 β β β pcre2 β Unicode β 8.598 β 6 β β β 10 β β β erbsland-re β Ascii β 146.246 β 100 β ββββββββββ β 10 β β β std::regex β Ascii β 409.767 β 280 β βββββββββββββββββββββΊ β 10 β β β pcre2 β Ascii β 8.365 β 6 β β β 10 β βββββββββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β ExtractTocLinks β erbsland-re β Unicode β 432.889 β 100 β ββββββββββ β 44 β β β pcre2 β Unicode β 6.351 β 1 β β 44 β β β erbsland-re β Ascii β 432.571 β 100 β ββββββββββ β 44 β β β std::regex β Ascii β 557.580 β 129 β βββββββββββββ β 44 β β β pcre2 β Ascii β 6.175 β 1 β β 44 β βββββββββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β ExtractLicenseDiv β erbsland-re β Unicode β 432.680 β 100 β ββββββββββ β 1 β β β pcre2 β Unicode β 6.412 β 1 β β 1 β β β erbsland-re β Ascii β 432.290 β 100 β ββββββββββ β 1 β β β std::regex β Ascii β 543.849 β 126 β βββββββββββββ β 1 β β β pcre2 β Ascii β 6.205 β 1 β β 1 β βββββββββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌββββββββββββββββββββββββΌββββββββββ€ β HTML Tags β erbsland-re β Unicode β 195.751 β 100 β ββββββββββ β 159584 β β β pcre2 β Unicode β 18.203 β 9 β β β 159584 β β β erbsland-re β Ascii β 190.732 β 97 β ββββββββββ β 159584 β β β std::regex β Ascii β 416.028 β 213 β βββββββββββββββββββββΊ β 159584 β β β pcre2 β Ascii β 18.031 β 9 β β β 159584 β βββββββββββββββββββββ΄ββββββββββββββ΄ββββββββββ΄ββββββββββββ΄ββββββ΄ββββββββββββββββββββββββ΄ββββββββββ Benchmarking file: shakespeare.txt (5.38 MB) βββββββββββββββ¬ββββββββββββββ¬ββββββββββ¬ββββββββββββ¬ββββββ¬βββββββββββββββββββββ¬ββββββββββ β Pattern β Library β Mode β Time (ms) β % β Bar β Matches β βββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌβββββββββββββββββββββΌββββββββββ€ β Words β erbsland-re β Unicode β 247.719 β 100 β ββββββββββ β 996052 β β β pcre2 β Unicode β 119.407 β 48 β βββββ β 996052 β β β erbsland-re β Ascii β 242.508 β 98 β ββββββββββ β 996199 β β β std::regex β Ascii β 403.254 β 163 β βββββββββββββββββ β 996199 β β β pcre2 β Ascii β 79.064 β 32 β βββ β 996199 β βββββββββββββββΌββββββββββββββΌββββββββββΌββββββββββββΌββββββΌβββββββββββββββββββββΌββββββββββ€ β Capitalized β erbsland-re β Unicode β 212.670 β 100 β ββββββββββ β 180019 β β β pcre2 β Unicode β 27.846 β 13 β ββ β 180019 β β β erbsland-re β Ascii β 162.808 β 77 β ββββββββ β 180140 β β β std::regex β Ascii β 380.359 β 179 β ββββββββββββββββββ β 180140 β β β pcre2 β Ascii β 26.088 β 12 β β β 180140 β βββββββββββββββ΄ββββββββββββββ΄ββββββββββ΄ββββββββββββ΄ββββββ΄βββββββββββββββββββββ΄ββββββββββ - Words
\w+ - Capitalized
\b[A-Z][a-z]*\b - URI
https?://[a-zA-Z0-9\.]+ - ExtractTocLinks
<a href="#(chap([0-9]{2}))" class="pginternal">([^<]+)</a> - ExtractLicenseDiv
<div id="(([^-\"]+)-([^-"]+)-([^"]+))">([^<]+)</div> - HTML Tags
<[a-z1-6]+[^>]*>
- A C++20-compliant compiler:
- Clang
- GCC
- MSVC
- CMake 3.23 or newer
Copyright Β© 2026 Tobias Erbsland
https://erbsland.dev/
Licensed under the Apache License, Version 2.0.
You may obtain a copy at:
http://www.apache.org/licenses/LICENSE-2.0
Distributed on an βAS ISβ basis, without warranties or conditions of any kind.
See the LICENSE file for full details.