Invisible Unicode Steganography Toolkit
Python Compatibility:
- Requires Python 3.7+
- License: MIT
Overview: ZeroWidthStego is a comprehensive, modular steganography and detection tool built in Python. It focuses on the abuse and analysis of zero width Unicode characters and homoglyph based text obfuscation, supporting multiple encoding schemes for covert data embedding and extraction.
This tool provides:
- Encoding and decoding logic
- Auto-detection of steganographic schemes
- Static and semantic analysis
- High-density data hiding techniques
- Integration into CTFs, CI/CD pipelines, and source code workflows
Built For:
- Security researchers and threat analysts
- CTF players and tool developers
- Red and blue teams
- Source code integrity auditors
- 12+ Encoding Schemes: Binary, Quaternary, Octal, UTF-8, UTF-16, Directional, Homoglyph, and more
- ZWSP Spacing Mode with full decode support
- Homoglyph substitution with carrier aware injection
- AES Encryption Layer (optional symmetric encryption)
- Threshold-based Encoding/Decoding
- Advanced Brute-Force Mode with base switching and heuristics
- Static and Dynamic Stego Detection Engine
- Entropy & Flag Pattern Analysis (flag{}, picoctf{}, etc.)
- Carrier file injection and extraction with cleanup control
git clone https://github.com/yourusername/zero-width-steganography.git cd ZeroWidthStego # No dependencies required - pure Python 3 python zero_width_tool.py encode "secret message" -o hidden.txt python zero_width_tool.py encode -i secret.txt -o hidden.txt python zero_width_tool.py encode "flag{hidden}" -o output.txt --carrier normal_text.txtpython zero_width_tool.py decode -i hidden.txt python zero_width_tool.py decode -i hidden.txt -o decoded.txt python zero_width_tool.py decode -i hidden.txt --scheme simple_8bitpython zero_width_tool.py analyze -i suspicious.txt python zero_width_tool.py analyze -i document.txtpython zero_width_tool.py encode "message" -o output.txt --scheme simple_8bit python zero_width_tool.py encode "message" -o output.txt --scheme basic_utf16 python zero_width_tool.py encode "message" -o output.txt --scheme quaternary_utf8 python zero_width_tool.py encode "secret" -o output.txt --scheme homoglyph_binary_utf8 --carrier carrier.txt python zero_width_tool.py encode "message" -o output.txt --scheme zwsp_spacing --carrier carrier.txtecho "secret message" \ | python zero_width_tool.py encode -i - --scheme simple_8bit \ | python zero_width_tool.py decode -i - for file in *.txt; do echo "=== $file ===" python zero_width_tool.py analyze -i "$file" donepython zero_width_tool.py decode -i challenge.txt --scheme basic_utf16 python zero_width_tool.py decode -i challenge.txt --scheme quaternary_utf8find . -name "*.txt" -exec python zero_width_tool.py analyze -i {} \;python zero_width_tool.py decode -i memo.txt -o extracted.txt| Scheme Name | Bits/Symbol | Description |
|---|---|---|
| simple_8bit | 1-bit | ZWSP=0, ZWNJ=1 (exact decoder logic) |
| basic_utf8 | 1-bit | Standard UTF-8 encoding with ZWSP/ZWNJ |
| basic_utf16 | 1-bit | UTF-16, 16-bit encoding units |
| quaternary_utf8 | 2-bit | 2-bit per symbol using ZWSP, ZWNJ, ZWJ, WJ |
| octal_utf8 | 3-bit | 3-bit per symbol using 8 zero-width characters |
| binary_directional_utf8 | 1-bit | Uses directional LRM=0, RLM=1 |
| homoglyph_binary_utf8 | 1-bit | Homoglyph substitution (Cyrillic & Latin) |
| zwsp_spacing | Pattern | Letters separated by ZWSP |
| Adversary Behavior | Category | Mapping ID |
|---|---|---|
| Invisible Covert Channel | Command & Control | T1001.003 |
| Payload Injection in Text | Defense Evasion | T1027 |
| Source Code Backdooring | Supply Chain Comp. | T1195 |
| Data Exfiltration via Unicode | Exfiltration | T1048 |
| C2 Signaling in Comments | Defense Evasion | T1564.004 |
| Content Obfuscation | Impact | T1498 |
python zerowidthstego.py decode -i file.txt --scheme basic_utf16 python zerowidthstego.py decode -i file.txt --scheme zwsp_spacing python zerowidthstego.py decode -i file.txt --force python zerowidthstego.py analyze -i file.txt python zerowidthstego.py encode "msg" -o hidden.txt python zerowidthstego.py decode -i hidden.txt python zerowidthstego.py analyze -i file.txt python zerowidthstego.py encode "msg" -o out.txt --carrier text.txt --scheme homoglyph_binary_utf8 python zerowidthstego.py decode -i file.txt --scheme basic_utf16 --force Python: from zwstego import ZeroWidthEncoder, EncodingScheme encoder = ZeroWidthEncoder(EncodingScheme.SIMPLE_8BIT) stego = encoder.encode("flag{hidden_data}") decoded = encoder.decode(stego) Extend by modifying SCHEMES dictionary and EncodingScheme enum.
Summary: Zero width character steganography is a covert encoding method using invisible Unicode control characters. These characters are not visible to humans or rendered by most editors, but persist through storage, transmission, and copy/paste.
Key Properties:
- Invisible across platforms and apps
- Persists through re-encoding and copy/paste
- Bypasses naive filters
- Undetected by most antivirus tools and SIEM/DLP
Encoding Techniques:
- Binary using ZWSP/ZWNJ
- 2-bit schemes using ZWJ and WJ as well
- 3-bit octal using eight zero width characters
- Homoglyph substitution
- Compressed binary → zero width hybrid
Insertion Strategies:
- HTML/JS comments
- JSON/YAML/Markdown
- Software documentation & Git commits
- Natural language stego in text
Malicious Use:
- Supply chain attacks
- Payload concealment & exfiltration
- Hidden C2 signaling
- Code integrity compromise
Legitimate Use:
- Watermarking and attribution
- Anti-scraping defense
- Secure provenance verification
Detection:
- Regex: [\u200B-\u200D\u202A-\u202E\u2060\uFEFF]
- Display control characters in editors
- Printable vs. invisible entropy analysis
- Hunt for CTF markers: flag{}, ctf{}, picoctf{}
Countermeasures:
- Strip zero width characters by default
- Unicode normalization NFKC/NFKD
- CI checks for supply chain integrity
- SOC awareness and incident playbooks
MIT License. Attribution appreciated.
Based on:
- Steganography and Unicode security research
- CTF adversarial tradecraft
- Red team stealth and evasion methodologies
- Industry cases of text-based payload delivery
"Not all things need to be encrypted to be hidden. And not all who hide deceive."