Idiomatic Python 3.8+ bindings for SNKV — a lightweight, ACID-compliant embedded key-value store built directly on SQLite's B-Tree engine.
If you find it useful, a ⭐ on GitHub goes a long way!
- Dict-style API —
db["key"] = value,val = db["key"],del db["key"],"key" in db - Context managers —
with KVStore(...) as dbandwith db.create_column_family(...) as cffor guaranteed cleanup - Prefix iterators — efficient namespace scans with
db.prefix_iterator(b"user:") - Reverse iterators — walk keys in descending order with
db.reverse_iterator()anddb.reverse_prefix_iterator(b"user:") - WAL checkpoint control — PASSIVE / FULL / RESTART / TRUNCATE modes via
db.checkpoint() - Auto-checkpoint — set
wal_size_limit=Nto checkpoint automatically after every N WAL frames - Typed exceptions —
NotFoundError,BusyError,LockedError,ReadOnlyError,CorruptErrorall subclasssnkv.Error - No Python dependencies — pure CPython C extension; only requires a C compiler and
python3-dev - Native TTL — per-key expiry with
put(ttl=seconds), dict-styledb[key, ttl] = value, lazy expiry on get, andpurge_expired() - Encryption — per-value XChaCha20-Poly1305 encryption with Argon2id key derivation; transparent to all existing APIs
- Seek iterators — jump to any key in O(log N) with
it.seek(key), chainable and works on prefix/reverse iterators - Conditional insert — atomic
put_if_absent(key, value, ttl=None)returnsTrueif inserted; safe for distributed locks and dedup - Bulk clear —
db.clear()/cf.clear()truncates all keys in O(pages) without dropping the store - Key count —
db.count()/cf.count()returns entry count in O(pages); CF counts are fully isolated - Extended stats —
db.stats()exposes 12 counters includingbytes_read,bytes_written,wal_commits,ttl_expired,db_pages; reset withdb.stats_reset() - Vector search — integrated HNSW approximate nearest-neighbour index via
snkv[vector]; sidecar persistence, quantization (f32/f16/i8), metadata filtering, exact rerank, TTL on vectors, and encryption support - 471 tests — full pytest suite covering ACID, WAL, crash recovery, concurrency, column families, TTL, encryption, and vector search
Pre-built binary wheels are available for Linux, macOS, and Windows — no compiler needed.
Windows / macOS:
pip install snkvLinux (Debian/Ubuntu):
python3 -m venv .venv source .venv/bin/activate pip install snkvLinux system Python is "externally managed" (PEP 668) and blocks system-wide pip installs. Use a virtual environment.
# System dependencies sudo apt-get install -y build-essential python3-dev python3-pip # Python build dependencies pip3 install setuptools wheel pytest # Build cd python python3 setup.py build_ext --inplace# Compiler (skip if already installed) xcode-select --install # Python build dependencies pip3 install setuptools wheel pytest # Build cd python python3 setup.py build_ext --inplace- Install Python 3.8+ — check "Add Python to PATH"
- Install Visual Studio Build Tools — select "Desktop development with C++"
- Open "x64 Native Tools Command Prompt for VS 2022" from the Start Menu (required for 64-bit Python; "Developer PowerShell for VS" defaults to 32-bit and will fail)
:: Python build dependencies pip install setuptools wheel pytest :: Build cd python python setup.py build_ext --inplaceOpen the MSYS2 MinGW64 shell (not plain MSYS2, not cmd.exe):
# System + Python dependencies (one-time) pacman -S --needed mingw-w64-x86_64-python \ mingw-w64-x86_64-python-pip \ mingw-w64-x86_64-python-setuptools \ mingw-w64-x86_64-python-pytest # Build cd python python3 setup.py build_ext --inplaceOn all platforms,
setup.pyautomatically locatessnkv.h— no manual header step needed. On Linux/macOS it regenerates it viamake snkv.h; on Windows it falls back to the pre-builtsnkv.hincluded in the repo.
from snkv import KVStore with KVStore("mydb.db") as db: db["hello"] = "world" print(db["hello"].decode()) # worldfrom snkv import KVStore, JOURNAL_WAL, JOURNAL_DELETE, SYNC_NORMAL, SYNC_OFF, SYNC_FULL with KVStore( "mydb.db", journal_mode=JOURNAL_WAL, # JOURNAL_WAL (default) or JOURNAL_DELETE sync_level=SYNC_NORMAL, # SYNC_NORMAL (default), SYNC_OFF, SYNC_FULL cache_size=2000, # pages (~8 MB default) page_size=4096, # bytes; new databases only busy_timeout=5000, # ms to retry on SQLITE_BUSY (default 0) read_only=False, # open read-only wal_size_limit=100, # auto-checkpoint every 100 WAL frames (0 = off) ) as db: ...# Write db["key"] = b"value" # bytes or str keys/values are both accepted db["key"] = "value" # str is UTF-8 encoded automatically # Read val = db["key"] # returns bytes; raises NotFoundError if missing val = db.get("key") # returns bytes or None val = db.get("key", b"def") # with default # Check existence exists = "key" in db exists = db.exists(b"key") # Delete del db["key"] db.delete(b"key") # same as del; no error if key absent # Upsert db.put(b"key", b"value") # identical to db["key"] = valuedb.begin(write=True) db["a"] = "1" db["b"] = "2" db.commit() # persist db.begin(write=True) db["c"] = "3" db.rollback() # discard — "c" is never writtenAuto-commit is the default: each db["key"] = value outside an explicit transaction is committed immediately.
Logical namespaces within a single database file. Always close cf before db.
# Create (first use) with db.create_column_family("users") as cf: cf[b"alice"] = b"admin" cf[b"bob"] = b"viewer" # Open (subsequent uses) with db.open_column_family("users") as cf: print(cf[b"alice"]) # b"admin" # List all column families names = db.list_column_families() # ["users", ...] # Drop db.drop_column_family("users")# Full scan — yields (key, value) tuples in key order for key, value in db.iterator(): print(key, value) # Prefix scan for key, value in db.prefix_iterator(b"user:"): print(key, value) # Manual control it = db.iterator() it.first() while not it.eof: print(it.key, it.value) it.next() it.close() # As a context manager with db.iterator() as it: for key, value in it: ...Walk keys in descending order — no full scan, no sort, pure B-tree traversal.
# Full reverse scan for key, value in db.reverse_iterator(): print(key, value) # Reverse prefix scan — visits only matching keys, largest first for key, value in db.reverse_prefix_iterator(b"user:"): print(key, value) # Manual control it = db.reverse_iterator() it.last() while not it.eof: print(it.key, it.value) it.prev() it.close() # As a context manager with db.reverse_prefix_iterator(b"log:") as it: for key, value in it: ...Column families support reverse iterators identically via cf.reverse_iterator() and cf.reverse_prefix_iterator().
from snkv import CHECKPOINT_PASSIVE, CHECKPOINT_FULL, CHECKPOINT_RESTART, CHECKPOINT_TRUNCATE # Returns (nLog, nCkpt) — WAL frames total / frames written to DB nlog, nckpt = db.checkpoint(CHECKPOINT_PASSIVE) # copy frames without blocking nlog, nckpt = db.checkpoint(CHECKPOINT_FULL) # wait for writers, flush all nlog, nckpt = db.checkpoint(CHECKPOINT_RESTART) # like FULL, reset write position nlog, nckpt = db.checkpoint(CHECKPOINT_TRUNCATE) # like RESTART, truncate WAL fileMust be called outside an active write transaction. Use wal_size_limit to auto-checkpoint instead.
Jump to any position in O(log N) without scanning from the start.
with db.iterator() as it: it.seek(b"user:bob") # forward: position at first key >= target while not it.eof: print(it.key, it.value) it.next() with db.iterator(reverse=True) as it: it.last() it.seek(b"user:bob") # reverse: position at last key <= target while not it.eof: print(it.key, it.value) it.prev() # Works on prefix iterators too — boundary still enforced with db.iterator(prefix=b"user:") as it: it.seek(b"user:carol") # skip straight to "user:carol" while not it.eof: print(it.key) it.next() # seek() returns self for chaining key = db.iterator().seek(b"target").keyAtomically insert a key only when it is absent — safe for distributed locks and deduplication.
# Returns True if inserted, False if the key already existed. inserted = db.put_if_absent(b"lock", b"owner:alice") # With TTL — the key auto-releases after the given number of seconds. inserted = db.put_if_absent(b"session:42", b"token-xyz", ttl=30) # Column families support the same method. with db.create_column_family("dedup") as cf: if cf.put_if_absent(b"msg:001", b"hello"): process(b"msg:001") # only the first caller reaches hereTruncate all entries from a store or column family in O(pages) — no iterating, no individual deletes.
db.clear() # remove every key from the default CF with db.create_column_family("cache") as cf: cf.clear() # only this CF is affected; other CFs are untouchedTTL index entries are cleared atomically alongside data entries. Close all iterators before calling clear().
Count entries without scanning individual keys.
n = db.count() # total entries in the default CF with db.open_column_family("users") as cf: n = cf.count() # only this CF; TTL index not counted # count() includes expired-but-not-yet-purged keys. # Call purge_expired() first for an accurate live count. db.purge_expired() n = db.count()db.sync() # flush OS write buffers (fsync) db.vacuum(100) # reclaim up to 100 unused pages incrementally db.integrity_check() # raises CorruptError if database is corrupt # Extended stats — 12 counters stats = db.stats() # Keys: puts, gets, deletes, iterations, errors, # bytes_read, bytes_written, wal_commits, checkpoints, # ttl_expired, ttl_purged, db_pages # Reset all cumulative counters (db_pages is always live) db.stats_reset()Per-key TTL with automatic lazy expiry on read.
# Put with TTL (seconds, float precision) db.put(b"session", b"tok123", ttl=60) # expires in 60 s db[b"token", 30] = b"bearer-xyz" # dict-style shorthand # Get — expired keys are silently evicted and raise NotFoundError val = db.get(b"session") # returns bytes or None if expired # Check remaining lifetime from snkv import NotFoundError try: remaining = db.ttl(b"session") # seconds remaining (float) except NotFoundError: remaining = None # key expired or never set # Purge all expired keys from disk (returns count removed) n = db.purge_expired() # Column families support TTL identically with db.create_column_family("cache") as cf: cf.put(b"item", b"data", ttl=10) cf[b"item2", 5] = b"data2" n = cf.purge_expired()Transparent per-value encryption. All existing APIs work without modification.
from snkv import KVStore, AuthError # Create / open encrypted store with KVStore.open_encrypted("mydb.db", b"hunter2") as db: db[b"secret"] = b"classified" print(db.is_encrypted()) # True print(db[b"secret"]) # b"classified" — transparent decrypt # Wrong password raises AuthError try: KVStore.open_encrypted("mydb.db", b"wrong") except AuthError: print("bad password") # Change password in-place (re-encrypts all values atomically) with KVStore.open_encrypted("mydb.db", b"hunter2") as db: db.reencrypt(b"new-strong-pass") # Remove encryption permanently with KVStore.open_encrypted("mydb.db", b"new-strong-pass") as db: db.remove_encryption() with KVStore("mydb.db") as db: # plain open works now print(db[b"secret"])| Method | Description |
|---|---|
KVStore.open_encrypted(path, password, **kwargs) | Class method — open or create encrypted store |
db.is_encrypted() | Returns True if store is encrypted |
db.reencrypt(new_password) | Change password; re-encrypts all values atomically |
db.remove_encryption() | Decrypt in-place; store becomes plain |
Cryptographic details: XChaCha20-Poly1305 per value · Argon2id KDF (64 MB, 3 iterations) · 40-byte overhead per value (nonce + MAC) · key wiped from memory on close.
Integrated HNSW approximate nearest-neighbour index backed by usearch. All vectors and KV data live in the same .db file — no separate index file, no external service.
pip install snkv[vector]from snkv.vector import VectorStore import numpy as np with VectorStore("store.db", dim=128, space="cosine") as vs: vs.vector_put(b"doc:1", b"hello world", np.random.rand(128).astype("f4")) results = vs.search(np.random.rand(128).astype("f4"), top_k=5) for r in results: print(r.key, r.distance, r.value)| Parameter | Default | Description |
|---|---|---|
path | — | Path to .db file. None for in-memory. |
dim | — | Vector dimension. Fixed for the lifetime of the store. |
space | "l2" | Distance metric: "l2" (squared L2), "cosine", or "ip" (inner product). |
connectivity | 16 | HNSW M parameter. |
expansion_add | 128 | HNSW expansion during index build. |
expansion_search | None | HNSW expansion at query time. None restores the stored value (default 64). |
dtype | "f32" | In-memory index precision: "f32", "f16" (half RAM), or "i8" (quarter RAM). On-disk storage is always float32. |
password | None | Open/create an encrypted store. Sidecar is disabled for encrypted stores. |
dtype controls the in-memory HNSW graph precision only — on-disk storage in _snkv_vec_ is always float32.
| dtype | RAM per vector (dim=768) | Notes |
|---|---|---|
"f32" | 3072 bytes | Full precision (default) |
"f16" | 1536 bytes | Half RAM, negligible recall loss |
"i8" | 768 bytes | Quarter RAM, small recall cost |
For 1 M vectors at dim=768: f32 ≈ 3 GB → f16 ≈ 1.5 GB → i8 ≈ 768 MB.
# Half RAM for the in-memory index; on-disk vectors still float32 with VectorStore("store.db", dim=768, space="cosine", dtype="f16") as vs: vs.vector_put(b"doc:1", b"hello", np.random.rand(768).astype("f4"))For unencrypted file-backed stores, the HNSW index is saved to {path}.usearch on close() and reloaded on the next open — skipping the O(n×d) CF rebuild. A companion {path}.usearch.nid stamp file detects any write that occurred after the last clean close (including crash scenarios). Stale or corrupt sidecars are silently discarded and the index is rebuilt from the column families.
Encrypted stores and in-memory stores always rebuild from column families.
# Write vs.vector_put(b"key", b"value", vec, ttl=None, metadata=None) vs.vector_put_batch([(b"key", b"value", vec), ...], ttl=None) # Search results = vs.search(query_vec, top_k=10) # ANN results = vs.search(query_vec, top_k=10, filter={"topic": "ml"}) # metadata filter results = vs.search(query_vec, top_k=10, rerank=True) # exact rerank results = vs.search(query_vec, top_k=10, max_distance=0.5) # distance cutoff pairs = vs.search_keys(query_vec, top_k=10) # keys + distances only # SearchResult fields: key, value, distance, metadata # NOTE: result.metadata is None unless filter= is passed to search(). # To access metadata without filtering, call get_metadata(key) after the search: for r in results: meta = vs.get_metadata(r.key) # dict or None — always works # Read vec = vs.vector_get(b"key") # np.ndarray(dim,) float32 val = vs.get(b"key") # value bytes from KV store meta = vs.get_metadata(b"key") # dict or None # Delete / maintenance vs.delete(b"key") n = vs.vector_purge_expired() # remove expired vectors from index + CFs # Stats stats = vs.vector_stats() # Keys: dim, space, dtype, connectivity, expansion_add, expansion_search, # count, capacity, fill_ratio, vec_cf_count, has_metadata, sidecar_enabled # Drop index (KV data preserved) vs.drop_vector_index()from snkv import AuthError with VectorStore("store.db", dim=128, password=b"secret") as vs: vs.vector_put(b"doc:1", b"classified", np.random.rand(128).astype("f4")) try: VectorStore("store.db", dim=128, password=b"wrong") except AuthError: print("bad password")snkv.Error (base) ├── snkv.NotFoundError (also KeyError — raised by db["missing"]) ├── snkv.BusyError (SQLITE_BUSY — another writer holds the lock) ├── snkv.LockedError (SQLITE_LOCKED) ├── snkv.ReadOnlyError (write attempted on read-only store) ├── snkv.CorruptError (database file is corrupt) └── snkv.AuthError (wrong password or not an encrypted store) snkv.vector.VectorIndexError (index dropped or empty; not a subclass of snkv.Error) import snkv try: val = db["missing_key"] except snkv.NotFoundError: val = b"default" try: db["key"] = b"value" except snkv.BusyError: # retry after a delay ...Linux / macOS
cd python python3 -m pytest tests/ -vWindows — Native Python (x64 Native Tools Command Prompt for VS 2022)
cd python set PYTHONPATH=. python -m pytest tests\ -vWindows — MSYS2 MinGW64 shell
cd python PYTHONPATH=. python3 -m pytest tests/ -vAll 471 tests should pass.
Linux / macOS
cd python PYTHONPATH=. python3 examples/basic.py # CRUD, binary data, in-memory store PYTHONPATH=. python3 examples/transactions.py # begin/commit/rollback PYTHONPATH=. python3 examples/column_families.py # logical namespaces PYTHONPATH=. python3 examples/iterators.py # ordered scan, prefix scan PYTHONPATH=. python3 examples/config.py # journal mode, sync, cache, WAL limit PYTHONPATH=. python3 examples/checkpoint.py # manual + auto WAL checkpoint PYTHONPATH=. python3 examples/session_store.py # real-world session store pattern PYTHONPATH=. python3 examples/ttl.py # TTL expiry, rate limiter demo PYTHONPATH=. python3 examples/encryption.py # encrypted store, wrong-password, reencrypt PYTHONPATH=. python3 examples/iterator_reverse.py # reverse iterators, descending scans PYTHONPATH=. python3 examples/new_apis.py # seek, put_if_absent, clear, count, stats PYTHONPATH=. python3 examples/multiprocess.py # 5 concurrent processes, busy_timeout PYTHONPATH=. python3 examples/vector.py # vector search, quantization, sidecar, TTL, encryptionWindows — Native Python (x64 Native Tools Command Prompt for VS 2022)
cd python set PYTHONPATH=. python examples\basic.py python examples\transactions.py python examples\column_families.py python examples\iterators.py python examples\config.py python examples\checkpoint.py python examples\session_store.py python examples\ttl.py python examples\encryption.py python examples\iterator_reverse.py python examples\new_apis.py python examples\multiprocess.py python examples\all_apis.py python examples\vector.pyWindows — MSYS2 MinGW64 shell
cd python PYTHONPATH=. python3 examples/basic.py PYTHONPATH=. python3 examples/transactions.py # ... same pattern for all examplesEach thread must use its own KVStore instance. WAL mode serialises concurrent writers at the SQLite level — a BusyError is raised (or retried up to busy_timeout ms) when two writers collide. Multiple readers always make progress concurrently in WAL mode.
import threading from snkv import KVStore, JOURNAL_WAL def worker(db_path, worker_id): # Each thread opens its own connection with KVStore(db_path, journal_mode=JOURNAL_WAL, busy_timeout=5000) as db: db[f"key_{worker_id}".encode()] = b"value" threads = [threading.Thread(target=worker, args=("mydb.db", i)) for i in range(4)] for t in threads: t.start() for t in threads: t.join()The snkv Python package embeds the following third-party libraries compiled into its native extension:
| Library | Version | License | Notes |
|---|---|---|---|
| SQLite | 3.x (amalgamation subset) | Public Domain | B-tree, pager, WAL, OS layer |
| Monocypher | 4.x | CC0-1.0 (Public Domain) | XChaCha20-Poly1305 + Argon2id |
| usearch | ≥ 2.9 | Apache 2.0 | HNSW vector index (optional — pip install snkv[vector]) |
SQLite and Monocypher are statically linked into the extension module — no separate installation required.
SQLite and Monocypher are public domain — no attribution is legally required, but credit is given here in the spirit of good practice. usearch is an optional runtime dependency and is not bundled.
Apache License 2.0 © 2025 Hash Anu