Backend engineer with 6+ years building data-intensive systems — SQL-native tuning across 6 heterogeneous databases (PostgreSQL, Oracle, MariaDB, MSSQL, DB2, Netezza), full-stack from backend to mobile, and now building AI service pipelines end-to-end.
I believe the next wave of useful software won't come from training bigger models, but from building better plumbing around them — reliable extraction, structured search, safety guardrails, and pipelines that run without cloud API keys.
Currently building QuartzUnit: composable Python tools that let AI agents collect, extract, search, and monitor data — entirely locally.
- Mirror Agent — local autonomous AI agent with custom ReAct pipeline, RAG (Qdrant + Neo4j), persistent memory, and 2-tier LLM fallback. No framework, no cloud. Runs entirely on-premise.
- Forge — multi-source data refinement into LLM-ready warehouses. Any domain, any source — news, blogs, enterprise back-data, legacy DB migrations. Includes fact-checking at 83.6% production accuracy.
- QubicAI — automated data mart generation + natural language → SQL. Turns Forge warehouses into queryable marts that LLMs can use directly.
- QuartzUnit — personal data asset platform. Consumption/production knowledge archiving + AI labeling + mobile app. Building toward a personal data marketplace.
- QuartzUnit OSS — 10 Python packages on PyPI extracted from the above projects. The composable tool layer that powers the ecosystem.
| LLM pipelines that work identically online and offline — collection → extraction → RAG → fact-checking, all on local inference.
| 6 years delivering DW/DM, BI/OLAP, and analytics to government and enterprise clients — often on air-gapped networks. Cross-DB migrations, query tuning, report automation.
|
Most AI agent frameworks give you an orchestrator but expect you to figure out the I/O yourself. QuartzUnit is the I/O layer — 10 focused tools that each solve one problem well, designed to be chained together.
Every tool ships with three interfaces: CLI for scripting, async Python API for integration, and MCP server for AI agent consumption. Zero cloud dependency — everything runs on your machine.
Collect Extract Search Monitor Guard ───────── ───────── ───────── ───────── ───────── feedkit ───→ markgrab ───→ embgrep ───→ diffgrab agent-action-policy (RSS/Atom) (HTML/PDF/ (semantic) (web change agent-loop-guard YouTube) tracking) llm-degen-guard docpick (OCR→JSON) browsegrab ───→ snapgrab (browser (screenshot) agent) I needed these tools for my own data pipelines — collecting news from 444 RSS feeds, extracting article content for fact-checking, searching across collected documents by meaning, and monitoring pages for changes. Every tool started as a module in a private project, then got extracted into a standalone package when it became useful on its own.
The guard libraries (loop detection, degeneration detection, action policies) came from running autonomous agents that would occasionally get stuck in loops, produce garbage output, or try to access things they shouldn't. Rather than adding ad-hoc checks, I built proper detectors that any agent framework can use.
| Package | What it does | PyPI | Tests |
|---|---|---|---|
| markgrab | URL → LLM-ready markdown (HTML, YouTube, PDF, DOCX) | 114 | |
| docpick | Schema-driven document OCR → structured JSON | 217 | |
| feedkit | RSS/Atom collection with 444 curated feeds | 34 | |
| browsegrab | Token-efficient browser agent for local LLMs | 200 | |
| snapgrab | URL → screenshot + metadata (Claude Vision optimized) | 29 | |
| diffgrab | Web page change tracking with structured diffs | 89 | |
| embgrep | Local semantic search (embedding-powered grep) | 74 | |
| llm-degen-guard | LLM output degeneration detector | 55 | |
| agent-loop-guard | Agent infinite loop detection | 78 | |
| agent-action-policy | Declarative action policies for AI agents | 69 |
959 tests across 10 packages · Open-source (MIT / Apache-2.0) · Korean + English documentation
End-to-end examples showing how QuartzUnit packages chain together:
| Project | Pipeline | What it does |
|---|---|---|
| newswatch | feedkit → markgrab → embgrep → diffgrab | Collect RSS feeds, extract articles, build semantic search index, track changes |
| watchdeck | diffgrab → markgrab → snapgrab → guard trio | Monitor web pages for changes with visual diffs and safety guards |
Architecture deep-dives with quantifiable results and honest failure analysis:
| Case Study | Domain | Key Result |
|---|---|---|
| Fact-Checking Pipeline | NLP / Verification | 83.6% accuracy through 14 iterations, 127K claims |
| Autonomous AI Agent | Agent Systems | Zero-framework ReAct + RAG, 1,235 tests, 101s → 3s |
| Data Mart Automation | Data Engineering | 47.4M rows, 90% Text2SQL with zero manual config |
| 6 | Database engines tuned in production (PostgreSQL, Oracle, MariaDB, MSSQL, DB2, Netezza) |
| 10 | Open-source Python packages on PyPI |
| 959 | Tests across the QuartzUnit ecosystem |
| 444 | Curated, verified RSS feeds in the feedkit catalog |
| 800K+ | Quality articles collected across 115 domains |
| 83.6% | Fact-checking pipeline effective accuracy (production) |
Languages
Backend & AI
Database
Frontend & Mobile
Infra
