MJ Hwang ArkNill

About

Backend engineer with 6+ years building data-intensive systems — SQL-native tuning across 6 heterogeneous databases (PostgreSQL, Oracle, MariaDB, MSSQL, DB2, Netezza), full-stack from backend to mobile, and now building AI service pipelines end-to-end.

I believe the next wave of useful software won't come from training bigger models, but from building better plumbing around them — reliable extraction, structured search, safety guardrails, and pipelines that run without cloud API keys.

Currently building QuartzUnit: composable Python tools that let AI agents collect, extract, search, and monitor data — entirely locally.

Currently Working On

Mirror Agent — local autonomous AI agent with custom ReAct pipeline, RAG (Qdrant + Neo4j), persistent memory, and 2-tier LLM fallback. No framework, no cloud. Runs entirely on-premise.
Forge — multi-source data refinement into LLM-ready warehouses. Any domain, any source — news, blogs, enterprise back-data, legacy DB migrations. Includes fact-checking at 83.6% production accuracy.
QubicAI — automated data mart generation + natural language → SQL. Turns Forge warehouses into queryable marts that LLMs can use directly.
QuartzUnit — personal data asset platform. Consumption/production knowledge archiving + AI labeling + mobile app. Building toward a personal data marketplace.
QuartzUnit OSS — 10 Python packages on PyPI extracted from the above projects. The composable tool layer that powers the ecosystem.

What I Build

Data & AI Pipelines

LLM pipelines that work identically online and offline — collection → extraction → RAG → fact-checking, all on local inference.

Autonomous AI agent with RAG (Qdrant + Neo4j), persistent memory, and custom ReAct pipeline
Automated fact-checking: claim extraction → tiered evidence search → verdict generation
800K+ articles collected, indexed, and searchable across 115 domains

Python vLLM PostgreSQL Qdrant Neo4j Redis

Full-stack & Data Engineering

6 years delivering DW/DM, BI/OLAP, and analytics to government and enterprise clients — often on air-gapped networks. Cross-DB migrations, query tuning, report automation.

LLM chatbot → natural language to BI dashboards (5-step UI → 1 sentence)
Video analytics with PGVector semantic search + LLM-powered metadata structuring
Mobile + web: Next.js dashboards, Kotlin/Compose Android, React Native

FastAPI Spring Boot Next.js Kotlin JPA React Native

QuartzUnit — LLM-native Tool Ecosystem

Most AI agent frameworks give you an orchestrator but expect you to figure out the I/O yourself. QuartzUnit is the I/O layer — 10 focused tools that each solve one problem well, designed to be chained together.

Every tool ships with three interfaces: CLI for scripting, async Python API for integration, and MCP server for AI agent consumption. Zero cloud dependency — everything runs on your machine.

Collect Extract Search Monitor Guard ───────── ───────── ───────── ───────── ───────── feedkit ───→ markgrab ───→ embgrep ───→ diffgrab agent-action-policy (RSS/Atom) (HTML/PDF/ (semantic) (web change agent-loop-guard YouTube) tracking) llm-degen-guard docpick (OCR→JSON) browsegrab ───→ snapgrab (browser (screenshot) agent)

Why I built this

I needed these tools for my own data pipelines — collecting news from 444 RSS feeds, extracting article content for fact-checking, searching across collected documents by meaning, and monitoring pages for changes. Every tool started as a module in a private project, then got extracted into a standalone package when it became useful on its own.

The guard libraries (loop detection, degeneration detection, action policies) came from running autonomous agents that would occasionally get stuck in loops, produce garbage output, or try to access things they shouldn't. Rather than adding ad-hoc checks, I built proper detectors that any agent framework can use.

Packages

Package	What it does	Tests
markgrab	URL → LLM-ready markdown (HTML, YouTube, PDF, DOCX)	114
docpick	Schema-driven document OCR → structured JSON	217
feedkit	RSS/Atom collection with 444 curated feeds	34
browsegrab	Token-efficient browser agent for local LLMs	200
snapgrab	URL → screenshot + metadata (Claude Vision optimized)	29
diffgrab	Web page change tracking with structured diffs	89
embgrep	Local semantic search (embedding-powered grep)	74
llm-degen-guard	LLM output degeneration detector	55
agent-loop-guard	Agent infinite loop detection	78
agent-action-policy	Declarative action policies for AI agents	69

959 tests across 10 packages · Open-source (MIT / Apache-2.0) · Korean + English documentation

Showcase

End-to-end examples showing how QuartzUnit packages chain together:

Project	Pipeline	What it does
newswatch	feedkit → markgrab → embgrep → diffgrab	Collect RSS feeds, extract articles, build semantic search index, track changes
watchdeck	diffgrab → markgrab → snapgrab → guard trio	Monitor web pages for changes with visual diffs and safety guards

Case Studies

Architecture deep-dives with quantifiable results and honest failure analysis:

Case Study	Domain	Key Result
Fact-Checking Pipeline	NLP / Verification	83.6% accuracy through 14 iterations, 127K claims
Autonomous AI Agent	Agent Systems	Zero-framework ReAct + RAG, 1,235 tests, 101s → 3s
Data Mart Automation	Data Engineering	47.4M rows, 90% Text2SQL with zero manual config

By the Numbers


6	Database engines tuned in production (PostgreSQL, Oracle, MariaDB, MSSQL, DB2, Netezza)
10	Open-source Python packages on PyPI
959	Tests across the QuartzUnit ecosystem
444	Curated, verified RSS feeds in the feedkit catalog
800K+	Quality articles collected across 115 domains
83.6%	Fact-checking pipeline effective accuracy (production)

Tech Stack

Languages

Backend & AI

Database

Frontend & Mobile

Infra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly