Skip to content
View ArkNill's full-sized avatar

Organizations

@QuartzUnit

Block or report ArkNill

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ArkNill/README.md

About

Backend engineer with 6+ years building data-intensive systems — SQL-native tuning across 6 heterogeneous databases (PostgreSQL, Oracle, MariaDB, MSSQL, DB2, Netezza), full-stack from backend to mobile, and now building AI service pipelines end-to-end.

I believe the next wave of useful software won't come from training bigger models, but from building better plumbing around them — reliable extraction, structured search, safety guardrails, and pipelines that run without cloud API keys.

Currently building QuartzUnit: composable Python tools that let AI agents collect, extract, search, and monitor data — entirely locally.


Currently Working On

  • Mirror Agent — local autonomous AI agent with custom ReAct pipeline, RAG (Qdrant + Neo4j), persistent memory, and 2-tier LLM fallback. No framework, no cloud. Runs entirely on-premise.
  • Forge — multi-source data refinement into LLM-ready warehouses. Any domain, any source — news, blogs, enterprise back-data, legacy DB migrations. Includes fact-checking at 83.6% production accuracy.
  • QubicAI — automated data mart generation + natural language → SQL. Turns Forge warehouses into queryable marts that LLMs can use directly.
  • QuartzUnit — personal data asset platform. Consumption/production knowledge archiving + AI labeling + mobile app. Building toward a personal data marketplace.
  • QuartzUnit OSS — 10 Python packages on PyPI extracted from the above projects. The composable tool layer that powers the ecosystem.

What I Build

Data & AI Pipelines

LLM pipelines that work identically online and offline — collection → extraction → RAG → fact-checking, all on local inference.

  • Autonomous AI agent with RAG (Qdrant + Neo4j), persistent memory, and custom ReAct pipeline
  • Automated fact-checking: claim extraction → tiered evidence search → verdict generation
  • 800K+ articles collected, indexed, and searchable across 115 domains

Python vLLM PostgreSQL Qdrant Neo4j Redis

Full-stack & Data Engineering

6 years delivering DW/DM, BI/OLAP, and analytics to government and enterprise clients — often on air-gapped networks. Cross-DB migrations, query tuning, report automation.

  • LLM chatbot → natural language to BI dashboards (5-step UI → 1 sentence)
  • Video analytics with PGVector semantic search + LLM-powered metadata structuring
  • Mobile + web: Next.js dashboards, Kotlin/Compose Android, React Native

FastAPI Spring Boot Next.js Kotlin JPA React Native


QuartzUnit — LLM-native Tool Ecosystem

Most AI agent frameworks give you an orchestrator but expect you to figure out the I/O yourself. QuartzUnit is the I/O layer — 10 focused tools that each solve one problem well, designed to be chained together.

Every tool ships with three interfaces: CLI for scripting, async Python API for integration, and MCP server for AI agent consumption. Zero cloud dependency — everything runs on your machine.

Collect Extract Search Monitor Guard ───────── ───────── ───────── ───────── ───────── feedkit ───→ markgrab ───→ embgrep ───→ diffgrab agent-action-policy (RSS/Atom) (HTML/PDF/ (semantic) (web change agent-loop-guard YouTube) tracking) llm-degen-guard docpick (OCR→JSON) browsegrab ───→ snapgrab (browser (screenshot) agent) 

Why I built this

I needed these tools for my own data pipelines — collecting news from 444 RSS feeds, extracting article content for fact-checking, searching across collected documents by meaning, and monitoring pages for changes. Every tool started as a module in a private project, then got extracted into a standalone package when it became useful on its own.

The guard libraries (loop detection, degeneration detection, action policies) came from running autonomous agents that would occasionally get stuck in loops, produce garbage output, or try to access things they shouldn't. Rather than adding ad-hoc checks, I built proper detectors that any agent framework can use.

Packages

Package What it does PyPI Tests
markgrab URL → LLM-ready markdown (HTML, YouTube, PDF, DOCX) PyPI 114
docpick Schema-driven document OCR → structured JSON PyPI 217
feedkit RSS/Atom collection with 444 curated feeds PyPI 34
browsegrab Token-efficient browser agent for local LLMs PyPI 200
snapgrab URL → screenshot + metadata (Claude Vision optimized) PyPI 29
diffgrab Web page change tracking with structured diffs PyPI 89
embgrep Local semantic search (embedding-powered grep) PyPI 74
llm-degen-guard LLM output degeneration detector PyPI 55
agent-loop-guard Agent infinite loop detection PyPI 78
agent-action-policy Declarative action policies for AI agents PyPI 69

959 tests across 10 packages · Open-source (MIT / Apache-2.0) · Korean + English documentation

Showcase

End-to-end examples showing how QuartzUnit packages chain together:

Project Pipeline What it does
newswatch feedkit → markgrab → embgrep → diffgrab Collect RSS feeds, extract articles, build semantic search index, track changes
watchdeck diffgrab → markgrab → snapgrab → guard trio Monitor web pages for changes with visual diffs and safety guards

Case Studies

Architecture deep-dives with quantifiable results and honest failure analysis:

Case Study Domain Key Result
Fact-Checking Pipeline NLP / Verification 83.6% accuracy through 14 iterations, 127K claims
Autonomous AI Agent Agent Systems Zero-framework ReAct + RAG, 1,235 tests, 101s → 3s
Data Mart Automation Data Engineering 47.4M rows, 90% Text2SQL with zero manual config

By the Numbers

6 Database engines tuned in production (PostgreSQL, Oracle, MariaDB, MSSQL, DB2, Netezza)
10 Open-source Python packages on PyPI
959 Tests across the QuartzUnit ecosystem
444 Curated, verified RSS feeds in the feedkit catalog
800K+ Quality articles collected across 115 domains
83.6% Fact-checking pipeline effective accuracy (production)

Tech Stack

Languages

Python Java Kotlin TypeScript SQL

Backend & AI

FastAPI Spring Boot vLLM Qdrant Neo4j

Database

PostgreSQL Oracle MariaDB MSSQL DB2 Redis

Frontend & Mobile

Next.js React Native Jetpack Compose

Infra

Docker GitHub Actions Cloudflare Playwright

Pinned Loading

  1. QuartzUnit/markgrab QuartzUnit/markgrab Public

    Universal web content extraction — any URL to LLM-ready markdown

    Python 1

  2. QuartzUnit/browsegrab QuartzUnit/browsegrab Public

    Token-efficient browser agent for local LLMs — Playwright + accessibility tree + MarkGrab, MCP native.

    Python 6 1

  3. QuartzUnit/docpick QuartzUnit/docpick Public

    Lightweight OCR + Local LLM → Schema-based Structured JSON Extraction

    Python 2

  4. QuartzUnit/newswatch QuartzUnit/newswatch Public

    News monitoring pipeline — feedkit + markgrab + embgrep + diffgrab showcase

    Python 1