Fast summaries from URLs, files, and media. Web API server with a Preact frontend.
- Web API for programmatic summarization (POST JSON or SSE streaming).
- Preact frontend with chat, slides, and history.
- YouTube slides: screenshots + OCR + transcript cards, timestamped seek.
- Media-aware summaries: auto-detect video/audio vs page content.
- Streaming Markdown + metrics + cache-aware status.
- URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, TikTok, podcasts, RSS.
- Slide extraction for video sources (YouTube/direct media) with OCR + timestamped cards.
- Transcript-first media flow: published transcripts when available, then Groq/ONNX/whisper.cpp/AssemblyAI/Gemini/OpenAI/FAL transcription fallback when not.
- Streaming output with Markdown rendering, metrics, and cache-aware status.
- Local, paid, and free models: OpenAI-compatible local endpoints, paid providers, plus an OpenRouter free preset.
- Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates.
- Smart default: if content is shorter than the requested length, return it as-is.
HTTP API for programmatic summarization — POST a URL or text, get a JSON summary back.
export ANTHROPIC_API_KEY="sk-..." node dist/esm/server/main.jscurl -X POST http://localhost:3000/v1/summarize \ -H "Authorization: Bearer your-token" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "length": "short"}'Docker: docker build -t summarize-api . && docker run -p 3000:3000 --env-file .env summarize-api
Full docs: docs/api-server.md
POST /v1/summarize— summarize a URL (JSON or SSE)GET /v1/history— list past summariesPOST /v1/chat— chat about a summaryGET /v1/summarize/:id/slides— get slides for a summaryGET /v1/slides/:sourceId/:index— get a single slide imageGET /v1/me— current account info
Dev: pnpm -C apps/web dev (Vite on port 5173, proxies /v1 to API on port 3000) Build: pnpm -C apps/web build
Use gateway-style ids: <provider>/<model>.
Examples:
openai/gpt-5-minianthropic/claude-sonnet-4-5xai/grok-4-fast-non-reasoninggoogle/gemini-3-flashzai/glm-4.7openrouter/openai/gpt-5-mini(force OpenRouter)
The length parameter controls how much output we ask for (guideline), not a hard cap.
- Presets:
short|medium|long|xl|xxl - Character targets:
1500,20k,20000 - Optional hard cap:
maxOutputTokens(e.g.2000,2k) - Preset targets (source of truth:
src/core/prompts/summary-lengths.ts):- short: target ~900 chars (range 600-1,200)
- medium: target ~1,800 chars (range 1,200-2,500)
- long: target ~4,200 chars (range 2,500-6,000)
- xl: target ~9,000 chars (range 6,000-14,000)
- xxl: target ~17,000 chars (range 14,000-22,000)
Config location: ~/.summarize/config.json
{ "accounts": [{ "name": "default", "token": "your-secret-token" }], "model": { "id": "openai/gpt-5-mini" }, "env": { "OPENAI_API_KEY": "sk-..." } }Also supported:
model: { "mode": "auto" }(automatic model selection + fallback)model.rules(customize candidates / ordering)models(define presets selectable viamodelparameter)env(generic env var defaults; process env still wins)apiKeys(legacy shortcut, mapped to env names; preferenvfor new configs)cache.media(media download cache: TTL 7 days, 2048 MB cap by default)media.videoMode: "auto"|"transcript"|"understand"slides.enabled/slides.max/slides.ocr/slides.diropenai.useChatCompletions: true(force OpenAI-compatible chat completions)
Set the key matching your chosen model:
OPENAI_API_KEY(foropenai/...)NVIDIA_API_KEY(fornvidia/...)ANTHROPIC_API_KEY(foranthropic/...)XAI_API_KEY(forxai/...)Z_AI_API_KEY(forzai/...; supportsZAI_API_KEYalias)GEMINI_API_KEY(forgoogle/...)- also accepts
GOOGLE_GENERATIVE_AI_API_KEYandGOOGLE_API_KEYas aliases
- also accepts
OPENROUTER_API_KEY(foropenrouter/...)
Optional services:
FIRECRAWL_API_KEY(website extraction fallback)YT_DLP_PATH(path to yt-dlp binary for audio extraction)GROQ_API_KEY(Groq Whisper transcription)ASSEMBLYAI_API_KEY(AssemblyAI transcription)FAL_KEY(FAL AI API key for audio transcription via Whisper)APIFY_API_TOKEN(YouTube transcript fallback)
Install these if you want media-heavy features:
ffmpeg: required for slides and many local media/transcription flowsyt-dlp: required for YouTube slide extraction, TikTok video transcriptiontesseract: optional OCR for slide OCR- Optional cloud transcription providers (alternative to local whisper):
GROQ_API_KEY,ASSEMBLYAI_API_KEY,GEMINI_API_KEY,OPENAI_API_KEY,FAL_KEY
pnpm install pnpm check- YouTube handling: docs/youtube.md
- Media pipeline: docs/media.md
- API server: docs/api-server.md
- Deployment: docs/deployment.md
License: MIT