β¨ Never Build Slides from Scratch Again β¨
| π Universal File Support Β |Β π― RAG-Powered Precision Β |Β π¨ Custom Styling Β |Β β‘ Lightning Speed |
Turns your research papers, reports, and documents into professional slides & posters in minutes.
-
π Universal Document Support
Seamlessly process PDF, Word, Excel, PowerPoint, Markdown, and multiple file formats simultaneously. -
π― Comprehensive Content Extraction
RAG-powered mechanism ensures every critical insight, figure, and data point is captured with precision. -
π Source-Linked Accuracy
Maintains direct traceability between generated content and original sources, eliminating information drift. -
π¨ Custom Styling Freedom
Choose from professional built-in themes or describe your vision in natural language for custom styling. -
β‘ Lightning-Fast Generation
Instant preview mode enables rapid experimentation and real-time refinements. -
πΎ Seamless Session Management
Advanced checkpoint system preserves all progressβpause, resume, or switch themes instantly without loss. -
β¨ Professional-Grade Visuals
Deliver polished, presentation-ready slides and posters with publication-quality design standards.
# One command to generate slides from a paper python -m paper2slides --input paper.pdf --output slides --style doraemon --length medium --fast --parallel 2- [2025.12.09] Added parallel slide generation (
--parallel) for faster processing - [2025.12.08] Paper2Slides is now open source!
![]() doraemon | ![]() academic | ![]() custom |
![]() doraemon | ![]() academic | ![]() custom |
β¨ Multiple styles available β simply modify the --style parameter
Examples from DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
π‘ Custom Style Example: Totoro Theme
--style "Studio Ghibli anime style with warm whimsical aesthetic. Use soft watercolor Morandi tones with light cream background, muted sage green and dusty pink accents. Totoro character can appear as a friendly guide relating to the content, with nature elements like soft clouds or leaves." # Clone repository git clone https://github.com/HKUDS/Paper2Slides.git cd Paper2Slides # Create and activate conda environment conda create -n paper2slides python=3.12 -y conda activate paper2slides # Install dependencies pip install -r requirements.txtNote
Create a .env file in paper2slides/ directory with your API keys. Refer to paper2slides/.env.example for the required variables.
# Basic usage - generate slides from a paper python -m paper2slides --input paper.pdf --output slides --length medium # Generate poster with custom style python -m paper2slides --input paper.pdf --output poster --style "minimalist with blue theme" --density medium # Fast mode python -m paper2slides --input paper.pdf --output slides --fast # Enable parallel generation (2 workers by default) python -m paper2slides --input paper.pdf --output slides --parallel 2 # List all processed outputs python -m paper2slides --listCLI Options:
| Option | Description | Default |
|---|---|---|
--input, -i | Input file(s) or directory | Required |
--output | Output type: slides or poster | poster |
--content | Content type: paper or general | paper |
--style | Style: academic, doraemon, or custom | doraemon |
--length | Slides length: short, medium, long | short |
--density | Poster density: sparse, medium, dense | medium |
--fast | Fast mode: skip RAG indexing | false |
--parallel | Enable parallel slide generation: --parallel uses 2 workers, --parallel N uses N workers | 1 (sequential without this option) |
--from-stage | Force restart from stage: rag, summary, plan, generate | Auto-detect |
--debug | Enable debug logging | false |
πΎ Checkpoint & Resume:
Paper2Slides intelligently saves your progress at every key stage, allowing you to:
| Scenario | Command |
|---|---|
| Resume after interruption | Just run the same command again β it auto-detects and continues |
| Change style only | Add --from-stage plan to skip re-parsing |
| Regenerate images | Add --from-stage generate to keep the same plan |
| Full restart | Add --from-stage rag to start from scratch |
Tip
Checkpoints are auto-saved. Just run the same command to resume. Use --from-stage only to force restart from a specific stage.
Launch both backend and frontend services:
./scripts/start.shOr start services independently:
# Terminal 1: Start backend API ./scripts/start_backend.sh # Terminal 2: Start frontend ./scripts/start_frontend.shAccess the web interface at http://localhost:5173 (default)
Paper2Slides transforms documents through a 4-stage pipeline designed for reliability and efficiency:
| Stage | Description | Checkpoint | Output |
|---|---|---|---|
| π RAG | Parse documents and construct intelligent retrieval index using RAG | checkpoint_rag.json | Searchable knowledge base |
| π Analysis | Extract document structure, identify key figures, tables, and content hierarchy | checkpoint_summary.json | Structured content map |
| π Planning | Generate optimized content layout and slide/poster organization strategy | checkpoint_plan.json | Presentation blueprint |
| π¨ Creation | Render final high-quality slides and poster visuals | Output directory | Polished presentation materials |
Each stage automatically saves progress checkpoints, enabling seamless resumption from any point if the process is interruptedβno need to start over.
| Mode | Processing Pipeline | Use Cases |
|---|---|---|
| Normal | Complete RAG indexing with deep document analysis | Complex research papers, lengthy documents, multi-section content |
| Fast | Skip RAG indexing, direct LLM query | Short documents, instant previews, quick revisions |
Use --fast when:
- Document (text + figures) is short enough to fit in LLM context
- Quick preview/iteration needed
- Don't want to wait for RAG indexing
Use normal mode (default) when:
- Document is long or has many figures
- Multiple files to process together
- Need retrieval for better context selection
outputs/ βββ <project_name>/ β βββ <content_type>/ # paper or general β β βββ <mode>/ # fast or normal β β β βββ checkpoint_rag.json # RAG query results & parsed file paths β β β βββ checkpoint_summary.json # Extracted content, figures, tables β β β βββ summary.md # Human-readable summary β β β βββ <config_name>/ # e.g., slides_doraemon_medium β β β βββ state.json # Current pipeline state β β β βββ checkpoint_plan.json # Content plan for slides/poster β β β βββ <timestamp>/ # Generated outputs β β β βββ slide_01.png β β β βββ slide_02.png β β β βββ ... β β β βββ slides.pdf # Final PDF output β β βββ rag_output/ # RAG index storage β βββ ... βββ ... Checkpoint Files:
| File | Description | Reusable When |
|---|---|---|
checkpoint_rag.json | Parsed document content | Same input files |
checkpoint_summary.json | Figures, tables, structure | Same input files |
checkpoint_plan.json | Content layout plan | Same style & length/density |
| Style | Description |
|---|---|
academic | Clean, professional academic presentation style |
doraemon | Colorful, friendly style with illustrations |
custom | Any text description for LLM-generated style |
- Set
IMAGE_GEN_PROVIDERinpaper2slides/.envto choose the backend:openrouter(default): usesIMAGE_GEN_API_KEY,IMAGE_GEN_BASE_URL, andIMAGE_GEN_MODEL(defaultgoogle/gemini-3-pro-image-preview)google: uses the official Gemini API atGOOGLE_GENAI_BASE_URL(defaulthttps://generativelanguage.googleapis.com/v1beta),IMAGE_GEN_API_KEY,IMAGE_GEN_MODEL(defaultmodels/gemini-3-pro-image-preview, must be image-capable), andIMAGE_GEN_RESPONSE_MIME_TYPE(defaulttext/plain; use text types if your model does not support image responses)
- Reference figures are sent as inline data when supported (Google) or as
image_urlattachments (OpenRouter).
Tip
By default Paper2Slides uses gemini-3-pro-image-preview (OpenRouter) for image generation; you can switch to an image-capable Google Gemini model (e.g., models/gemini-1.5-flash) via IMAGE_GEN_PROVIDER=google. Key findings:
- Mood Keywords: Words like "warm", "elegant", "vibrant" strongly influence the overall color palette
- Layout vs Style: Fine-grained layout instructions ground well; fine-grained element styling does not
- Prompt Length: Simple prompts generally outperform detailed ones
- Multi-slide Generation: Native multi-image output is story-like; for consistent slides, we use iterative single-image generation
| Module | Description |
|---|---|
paper2slides/core/ | Pipeline orchestration, 4-stage execution |
paper2slides/raganything/ | Document parsing & RAG indexing |
paper2slides/summary/ | Content extraction: figures, tables, paper structure |
paper2slides/generator/ | Content planning & image generation |
api/ | FastAPI backend for web interface |
frontend/ | React frontend (Vite + TailwindCSS) |
Click to expand full project structure
Paper2Slides/ βββ paper2slides/ # Core library β βββ main.py # CLI entry point β βββ core/ β β βββ pipeline.py # Main pipeline orchestration β β βββ state.py # Checkpoint state management β β βββ stages/ β β βββ rag_stage.py # Stage 1: Parse & index β β βββ summary_stage.py # Stage 2: Extract content β β βββ plan_stage.py # Stage 3: Plan layout β β βββ generate_stage.py # Stage 4: Generate images β β β βββ raganything/ β β βββ raganything.py # RAG processor β β βββ parser.py # Document parser β β β βββ summary/ β β βββ paper.py # Paper structure extraction β β βββ extractors/ # Figure/table extractors β β β βββ generator/ β β βββ content_planner.py # Slide/poster planning β β βββ image_generator.py # Image generation β β β βββ prompts/ # LLM prompt templates β βββ utils/ # Utilities β βββ api/server.py # FastAPI backend βββ frontend/src/ # React frontend βββ scripts/ # Shell scripts (start/stop) - LightRAG: Graph-Empowered RAG
- RAG-Anything: Multi-Modal RAG
- VideoRAG: RAG with Extremely-Long Videos
πFound Paper2Slides helpful? Star us on GitHub!
π Turn any document into professional presentations in minutes!








