Quick Start Β· Core Modules Β· FAQ
π¨π³ δΈζ Β· π―π΅ ζ₯ζ¬θͺ Β· πͺπΈ EspaΓ±ol Β· π«π· FranΓ§ais Β· πΈπ¦ Ψ§ΩΨΉΨ±Ψ¨ΩΨ©
β‘ Massive Document Knowledge Q&A Β β’Β π Interactive Learning Visualization
π§ Knowledge Reinforcement Β β’Β π¬ Deep Research & Idea Generation
[2025.12.30] We release DeepTutor v0.1 β¨
β’ Smart Knowledge Base: Upload textbooks, research papers, technical manuals, and domain-specific documents. Build a comprehensive AI-powered knowledge repository for instant access.
β’ Multi-Agent Problem Solving: Dual-loop reasoning architecture with RAG, web search, paper search, and code executionβdelivering step-by-step solutions with precise citations.
β’ Knowledge Simplification & Explanations: Transform complex concepts, knowledge, and algorithms into easy-to-understand visual aids, detailed step-by-step breakdowns, and engaging interactive demonstrations.
β’ Personalized Q&A: Context-aware conversations that adapt to your learning progress, with interactive pages and session-based knowledge tracking.
β’ Intelligent Exercise Creation: Generate targeted quizzes, practice problems, and customized assessments tailored to your current knowledge level and specific learning objectives.
β’ Authentic Exam Simulation: Upload reference exams to generate practice questions that perfectly match the original style, format, and difficultyβgiving you realistic preparation for the actual test.
β’ Comprehensive Research & Literature Review: Conduct in-depth topic exploration with systematic analysis. Identify patterns, connect related concepts across disciplines, and synthesize existing research findings.
β’ Novel Insight Discovery: Generate structured learning materials and uncover knowledge gaps. Identify promising new research directions through intelligent cross-domain knowledge synthesis.
Document Q&A and Step-by-Step Problem Solving | Interactive AI Learning with Knowledge Visual Explanations |
Custom Questions | Mimic Questions |
Deep Research | Automated IdeaGen | Interactive IdeaGen |
Personal Knowledge Base | Personal Notebook |
π Use DeepTutor in Dark Mode!
β’ Intuitive Interaction: Simple bidirectional query-response flow for intuitive interaction.
β’ Structured Output: Structured response generation that organizes complex information into actionable outputs.
β’ Problem Solving & Assessment: Step-by-step problem solving and custom assessment generation.
β’ Research & Learning: Deep Research for topic exploration and Guided Learning with visualization.
β’ Idea Generation: Automated and interactive concept development with multi-source insights.
β’ Information Retrieval: RAG hybrid retrieval, real-time web search, and academic paper databases.
β’ Processing & Analysis: Python code execution, query item lookup, and PDF parsing for document analysis.
β’ Knowledge Graph: Entity-relation mapping for semantic connections and knowledge discovery.
β’ Vector Store: Embedding-based semantic search for intelligent content retrieval.
β’ Memory System: Session state management and citation tracking for contextual continuity.
π Star to follow our future updates!
- Project-based learning
- deepcoding from idea generation
- Personalized memory
# Clone the repository git clone https://github.com/HKUDS/DeepTutor.git cd DeepTutor # Set Up Virtual Environment (Choose One Option) # Option A: Using conda (Recommended) conda create -n aitutor python=3.10 conda activate aitutor # Option B: Using venv python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activateRun the automated installation script to install all required dependencies:
# Recommended: Automated Installation bash scripts/install_all.sh # Alternative: Manual Installation python scripts/install_all.py # Or Install Dependencies Manually pip install -r requirements.txt npm installCreate a .env file in the project root directory based on .env.example:
# Copy from .env.example template (if exists) cp .env.example .env # Then edit .env file with your API keys:By default, the application uses:
- Backend (FastAPI):
8001 - Frontend (Next.js):
3782
You can modify these ports in config/main.yaml by editing the server.backend_port and server.frontend_port values.
LLM Configuration: Agent settings for temperature and max_tokens are centralized in config/agents.yaml. Each module (guide, solve, research, question, ideagen, co_writer) has customizable parameters. See Configuration Documentation for details.
Experience the system quickly with two pre-built knowledge bases and a collection of challenging questions with usage examples.
Research Papers Collection β 5 papers (20-50 pages each)
A curated collection of 5 research papers from our lab covering RAG and Agent fields. This demo showcases broad knowledge coverage for research scenarios.
Used Papers: AI-Researcher | AutoAgent | RAG-Anything | LightRAG | VideoRAG
Data Science Textbook β 8 chapters, 296 pages
A comprehensive data science textbook with challenging content. This demo showcases deep knowledge depth for learning scenarios.
Book Link: Deep Representation Learning Book
Download and Setup:
- Download the demo package: Google Drive
- Extract the compressed files directly into the
data/directory - Knowledge bases will be automatically available once you start the system
Note: Our demo knowledge bases use
text-embedding-3-largewithdimensions = 3072. Ensure your embeddings model has matching dimensions (3072) for compatibility.
# Activate virtual environment conda activate aitutor # or: source venv/bin/activate # Start web interface (frontend + backend) python scripts/start_web.py # Alternative: CLI interface only python scripts/start.py # Stop the service: Ctrl+CCreate custom knowledge bases through the web interface with support for multiple file formats.
- Access Knowledge Base: Navigate to http://localhost:{frontend_port}/knowledge
- Create New Base: Click "New Knowledge Base"
- Configure Settings: Enter a unique name for your knowledge base
- Upload Content: Add single or multiple files for batch processing
- Monitor Progress: Track processing status in the terminal running
start_web.py- Large files may take several minutes to complete
- Knowledge base becomes available once processing finishes
Tips: Large files may require several minutes to process. Multiple files can be uploaded simultaneously for efficient batch processing.
| Service | URL | Description |
|---|---|---|
| Frontend | http://localhost:{frontend_port} | Main web interface |
| API Docs | http://localhost:{backend_port}/docs | Interactive API documentation |
| Health | http://localhost:{backend_port}/api/v1/knowledge/health | System health check |
All user content and system data are stored in the data/ directory:
data/ βββ knowledge_bases/ # Knowledge base storage βββ user/ # User activity data βββ solve/ # Problem solving results and artifacts βββ question/ # Generated questions βββ research/ # Research reports and cache βββ co-writer/ # Interactive IdeaGen documents and audio files βββ notebook/ # Notebook records and metadata βββ guide/ # Guided learning sessions βββ logs/ # System logs βββ run_code_workspace/ # Code execution workspace Results are automatically saved during all activities. Directories are created automatically as needed.
π§ Smart Solver
Intelligent problem-solving system based on Analysis Loop + Solve Loop dual-loop architecture, supporting multi-mode reasoning and dynamic knowledge retrieval.
Core Features
| Feature | Description |
|---|---|
| Dual-Loop Architecture | Analysis Loop: InvestigateAgent β NoteAgent Solve Loop: PlanAgent β ManagerAgent β SolveAgent β CheckAgent β Format |
| Multi-Agent Collaboration | Specialized agents: InvestigateAgent, NoteAgent, PlanAgent, ManagerAgent, SolveAgent, CheckAgent |
| Real-time Streaming | WebSocket transmission with live reasoning process display |
| Tool Integration | RAG (naive/hybrid), Web Search, Query Item, Code Execution |
| Persistent Memory | JSON-based memory files for context preservation |
| Citation Management | Structured citations with reference tracking |
Usage
- Visit http://localhost:{frontend_port}/solver
- Select a knowledge base
- Enter your question, click "Solve"
- Watch the real-time reasoning process and final answer
Python API
import asyncio from src.agents.solve import MainSolver async def main(): solver = MainSolver(kb_name="ai_textbook") result = await solver.solve( question="Calculate the linear convolution of x=[1,2,3] and h=[4,5]", mode="auto" ) print(result['formatted_solution']) asyncio.run(main())Output Location
data/user/solve/solve_YYYYMMDD_HHMMSS/ βββ investigate_memory.json # Analysis Loop memory βββ solve_chain.json # Solve Loop steps & tool records βββ citation_memory.json # Citation management βββ final_answer.md # Final solution (Markdown) βββ performance_report.json # Performance monitoring βββ artifacts/ # Code execution outputs π Question Generator
Dual-mode question generation system supporting custom knowledge-based generation and reference exam paper mimicking with automatic validation.
Core Features
| Feature | Description |
|---|---|
| Custom Mode | Background Knowledge β Question Planning β Generation β Single-Pass Validation Analyzes question relevance without rejection logic |
| Mimic Mode | PDF Upload β MinerU Parsing β Question Extraction β Style Mimicking Generates questions based on reference exam structure |
| ReAct Engine | QuestionGenerationAgent with autonomous decision-making (think β act β observe) |
| Validation Analysis | Single-pass relevance analysis with kb_coverage and extension_points |
| Question Types | Multiple choice, fill-in-the-blank, calculation, written response, etc. |
| Batch Generation | Parallel processing with progress tracking |
| Complete Persistence | All intermediate files saved (background knowledge, plan, individual results) |
| Timestamped Output | Mimic mode creates batch folders: mimic_YYYYMMDD_HHMMSS_{pdf_name}/ |
Usage
Custom Mode:
- Visit http://localhost:{frontend_port}/question
- Fill in requirements (topic, difficulty, question type, count)
- Click "Generate Questions"
- View generated questions with validation reports
Mimic Mode:
- Visit http://localhost:{frontend_port}/question
- Switch to "Mimic Exam" tab
- Upload PDF or provide parsed exam directory
- Wait for parsing β extraction β generation
- View generated questions alongside original references
Python API
Custom Mode - Full Pipeline:
import asyncio from src.agents.question import AgentCoordinator async def main(): coordinator = AgentCoordinator( kb_name="ai_textbook", output_dir="data/user/question" ) # Generate multiple questions from text requirement result = await coordinator.generate_questions_custom( requirement_text="Generate 3 medium-difficulty questions about deep learning basics", difficulty="medium", question_type="choice", count=3 ) print(f"β
Generated {result['completed']}/{result['requested']} questions") for q in result['results']: print(f"- Relevance: {q['validation']['relevance']}") asyncio.run(main())Mimic Mode - PDF Upload:
from src.agents.question.tools.exam_mimic import mimic_exam_questions result = await mimic_exam_questions( pdf_path="exams/midterm.pdf", kb_name="calculus", output_dir="data/user/question/mimic_papers", max_questions=5 ) print(f"β
Generated {result['successful_generations']} questions") print(f"Output: {result['output_file']}")Output Location
Custom Mode:
data/user/question/custom_YYYYMMDD_HHMMSS/ βββ background_knowledge.json # RAG retrieval results βββ question_plan.json # Question planning βββ question_1_result.json # Individual question results βββ question_2_result.json βββ ... Mimic Mode:
data/user/question/mimic_papers/ βββ mimic_YYYYMMDD_HHMMSS_{pdf_name}/ βββ {pdf_name}.pdf # Original PDF βββ auto/{pdf_name}.md # MinerU parsed markdown βββ {pdf_name}_YYYYMMDD_HHMMSS_questions.json # Extracted questions βββ {pdf_name}_YYYYMMDD_HHMMSS_generated_questions.json # Generated questions π Guided Learning
Personalized learning system based on notebook content, automatically generating progressive learning paths through interactive pages and smart Q&A.
Core Features
| Feature | Description |
|---|---|
| Multi-Agent Architecture | LocateAgent: Identifies 3-5 progressive knowledge points InteractiveAgent: Converts to visual HTML pages ChatAgent: Provides contextual Q&A SummaryAgent: Generates learning summaries |
| Smart Knowledge Location | Automatic analysis of notebook content |
| Interactive Pages | HTML page generation with bug fixing |
| Smart Q&A | Context-aware answers with explanations |
| Progress Tracking | Real-time status with session persistence |
| Cross-Notebook Support | Select records from multiple notebooks |
Usage Flow
- Select Notebook(s) β Choose one or multiple notebooks (cross-notebook selection supported)
- Generate Learning Plan β LocateAgent identifies 3-5 core knowledge points
- Start Learning β InteractiveAgent generates HTML visualization
- Learning Interaction β Ask questions, click "Next" to proceed
- Complete Learning β SummaryAgent generates learning summary
Output Location
data/user/guide/ βββ session_{session_id}.json # Complete session state, knowledge points, chat history βοΈ Interactive IdeaGen (Co-Writer)
Intelligent Markdown editor supporting AI-assisted writing, auto-annotation, and TTS narration.
Core Features
| Feature | Description |
|---|---|
| Rich Text Editing | Full Markdown syntax support with live preview |
| EditAgent | Rewrite: Custom instructions with optional RAG/web context Shorten: Compress while preserving key information Expand: Add details and context |
| Auto-Annotation | Automatic key content identification and marking |
| NarratorAgent | Script generation, TTS audio, multiple voices (Cherry, Stella, Annie, Cally, Eva, Bella) |
| Context Enhancement | Optional RAG or web search for additional context |
| Multi-Format Export | Markdown, PDF, etc. |
Usage
- Visit http://localhost:{frontend_port}/co_writer
- Enter or paste text in the editor
- Use AI features: Rewrite, Shorten, Expand, Auto Mark, Narrate
- Export to Markdown or PDF
Output Location
data/user/co-writer/ βββ audio/ # TTS audio files β βββ {operation_id}.mp3 βββ tool_calls/ # Tool call history β βββ {operation_id}_{tool_type}.json βββ history.json # Edit history π¬ Deep Research
DR-in-KG (Deep Research in Knowledge Graph) β A systematic deep research system based on Dynamic Topic Queue architecture, enabling multi-agent collaboration across three phases: Planning β Researching β Reporting.
Core Features
| Feature | Description |
|---|---|
| Three-Phase Architecture | Phase 1 (Planning): RephraseAgent (topic optimization) + DecomposeAgent (subtopic decomposition) Phase 2 (Researching): ManagerAgent (queue scheduling) + ResearchAgent (research decisions) + NoteAgent (info compression) Phase 3 (Reporting): Deduplication β Three-level outline generation β Report writing with citations |
| Dynamic Topic Queue | Core scheduling system with TopicBlock state management: PENDING β RESEARCHING β COMPLETED/FAILED. Supports dynamic topic discovery during research |
| Execution Modes | Series Mode: Sequential topic processing Parallel Mode: Concurrent multi-topic processing with AsyncCitationManagerWrapper for thread-safe operations |
| Multi-Tool Integration | RAG (hybrid/naive), Query Item (entity lookup), Paper Search, Web Search, Code Execution β dynamically selected by ResearchAgent |
| Unified Citation System | Centralized CitationManager as single source of truth for citation ID generation, ref_number mapping, and deduplication |
| Preset Configurations | quick: Fast research (1-2 subtopics, 1-2 iterations) medium/standard: Balanced depth (5 subtopics, 4 iterations) deep: Thorough research (8 subtopics, 7 iterations) auto: Agent autonomously decides depth |
Citation System Architecture
The citation system follows a centralized design with CitationManager as the single source of truth:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β CitationManager β β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β β β ID Generation β β ref_number Map β β Deduplication β β β β PLAN-XX β β citation_id β β β (papers only) β β β β CIT-X-XX β β ref_number β β β β β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ β βββββββββββββΌβββββββββββββββββββββΌβββββββββββββββββββββΌββββββββββββ β β β ββββββββ΄βββββββ ββββββββ΄βββββββ ββββββββ΄βββββββ βDecomposeAgentβ βReportingAgentβ β References β β ResearchAgentβ β (inline [N]) β β Section β β NoteAgent β βββββββββββββββ ββββββββββββββ βββββββββββββββ | Component | Description |
|---|---|
| ID Format | PLAN-XX (planning stage RAG queries) + CIT-X-XX (research stage, X=block number) |
| ref_number Mapping | Sequential 1-based numbers built from sorted citation IDs, with paper deduplication |
| Inline Citations | Simple [N] format in LLM output, post-processed to clickable [[N]](#ref-N) links |
| Citation Table | Clear reference table provided to LLM: Cite as [1] β (RAG) query preview... |
| Post-processing | Automatic format conversion + validation to remove invalid citation references |
| Parallel Safety | Thread-safe async methods (get_next_citation_id_async, add_citation_async) for concurrent execution |
Parallel Execution Architecture
When execution_mode: "parallel" is enabled, multiple topic blocks are researched concurrently:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Parallel Research Execution β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β DynamicTopicQueue AsyncCitationManagerWrapper β β βββββββββββββββββββ βββββββββββββββββββββββββββ β β β Topic 1 (PENDING)β βββ β Thread-safe wrapper β β β β Topic 2 (PENDING)β βββΌβββ asyncio β for CitationManager β β β β Topic 3 (PENDING)β βββ€ Semaphore β β β β β Topic 4 (PENDING)β βββ€ (max=5) β β’ get_next_citation_ β β β β Topic 5 (PENDING)β βββ β id_async() β β β βββββββββββββββββββ β β’ add_citation_async() β β β β βββββββββββββ¬ββββββββββββββ β β βΌ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Concurrent ResearchAgent Tasks β β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β β β Task 1 β β Task 2 β β Task 3 β β Task 4 β ... β β β β β(Topic 1)β β(Topic 2)β β(Topic 3)β β(Topic 4)β β β β β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β β β β β β β β β β β β ββββββββββββββ΄βββββββββββββ΄βββββββββββββ β β β β β β β β β βΌ β β β β AsyncManagerAgentWrapper β β β β (Thread-safe queue updates) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | Component | Description |
|---|---|
asyncio.Semaphore | Limits concurrent tasks to max_parallel_topics (default: 5) |
AsyncCitationManagerWrapper | Wraps CitationManager with asyncio.Lock() for thread-safe ID generation |
AsyncManagerAgentWrapper | Ensures queue state updates are atomic across parallel tasks |
| Real-time Progress | Live display of all active research tasks with status indicators |
Agent Responsibilities
| Agent | Phase | Responsibility |
|---|---|---|
| RephraseAgent | Planning | Optimizes user input topic, supports multi-turn user interaction for refinement |
| DecomposeAgent | Planning | Decomposes topic into subtopics with RAG context, obtains citation IDs from CitationManager |
| ManagerAgent | Researching | Queue state management, task scheduling, dynamic topic addition |
| ResearchAgent | Researching | Knowledge sufficiency check, query planning, tool selection, requests citation IDs before each tool call |
| NoteAgent | Researching | Compresses raw tool outputs into summaries, creates ToolTraces with pre-assigned citation IDs |
| ReportingAgent | Reporting | Builds citation map, generates three-level outline, writes report sections with citation tables, post-processes citations |
Report Generation Pipeline
1. Build Citation Map β CitationManager.build_ref_number_map() 2. Generate Outline β Three-level headings (H1 β H2 β H3) 3. Write Sections β LLM uses [N] citations with provided citation table 4. Post-process β Convert [N] β [[N]](#ref-N), validate references 5. Generate References β Academic-style entries with collapsible source details Usage
- Visit http://localhost:{frontend_port}/research
- Enter research topic
- Select research mode (quick/medium/deep/auto)
- Watch real-time progress with parallel/series execution
- View structured report with clickable inline citations
- Export as Markdown or PDF (with proper page splitting and Mermaid diagram support)
CLI
# Quick mode (fast research) python src/agents/research/main.py --topic "Deep Learning Basics" --preset quick # Medium mode (balanced) python src/agents/research/main.py --topic "Transformer Architecture" --preset medium # Deep mode (thorough research) python src/agents/research/main.py --topic "Graph Neural Networks" --preset deep # Auto mode (agent decides depth) python src/agents/research/main.py --topic "Reinforcement Learning" --preset autoPython API
import asyncio from src.agents.research import ResearchPipeline from src.core.core import get_llm_config, load_config_with_main async def main(): # Load configuration (main.yaml merged with any module-specific overrides) config = load_config_with_main("research_config.yaml") llm_config = get_llm_config() # Create pipeline (agent parameters loaded from agents.yaml automatically) pipeline = ResearchPipeline( config=config, api_key=llm_config["api_key"], base_url=llm_config["base_url"], kb_name="ai_textbook" # Optional: override knowledge base ) # Run research result = await pipeline.run(topic="Attention Mechanisms in Deep Learning") print(f"Report saved to: {result['final_report_path']}") asyncio.run(main())Output Location
data/user/research/ βββ reports/ # Final research reports β βββ research_YYYYMMDD_HHMMSS.md # Markdown report with clickable citations [[N]](#ref-N) β βββ research_*_metadata.json # Research metadata and statistics βββ cache/ # Research process cache βββ research_YYYYMMDD_HHMMSS/ βββ queue.json # DynamicTopicQueue state (TopicBlocks + ToolTraces) βββ citations.json # Citation registry with ID counters and ref_number mapping β # - citations: {citation_id: citation_info} β # - counters: {plan_counter, block_counters} βββ step1_planning.json # Planning phase results (subtopics + PLAN-XX citations) βββ planning_progress.json # Planning progress events βββ researching_progress.json # Researching progress events βββ reporting_progress.json # Reporting progress events βββ outline.json # Three-level report outline structure βββ token_cost_summary.json # Token usage statistics Citation File Structure (citations.json):
{ "research_id": "research_20241209_120000", "citations": { "PLAN-01": {"citation_id": "PLAN-01", "tool_type": "rag_hybrid", "query": "...", "summary": "..."}, "CIT-1-01": {"citation_id": "CIT-1-01", "tool_type": "paper_search", "papers": [...], ...} }, "counters": { "plan_counter": 2, "block_counters": {"1": 3, "2": 2} } }Configuration Options
Key configuration in config/main.yaml (research section) and config/agents.yaml:
# config/agents.yaml - Agent LLM parameters research: temperature: 0.5 max_tokens: 12000 # config/main.yaml - Research settings research: # Execution Mode researching: execution_mode: "parallel" # "series" or "parallel" max_parallel_topics: 5 # Max concurrent topics max_iterations: 5 # Max iterations per topic # Tool Switches enable_rag_hybrid: true # Hybrid RAG retrieval enable_rag_naive: true # Basic RAG retrieval enable_paper_search: true # Academic paper search enable_web_search: true # Web search (also controlled by tools.web_search.enabled) enable_run_code: true # Code execution # Queue Limits queue: max_length: 5 # Maximum topics in queue # Reporting reporting: enable_inline_citations: true # Enable clickable [N] citations in report # Presets: quick, medium, deep, auto # Global tool switches in tools section tools: web_search: enabled: true # Global web search switch (higher priority)π‘ Automated IdeaGen
Research idea generation system that extracts knowledge points from notebook records and generates research ideas through multi-stage filtering.
Core Features
| Feature | Description |
|---|---|
| MaterialOrganizerAgent | Extracts knowledge points from notebook records |
| Multi-Stage Filtering | Loose Filter β Explore Ideas (5+ per point) β Strict Filter β Generate Markdown |
| Idea Exploration | Innovative thinking from multiple dimensions |
| Structured Output | Organized markdown with knowledge points and ideas |
| Progress Callbacks | Real-time updates for each stage |
Usage
- Visit http://localhost:{frontend_port}/ideagen
- Select a notebook with records
- Optionally provide user thoughts/preferences
- Click "Generate Ideas"
- View generated research ideas organized by knowledge points
Python API
import asyncio from src.agents.ideagen import IdeaGenerationWorkflow, MaterialOrganizerAgent from src.core.core import get_llm_config async def main(): llm_config = get_llm_config() # Step 1: Extract knowledge points from materials organizer = MaterialOrganizerAgent( api_key=llm_config["api_key"], base_url=llm_config["base_url"] ) knowledge_points = await organizer.extract_knowledge_points( "Your learning materials or notebook content here" ) # Step 2: Generate research ideas workflow = IdeaGenerationWorkflow( api_key=llm_config["api_key"], base_url=llm_config["base_url"] ) result = await workflow.process(knowledge_points) print(result) # Markdown formatted research ideas asyncio.run(main())π Dashboard + Knowledge Base Management
Unified system entry providing activity tracking, knowledge base management, and system status monitoring.
Key Features
| Feature | Description |
|---|---|
| Activity Statistics | Recent solving/generation/research records |
| Knowledge Base Overview | KB list, statistics, incremental updates |
| Notebook Statistics | Notebook counts, record distribution |
| Quick Actions | One-click access to all modules |
Usage
- Web Interface: Visit http://localhost:{frontend_port} to view system overview
- Create KB: Click "New Knowledge Base", upload PDF/Markdown documents
- View Activity: Check recent learning activities on Dashboard
π Notebook
Unified learning record management, connecting outputs from all modules to create a personalized learning knowledge base.
Core Features
| Feature | Description |
|---|---|
| Multi-Notebook Management | Create, edit, delete notebooks |
| Unified Record Storage | Integrate solving/generation/research/Interactive IdeaGen records |
| Categorization Tags | Auto-categorize by type, knowledge base |
| Custom Appearance | Color, icon personalization |
Usage
- Visit http://localhost:{frontend_port}/notebook
- Create new notebook (set name, description, color, icon)
- After completing tasks in other modules, click "Add to Notebook"
- View and manage all records on the notebook page
| Configuration | Data Directory | API Backend | Core Utilities |
| Knowledge Base | Tools | Web Frontend | Solve Module |
| Question Module | Research Module | Interactive IdeaGen Module | Guide Module |
| Automated IdeaGen Module | |||
Backend fails to start?
Checklist
- Confirm Python version >= 3.10
- Confirm all dependencies installed:
pip install -r requirements.txt - Check if port 8001 is in use (configurable in
config/main.yaml) - Check
.envfile configuration
Solutions
- Change port: Edit
config/main.yamlserver.backend_port - Check logs: Review terminal error messages
Port occupied after Ctrl+C?
Problem
After pressing Ctrl+C during a running task (e.g., deep research), restarting shows "port already in use" error.
Cause
Ctrl+C sometimes only terminates the frontend process while the backend continues running in the background.
Solution
# macOS/Linux: Find and kill the process lsof -i :8001 kill -9 <PID> # Windows: Find and kill the process netstat -ano | findstr :8001 taskkill /PID <PID> /FThen restart the service with python scripts/start_web.py.
npm: command not found error?
Problem
Running scripts/start_web.py shows npm: command not found or exit status 127.
Checklist
- Check if npm is installed:
npm --version - Check if Node.js is installed:
node --version - Confirm conda environment is activated (if using conda)
Solutions
# Option A: Using Conda (Recommended) conda install -c conda-forge nodejs # Option B: Using Official Installer # Download from https://nodejs.org/ # Option C: Using nvm nvm install 18 nvm use 18Verify Installation
node --version # Should show v18.x.x or higher npm --version # Should show version numberFrontend cannot connect to backend?
Checklist
- Confirm backend is running (visit http://localhost:8001/docs)
- Check browser console for error messages
Solution
Create .env.local in web directory:
NEXT_PUBLIC_API_BASE=http://localhost:8001WebSocket connection fails?
Checklist
- Confirm backend is running
- Check firewall settings
- Confirm WebSocket URL is correct
Solution
- Check backend logs
- Confirm URL format:
ws://localhost:8001/api/v1/...
Where are module outputs stored?
| Module | Output Path |
|---|---|
| Solve | data/user/solve/solve_YYYYMMDD_HHMMSS/ |
| Question | data/user/question/question_YYYYMMDD_HHMMSS/ |
| Research | data/user/research/reports/ |
| Interactive IdeaGen | data/user/co-writer/ |
| Notebook | data/user/notebook/ |
| Guide | data/user/guide/session_{session_id}.json |
| Logs | data/user/logs/ |
How to add a new knowledge base?
Web Interface
- Visit http://localhost:{frontend_port}/knowledge
- Click "New Knowledge Base"
- Enter knowledge base name
- Upload PDF/TXT/MD documents
- System will process documents in background
CLI
python -m src.knowledge.start_kb init <kb_name> --docs <pdf_path>How to incrementally add documents to existing KB?
CLI (Recommended)
python -m src.knowledge.add_documents <kb_name> --docs <new_document.pdf>Benefits
- Only processes new documents, saves time and API costs
- Automatically merges with existing knowledge graph
- Preserves all existing data
Numbered items extraction failed with uvloop.Loop error?
Problem
When initializing a knowledge base, you may encounter this error:
ValueError: Can't patch loop of type <class 'uvloop.Loop'> This occurs because Uvicorn uses uvloop event loop by default, which is incompatible with nest_asyncio.
Solution
Use one of the following methods to extract numbered items:
# Option 1: Using the shell script (recommended) ./scripts/extract_numbered_items.sh <kb_name> # Option 2: Direct Python command python src/knowledge/extract_numbered_items.py --kb <kb_name> --base-dir ./data/knowledge_basesThis will extract numbered items (Definitions, Theorems, Equations, etc.) from your knowledge base without reinitializing it.
This project is licensed under the AGPL-3.0 License.
We welcome contributions from the community! To ensure code quality and consistency, please follow the guidelines below.
Development Setup
This project uses pre-commit hooks to automatically format code and check for issues before commits.
Step 1: Install pre-commit
# Using pip pip install pre-commit # Or using conda conda install -c conda-forge pre-commitStep 2: Install Git hooks
cd DeepTutor pre-commit installStep 3: (Optional) Run checks on all files
pre-commit run --all-filesEvery time you run git commit, pre-commit hooks will automatically:
- Format Python code with Ruff
- Format frontend code with Prettier
- Check for syntax errors
- Validate YAML/JSON files
- Detect potential security issues
| Tool | Purpose | Configuration |
|---|---|---|
| Ruff | Python linting & formatting | pyproject.toml |
| Prettier | Frontend code formatting | web/.prettierrc.json |
| detect-secrets | Security check | .secrets.baseline |
Note: The project uses Ruff format instead of Black to avoid formatting conflicts.
# Normal commit (hooks run automatically) git commit -m "Your commit message" # Manually check all files pre-commit run --all-files # Update hooks to latest versions pre-commit autoupdate # Skip hooks (not recommended, only for emergencies) git commit --no-verify -m "Emergency fix"- Fork and Clone: Fork the repository and clone your fork
- Create Branch: Create a feature branch from
main - Install Pre-commit: Follow the setup steps above
- Make Changes: Write your code following the project's style
- Test: Ensure your changes work correctly
- Commit: Pre-commit hooks will automatically format your code
- Push and PR: Push to your fork and create a Pull Request
- Use GitHub Issues to report bugs or suggest features
- Provide detailed information about the issue
- Include steps to reproduce if it's a bug
β€οΈ We thank all our contributors for their valuable contributions.
| β‘ LightRAG | π¨ RAG-Anything | π» DeepCode | π¬ AI-Researcher |
|---|---|---|---|
| Simple and Fast RAG | Multimodal RAG | AI Code Assistant | Research Automation |
β Star us Β· π Report a bug Β· π¬ Discussions
β¨ Thanks for visiting DeepTutor!






