Skip to content

CLoaKY233/Multi-cycle-RAG

Repository files navigation

Reflexion RAG Engine

Last Commit Top Language Language Count

Built with the tools and technologies:

Markdown Typer TOML .ENV Rich NumPy

Python uv SurrealDB Pydantic YAML


A production-ready Retrieval Augmented Generation (RAG) system with advanced self-correction, iterative refinement, and comprehensive web search integration. Built for complex reasoning tasks requiring multi-step analysis and comprehensive knowledge synthesis.


🚀 Key Features

🧠 Advanced Reflexion Architecture

  • Self-Evaluation System: Iterative cycles with confidence scoring and dynamic query refinement
  • Gap Detection: Intelligent identification of missing information and knowledge gaps
  • Multi-Cycle Processing: Automatic follow-up queries for comprehensive answers
  • Smart Decision Engine: Four-tier framework (CONTINUE, COMPLETE, REFINE_QUERY, INSUFFICIENT_DATA)

🔄 Multi-LLM Orchestration

  • Specialized Model Allocation: Dedicated models for generation, evaluation, and synthesis
  • Generation Model: Meta-Llama-3.1-405B for primary answer generation
  • Evaluation Model: Cohere-command-r for self-assessment and confidence scoring
  • Summary Model: Meta-Llama-3.1-70B for final synthesis across cycles
  • 40+ GitHub Models: Access to the full GitHub Models ecosystem

🌐 Web Search Integration

  • Google Custom Search: Real-time web search with configurable modes
  • Content Extraction: Advanced web content extraction using Crawl4AI
  • Hybrid Retrieval: Seamlessly combines vector store and web search results
  • Intelligent Filtering: Content quality assessment and relevance scoring

🚀 High-Performance Infrastructure

  • Azure AI Inference: Superior semantic understanding with 3072-dimensional embeddings
  • SurrealDB Vector Store: Native vector search with HNSW indexing for production scalability
  • Intelligent Memory Caching: LRU-based cache with hit rate tracking
  • Streaming Architecture: Real-time response streaming with progress indicators
  • Async Design: Non-blocking operations throughout the pipeline

🎯 Enterprise-Ready Features

  • YAML Prompt Management: Template-based prompt system with versioning
  • Production Monitoring: Comprehensive logging, error handling, and performance metrics
  • Modular Design: Clean architecture with dependency injection and clear interfaces
  • Context-Aware Processing: Dynamic retrieval scaling with intelligent context management
  • Error Resilience: Graceful degradation to simpler RAG modes when reflexion fails

📊 Performance Metrics

  • 40%+ improvement in answer comprehensiveness compared to traditional RAG
  • 60%+ improvement in semantic similarity accuracy with 3072D embeddings
  • 25%+ performance boost in vector search with SurrealDB HNSW indexing
  • Real-time web search integration for up-to-date information
  • Sub-linear search performance even with millions of documents

🛠 Quick Start

Prerequisites

  • Python 3.13+ with UV package manager (recommended)
  • GitHub Personal Access Token with repo and read:org scopes.
  • (Optional) Google Custom Search API Key and CSE ID for web search.
  • SurrealDB instance (local or cloud). Refer to the official SurrealDB installation guide.
  • Google Search API credentials (optional, for web search)
  • 8GB+ RAM recommended for optimal performance
  • uv package manager (recommended). If installed skip the Install UV Package Manager step.

Install UV Package Manager

UV is a lightning-fast Python package manager written in Rust that significantly outperforms traditional pip:

# Linux/macOS curl -LsSf https://astral.sh/uv/install.sh | sh # Windows (PowerShell as Administrator) powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Alternative: via Homebrew brew install uv # Verify installation uv --version

Installation

# 1. Clone the repository git clone https://github.com/cloaky233/multi-cycle-rag.git cd multi-cycle-rag # 2. Create virtual environment and install dependencies uv venv && source .venv/bin/activate # macOS/Linux# .venv\Scripts\activate # Windows uv sync

uv sync : This single command installs all production dependencies including SurrealDB Python SDK, Azure AI Inference, Crawl4AI for web scraping, and all LLM related libraries.

⚙ Configuration

Create a .env file in the project root:

# GitHub Models Configuration GITHUB_TOKEN=your_github_pat_token_here LLM_MODEL=meta/Meta-Llama-3.1-405B-Instruct EVALUATION_MODEL=cohere/Cohere-command-r SUMMARY_MODEL=meta/Meta-Llama-3.1-70B-Instruct # Azure AI Inference Embeddings EMBEDDING_MODEL=text-embedding-3-large EMBEDDING_ENDPOINT=https://models.inference.ai.azure.com # SurrealDB Configuration SURREALDB_URL=wss://your-surreal-instance.surreal.cloud SURREALDB_NS=rag SURREALDB_DB=rag SURREALDB_USER=your_username SURREALDB_PASS=your_password # Reflexion Settings MAX_REFLEXION_CYCLES=3 CONFIDENCE_THRESHOLD=0.85 INITIAL_RETRIEVAL_K=3 REFLEXION_RETRIEVAL_K=5 # Web Search Configuration (Optional) WEB_SEARCH_MODE=off # off, initial_only, every_cycle GOOGLE_API_KEY=your_google_api_key GOOGLE_CSE_ID=your_custom_search_engine_id # Performance Settings ENABLE_MEMORY_CACHE=true MAX_CACHE_SIZE=100 CHUNK_SIZE=1000 CHUNK_OVERLAP=200

Set up Google Custom Search

1. Obtain a Google Custom Search API Key

The API key authenticates your project's requests to Google's services.

  • Go to the Google Cloud Console: Navigate to the Google Cloud Console and create a new project if you don't have one already.
  • Enable the API: In your project's dashboard, go to the "APIs & Services" section. Find and enable the Custom Search API.
  • Create Credentials: Go to the "Credentials" tab within "APIs & Services". Click "Create Credentials" and select "API key".
  • Copy and Secure the Key: A new API key will be generated. Copy this key and store it securely. It is recommended to restrict the key's usage to only the "Custom Search API" for security purposes.

2. Create a Programmable Search Engine and get the CSE ID

The CSE ID (also called the Search Engine ID or cx) tells Google what to search (e.g., the entire web or specific sites you define).

  • Go to the Programmable Search Engine Page: Visit the Google Programmable Search Engine website and sign in with your Google account.
  • Create a New Search Engine: Click "Add" or "New search engine" to start the setup process.
  • Configure Your Engine:
    • Give your search engine a name.
    • Under "Sites to search," you can specify particular websites or enable the option to "Search the entire web."
    • Click "Create" when you are done.
  • Find Your Search Engine ID (CSE ID): After creating the engine, go to the "Setup" or "Overview" section of its control panel. Your Search Engine ID will be displayed there. Copy this ID.

3. Update Your Project Configuration

Finally, take the two values you have obtained and place them in your project's .env file:

# .env file ... GOOGLE_API_KEY=your_google_api_key_here GOOGLE_CSE_ID=your_google_cse_id_here ... 

Setup Crawl4AI for Web Search

For web search, you must have the google api key and cse id, for

# Install Crawl4AI with browser dependencies uv run crawl4ai-setup # Verify installation uv run crawl4ai-doctor # Manual browser setup if needed python -m playwright install chromium

SurrealDB schema setup

Run all the queries in the schema directory (either as a query or in surrealist)

🎮 Usage

# Ingest documents uv run rag.py ingest --docs_path=./docs

Command Line Interface

# Interactive chat with reflexion engine uv run rag.py chat # Ingest documents from a directory uv run rag.py ingest --docs_path=/path/to/documents # View current configuration uv run rag.py config # Delete all documents from vector store uv run rag.py delete

Programmatic Usage

from src.rag.engine import RAGEngine import asyncio async def main(): # Initialize the RAG engine engine = RAGEngine() # Process a query with reflexion response = "" async for chunk in engine.query_stream("What are the benefits of renewable energy?"): response += chunk.content print(chunk.content, end="") return response # Run the async function asyncio.run(main())

Advanced Example with Metadata

import asyncio from src.rag.engine import RAGEngine async def advanced_query(): engine = RAGEngine() query = "Compare different machine learning approaches for natural language processing" print("🔄 Starting Reflexion Analysis...") current_cycle = 0 async for chunk in engine.query_stream(query): # Handle metadata if chunk.metadata: cycle = chunk.metadata.get("cycle_number", 1) confidence = chunk.metadata.get("confidence_score", 0) if cycle != current_cycle: current_cycle = cycle print(f"\n--- Cycle {cycle} (Confidence: {confidence:.2f}) ---") # Print content print(chunk.content, end="") # Check for completion if chunk.is_complete and chunk.metadata.get("reflexion_complete"): stats = chunk.metadata print(f"\n\n✅ Analysis Complete!") print(f"Total Cycles: {stats.get('total_cycles', 0)}") print(f"Processing Time: {stats.get('total_processing_time', 0):.2f}s") print(f"Final Confidence: {stats.get('final_confidence', 0):.2f}") asyncio.run(advanced_query())

🏗 Architecture Overview

Core Components

Reflexion RAG Engine ├── Generation Pipeline (Meta-Llama-405B) │ ├── Initial Response Generation │ ├── Context Retrieval & Web Search │ └── Streaming Output ├── Evaluation System (Cohere-command-r) │ ├── Confidence Scoring │ ├── Gap Analysis │ ├── Follow-up Generation │ └── Decision Classification ├── Memory Cache (LRU) │ ├── Query Caching │ ├── Hit Rate Tracking │ └── Automatic Eviction ├── Web Search Engine │ ├── Google Custom Search │ ├── Content Extraction │ ├── Quality Assessment │ └── Hybrid Retrieval └── Decision Engine ├── CONTINUE (confidence < threshold) ├── REFINE_QUERY (specific gaps identified) ├── COMPLETE (high confidence ≥0.85) └── INSUFFICIENT_DATA (knowledge base gaps) Document Pipeline ├── Multi-format Loading (PDF, TXT, DOCX, MD, HTML) ├── Intelligent Chunking (1000 chars, 200 overlap) ├── Azure AI Embeddings (3072D vectors) └── SurrealDB Storage (HNSW indexing) 

Reflexion Flow

graph TB A[User Query] --> B[Initial Generation] B --> C[Self-Evaluation] C --> D{Confidence ≥ 0.85?} D -->|Yes| E[Complete Response] D -->|No| F[Gap Analysis] F --> G[Generate Follow-up Queries] G --> H[Enhanced Retrieval + Web Search] H --> I[Synthesis Cycle] I --> C E --> J[Final Answer] 
Loading

📁 Project Structure

rag/ ├── src/ # Main source code │ ├── config/ # Configuration management │ │ ├── __init__.py │ │ └── settings.py # Pydantic settings with env support │ ├── core/ # Core interfaces and exceptions │ │ ├── __init__.py │ │ ├── exceptions.py # Custom exception classes │ │ └── interfaces.py # Abstract base classes │ ├── data/ # Document loading and processing │ │ ├── __init__.py │ │ ├── loader.py # Multi-format document loader │ │ └── processor.py # Text chunking and preprocessing │ ├── embeddings/ # Embedding providers │ │ ├── __init__.py │ │ └── github_embeddings.py # Azure AI Inference │ ├── llm/ # LLM interfaces and implementations │ │ ├── __init__.py │ │ └── github_llm.py # GitHub Models integration │ ├── memory/ # Caching and memory management │ │ ├── __init__.py │ │ └── cache.py # LRU cache for reflexion memory │ ├── rag/ # Main RAG engine │ │ ├── __init__.py │ │ ├── engine.py # Main RAG engine interface │ │ └── reflexion_engine.py # Reflexion implementation │ ├── reflexion/ # Reflexion evaluation logic │ │ ├── __init__.py │ │ └── evaluator.py # Smart evaluation and follow-up │ ├── utils/ # Utility functions │ │ ├── __init__.py │ │ └── logging.py # Structured logging │ ├── vectorstore/ # Vector storage implementations │ │ ├── __init__.py │ │ └── surrealdb_store.py # SurrealDB vector store │ └── websearch/ # Web search integration │ ├── __init__.py │ └── google_search.py # Google Search with content extraction ├── prompts/ # YAML prompt templates │ ├── __init__.py │ ├── manager.py # Prompt template manager │ ├── evaluation/ # Evaluation prompts │ ├── generation/ # Generation prompts │ ├── synthesis/ # Synthesis prompts │ └── templates/ # Base templates ├── schema/ # SurrealDB schema definitions │ ├── documents.surql # Document table schema │ ├── web_search.surql # Web search results schema │ └── *.surql # Database functions ├── Documentation/ # Comprehensive documentation ├── rag.py # Main CLI entry point ├── pyproject.toml # Project dependencies and metadata ├── .env.example # Example environment configuration └── README.md # This file 

🔧 Advanced Configuration

Model Selection

Note : Model Names might change with provider updates, please refer GitHub Models to find the model catalogue.

# Generation Models (Primary Response) LLM_MODEL=meta/Meta-Llama-3.1-405B-Instruct # High-quality generation LLM_MODEL=meta/Meta-Llama-3.1-70B-Instruct # Balanced performance LLM_MODEL=microsoft/Phi-3-mini-4k-instruct # Fast responses # Evaluation Models (Self-Assessment) EVALUATION_MODEL=cohere/Cohere-command-r # Recommended EVALUATION_MODEL=mistralai/Mistral-7B-Instruct-v0.3 # Summary Models (Final Synthesis) SUMMARY_MODEL=meta/Meta-Llama-3.1-70B-Instruct SUMMARY_MODEL=meta/Meta-Llama-3.1-8B-Instruct

Performance Tuning

# Reflexion Parameters MAX_REFLEXION_CYCLES=3 # Faster responses MAX_REFLEXION_CYCLES=5 # More comprehensive answers CONFIDENCE_THRESHOLD=0.7 # Lower threshold for completion CONFIDENCE_THRESHOLD=0.9 # Higher quality requirement # Retrieval Configuration INITIAL_RETRIEVAL_K=3 # Documents for first cycle REFLEXION_RETRIEVAL_K=5 # Documents for follow-up cycles # Web Search Configuration WEB_SEARCH_MODE=off # Disable web search WEB_SEARCH_MODE=initial_only # Search only on first cycle WEB_SEARCH_MODE=every_cycle # Search on every cycle # Memory Management ENABLE_MEMORY_CACHE=true # Enable LRU caching MAX_CACHE_SIZE=1000 # Cache size (adjust for RAM)

📈 Monitoring and Performance

Real-time Metrics

  • Reflexion Cycles: Track iteration count and decision points
  • Confidence Scoring: Monitor answer quality and completion confidence
  • Memory Cache: Hit rates and performance improvements
  • Processing Time: End-to-end response time analysis
  • Web Search: Integration success and content quality
  • Vector Search: SurrealDB query performance and indexing efficiency

Performance Dashboard

# Get comprehensive engine statistics from src.rag.engine import RAGEngine engine = RAGEngine() engine_info = engine.get_engine_info() print(f"Engine Type: {engine_info['engine_type']}") print(f"Max Cycles: {engine_info['max_reflexion_cycles']}") print(f"Memory Cache: {engine_info['memory_cache_enabled']}") if 'memory_stats' in engine_info: memory = engine_info['memory_stats'] print(f"Cache Hit Rate: {memory.get('hit_rate', 0):.2%}") print(f"Cache Size: {memory.get('size', 0)}/{memory.get('max_size', 0)}")

🔍 Web Search Integration

Google Custom Search Setup

  1. Create Google Cloud Project: Enable Custom Search API
  2. Create Custom Search Engine: Configure search scope and preferences
  3. Get API Credentials: Obtain API key and Custom Search Engine ID
  4. Configure Environment: Add credentials to .env file

Web Search Modes

  • OFF: Traditional RAG without web search
  • INITIAL_ONLY: Web search only on the first reflexion cycle
  • EVERY_CYCLE: Web search on every reflexion cycle for maximum coverage

Content Extraction

  • Crawl4AI Integration: Advanced web content extraction
  • Quality Assessment: Content validation and filtering
  • Smart Truncation: Token-aware content limiting
  • Error Handling: Graceful fallback to snippets

🤝 Contributing

We welcome contributions! Areas for improvement:

  • Additional LLM Providers: Support for more model providers
  • Vector Stores: Alternative vector storage backends
  • Web Search: Additional search engines and providers
  • Performance: Optimization and caching improvements
  • UI/UX: Web interface and visualization tools

📚 Documentation

🛣 Roadmap

  • Model Context Protocol (MCP): AI-powered document ingestion
  • Advanced Web Search: Multi-engine search with fact checking
  • Rust Performance: High-performance Rust extensions
  • Modern Web Interface: React/Vue.js frontend with FastAPI backend

See ROADMAP.md for detailed future plans.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with:

👨‍💻 Author

Lay Sheth (@cloaky233)

  • AI Engineer & Enthusiast
  • B.Tech Computer Science Student at VIT Bhopal
  • Portfolio: cloaky.works

📞 Support


Production-ready RAG with human-like iterative reasoning, real-time web search, and enterprise-grade vector storage.

About

RAG with web search, self correction, gap detection and query loops

Topics

Resources

License

Stars

Watchers

Forks

Languages