🔬 AI System Testbed

An advanced AI-powered search platform featuring three core capabilities: Search & Recommendation, Context Engineering, and Image Search. Built with modern MLOps practices for production-ready deployment.

🌟 Features

🎯 Three Core Capabilities

1. 🔍 Search & Recommendation System

Intelligent Indexing: TF-IDF based inverted index with Chinese word segmentation
CTR Prediction: Advanced machine learning models (Logistic Regression & Wide & Deep) for click-through rate prediction
Real-time Ranking: Dynamic ranking strategy adjustment based on user behavior
Knowledge Graph: LLM-based NER technology for enhanced semantic search
A/B Testing: Experiment management for ranking algorithm comparison

2. 🤖 Context Engineering

Hybrid Retrieval: Combines inverted index and knowledge graph for comprehensive information retrieval
LLM Integration: Seamless integration with Ollama for local LLM inference
Prompt Engineering: Optimized prompt templates with full transparency
Context Management: Intelligent context selection and ranking for accurate responses
Multi-source Context: Retrieval from documents, knowledge graphs, and structured data

3. 🖼️ Image Search System

CLIP-powered: OpenAI CLIP model via Hugging Face Transformers
Multi-modal Search: Image-to-image and text-to-image search capabilities
Semantic Understanding: 512-dimensional embedding vectors for precise similarity matching
Real-time Processing: Sub-second search response with efficient similarity calculation
Scalable Storage: Unlimited image library with optimized storage management

🏗️ Shared Infrastructure

Microservice Architecture: Decoupled services (Data, Index, Model, Image, Experiment)
Unified Service Management: Centralized service discovery and management
MLOps Pipeline: Complete workflow from data collection to model deployment
Monitoring & Observability: Real-time performance tracking and health checks
Web Interface: Modern Gradio-based UI with responsive design
Production Ready: Comprehensive error handling, logging, and scalability features

📚 Documentation

Search & Recommendation: docs/SEARCH_GUIDE.md
Context Engineering: docs/CONTEXT_ENGINEERING_GUIDE.md
Image Search: docs/IMAGE_SEARCH_GUIDE.md

🚀 Quick Start

Requirements

Python 3.8+
Memory: At least 2GB
Storage: At least 1GB available space
GPU (optional): For better CLIP model performance

Optional Dependencies

Ollama (for Context Engineering/KG): local LLM inference service, default at http://localhost:11434
datasets (for data tools): pip install datasets, used by tools/wikipedia_downloader.py

Installation

# Clone the repository git clone https://github.com/tylerelyt/test_bed.git cd test_bed # Create virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt

Preloaded Dataset (Read-Only)

If data/preloaded_documents.json exists, the system loads these Chinese Wikipedia documents as a read-only core dataset:

Immutable: Preloaded documents are read-only in the UI
Auto-loading: Automatically loads data/preloaded_documents.json at startup (if present)
User Documents: Importing/editing via the UI is not supported in this version
Data Source: Typically generated from Hugging Face fjcanyue/wikipedia-zh-cn via tooling

Note: If no preloaded file is present, the system will still start but the text index may be empty until data is provided offline.

Preloaded Knowledge Graph (Read-Only)

The system automatically loads a preloaded Chinese knowledge graph if available:

Primary Source: data/openkg_triples.tsv - Real OpenKG concept hierarchy data (290 entities, 254 relations)
Fallback: data/preloaded_knowledge_graph.json - Alternative format if TSV not available
Auto-generation: Run python tools/openkg_generator.py to download fresh OpenKG sample data
Format: TSV format with concept-category relationships (e.g., "移动应用属于软件")
Data Source: OpenKG OpenConcepts project from GitHub

The knowledge graph powers entity recognition and context engineering features.

Start the System

# Method 1: Using startup script ./quick_start.sh # Method 2: Direct startup python start_system.py

After the system starts, visit http://localhost:7861 to use the interface.

Configuration

Basic configuration is done in code. Optional environment variables include LLM provider credentials used by NER/RAG (see comments in src/search_engine/index_tab/ner_service.py).

System Architecture Overview

The platform is organized into three main functional areas with shared infrastructure:

🔍 Search & Recommendation Module

Index Building Tab: Offline index construction, document management, and knowledge graph building
Search Tab: Online retrieval and ranking with CTR-based optimization
Training Tab: CTR data collection and Wide & Deep model training

🤖 Context Engineering Module

Context Q&A Tab: Context‑augmented answering with Ollama integration
Knowledge Graph Integration: Semantic search with LLM-based entity recognition
Multi-source Retrieval: Documents, graphs, and structured data integration

Note: Context Engineering / KG rely on a locally running Ollama service and available models. If Ollama is not running or the model hasn't been pulled, the page will show a connection error, but other parts of the system remain available.

🖼️ Image Search Module

Image Search Tab: CLIP-based image retrieval supporting image-to-image and text-to-image search
Image Management: Upload, indexing, and library management
Multi-modal Understanding: Cross-modal semantic search capabilities

🏗️ Shared Infrastructure

Service Management: Unified service discovery and orchestration
Monitoring Tab: System performance monitoring and health checks
Data Pipeline: Centralized data processing and storage
Web Interface: Modern responsive UI with Gradio framework

🖼️ Image Search System

Overview

The image search system leverages OpenAI's CLIP model to provide intelligent image retrieval capabilities:

📤 Image Upload: Store images with descriptions and tags
🔍 Image-to-Image Search: Find visually similar images using query images
💬 Text-to-Image Search: Search images using natural language descriptions
📋 Image Management: Comprehensive image library management

Technical Details

Model: OpenAI CLIP ViT-B/32 via Hugging Face Transformers
Embedding Dimension: 512-dimensional vectors
Similarity Metric: Cosine similarity
Supported Formats: JPG, PNG, GIF, BMP, and more
Performance: Sub-second search response times

Usage Examples

Text-to-Image Search

# Examples of search queries "a red car on the street" "cat sleeping on a bed" "beautiful sunset landscape" "person running" # Non-English queries are also supported

Upload and Index Images

Navigate to "🖼️ Image Search System" → "📤 Image Upload"
Select image files and add descriptions/tags
Click "📤 Upload Image" to index

Search Similar Images

Go to "🔍 Image-to-Image" tab
Upload a query image
Adjust the number of results (1-20)
View results in table and gallery format

For detailed usage instructions, see:

📖 User Guide

Basic Usage

Index Building: The system automatically loads preloaded documents (if present) and builds the index on startup; manual document addition via UI is not supported
Search Testing: Enter queries in the search box to retrieve relevant documents
Click Feedback: Clicking search results records user behavior for model training
Model Training: After collecting sufficient data, train CTR prediction models

Advanced Features

1. Batch Data Import

from src.search_engine.data_utils import import_ctr_data result = import_ctr_data("path/to/your/data.json")

2. Custom Ranking Strategy

from src.search_engine.service_manager import get_index_service index_service = get_index_service() results = index_service.search("query terms", top_k=10)

3. Experiment Management

The system supports A/B testing with configurable ranking strategies for comparison in the monitoring interface.

🏗️ Architecture Design

System Architecture

graph TB subgraph "🖥️ Web Interface Layer" Portal["Portal<br/>🚪 Main Entry"] end subgraph "📱 Application Layer" SearchMod["🔍 Search & Recommendation<br/>• Index Building<br/>• Text Search<br/>• CTR Training"] RAGMod["🤖 Context Engineering<br/>• Context Q&A<br/>• Knowledge Graph<br/>• Multi-source Retrieval"] ImageMod["🖼️ Image Search<br/>• Image Upload<br/>• Image-to-Image<br/>• Text-to-Image"] end subgraph "🏗️ Service Layer" DataSvc["DataService<br/>📊 CTR Data Management"] IndexSvc["IndexService<br/>📚 Text Indexing & Search"] ModelSvc["ModelService<br/>🤖 ML Model Management"] ImageSvc["ImageService<br/>🖼️ CLIP-based Search"] ExpSvc["ExperimentService<br/>🧪 A/B Testing"] end subgraph "📊 Infrastructure Layer" Monitor["Monitoring<br/>📈 Performance Tracking"] Storage["Storage<br/>💾 Data Persistence"] ServiceMgr["ServiceManager<br/>🔧 Service Orchestration"] end Portal --> SearchMod Portal --> RAGMod Portal --> ImageMod SearchMod --> DataSvc SearchMod --> IndexSvc SearchMod --> ModelSvc RAGMod --> IndexSvc RAGMod --> ModelSvc ImageMod --> ImageSvc DataSvc --> ServiceMgr IndexSvc --> ServiceMgr ModelSvc --> ServiceMgr ImageSvc --> ServiceMgr ExpSvc --> ServiceMgr ServiceMgr --> Monitor ServiceMgr --> Storage

Data Flow

graph LR subgraph "🔍 Search & Recommendation Flow" A1[User Query] --> A2[Index Retrieval] A2 --> A3[Initial Ranking] A3 --> A4[CTR Prediction] A4 --> A5[Re-ranking] A5 --> A6[Results Display] A6 --> A7[User Click] A7 --> A8[Behavior Recording] A8 --> A9[Model Training] A9 --> A4 end subgraph "🤖 Context Engineering Flow" B1[User Question] --> B2[Document Retrieval] B2 --> B3[Knowledge Graph Query] B3 --> B4[Context Assembly] B4 --> B5[LLM Generation] B5 --> B6[Response Display] end subgraph "🖼️ Image Search Flow" C1[Image/Text Query] --> C2[CLIP Encoding] C2 --> C3[Similarity Calculation] C3 --> C4[Result Ranking] C4 --> C5[Image Gallery Display] C5 --> C6[User Interaction] C6 --> C7[Usage Analytics] end

📊 Notes

This project is a testbed for learning and experimentation. Any performance numbers depend on environment, data size, and configuration and are not guaranteed.

🛠️ Development Guide

Project Structure

Testbed/ ├── src/ # Source code │ └── search_engine/ │ ├── data_service.py # Data service (CTR data management) │ ├── index_service.py # Index service (text search & indexing) │ ├── model_service.py # Model service (CTR & Wide&Deep models) │ ├── image_service.py # Image service (CLIP-based image search) │ ├── experiment_service.py # Experiment management service │ ├── service_manager.py # Service manager (unified service access) │ ├── data_utils.py # Data processing utilities │ ├── portal.py # Main UI entry point │ ├── index_tab/ # Index building & knowledge graph UI │ │ ├── index_tab.py │ │ ├── knowledge_graph.py │ │ ├── ner_service.py │ │ └── offline_index.py │ ├── search_tab/ # Text search UI │ │ ├── search_tab.py │ │ └── search_engine.py │ ├── image_tab/ # Image search UI │ │ └── image_tab.py │ ├── training_tab/ # Model training UI │ │ ├── training_tab.py │ │ ├── ctr_model.py │ │ ├── ctr_wide_deep_model.py │ │ └── ctr_config.py │ ├── rag_tab/ # RAG Q&A system UI │ │ ├── rag_tab.py │ │ └── rag_service.py │ └── monitoring_tab/ # System monitoring UI │ └── monitoring_tab.py ├── models/ # Model files and data storage │ ├── ctr_model.pkl # Trained CTR model │ ├── wide_deep_ctr_model.h5 # Wide & Deep model │ ├── index_data.json # Text search index │ ├── knowledge_graph.pkl # Knowledge graph data │ └── images/ # Image storage and embeddings │ ├── image_index.json │ └── image_embeddings.npy ├── data/ # Training and experiment data │ └── preloaded_documents.json # Preloaded Chinese Wikipedia documents ├── docs/ # Documentation (simplified) │ ├── SEARCH_GUIDE.md # Search & Recommendation guide │ ├── CONTEXT_ENGINEERING_GUIDE.md # Context Engineering guide │ └── IMAGE_SEARCH_GUIDE.md # Image search guide ├── examples/ # Example scripts ├── tools/ # Utility and monitoring tools ├── test/ & tests/ # Test suites ├── start_system.py # System startup script ├── quick_start.sh # Quick start script └── requirements.txt # Python dependencies

Extension Development

Adding New Ranking Algorithms

Create new ranking module in src/search_engine/ranking/
Implement RankingInterface interface
Register new algorithm in IndexService

Adding New Features

Define new features in CTRSampleConfig
Calculate feature values in DataService.record_impression
Update model training logic

Adding New Image Search Features

Extend ImageService class with new methods
Update image_tab.py UI components
Test with various image types and queries

🧪 Testing

# Run unit tests (if present) python -m pytest tests/

📈 Monitoring

The system provides multi-dimensional monitoring:

System Monitoring: CPU, memory, disk usage
Business Monitoring: Search QPS, click-through rate, response time
Data Monitoring: Data quality, model performance metrics
Image Search Monitoring: CLIP model performance, search accuracy
Alert Mechanism: Anomaly detection and automatic alerting

🤝 Contributing

Fork the project
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

jieba - Chinese word segmentation
scikit-learn - Machine learning library
Gradio - Web interface framework
pandas - Data processing
Hugging Face Transformers - CLIP model implementation
OpenAI CLIP - Original CLIP model

📞 Contact

Project Homepage: https://github.com/tylerelyt/test_bed
Issue Tracker: https://github.com/tylerelyt/test_bed/issues
Email: tylerelyt@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.cursor/rules		.cursor/rules
.github		.github
checkpoints		checkpoints
configs/llmops		configs/llmops
data		data
docs		docs
docs_site		docs_site
examples		examples
logs		logs
models/images		models/images
src		src
test_images		test_images
tools		tools
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
check_versions.py		check_versions.py
debug_segfault.py		debug_segfault.py
image_generation_service.py		image_generation_service.py
image_generation_service_requirements.txt		image_generation_service_requirements.txt
quick_start.sh		quick_start.sh
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt
restart_system.sh		restart_system.sh
setup.py		setup.py
setup_python_env.sh		setup_python_env.sh
start_model_serving.py		start_model_serving.py
start_system.py		start_system.py
status_system.sh		status_system.sh
stop_system.sh		stop_system.sh
test_api.yaml		test_api.yaml
test_coordinate_annotation.py		test_coordinate_annotation.py
test_gui_agent_manual.py		test_gui_agent_manual.py
test_gui_agent_resolution_fix.py		test_gui_agent_resolution_fix.py
test_inference.yaml		test_inference.yaml
test_model_service_integration.py		test_model_service_integration.py
test_osworld_vm_screenshot.py		test_osworld_vm_screenshot.py
test_qwen_dpo.yaml		test_qwen_dpo.yaml
test_qwen_sft.yaml		test_qwen_sft.yaml
test_screen_resolution.py		test_screen_resolution.py

Folders and files

Latest commit

History

Repository files navigation

🔬 AI System Testbed

🌟 Features

🎯 Three Core Capabilities

1. 🔍 Search & Recommendation System

2. 🤖 Context Engineering

3. 🖼️ Image Search System

🏗️ Shared Infrastructure

📚 Documentation

🚀 Quick Start

Requirements

Optional Dependencies

Installation

Preloaded Dataset (Read-Only)

Preloaded Knowledge Graph (Read-Only)

Start the System

Configuration

System Architecture Overview

🔍 Search & Recommendation Module

🤖 Context Engineering Module

🖼️ Image Search Module

🏗️ Shared Infrastructure

🖼️ Image Search System

Overview

Technical Details

Usage Examples

Text-to-Image Search

Upload and Index Images

Search Similar Images

📖 User Guide

Basic Usage

Advanced Features

1. Batch Data Import

2. Custom Ranking Strategy

3. Experiment Management

🏗️ Architecture Design

System Architecture

Data Flow

📊 Notes

🛠️ Development Guide

Project Structure

Extension Development

Adding New Ranking Algorithms

Adding New Features

Adding New Image Search Features

🧪 Testing

📈 Monitoring

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages