Private RAG System with EmbeddingGemma

A 100% private, local Retrieval-Augmented Generation (RAG) stack using:

EmbeddingGemma-300m for embeddings
SQLite-vec for vector storage
Qwen3:4b for language generation
100% Private & Offline Capable

🎯 What This Project Does

Build a completely private, offline RAG application right on your laptop. This system combines Google's new EmbeddingGemma model for best-in-class local embeddings, SQLite-vec for a dead-simple vector database, and Ollama for a powerful, local LLM. No API keys, no costs, no data sent to the cloud.

📋 Prerequisites

Python 3.9+
Modern laptop with at least 8GB RAM
Internet connection for initial model downloads

🚀 Quick Start

1. Clone and Setup

git clone <your-repo> cd embeddinggemma

2. Install UV (if not already installed)

# macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Or with pip pip install uv

3. Install Dependencies

# Install all project dependencies uv sync

4. Setup Ollama

# macOS brew install ollama # Linux curl -fsSL https://ollama.ai/install.sh | sh # Start Ollama service ollama serve & # Pull the Qwen3 model (2.5GB download) ollama pull qwen3:4b

5. Hugging Face Authentication

EmbeddingGemma requires Hugging Face access:

Request access at: https://huggingface.co/google/embeddinggemma-300m
Wait for approval (usually within 24 hours)
Login via CLI:

# Login to Hugging Face uv run huggingface-cli login

6. Run the Demo

# Run the RAG system uv run python rag_demo.py

📔 Jupyter Notebook Setup

To use this project with Jupyter notebooks in a standalone virtual environment:

Step 1: Add Jupyter Dependencies

# Add Jupyter packages to your project uv add jupyter notebook ipykernel

Step 2: Register Jupyter Kernel

# Register your virtual environment as a Jupyter kernel uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG"

Step 3: Launch Jupyter

# Start Jupyter uv run jupyter notebook # Or use Jupyter Lab uv run jupyter lab

Step 4: Use the Correct Kernel

Open your notebook
Go to Kernel → Change kernel → EmbeddingGemma RAG
Now all your project dependencies are available!

🏗️ Project Structure

embeddinggemma/ ├── .venv/ # Virtual environment ├── docs/ # Scraped documentation ├── rag_demo.py # Main RAG demonstration script ├── rag_demo.ipynb # Complete tutorial notebook ├── pyproject.toml # Project dependencies (uv format) ├── requirements.txt # Alternative pip format └── vectors_docs.db # SQLite vector database

🔧 Configuration

Key parameters you can modify:

EMBEDDING_MODEL = "google/embeddinggemma-300m" EMBEDDING_DIMS = 256 # 256 for 3x speed, 768 for max quality LLM_MODEL = "qwen3:4b" # Try: qwen3:7b, llama3:8b, mistral:7b DRY_RUN = False # Set True to test without LLM

🧪 Usage Examples

Command Line

uv run python rag_demo.py

In Python/Jupyter

from rag_docs import * # Query the system response = semantic_search_and_query("How do I use SQLite-vec with Python?")

🔍 Troubleshooting

Common Issues

"pip not found" in Jupyter

Solution: Make sure you're using the correct kernel

Register kernel: uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG"
Switch kernel in Jupyter to "EmbeddingGemma RAG"

"Command not found: jupyter"

Solution: Install Jupyter in your environment

uv add jupyter notebook ipykernel uv sync

EmbeddingGemma Access Denied

Solution: Request access and wait for approval

Visit: https://huggingface.co/google/embeddinggemma-300m
Click "Request access to this repo"
Wait 24 hours for approval
Run uv run huggingface-cli login

Ollama Connection Error

Solution: Ensure Ollama is running

# Check if running ps aux | grep ollama # Start if not running ollama serve & # Pull model if needed ollama pull qwen3:4b

Out of Memory Errors

Solutions:

Reduce EMBEDDING_DIMS to 256
Use smaller batch sizes
Try qwen3:1.5b instead of qwen3:4b
Close other applications

Verification Commands

Check your setup:

# Verify environment is activated which python # Should show .venv path # Test imports uv run python -c "import sqlite_vec, ollama, sentence_transformers; print('All imports working!')" # Check Ollama ollama list # Should show qwen3:4b # Test Jupyter kernel jupyter kernelspec list # Should show embeddinggemma kernel

📊 System Requirements

RAM: 8GB minimum, 16GB recommended
Storage: ~3GB for models + data
Models Downloaded:
- EmbeddingGemma-300m: ~600MB
- Qwen3:4b: ~2.5GB

🛠️ Advanced Customization

Add Custom Documentation

Edit DOCUMENTATION_URLS in the script to scrape your own docs.

Different Models

Embeddings: Try google/embeddinggemma-768 for higher quality
LLM: Try qwen3:7b, llama3:8b, or mistral:7b

Chunking Strategy

Modify token-based chunking parameters:

max_tokens = 2048 # Chunk size overlap_tokens = 100 # Overlap between chunks

🎯 Benefits

✅ 100% Private: All processing happens locally
✅ Zero Cost: No API fees after initial setup
✅ Mobile-Optimized: EmbeddingGemma designed for mobile deployment
✅ Fast: SQLite-vec provides sub-millisecond vector search
✅ Smart: Qwen3 rivals much larger models with 256K context
✅ Standalone: Complete isolation in virtual environment

📜 License

This project is open source. See individual model licenses:

EmbeddingGemma: Gemma License
Qwen3: Apache 2.0
SQLite-vec: Apache 2.0

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
rag_demo.ipynb		rag_demo.ipynb
rag_demo.py		rag_demo.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Private RAG System with EmbeddingGemma

🎯 What This Project Does

📋 Prerequisites

🚀 Quick Start

1. Clone and Setup

2. Install UV (if not already installed)

3. Install Dependencies

4. Setup Ollama

5. Hugging Face Authentication

6. Run the Demo

📔 Jupyter Notebook Setup

Step 1: Add Jupyter Dependencies

Step 2: Register Jupyter Kernel

Step 3: Launch Jupyter

Step 4: Use the Correct Kernel

🏗️ Project Structure

🔧 Configuration

🧪 Usage Examples

Command Line

In Python/Jupyter

🔍 Troubleshooting

Common Issues

"pip not found" in Jupyter

"Command not found: jupyter"

EmbeddingGemma Access Denied

Ollama Connection Error

Out of Memory Errors

Verification Commands

📊 System Requirements

🛠️ Advanced Customization

Add Custom Documentation

Different Models

Chunking Strategy

🎯 Benefits

📜 License

🤝 Contributing

🔗 Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages