A 100% private, local Retrieval-Augmented Generation (RAG) stack using:
- EmbeddingGemma-300m for embeddings
- SQLite-vec for vector storage
- Qwen3:4b for language generation
- 100% Private & Offline Capable
Build a completely private, offline RAG application right on your laptop. This system combines Google's new EmbeddingGemma model for best-in-class local embeddings, SQLite-vec for a dead-simple vector database, and Ollama for a powerful, local LLM. No API keys, no costs, no data sent to the cloud.
- Python 3.9+
- Modern laptop with at least 8GB RAM
- Internet connection for initial model downloads
git clone <your-repo> cd embeddinggemma# macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Or with pip pip install uv# Install all project dependencies uv sync# macOS brew install ollama # Linux curl -fsSL https://ollama.ai/install.sh | sh # Start Ollama service ollama serve & # Pull the Qwen3 model (2.5GB download) ollama pull qwen3:4bEmbeddingGemma requires Hugging Face access:
- Request access at: https://huggingface.co/google/embeddinggemma-300m
- Wait for approval (usually within 24 hours)
- Login via CLI:
# Login to Hugging Face uv run huggingface-cli login# Run the RAG system uv run python rag_demo.pyTo use this project with Jupyter notebooks in a standalone virtual environment:
# Add Jupyter packages to your project uv add jupyter notebook ipykernel# Register your virtual environment as a Jupyter kernel uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG"# Start Jupyter uv run jupyter notebook # Or use Jupyter Lab uv run jupyter lab- Open your notebook
- Go to Kernel β Change kernel β EmbeddingGemma RAG
- Now all your project dependencies are available!
embeddinggemma/ βββ .venv/ # Virtual environment βββ docs/ # Scraped documentation βββ rag_demo.py # Main RAG demonstration script βββ rag_demo.ipynb # Complete tutorial notebook βββ pyproject.toml # Project dependencies (uv format) βββ requirements.txt # Alternative pip format βββ vectors_docs.db # SQLite vector database Key parameters you can modify:
EMBEDDING_MODEL = "google/embeddinggemma-300m" EMBEDDING_DIMS = 256 # 256 for 3x speed, 768 for max quality LLM_MODEL = "qwen3:4b" # Try: qwen3:7b, llama3:8b, mistral:7b DRY_RUN = False # Set True to test without LLMuv run python rag_demo.pyfrom rag_docs import * # Query the system response = semantic_search_and_query("How do I use SQLite-vec with Python?")Solution: Make sure you're using the correct kernel
- Register kernel:
uv run python -m ipykernel install --user --name embeddinggemma --display-name "EmbeddingGemma RAG" - Switch kernel in Jupyter to "EmbeddingGemma RAG"
Solution: Install Jupyter in your environment
uv add jupyter notebook ipykernel uv syncSolution: Request access and wait for approval
- Visit: https://huggingface.co/google/embeddinggemma-300m
- Click "Request access to this repo"
- Wait 24 hours for approval
- Run
uv run huggingface-cli login
Solution: Ensure Ollama is running
# Check if running ps aux | grep ollama # Start if not running ollama serve & # Pull model if needed ollama pull qwen3:4bSolutions:
- Reduce
EMBEDDING_DIMSto 256 - Use smaller batch sizes
- Try
qwen3:1.5binstead ofqwen3:4b - Close other applications
Check your setup:
# Verify environment is activated which python # Should show .venv path # Test imports uv run python -c "import sqlite_vec, ollama, sentence_transformers; print('All imports working!')" # Check Ollama ollama list # Should show qwen3:4b # Test Jupyter kernel jupyter kernelspec list # Should show embeddinggemma kernel- RAM: 8GB minimum, 16GB recommended
- Storage: ~3GB for models + data
- Models Downloaded:
- EmbeddingGemma-300m: ~600MB
- Qwen3:4b: ~2.5GB
Edit DOCUMENTATION_URLS in the script to scrape your own docs.
- Embeddings: Try
google/embeddinggemma-768for higher quality - LLM: Try
qwen3:7b,llama3:8b, ormistral:7b
Modify token-based chunking parameters:
max_tokens = 2048 # Chunk size overlap_tokens = 100 # Overlap between chunksβ
100% Private: All processing happens locally
β
Zero Cost: No API fees after initial setup
β
Mobile-Optimized: EmbeddingGemma designed for mobile deployment
β
Fast: SQLite-vec provides sub-millisecond vector search
β
Smart: Qwen3 rivals much larger models with 256K context
β
Standalone: Complete isolation in virtual environment
This project is open source. See individual model licenses:
- EmbeddingGemma: Gemma License
- Qwen3: Apache 2.0
- SQLite-vec: Apache 2.0
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request