Quick Start: Get Evaluations Running in a Flash
Get started with Ragas in minutes. Create a complete evaluation project with just a few commands.
Step 1: Create Your Project
Choose one of the following methods:
Step 2: Install Dependencies
Install the project dependencies:
Or if you prefer pip:
Step 3: Set Your API Key
By default, the quickstart example uses OpenAI. Set your API key and you're ready to go. You can also use some other provider with a minor change:
The quickstart project is already configured to use OpenAI. You're all set!
Set your Anthropic API key:
Then update the LLM initialization in evals.py:
Set up your Google credentials:
Then update the LLM initialization in evals.py:
Option 1: Using Google's Official Library (Recommended)
import google.generativeai as genai from ragas.llms import llm_factory genai.configure(api_key=os.environ.get("GOOGLE_API_KEY")) client = genai.GenerativeModel("gemini-2.0-flash") llm = llm_factory("gemini-2.0-flash", provider="google", client=client) # Adapter is auto-detected as "litellm" for google provider For more Gemini options and detailed setup, see the Google Gemini Integration Guide.
Install and run Ollama locally, then update the LLM initialization in evals.py:
For any LLM with OpenAI-compatible API:
from openai import OpenAI from ragas.llms import llm_factory client = OpenAI( api_key="your-api-key", base_url="https://your-api-endpoint" ) llm = llm_factory("model-name", provider="openai", client=client) For more details, learn about LLM integrations.
Project Structure
Your generated project includes:
rag_eval/ ├── README.md # Project documentation ├── pyproject.toml # Project configuration ├── rag.py # Your RAG application ├── evals.py # Evaluation workflow ├── __init__.py # Makes this a Python package └── evals/ ├── datasets/ # Test data files ├── experiments/ # Evaluation results └── logs/ # Execution logs Step 4: Run Your Evaluation
Run the evaluation script:
Or if you installed with pip:
The evaluation will: - Load test data from the load_dataset() function in evals.py - Query your RAG application with test questions - Evaluate responses - Display results in the console - Save results to CSV in the evals/experiments/ directory
Congratulations! You have a complete evaluation setup running. 🎉
Customize Your Evaluation
Add More Test Cases
Edit the load_dataset() function in evals.py to add more test questions:
from ragas import Dataset def load_dataset(): """Load test dataset for evaluation.""" dataset = Dataset( name="test_dataset", backend="local/csv", root_dir=".", ) data_samples = [ { "question": "What is Ragas?", "grading_notes": "Ragas is an evaluation framework for LLM applications", }, { "question": "How do metrics work?", "grading_notes": "Metrics evaluate the quality and performance of LLM responses", }, # Add more test cases here ] for sample in data_samples: dataset.append(sample) dataset.save() return dataset Customize Evaluation Metrics
The template includes a DiscreteMetric for custom evaluation logic. You can customize the evaluation by:
- Modify the metric prompt - Change the evaluation criteria
- Adjust allowed values - Update valid output categories
- Add more metrics - Create additional metrics for different aspects
Example of modifying the metric:
from ragas.metrics import DiscreteMetric from ragas.llms import llm_factory my_metric = DiscreteMetric( name="custom_evaluation", prompt="Evaluate this response: {response} based on: {context}. Return 'excellent', 'good', or 'poor'.", allowed_values=["excellent", "good", "poor"], ) What's Next?
- Learn the concepts: Read the Evaluate a Simple LLM Application guide for deeper understanding
- Custom metrics: Create your own metrics using simple decorators
- Production integration: Integrate evaluations into your CI/CD pipeline
- RAG evaluation: Evaluate RAG systems with specialized metrics
- Agent evaluation: Explore AI agent evaluation
- Test data generation: Generate synthetic test datasets for your evaluations
