Skip to content

Error Handling

adham90 edited this page Feb 14, 2026 · 3 revisions

Error Handling

Understanding and handling errors in RubyLLM::Agents.

Error Class Hierarchy

StandardError └── RubyLLM::Agents::Error (base class) ├── RubyLLM::Agents::BudgetExceededError ├── RubyLLM::Agents::CircuitBreakerOpenError ├── RubyLLM::Agents::ConfigurationError ├── RubyLLM::Agents::TimeoutError └── RubyLLM::Agents::ValidationError 

Error Types

BudgetExceededError

Raised when budget limits are exceeded (with :hard enforcement):

begin result = LLM::ExpensiveAgent.call(query: params[:query]) rescue RubyLLM::Agents::BudgetExceededError => e e.message # => "Daily budget exceeded: $100.00 limit reached" e.budget_type # => :daily or :monthly e.budget_scope # => :global, :per_agent, or :tenant e.limit # => 100.0 e.current # => 105.50 e.tenant_budget? # => true/false (v0.4.0+) # Handle gracefully render json: { error: "Service temporarily unavailable" }, status: 503 end

CircuitBreakerOpenError

Raised when the circuit breaker is open (too many recent failures):

begin result = LLM::MyAgent.call(query: params[:query]) rescue RubyLLM::Agents::CircuitBreakerOpenError => e e.message # => "Circuit breaker open for MyAgent" e.agent_type # => "MyAgent" e.cooldown_ends # => Time object when circuit will close e.remaining_ms # => Milliseconds until retry is allowed # Suggest retry time render json: { error: "Service temporarily unavailable", retry_after: e.remaining_ms / 1000 }, status: 503 end

ConfigurationError

Raised when agent configuration is invalid:

# Missing required configuration class BadAgent < ApplicationAgent # No model specified param :query, required: true end BadAgent.call(query: "test") # => Raises ConfigurationError: "Model must be configured"

TimeoutError

Raised when total_timeout is exceeded:

begin result = LLM::SlowAgent.call(query: params[:query]) rescue RubyLLM::Agents::TimeoutError => e e.message # => "Total timeout of 30s exceeded" e.timeout # => 30 e.elapsed # => 31.5 e.attempts # => 3 (attempts made before timeout) render json: { error: "Request timed out" }, status: 504 end

ValidationError

Raised when parameter validation fails:

class TypedAgent < ApplicationAgent param :count, type: :integer, required: true end TypedAgent.call(count: "not a number") # => Raises ValidationError: "Parameter 'count' must be Integer, got String"

Retryable vs Non-Retryable Errors

RubyLLM::Agents automatically classifies errors:

Retryable Errors (automatic retry)

  • Faraday::TimeoutError - Request timeout
  • Faraday::ConnectionFailed - Connection issues
  • RubyLLM::RateLimitError - Rate limit exceeded
  • Net::OpenTimeout - Connection timeout
  • Errno::ECONNREFUSED - Connection refused

Non-Retryable Errors (fail immediately)

  • RubyLLM::AuthenticationError - Invalid API key
  • RubyLLM::InvalidRequestError - Bad request parameters
  • ArgumentError - Missing required parameters
  • RubyLLM::Agents::BudgetExceededError - Budget exceeded
  • RubyLLM::Agents::CircuitBreakerOpenError - Circuit open

Checking Error Type in Results

result = LLM::MyAgent.call(query: "test") unless result.success? if result.retryable? # Safe to retry later RetryJob.perform_later(query: "test") else # Don't retry, handle differently notify_admin(result.error) end end

Recovery Patterns

Basic Error Handling

def search(query) result = LLM::SearchAgent.call(query: query) if result.success? result.content else # Return cached/default response cached_search(query) || default_response end rescue RubyLLM::Agents::BudgetExceededError { error: "Search temporarily unavailable", results: [] } rescue RubyLLM::Agents::CircuitBreakerOpenError => e { error: "Service degraded, retry in #{e.remaining_ms / 1000}s", results: [] } end

Graceful Degradation

class SearchService def search(query) # Try AI-powered search first ai_search(query) rescue RubyLLM::Agents::Error # Fall back to basic search basic_search(query) end private def ai_search(query) result = LLM::SearchAgent.call(query: query) raise result.error unless result.success? result.content[:results] end def basic_search(query) # Simple database search as fallback Product.search(query).limit(10) end end

Retry with Backoff

class AgentRetryService MAX_RETRIES = 3 BASE_DELAY = 1 def call(agent_class, **params) retries = 0 begin agent_class.call(**params) rescue RubyLLM::Agents::Error => e raise unless e.retryable? && retries < MAX_RETRIES retries += 1 sleep(BASE_DELAY * (2 ** retries)) retry end end end

Queue for Later Processing

class AgentJob < ApplicationJob retry_on RubyLLM::Agents::CircuitBreakerOpenError, wait: :polynomially_longer discard_on RubyLLM::Agents::BudgetExceededError def perform(agent_class_name, **params) agent_class = agent_class_name.constantize result = agent_class.call(**params) if result.success? ResultHandler.process(result) else handle_failure(result) end end private def handle_failure(result) Rails.logger.error("Agent failed: #{result.error}") notify_admin(result) end end

Controller Error Handling

Rescue From Pattern

class ApplicationController < ActionController::Base rescue_from RubyLLM::Agents::BudgetExceededError, with: :handle_budget_exceeded rescue_from RubyLLM::Agents::CircuitBreakerOpenError, with: :handle_circuit_open rescue_from RubyLLM::Agents::TimeoutError, with: :handle_timeout private def handle_budget_exceeded(error) render json: { error: "Service limit reached", type: "budget_exceeded" }, status: :service_unavailable end def handle_circuit_open(error) response.headers["Retry-After"] = (error.remaining_ms / 1000).to_s render json: { error: "Service temporarily unavailable", type: "circuit_open", retry_after: error.remaining_ms / 1000 }, status: :service_unavailable end def handle_timeout(error) render json: { error: "Request timed out", type: "timeout" }, status: :gateway_timeout end end

API-Specific Handling

class Api::V1::SearchController < Api::BaseController def search result = LLM::SearchAgent.call(query: params[:q]) if result.success? render json: { data: result.content, meta: { tokens: result.total_tokens, cost: result.total_cost, duration_ms: result.duration_ms } } else render json: { error: result.error, retryable: result.retryable? }, status: :unprocessable_entity end end end

Monitoring and Alerting

Error Rate Monitoring

# Track error rates error_rate = RubyLLM::Agents::Execution .today .by_agent("MyAgent") .then { |e| e.failed.count.to_f / e.count } if error_rate > 0.1 # >10% error rate SlackNotifier.alert("High error rate for MyAgent: #{(error_rate * 100).round}%") end

Error Type Breakdown

# Analyze failure reasons RubyLLM::Agents::Execution .today .failed .group(:error_message) .count .sort_by { |_, count| -count } .first(5) # => [["Rate limit exceeded", 45], ["Timeout", 12], ...]

Setting Up Alerts

# config/initializers/ruby_llm_agents.rb RubyLLM::Agents.configure do |config| config.on_alert = ->(event, payload) { case event when :breaker_open PagerDuty.trigger( summary: "Circuit breaker open for #{payload[:agent_type]}", severity: "warning" ) Slack::Notifier.new(ENV["SLACK_WEBHOOK"]).ping( "Circuit breaker opened: #{payload[:agent_type]}" ) when :budget_hard_cap PagerDuty.trigger( summary: "Budget exceeded: $#{payload[:total_cost]}", severity: "critical" ) when :budget_soft_cap Rails.logger.warn("Budget warning: #{payload}") end } end

Best Practices

  1. Always check result.success? - Don't assume calls succeed
  2. Use rescue blocks sparingly - Prefer checking result status
  3. Log errors with context - Include agent type, parameters, and timing
  4. Set up monitoring - Track error rates and patterns
  5. Implement graceful degradation - Have fallback strategies
  6. Use circuit breakers - Prevent cascade failures
  7. Configure appropriate timeouts - Balance responsiveness and reliability

Related Pages

Clone this wiki locally