Skip to content

feat(ai): Add LiteLLM-like router infrastructure (#286)#439

Open
matiasmagni wants to merge 1 commit intoarakoodev:tsfrom
matiasmagni:feature/litellm-router-286-v2
Open

feat(ai): Add LiteLLM-like router infrastructure (#286)#439
matiasmagni wants to merge 1 commit intoarakoodev:tsfrom
matiasmagni:feature/litellm-router-286-v2

Conversation

@matiasmagni
Copy link

@matiasmagni matiasmagni commented Mar 18, 2026

Summary

  • Implement load balancing between multiple LLM deployments (OpenAI, Google Palm/Gemini, Cohere)
  • Routing strategies: least-tokens (default), simple-shuffle, latency-based
  • Timeout/retry with exponential backoff via axios interceptors
  • Streaming support for all providers
  • Token usage tracking with cost calculation
  • Sentry and Posthog logging callbacks
  • JSONNet configuration support
  • Mock servers for testing

Demo Video

https://youtu.be/Zij1XabtJnk

Features Implemented

  1. Load Balancing: Picks deployment below rate-limit with least tokens used
  2. Reliability: Timeouts, retries, exponential backoff
  3. Streaming: Full streaming support
  4. Token Usage: Tracks prompt/completion/total tokens and cost
  5. Logging: Sentry + Posthog callbacks

Tests

  • 8 E2E tests passing covering all features
  • Mock servers for OpenAI, Gemini, Cohere

Usage Example

import { Router } from "@arakoodev/edgechains.js/ai"; const router = new Router({ modelList: [ { modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-xxx", rpm: 3000, tpm: 90000 }, { modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-yyy", rpm: 3000, tpm: 90000 }, ], routingStrategy: "least-tokens", numRetries: 3, timeout: 30000, }); const response = await router.completion({ model: "gpt-3.5-turbo", messages: [{ role: "user", content: "Hello!" }], });

/claim #286

- Implement load balancing between multiple LLM deployments (OpenAI, Gemini, Cohere) - Support least-tokens, simple-shuffle, and latency-based routing strategies - Add timeout/retry logic with exponential backoff via axios interceptors - Implement streaming support for all providers - Add token usage tracking with cost calculation - Add Sentry and Posthog logging callbacks - Add JSONNet configuration example - Add mock servers and E2E tests
@matiasmagni
Copy link
Author

I have read the Arakoo CLA Document and I hereby sign the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant