Call all LLM APIs using the OpenAI format [Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]
| Docs | Discord | 100+ Supported Models |
LiteLLM manages
- Translating inputs to the provider's completion and embedding endpoints
- Guarantees consistent output, text responses will always be available at
['choices'][0]['message']['content'] - Exception mapping - common exceptions across providers are mapped to the OpenAI exception types
pip install litellm from litellm import completion import os ## set ENV variables os.environ["OPENAI_API_KEY"] = "your-openai-key" os.environ["COHERE_API_KEY"] = "your-cohere-key" messages = [{ "content": "Hello, how are you?","role": "user"}] # openai call response = completion(model="gpt-3.5-turbo", messages=messages) # cohere call response = completion(model="command-nightly", messages=messages) print(response)Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for OpenAI, Azure, Anthropic, Huggingface models
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True) for chunk in response: print(chunk['choices'][0]['delta']) # claude 2 result = completion('claude-2', messages, stream=True) for chunk in result: print(chunk['choices'][0]['delta'])Caching (Docs)
LiteLLM supports caching completion() and embedding() calls for all LLMs. Hosted Cache LiteLLM API
import litellm from litellm.caching import Cache import os litellm.cache = Cache() os.environ['OPENAI_API_KEY'] = "" # add to cache response1 = litellm.completion( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "why is LiteLLM amazing?"}], caching=True ) # returns cached response response2 = litellm.completion( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "why is LiteLLM amazing?"}], caching=True ) print(f"response1: {response1}") print(f"response2: {response2}")OpenAI Proxy Server (Docs)
Spin up a local server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.)
This works for async + streaming as well.
litellm --model <model_name>Running your model locally or on a custom endpoint ? Set the --api-base parameter see how
Supported Provider (Docs)
| Provider | Completion | Streaming | Async Completion | Async Streaming |
|---|---|---|---|---|
| openai | ✅ | ✅ | ✅ | ✅ |
| cohere | ✅ | ✅ | ✅ | ✅ |
| anthropic | ✅ | ✅ | ✅ | ✅ |
| replicate | ✅ | ✅ | ✅ | ✅ |
| huggingface | ✅ | ✅ | ✅ | ✅ |
| together_ai | ✅ | ✅ | ✅ | ✅ |
| openrouter | ✅ | ✅ | ✅ | ✅ |
| vertex_ai | ✅ | ✅ | ✅ | ✅ |
| palm | ✅ | ✅ | ✅ | ✅ |
| ai21 | ✅ | ✅ | ✅ | ✅ |
| baseten | ✅ | ✅ | ✅ | ✅ |
| azure | ✅ | ✅ | ✅ | ✅ |
| sagemaker | ✅ | ✅ | ✅ | ✅ |
| bedrock | ✅ | ✅ | ✅ | ✅ |
| vllm | ✅ | ✅ | ✅ | ✅ |
| nlp_cloud | ✅ | ✅ | ✅ | ✅ |
| aleph alpha | ✅ | ✅ | ✅ | ✅ |
| petals | ✅ | ✅ | ✅ | ✅ |
| ollama | ✅ | ✅ | ✅ | ✅ |
| deepinfra | ✅ | ✅ | ✅ | ✅ |
To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.
Here's how to modify the repo locally: Step 1: Clone the repo
git clone https://github.com/BerriAI/litellm.git Step 2: Navigate into the project, and install dependencies:
cd litellm poetry install Step 3: Test your change:
cd litellm/tests # pwd: Documents/litellm/litellm/tests pytest . Step 4: Submit a PR with your changes! 🚀
- push your fork to your github repo
- submit a PR from there
Learn more on how to make a PR
- Schedule Demo 👋
- Community Discord 💭
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
- Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI, Cohere