Anthropic
Install
To use AnthropicModel models, you need to either install pydantic-ai, or install pydantic-ai-slim with the anthropic optional group:
pip install "pydantic-ai-slim[anthropic]" uv add "pydantic-ai-slim[anthropic]" Configuration
To use Anthropic through their API, go to console.anthropic.com/settings/keys to generate an API key.
AnthropicModelName contains a list of available Anthropic models.
Environment variable
Once you have the API key, you can set it as an environment variable:
export ANTHROPIC_API_KEY='your-api-key' You can then use AnthropicModel by name:
from pydantic_ai import Agent agent = Agent('gateway/anthropic:claude-sonnet-4-5') ... from pydantic_ai import Agent agent = Agent('anthropic:claude-sonnet-4-5') ... Or initialise the model directly with just the model name:
from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModel model = AnthropicModel('claude-sonnet-4-5') agent = Agent(model) ... provider argument
You can provide a custom Provider via the provider argument:
from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModel from pydantic_ai.providers.anthropic import AnthropicProvider model = AnthropicModel( 'claude-sonnet-4-5', provider=AnthropicProvider(api_key='your-api-key') ) agent = Agent(model) ... Custom HTTP Client
You can customize the AnthropicProvider with a custom httpx.AsyncClient:
from httpx import AsyncClient from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModel from pydantic_ai.providers.anthropic import AnthropicProvider custom_http_client = AsyncClient(timeout=30) model = AnthropicModel( 'claude-sonnet-4-5', provider=AnthropicProvider(api_key='your-api-key', http_client=custom_http_client), ) agent = Agent(model) ... Prompt Caching
Anthropic supports prompt caching to reduce costs by caching parts of your prompts. Pydantic AI provides four ways to use prompt caching:
- Cache User Messages with
CachePoint: Insert aCachePointmarker in your user messages to cache everything before it - Cache System Instructions: Set
AnthropicModelSettings.anthropic_cache_instructionstoTrue(uses 5m TTL by default) or specify'5m'/'1h'directly - Cache Tool Definitions: Set
AnthropicModelSettings.anthropic_cache_tool_definitionstoTrue(uses 5m TTL by default) or specify'5m'/'1h'directly - Cache All Messages: Set
AnthropicModelSettings.anthropic_cache_messagestoTrueto automatically cache all messages
Example 1: Automatic Message Caching
Use anthropic_cache_messages to automatically cache all messages up to and including the newest user message:
from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'gateway/anthropic:claude-sonnet-4-5', system_prompt='You are a helpful assistant.', model_settings=AnthropicModelSettings( anthropic_cache_messages=True, # Automatically caches the last message ), ) # The last message is automatically cached - no need for manual CachePoint result1 = agent.run_sync('What is the capital of France?') # Subsequent calls with similar conversation benefit from cache result2 = agent.run_sync('What is the capital of Germany?') print(f'Cache write: {result1.usage().cache_write_tokens}') print(f'Cache read: {result2.usage().cache_read_tokens}') from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'anthropic:claude-sonnet-4-5', system_prompt='You are a helpful assistant.', model_settings=AnthropicModelSettings( anthropic_cache_messages=True, # Automatically caches the last message ), ) # The last message is automatically cached - no need for manual CachePoint result1 = agent.run_sync('What is the capital of France?') # Subsequent calls with similar conversation benefit from cache result2 = agent.run_sync('What is the capital of Germany?') print(f'Cache write: {result1.usage().cache_write_tokens}') print(f'Cache read: {result2.usage().cache_read_tokens}') Example 2: Comprehensive Caching Strategy
Combine multiple cache settings for maximum savings:
from pydantic_ai import Agent, RunContext from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'gateway/anthropic:claude-sonnet-4-5', system_prompt='Detailed instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True, # Cache system instructions anthropic_cache_tool_definitions='1h', # Cache tool definitions with 1h TTL anthropic_cache_messages=True, # Also cache the last message ), ) @agent.tool def search_docs(ctx: RunContext, query: str) -> str: """Search documentation.""" return f'Results for {query}' result = agent.run_sync('Search for Python best practices') print(result.output) from pydantic_ai import Agent, RunContext from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'anthropic:claude-sonnet-4-5', system_prompt='Detailed instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True, # Cache system instructions anthropic_cache_tool_definitions='1h', # Cache tool definitions with 1h TTL anthropic_cache_messages=True, # Also cache the last message ), ) @agent.tool def search_docs(ctx: RunContext, query: str) -> str: """Search documentation.""" return f'Results for {query}' result = agent.run_sync('Search for Python best practices') print(result.output) Example 3: Fine-Grained Control with CachePoint
Use manual CachePoint markers to control cache locations precisely:
from pydantic_ai import Agent, CachePoint agent = Agent( 'gateway/anthropic:claude-sonnet-4-5', system_prompt='Instructions...', ) # Manually control cache points for specific content blocks result = agent.run_sync([ 'Long context from documentation...', CachePoint(), # Cache everything up to this point 'First question' ]) print(result.output) from pydantic_ai import Agent, CachePoint agent = Agent( 'anthropic:claude-sonnet-4-5', system_prompt='Instructions...', ) # Manually control cache points for specific content blocks result = agent.run_sync([ 'Long context from documentation...', CachePoint(), # Cache everything up to this point 'First question' ]) print(result.output) Accessing Cache Usage Statistics
Access cache usage statistics via result.usage():
from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'gateway/anthropic:claude-sonnet-4-5', system_prompt='Instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True # Default 5m TTL ), ) result = agent.run_sync('Your question') usage = result.usage() print(f'Cache write tokens: {usage.cache_write_tokens}') print(f'Cache read tokens: {usage.cache_read_tokens}') from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'anthropic:claude-sonnet-4-5', system_prompt='Instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True # Default 5m TTL ), ) result = agent.run_sync('Your question') usage = result.usage() print(f'Cache write tokens: {usage.cache_write_tokens}') print(f'Cache read tokens: {usage.cache_read_tokens}') Cache Point Limits
Anthropic enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors.
How Cache Points Are Allocated
Cache points can be placed in three locations:
- System Prompt: Via
anthropic_cache_instructionssetting (adds cache point to last system prompt block) - Tool Definitions: Via
anthropic_cache_tool_definitionssetting (adds cache point to last tool definition) - Messages: Via
CachePointmarkers oranthropic_cache_messagessetting (adds cache points to message content)
Each setting uses at most 1 cache point, but you can combine them.
Example: Using All 3 Cache Point Sources
Define an agent with all cache settings enabled:
from pydantic_ai import Agent, CachePoint from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'gateway/anthropic:claude-sonnet-4-5', system_prompt='Detailed instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True, # 1 cache point anthropic_cache_tool_definitions=True, # 1 cache point anthropic_cache_messages=True, # 1 cache point ), ) @agent.tool_plain def my_tool() -> str: return 'result' # This uses 3 cache points (instructions + tools + last message) # You can add 1 more CachePoint marker before hitting the limit result = agent.run_sync([ 'Context', CachePoint(), # 4th cache point - OK 'Question' ]) print(result.output) usage = result.usage() print(f'Cache write tokens: {usage.cache_write_tokens}') print(f'Cache read tokens: {usage.cache_read_tokens}') from pydantic_ai import Agent, CachePoint from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'anthropic:claude-sonnet-4-5', system_prompt='Detailed instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True, # 1 cache point anthropic_cache_tool_definitions=True, # 1 cache point anthropic_cache_messages=True, # 1 cache point ), ) @agent.tool_plain def my_tool() -> str: return 'result' # This uses 3 cache points (instructions + tools + last message) # You can add 1 more CachePoint marker before hitting the limit result = agent.run_sync([ 'Context', CachePoint(), # 4th cache point - OK 'Question' ]) print(result.output) usage = result.usage() print(f'Cache write tokens: {usage.cache_write_tokens}') print(f'Cache read tokens: {usage.cache_read_tokens}') Automatic Cache Point Limiting
When cache points from all sources (settings + CachePoint markers) exceed 4, Pydantic AI automatically removes excess cache points from older message content (keeping the most recent ones).
Define an agent with 2 cache points from settings:
from pydantic_ai import Agent, CachePoint from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'gateway/anthropic:claude-sonnet-4-5', system_prompt='Instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True, # 1 cache point anthropic_cache_tool_definitions=True, # 1 cache point ), ) @agent.tool_plain def search() -> str: return 'data' # Already using 2 cache points (instructions + tools) # Can add 2 more CachePoint markers (4 total limit) result = agent.run_sync([ 'Context 1', CachePoint(), # Oldest - will be removed 'Context 2', CachePoint(), # Will be kept (3rd point) 'Context 3', CachePoint(), # Will be kept (4th point) 'Question' ]) # Final cache points: instructions + tools + Context 2 + Context 3 = 4 print(result.output) usage = result.usage() print(f'Cache write tokens: {usage.cache_write_tokens}') print(f'Cache read tokens: {usage.cache_read_tokens}') from pydantic_ai import Agent, CachePoint from pydantic_ai.models.anthropic import AnthropicModelSettings agent = Agent( 'anthropic:claude-sonnet-4-5', system_prompt='Instructions...', model_settings=AnthropicModelSettings( anthropic_cache_instructions=True, # 1 cache point anthropic_cache_tool_definitions=True, # 1 cache point ), ) @agent.tool_plain def search() -> str: return 'data' # Already using 2 cache points (instructions + tools) # Can add 2 more CachePoint markers (4 total limit) result = agent.run_sync([ 'Context 1', CachePoint(), # Oldest - will be removed 'Context 2', CachePoint(), # Will be kept (3rd point) 'Context 3', CachePoint(), # Will be kept (4th point) 'Question' ]) # Final cache points: instructions + tools + Context 2 + Context 3 = 4 print(result.output) usage = result.usage() print(f'Cache write tokens: {usage.cache_write_tokens}') print(f'Cache read tokens: {usage.cache_read_tokens}') Key Points: - System and tool cache points are always preserved - The cache point created by anthropic_cache_messages is always preserved (as it's the newest message cache point) - Additional CachePoint markers in messages are removed from oldest to newest when the limit is exceeded - This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching