`pydantic_ai.settings`

ModelSettings

Bases: TypedDict

Settings to configure an LLM.

Here we include only settings which apply to multiple models / model providers, though not all of these settings are supported by all models.

Source code in pydantic_ai_slim/pydantic_ai/settings.py

class ModelSettings(TypedDict, total=False):  """Settings to configure an LLM.  Here we include only settings which apply to multiple models / model providers,  though not all of these settings are supported by all models.  """ max_tokens: int  """The maximum number of tokens to generate before stopping.  Supported by:  * Gemini  * Anthropic  * OpenAI  * Groq  * Cohere  * Mistral  * Bedrock  * MCP Sampling  * Outlines (all providers)  """ temperature: float  """Amount of randomness injected into the response.  Use `temperature` closer to `0.0` for analytical / multiple choice, and closer to a model's  maximum `temperature` for creative and generative tasks.  Note that even with `temperature` of `0.0`, the results will not be fully deterministic.  Supported by:  * Gemini  * Anthropic  * OpenAI  * Groq  * Cohere  * Mistral  * Bedrock  * Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)  """ top_p: float  """An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.  So 0.1 means only the tokens comprising the top 10% probability mass are considered.  You should either alter `temperature` or `top_p`, but not both.  Supported by:  * Gemini  * Anthropic  * OpenAI  * Groq  * Cohere  * Mistral  * Bedrock  * Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)  """ timeout: float | Timeout  """Override the client-level default timeout for a request, in seconds.  Supported by:  * Gemini  * Anthropic  * OpenAI  * Groq  * Mistral  """ parallel_tool_calls: bool  """Whether to allow parallel tool calls.  Supported by:  * OpenAI (some models, not o1)  * Groq  * Anthropic  """ seed: int  """The random seed to use for the model, theoretically allowing for deterministic results.  Supported by:  * OpenAI  * Groq  * Cohere  * Mistral  * Gemini  * Outlines (LlamaCpp, VLLMOffline)  """ presence_penalty: float  """Penalize new tokens based on whether they have appeared in the text so far.  Supported by:  * OpenAI  * Groq  * Cohere  * Gemini  * Mistral  * Outlines (LlamaCpp, SgLang, VLLMOffline)  """ frequency_penalty: float  """Penalize new tokens based on their existing frequency in the text so far.  Supported by:  * OpenAI  * Groq  * Cohere  * Gemini  * Mistral  * Outlines (LlamaCpp, SgLang, VLLMOffline)  """ logit_bias: dict[str, int]  """Modify the likelihood of specified tokens appearing in the completion.  Supported by:  * OpenAI  * Groq  * Outlines (Transformers, LlamaCpp, VLLMOffline)  """ stop_sequences: list[str]  """Sequences that will cause the model to stop generating.  Supported by:  * OpenAI  * Anthropic  * Bedrock  * Mistral  * Groq  * Cohere  * Google  """ extra_headers: dict[str, str]  """Extra headers to send to the model.  Supported by:  * OpenAI  * Anthropic  * Groq  """ extra_body: object  """Extra body to send to the model.  Supported by:  * OpenAI  * Anthropic  * Groq  * Outlines (all providers)  """ 

max_tokens `instance-attribute`

max_tokens: int

The maximum number of tokens to generate before stopping.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral
Bedrock
MCP Sampling
Outlines (all providers)

temperature `instance-attribute`

temperature: float

Amount of randomness injected into the response.

Use temperature closer to 0.0 for analytical / multiple choice, and closer to a model's maximum temperature for creative and generative tasks.

Note that even with temperature of 0.0, the results will not be fully deterministic.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral
Bedrock
Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)

top_p `instance-attribute`

top_p: float

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

So 0.1 means only the tokens comprising the top 10% probability mass are considered.

You should either alter temperature or top_p, but not both.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral
Bedrock
Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)

timeout `instance-attribute`

timeout: float | Timeout

Override the client-level default timeout for a request, in seconds.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Mistral

parallel_tool_calls `instance-attribute`

parallel_tool_calls: bool

Whether to allow parallel tool calls.

Supported by:

OpenAI (some models, not o1)
Groq
Anthropic

seed `instance-attribute`

seed: int

The random seed to use for the model, theoretically allowing for deterministic results.

Supported by:

OpenAI
Groq
Cohere
Mistral
Gemini
Outlines (LlamaCpp, VLLMOffline)

presence_penalty `instance-attribute`

presence_penalty: float

Penalize new tokens based on whether they have appeared in the text so far.

Supported by:

OpenAI
Groq
Cohere
Gemini
Mistral
Outlines (LlamaCpp, SgLang, VLLMOffline)

frequency_penalty `instance-attribute`

frequency_penalty: float

Penalize new tokens based on their existing frequency in the text so far.

Supported by:

OpenAI
Groq
Cohere
Gemini
Mistral
Outlines (LlamaCpp, SgLang, VLLMOffline)

logit_bias `instance-attribute`

logit_bias: dict[str, int]

Modify the likelihood of specified tokens appearing in the completion.

Supported by:

OpenAI
Groq
Outlines (Transformers, LlamaCpp, VLLMOffline)

stop_sequences `instance-attribute`

stop_sequences: list[str]

Sequences that will cause the model to stop generating.

Supported by:

OpenAI
Anthropic
Bedrock
Mistral
Groq
Cohere
Google

extra_headers `instance-attribute`

extra_headers: dict[str, str]

Extra headers to send to the model.

Supported by:

OpenAI
Anthropic
Groq

extra_body `instance-attribute`

extra_body: object

Extra body to send to the model.

Supported by:

OpenAI
Anthropic
Groq
Outlines (all providers)

pydantic_ai.settings

ModelSettings

max_tokens instance-attribute

temperature instance-attribute

top_p instance-attribute

timeout instance-attribute

parallel_tool_calls instance-attribute

seed instance-attribute

presence_penalty instance-attribute

frequency_penalty instance-attribute

logit_bias instance-attribute

stop_sequences instance-attribute

extra_headers instance-attribute

extra_body instance-attribute

`pydantic_ai.settings`

max_tokens `instance-attribute`

temperature `instance-attribute`

top_p `instance-attribute`

timeout `instance-attribute`

parallel_tool_calls `instance-attribute`

seed `instance-attribute`

presence_penalty `instance-attribute`

frequency_penalty `instance-attribute`

logit_bias `instance-attribute`

stop_sequences `instance-attribute`

extra_headers `instance-attribute`

extra_body `instance-attribute`