LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • Client
  • AsyncClient
  • Run Helpers
  • Run Trees
  • Evaluation
  • Schemas
  • Utilities
  • Wrappers
  • Anonymizer
  • Testing
  • Expect API
  • Middleware
  • Pytest Plugin
  • Deployment SDK
⌘I

LangChain Assistant

Ask a question to get started

Enter to send•Shift+Enter new line

Menu

OverviewClientAsyncClientRun HelpersRun TreesEvaluationSchemasUtilitiesWrappersAnonymizerTestingExpect APIMiddlewarePytest PluginDeployment SDK
Language
Theme
Pythonlangsmithevaluation_runnerevaluate_comparative
Function●Since v0.1

evaluate_comparative

Evaluate existing experiment runs against each other.

This lets you use pairwise preference scoring to generate more reliable feedback in your experiments.

Copy
evaluate_comparative( experiments: tuple[EXPERIMENT_T, EXPERIMENT_T], , evaluators: Sequence[COMPARATIVE_EVALUATOR_T], experiment_prefix: Optional[str] = None, description: Optional[str] = None, max_concurrency: int = 5, client: Optional[langsmith.Client] = None, metadata: Optional[dict] = None, load_nested: bool = False, randomize_order: bool = False ) -> ComparativeExperimentResults

Parameters

NameTypeDescription
experiments*Tuple[Union[str, uuid.UUID], Union[str, uuid.UUID]]

The identifiers of the experiments to compare.

evaluators*Sequence[COMPARATIVE_EVALUATOR_T]

A list of evaluators to run on each example.

experiment_prefixOptional[str]
Default:None

A prefix to provide for your experiment name.

descriptionOptional[str]
Default:None

A free-form text description for the experiment.

max_concurrencyint
Default:5

The maximum number of concurrent evaluations to run.

clientOptional[langsmith.Client]
Default:None

The LangSmith client to use.

metadataOptional[dict]
Default:None

Metadata to attach to the experiment.

load_nestedbool
Default:False

Whether to load all child runs for the experiment.

Default is to only load the top-level root runs.

randomize_orderbool
Default:False

Whether to randomize the order of the outputs for each evaluation.

View source on GitHub