Function●Since v0.1

evaluate_comparative

Evaluate existing experiment runs against each other.

This lets you use pairwise preference scoring to generate more reliable feedback in your experiments.

evaluate_comparative( experiments: tuple[EXPERIMENT_T, EXPERIMENT_T], , evaluators: Sequence[COMPARATIVE_EVALUATOR_T], experiment_prefix: Optional[str] = None, description: Optional[str] = None, max_concurrency: int = 5, client: Optional[langsmith.Client] = None, metadata: Optional[dict] = None, load_nested: bool = False, randomize_order: bool = False ) -> ComparativeExperimentResults

Parameters

Name	Type	Description
`experiments`*	`Tuple[Union[str, uuid.UUID], Union[str, uuid.UUID]]`	The identifiers of the experiments to compare.
`evaluators`*	`Sequence[COMPARATIVE_EVALUATOR_T]`	A list of evaluators to run on each example.
`experiment_prefix`	`Optional[str]`	Default:`None` A prefix to provide for your experiment name.
`description`	`Optional[str]`	Default:`None` A free-form text description for the experiment.
`max_concurrency`	`int`	Default:`5` The maximum number of concurrent evaluations to run.
`client`	`Optional[langsmith.Client]`	Default:`None` The LangSmith client to use.
`metadata`	`Optional[dict]`	Default:`None` Metadata to attach to the experiment.
`load_nested`	`bool`	Default:`False` Whether to load all child runs for the experiment. Default is to only load the top-level root runs.
`randomize_order`	`bool`	Default:`False` Whether to randomize the order of the outputs for each evaluation.

View source on GitHub

LangChain Assistant

Menu

evaluate_comparative

Parameters