Skip to content

Run the Evaluation

Once sessions are imported (either simulated or production) and you have added evaluators, click Run Evaluation to apply the selected evaluators to the selected sessions or traces.

The system runs all applicable evaluators, processes the data, and computes scores for each relevant session or trace segment. Evaluation results are displayed in the session grid, with each evaluator contributing one or more dedicated column.

Run evaluation

Note

Evaluations can be run multiple times, for example, after updating agent logic, adding new simulated sessions, or importing additional production data. This allows developers to benchmark performance across different agent versions and testing conditions.