Viewing Evaluation Results¶
After the evaluation run is complete, the session grid is updated with evaluator-specific columns, displaying average scores for each session or trace. Clicking on a score allows you to drill down into detailed results—for example, selecting a Trajectory Evaluation score reveals which paths the agent followed and which it missed.
This helps you understand how agents behave, making it easier to debug issues, check quality, and improve performance based on real usage.