Skip to content

Import Data for Evaluation

To evaluate how your AI agents perform in real-world conditions, start by importing data into Evaluation Studio. You can choose between:

  • Production data – Sessions generated by real users in your deployed Agentic apps, ensuring evaluation is based on actual user interactions. Production data is sourced from Agentic app sessions and traces.
  • Simulated data – Sessions generated from your simulation runs, allowing you to test agent performance in controlled, repeatable scenarios.

Import Production Data

Pre-requisites

Make sure you have already interacted with the app and generated conversation sessions in your production environment.

Steps to import production data:

  1. Open your evaluation from Evaluation Studio → Agentic Evaluation → Evaluations. evaluation list

  2. Select the evaluation where you want to import data, and then select Import sessions to import the relevant session. import data

  3. In the Import Session dialog, enter the following and click Import to begin the import:

    1. Version: Select the version of the Agentic app to evaluate.
    2. Environment: Choose the environment that contains the session data you want to import.
    3. Date: Select the time range to filter sessions by a specific release or timeframe.

    Import sessions

After the import is complete, the session data appears in the Sessions and Traces tabs on the Evaluation page.

Import Simulated Data

Pre-requisites

To import simulated data, at least one simulation run with the relevant personas and test scenarios should exist.

Steps to import simulated data:

  1. Open your evaluation from Evaluation Studio → Agentic Evaluation → Evaluations. evaluation list

  2. Select the evaluation where you want to import data, and then select Import simulated data to import the simulated data. import data

  3. Choose the Simulation from which to import sessions and click Import. Import sessions dialog

After the import is complete, the simulated sessions appears in the Sessions and Traces tabs, just like production data.

Understanding the Imported Data

Imported data is organized into two tabs: Sessions and Traces. These tabs reflect that users can apply evaluators at two levels - session and trace - making it easy to add, run, and review evaluations accordingly.

  • Sessions: Displays the list of sessions with details like session ID, number of traces, creation time, and duration. Users can add session-level evaluators here to measure overall outcomes. Learn more about Session evaluators.
  • Traces: Breaks sessions into individual traces, each representing one pair of user input and agentic app response from the whole session. Users can add trace-level evaluators to check specific actions, like whether the correct agent or tool was used. Learn more about Trace evaluators.

Imported sessions