Testing the Unpredictable: Agent Trajectories
Agents do not follow a fixed execution path. When given the same prompt twice, they may call different tools in a different order. Standard unit tests looking for exact matches fail. Instead, we perform Trajectory Evaluation, analyzing the sequence of steps, tool selections, and final outputs.
Trajectory Score Metric
Evaluators check for three key criteria: correctness of final output, efficiency (number of loops run), and accuracy of tool calls (avoiding redundant tool activations).