Skip to main content

Overview

Tool call testing lets you verify that your voice AI agent is invoking the correct tools, with the correct parameters, at the right moments during a conversation. Roark provides built-in metrics for this through the Tool Invocation Analysis package. You can run tool call testing in two contexts:
  • Simulations — Proactively test tool calling behavior across scenarios before deploying changes
  • Production — Continuously monitor tool calling quality on live customer calls
Both approaches use the same metrics — you configure the metric once, then apply it wherever you need it.

Prerequisites

Before testing tool calls in either context, you need to set up the tool invocation metrics you want to evaluate.

1. Find the Tool Invocation Analysis Package

Navigate to Metrics > Library and look for the Tool Invocation Analysis package. This package contains five built-in metrics:
MetricTypeWhat It Measures
Tool Invocation CorrectYes/NoWhether the agent invoked the correct tools at the appropriate times
Tool Invocation CountCountTotal number of tool calls made during the conversation
Tool Invocation Order CorrectYes/NoWhether tools were called in the correct logical sequence
Tool Invocation Parameters CorrectYes/NoWhether the correct parameters were passed to each tool call
Tool Invocation Result CorrectYes/NoWhether the agent correctly interpreted and used the results returned by each tool

2. Choose and Configure Your Metrics

Select the metrics relevant to your testing goals. For example, if you want to verify that your agent calls the right tool when a customer asks about the weather:
  1. Click on Tool Invocation Correct in the library
  2. Under Tool Scoping, select the specific tool you want to evaluate (e.g., fetchWeather)
  3. Click Edit on the scoped tool to set the evaluation criteria — define when the tool should be called and what result is expected
Tool Scoping Configuration
You can leave tool scoping empty to evaluate all tools, or scope to specific tools for targeted testing. Use the Additional Instructions field to add cross-tool rules like “Never call any tool before greeting the customer.”

Testing in Simulations

Simulation testing lets you proactively validate tool calling behavior across different scenarios and personas before changes reach production.
Tool call testing in simulations requires enriched simulations — your agent must send its call data to Roark so that tool invocation data is available for analysis. See Enriched Simulations to set this up.

Setup

1

Create a Run Plan

Follow the run plan guide to set up your simulation. Choose the scenarios that should trigger the tool calls you want to test.
2

Add Tool Invocation Metrics

In the Metrics section of the run plan, select the tool invocation metrics you configured in the prerequisites.
3

Enable Enriched Simulations

Make sure your agent sends its call data to Roark via an integration or the API/SDK. This is required for tool call data to be available — without it, Roark only has its simulation agent’s recording and cannot see your agent’s tool calls.
4

Run the Simulation

Execute the run plan. Once your agent’s call is matched and merged with the simulation call, the tool invocation metrics will be evaluated automatically.

Testing in Production

Production testing lets you continuously monitor tool calling quality on live customer calls using metric policies.

Setup

1

Create a Metric Policy

Navigate to Metrics > Policies and create a new policy. See the policies guide for details.
2

Add Tool Invocation Metrics

In the policy, select the tool invocation metrics you configured in the prerequisites.
3

Set Conditions (Optional)

Use policy conditions to filter which calls should be evaluated. For example, you might only want to evaluate tool calls for a specific agent or call direction.
4

Activate the Policy

Once active, the policy will automatically evaluate tool calling on every matching call that comes in.
Your calls must include tool invocation data for these metrics to work. Tool calls are automatically captured for Vapi and Retell integrations. If you use the API directly or LiveKit, you must include tool invocations when creating calls.

Next Steps