Overview
Tool call testing lets you verify that your voice AI agent is invoking the correct tools, with the correct parameters, at the right moments during a conversation. Roark provides built-in metrics for this through the Tool Invocation Analysis package. You can run tool call testing in two contexts:- Simulations — Proactively test tool calling behavior across scenarios before deploying changes
- Production — Continuously monitor tool calling quality on live customer calls
Prerequisites
Before testing tool calls in either context, you need to set up the tool invocation metrics you want to evaluate.1. Find the Tool Invocation Analysis Package
Navigate to Metrics > Library and look for the Tool Invocation Analysis package. This package contains five built-in metrics:| Metric | Type | What It Measures |
|---|---|---|
| Tool Invocation Correct | Yes/No | Whether the agent invoked the correct tools at the appropriate times |
| Tool Invocation Count | Count | Total number of tool calls made during the conversation |
| Tool Invocation Order Correct | Yes/No | Whether tools were called in the correct logical sequence |
| Tool Invocation Parameters Correct | Yes/No | Whether the correct parameters were passed to each tool call |
| Tool Invocation Result Correct | Yes/No | Whether the agent correctly interpreted and used the results returned by each tool |
2. Choose and Configure Your Metrics
Select the metrics relevant to your testing goals. For example, if you want to verify that your agent calls the right tool when a customer asks about the weather:- Click on Tool Invocation Correct in the library
- Under Tool Scoping, select the specific tool you want to evaluate (e.g.,
fetchWeather) - Click Edit on the scoped tool to set the evaluation criteria — define when the tool should be called and what result is expected

Testing in Simulations
Simulation testing lets you proactively validate tool calling behavior across different scenarios and personas before changes reach production.Tool call testing in simulations requires enriched simulations — your agent must send its call data to Roark so that tool invocation data is available for analysis. See Enriched Simulations to set this up.
Setup
Create a Run Plan
Follow the run plan guide to set up your simulation. Choose the scenarios that should trigger the tool calls you want to test.
Add Tool Invocation Metrics
In the Metrics section of the run plan, select the tool invocation metrics you configured in the prerequisites.
Enable Enriched Simulations
Make sure your agent sends its call data to Roark via an integration or the API/SDK. This is required for tool call data to be available — without it, Roark only has its simulation agent’s recording and cannot see your agent’s tool calls.
Testing in Production
Production testing lets you continuously monitor tool calling quality on live customer calls using metric policies.Setup
Create a Metric Policy
Navigate to Metrics > Policies and create a new policy. See the policies guide for details.
Add Tool Invocation Metrics
In the policy, select the tool invocation metrics you configured in the prerequisites.
Set Conditions (Optional)
Use policy conditions to filter which calls should be evaluated. For example, you might only want to evaluate tool calls for a specific agent or call direction.
Your calls must include tool invocation data for these metrics to work. Tool calls are automatically captured for Vapi and Retell integrations. If you use the API directly or LiveKit, you must include tool invocations when creating calls.

