Tool Call Testing - Roark Docs

Overview

Tool call testing lets you verify that your voice AI agent is invoking the correct tools, with the correct parameters, at the right moments during a conversation. Roark provides built-in metrics for this through the Tool Invocation Analysis package. You can run tool call testing in two contexts:

Simulations — Proactively test tool calling behavior across scenarios before deploying changes
Production — Continuously monitor tool calling quality on live customer calls

Both approaches use the same metrics — you configure the metric once, then apply it wherever you need it.

Prerequisites

Before testing tool calls in either context, you need to set up the tool invocation metrics you want to evaluate.

1. Find the Tool Invocation Analysis Package

Navigate to Metrics > Library and look for the Tool Invocation Analysis package. This package contains five built-in metrics:

Metric	Type	What It Measures
Tool Invocation Correct	Yes/No	Whether the agent invoked the correct tools at the appropriate times
Tool Invocation Count	Count	Total number of tool calls made during the conversation
Tool Invocation Order Correct	Yes/No	Whether tools were called in the correct logical sequence
Tool Invocation Parameters Correct	Yes/No	Whether the correct parameters were passed to each tool call
Tool Invocation Result Correct	Yes/No	Whether the agent correctly interpreted and used the results returned by each tool

2. Choose and Configure Your Metrics

Select the metrics relevant to your testing goals. For example, if you want to verify that your agent calls the right tool when a customer asks about the weather:

Click on Tool Invocation Correct in the library
Under Tool Scoping, select the specific tool you want to evaluate (e.g., fetchWeather)
Click Edit on the scoped tool to set the evaluation criteria — define when the tool should be called and what result is expected

You can leave tool scoping empty to evaluate all tools, or scope to specific tools for targeted testing. Use the Additional Instructions field to add cross-tool rules like “Never call any tool before greeting the customer.”

Testing in Simulations

Simulation testing lets you proactively validate tool calling behavior across different scenarios and personas before changes reach production.

Tool call testing in simulations requires enriched simulations — your agent must send its call data to Roark so that tool invocation data is available for analysis. See Enriched Simulations to set this up.

Setup

Create a Run Plan

Follow the run plan guide to set up your simulation. Choose the scenarios that should trigger the tool calls you want to test.

Add Tool Invocation Metrics

In the Metrics section of the run plan, select the tool invocation metrics you configured in the prerequisites.

Enable Enriched Simulations

Make sure your agent sends its call data to Roark via an integration or the API/SDK. This is required for tool call data to be available — without it, Roark only has its simulation agent’s recording and cannot see your agent’s tool calls.

Run the Simulation

Execute the run plan. Once your agent’s call is matched and merged with the simulation call, the tool invocation metrics will be evaluated automatically.

Testing in Production

Production testing lets you continuously monitor tool calling quality on live customer calls using metric policies.

Setup

Create a Metric Policy

Navigate to Metrics > Policies and create a new policy. See the policies guide for details.

Add Tool Invocation Metrics

In the policy, select the tool invocation metrics you configured in the prerequisites.

Set Conditions (Optional)

Use policy conditions to filter which calls should be evaluated. For example, you might only want to evaluate tool calls for a specific agent or call direction.

Activate the Policy

Once active, the policy will automatically evaluate tool calling on every matching call that comes in.

Your calls must include tool invocation data for these metrics to work. Tool calls are automatically captured for Vapi and Retell integrations. If you use the API directly or LiveKit, you must include tool invocations when creating calls.

Next Steps

Tool Invocations

Learn how to submit tool call data with your calls

Enriched Simulations

Enable call matching for simulation tool testing

Metric Policies

Automate metric collection on production calls

Run Plans

Set up simulation test matrices

​Overview

​Prerequisites

​1. Find the Tool Invocation Analysis Package

​2. Choose and Configure Your Metrics

​Testing in Simulations

​Setup

​Testing in Production

​Setup

​Next Steps

Tool Invocations

Enriched Simulations

Metric Policies

Run Plans

Overview

Prerequisites

1. Find the Tool Invocation Analysis Package

2. Choose and Configure Your Metrics

Testing in Simulations

Setup

Testing in Production

Setup

Next Steps