Skip to main content

Overview

The Playground is an interactive environment in the Roark dashboard where you can test custom metrics against real calls before adding them to policies or running them at scale. Iterate on metric prompts quickly without affecting production data. Playground Overview

What You Can Do

  • Test custom metrics — Write an LLM prompt and see how it evaluates against a real call
  • Preview metric output — See the exact value, confidence score, and reasoning before deploying
  • Iterate quickly — Adjust prompts and re-run instantly without creating metric definitions
  • Validate before deploying — Ensure your metrics produce the expected results on representative calls

Getting Started

1

Open the Playground

Navigate to Playground in your Roark dashboard sidebar.
2

Select a Call

Choose an existing call from your project to test against. Pick a call that represents the type of conversation your metric will evaluate.
3

Configure Your Metric

Set the output type (boolean, scale, classification, etc.) and write your LLM evaluation prompt. The prompt should clearly describe what the metric should measure.
4

Run the Test

Click Run to evaluate the metric against the selected call. Review the output value, confidence score, and reasoning.Playground Metric Result
5

Iterate and Deploy

Adjust your prompt or configuration and re-run until the metric produces the results you expect. Once satisfied, you can:

Testing Thresholds

The Playground also lets you test thresholds on your metrics. After running a metric, you can configure a pass/fail condition (e.g., >= 7) and instantly see whether the call would pass or fail — without creating a metric definition first. This is a quick way to validate that your threshold logic produces the expected results before adding it to a policy or run plan.

Thresholds Guide

Learn about operators, aggregation modes, and participant role filtering

Tips

Test against calls that reflect the variety of conversations your metric will encounter in production.
Clear, specific prompts produce more consistent results. Include examples of what constitutes a positive or negative result.
Try your metric against calls where the answer is ambiguous to see how it handles uncertainty.

What’s Next