Overview
Thresholds turn raw metric values into clear pass/fail outcomes. By setting a threshold on a metric, you create a derived boolean metric that automatically evaluates whether each call meets your defined criteria. For example, you can set a threshold onResponse Time of < 1000ms using the P95 aggregation across agent turns — any call where the 95th percentile agent response time exceeds 1 second will be flagged as a failure.

How Thresholds Work
When you configure a threshold, Roark creates a derived metric behind the scenes:- The source metric is collected as usual (e.g., a satisfaction score of
8) - The threshold compares the value against your condition (e.g.,
>= 7) - A boolean Pass or Fail result is produced automatically
- The result is stored alongside the original metric for reporting
Thresholds are available on these metric output types: Scale, Numeric, Count, Boolean, and Classification.
Configuring a Threshold
Thresholds can be added from several places in Roark:- Policies — Add thresholds when selecting metrics for a policy
- Playground — Test a threshold against a real call before deploying
- Simulation Run Plans — Set pass/fail criteria for simulation testing
Threshold Fields
| Field | Description |
|---|---|
| Operator | The comparison to apply (see operators below) |
| Value | The threshold value to compare against |
| Aggregation Mode | How to handle metrics with multiple values per call (optional) |
| Participant Role | Filter to a specific speaker — Agent, Customer, or All (optional) |
Operators
The available operators depend on the metric’s output type.Numeric Types (Scale, Numeric, Count)
| Operator | Symbol | Example |
|---|---|---|
| Greater than | > | Score > 5 |
| Greater than or equals | >= | Score >= 7 |
| Less than | < | Response time < 3000ms |
| Less than or equals | <= | Response time <= 2000ms |
| Equals | = | Count = 0 |
| Not equals | != | Count != 0 |
Categorical Types (Boolean, Classification)
| Operator | Symbol | Example |
|---|---|---|
| Equals | = | Compliance check = true |
| Not equals | != | Call outcome != “escalated” |
Aggregation Modes
When a metric produces multiple values per call (e.g., a per-segment metric that fires on every turn), the aggregation mode determines how those values are combined before applying the threshold.| Mode | Description | Example |
|---|---|---|
| Each | Compare every value individually — fails if any single value fails | Each response time < 5000ms |
| Average | Average all values, then apply the threshold | Average sentiment >= 6 |
| Min | Use the minimum value | Min confidence >= 0.8 |
| Max | Use the maximum value | Max response time < 10000ms |
| Median | Use the median value | Median score >= 7 |
| Sum | Sum all values, then apply the threshold | Total talk time <= 300000ms |
| P95 | Use the 95th percentile value | P95 response time < 5000ms |
| P99 | Use the 99th percentile value | P99 latency < 8000ms |
| Count | Count how many values fail, then compare against a max allowed failures threshold | No more than 2 segments below threshold |
Participant Role Filtering
For metrics withPER_PARTICIPANT scope, you can narrow the threshold to a specific speaker:
| Role | Description |
|---|---|
| All | Apply to all participant values (default) |
| Agent | Only evaluate the agent’s values |
| Customer | Only evaluate the customer’s values |
sentiment is tracked for both speakers but you only want to set a threshold on the agent’s performance.
Examples
Customer Satisfaction ≥ 7
Customer Satisfaction ≥ 7
Source metric: Customer Satisfaction (Scale 1-10)
Operator: Greater than or equals
Value: 7
Result: Pass if satisfaction is 7 or above, Fail otherwise
Agent Response Time Under 1 Second
Agent Response Time Under 1 Second
Source metric: Response Time (Numeric, per-segment)
Operator: Less than
Value: 1000 (milliseconds)
Aggregation: P95
Participant Role: Agent
Result: Pass if 95th percentile agent response time is under 1 second
Identity Verified Equals True
Identity Verified Equals True
Source metric: Identity Verified (Boolean)
Operator: Equals
Value: true
Result: Pass if the agent verified the caller’s identity
No More Than 2 Failed Segments (Count Mode)
No More Than 2 Failed Segments (Count Mode)
Source metric: Tone Appropriate (Boolean, per-segment)
Operator: Equals
Value: true
Aggregation: Count (max failures: 2)
Result: Pass if no more than 2 segments have an inappropriate tone

