Thresholds

Overview

Thresholds turn raw metric values into clear pass/fail outcomes. By setting a threshold on a metric, you create a derived boolean metric that automatically evaluates whether each call meets your defined criteria. For example, you can set a threshold on Response Time of < 1000ms using the P95 aggregation across agent turns — any call where the 95th percentile agent response time exceeds 1 second will be flagged as a failure.

How Thresholds Work

When you configure a threshold, Roark creates a derived metric behind the scenes:

The source metric is collected as usual (e.g., a satisfaction score of 8)
The threshold compares the value against your condition (e.g., >= 7)
A boolean Pass or Fail result is produced automatically
The result is stored alongside the original metric for reporting

Thresholds are available on these metric output types: Scale, Numeric, Count, Boolean, and Classification.

Configuring a Threshold

Thresholds can be added from several places in Roark:

Policies — Add thresholds when selecting metrics for a policy
Playground — Test a threshold against a real call before deploying
Simulation Run Plans — Set pass/fail criteria for simulation testing

In each case, the configuration options are the same.

Threshold Fields

Field	Description
Operator	The comparison to apply (see operators below)
Value	The threshold value to compare against
Aggregation Mode	How to handle metrics with multiple values per call (optional)
Participant Role	Filter to a specific speaker — Agent, Customer, or All (optional)

Operators

The available operators depend on the metric’s output type.

Numeric Types (Scale, Numeric, Count)

Operator	Symbol	Example
Greater than	`>`	Score `>` 5
Greater than or equals	`>=`	Score `>=` 7
Less than	`<`	Response time `<` 3000ms
Less than or equals	`<=`	Response time `<=` 2000ms
Equals	`=`	Count `=` 0
Not equals	`!=`	Count `!=` 0

Categorical Types (Boolean, Classification)

Operator	Symbol	Example
Equals	`=`	Compliance check `=` true
Not equals	`!=`	Call outcome `!=` “escalated”

Aggregation Modes

When a metric produces multiple values per call (e.g., a per-segment metric that fires on every turn), the aggregation mode determines how those values are combined before applying the threshold.

Mode	Description	Example
Each	Compare every value individually — fails if any single value fails	Each response time `<` 5000ms
Average	Average all values, then apply the threshold	Average sentiment `>=` 6
Min	Use the minimum value	Min confidence `>=` 0.8
Max	Use the maximum value	Max response time `<` 10000ms
Median	Use the median value	Median score `>=` 7
Sum	Sum all values, then apply the threshold	Total talk time `<=` 300000ms
P95	Use the 95th percentile value	P95 response time `<` 5000ms
P99	Use the 99th percentile value	P99 latency `<` 8000ms
Count	Count how many values fail, then compare against a max allowed failures threshold	No more than 2 segments below threshold

For call-level metrics that produce a single value, aggregation is not needed — Each mode is used by default.

Participant Role Filtering

For metrics with PER_PARTICIPANT scope, you can narrow the threshold to a specific speaker:

Role	Description
All	Apply to all participant values (default)
Agent	Only evaluate the agent’s values
Customer	Only evaluate the customer’s values

This is useful when a metric like sentiment is tracked for both speakers but you only want to set a threshold on the agent’s performance.

Examples

Customer Satisfaction ≥ 7

Source metric: Customer Satisfaction (Scale 1-10) Operator: Greater than or equals Value: 7 Result: Pass if satisfaction is 7 or above, Fail otherwise

Agent Response Time Under 1 Second

Source metric: Response Time (Numeric, per-segment) Operator: Less than Value: 1000 (milliseconds) Aggregation: P95 Participant Role: Agent Result: Pass if 95th percentile agent response time is under 1 second

Identity Verified Equals True

Source metric: Identity Verified (Boolean) Operator: Equals Value: true Result: Pass if the agent verified the caller’s identity

No More Than 2 Failed Segments (Count Mode)

Source metric: Tone Appropriate (Boolean, per-segment) Operator: Equals Value: true Aggregation: Count (max failures: 2) Result: Pass if no more than 2 segments have an inappropriate tone

What’s Next

Policies

Add thresholds to automated metric collection policies

Playground

Test thresholds interactively before deploying

Run Plans

Set pass/fail criteria for simulation testing

Metric Definitions

Learn about metric types, scopes, and output formats

Getting Started

Observability

Metrics

Simulations

Integrations

SDKs & Libraries

Resources

Overview

How Thresholds Work

Configuring a Threshold

Threshold Fields

Operators

Numeric Types (Scale, Numeric, Count)

Categorical Types (Boolean, Classification)

Aggregation Modes

Participant Role Filtering

Examples

What’s Next

Policies

Playground

Run Plans

Metric Definitions

Getting Started

Observability

Metrics

Simulations

Integrations

SDKs & Libraries

Resources

​Overview

​How Thresholds Work

​Configuring a Threshold

​Threshold Fields

​Operators

​Numeric Types (Scale, Numeric, Count)

​Categorical Types (Boolean, Classification)

​Aggregation Modes

​Participant Role Filtering

​Examples

​What’s Next

Policies

Playground

Run Plans

Metric Definitions

Overview

How Thresholds Work

Configuring a Threshold

Threshold Fields

Operators

Numeric Types (Scale, Numeric, Count)

Categorical Types (Boolean, Classification)

Aggregation Modes

Participant Role Filtering

Examples

What’s Next