Evaluators

Overview

Evaluators are about setting thresholds to metrics and determining pass/fail criteria for your voice AI calls. They transform raw metrics into actionable insights by evaluating single or groups of metrics against your defined standards.

How Evaluators Work

A single evaluator can be composed of multiple tests, which we call blocks. Each block can use either:

Deterministic logic - Exact calculations and thresholds
LLM as judge - AI-powered evaluation for complex criteria

The evaluator combines results from all blocks to produce a final pass/fail determination for each call.

9 Block Types Available

Prompt Based

LLM evaluation using custom prompts

Data Field Checks

Verify specific data fields and values

Tool Call

Validate function calls and parameters

Speech Sentiment

Analyze emotional tone of speech

Emotion Detection

Identify specific emotions in conversation

Vocal Cues

Detect sighs, pauses, raised voice

Politeness

Measure courtesy and professionalism

Latency

Check response time thresholds

Toxicity

Flag inappropriate language or behavior

Three Ways to Create Evaluators

1. Build from Scratch

Start with a blank evaluator and add blocks based on your specific needs:

Name Your Evaluator

Give it a descriptive name and purpose

Add Blocks

Choose from the 9 block types and configure each one

Set Pass/Fail Logic

Define how blocks combine (AND/OR logic)

Configure Thresholds

Set specific values that determine success

2. Use Templates

Select from our hand-crafted templates designed for common use cases:

Customer Service

Pre-built blocks for support quality

Sales Calls

Conversion and objection handling checks

Appointment Booking

Verification of scheduling success

Compliance

Regulatory and script adherence

3. Generate from Agent Prompt

Automatically create evaluators based on your agent’s purpose:

Select your agent
Review the agent’s prompt and objectives
Let Roark generate relevant evaluation blocks
Customize the generated evaluator as needed

Combining Multiple Blocks

Evaluators become powerful when you combine multiple blocks:

AND Logic
OR Logic
Weighted Scoring

All blocks must pass for the evaluator to pass:

Block 1: Politeness > 80% ✓
Block 2: Latency < 2s ✓
Block 3: Task Completed ✓
Result: PASS

Any block passing means the evaluator passes:

Block 1: Appointment Booked ✓
Block 2: Callback Scheduled ✗
Block 3: Email Captured ✗
Result: PASS

Blocks contribute to a total score:

Block 1: Sentiment (30%) = 27/30
Block 2: Completion (50%) = 45/50
Block 3: Efficiency (20%) = 15/20
Total: 87/100 (PASS threshold: 75)

Use Cases

Quality Assurance

Ensure agents maintain professional standards
Verify script compliance
Check for complete information gathering

Performance Monitoring

Track task completion rates
Monitor response times
Measure customer satisfaction

Compliance & Risk

Validate regulatory requirements
Check for PII handling
Monitor for inappropriate content

Training & Improvement

Identify coaching opportunities
Compare agent performance
Track improvement over time

Integration with Metrics

Evaluators work seamlessly with your Metrics:

Metrics collect the data
Evaluators apply the thresholds
Reports show the results

This creates a complete quality assurance loop for your voice AI system.

Getting Started

Observability

Simulations & Testing

Integrations

SDKs & Libraries

Resources

Overview

How Evaluators Work

9 Block Types Available

Prompt Based

Data Field Checks

Tool Call

Speech Sentiment

Emotion Detection

Vocal Cues

Politeness

Latency

Toxicity

Three Ways to Create Evaluators

1. Build from Scratch

2. Use Templates

Customer Service

Sales Calls

Appointment Booking

Compliance

3. Generate from Agent Prompt

Combining Multiple Blocks

Use Cases

Quality Assurance

Performance Monitoring

Compliance & Risk

Training & Improvement

Integration with Metrics

Getting Started

Observability

Simulations & Testing

Integrations

SDKs & Libraries

Resources

​Overview

​How Evaluators Work

​9 Block Types Available

Prompt Based

Data Field Checks

Tool Call

Speech Sentiment

Emotion Detection

Vocal Cues

Politeness

Latency

Toxicity

​Three Ways to Create Evaluators

​1. Build from Scratch

​2. Use Templates

Customer Service

Sales Calls

Appointment Booking

Compliance

​3. Generate from Agent Prompt

​Combining Multiple Blocks

​Use Cases

​Quality Assurance

​Performance Monitoring

​Compliance & Risk

​Training & Improvement

​Integration with Metrics

Overview

How Evaluators Work

9 Block Types Available

Three Ways to Create Evaluators

1. Build from Scratch

2. Use Templates

3. Generate from Agent Prompt

Combining Multiple Blocks

Use Cases

Quality Assurance

Performance Monitoring

Compliance & Risk

Training & Improvement

Integration with Metrics