Overview

Evaluators are about setting thresholds to metrics and determining pass/fail criteria for your voice AI calls. They transform raw metrics into actionable insights by evaluating single or groups of metrics against your defined standards. Evaluator Block Types

How Evaluators Work

A single evaluator can be composed of multiple tests, which we call blocks. Each block can use either:
  • Deterministic logic - Exact calculations and thresholds
  • LLM as judge - AI-powered evaluation for complex criteria
The evaluator combines results from all blocks to produce a final pass/fail determination for each call.

9 Block Types Available

Prompt Based

LLM evaluation using custom prompts

Data Field Checks

Verify specific data fields and values

Tool Call

Validate function calls and parameters

Speech Sentiment

Analyze emotional tone of speech

Emotion Detection

Identify specific emotions in conversation

Vocal Cues

Detect sighs, pauses, raised voice

Politeness

Measure courtesy and professionalism

Latency

Check response time thresholds

Toxicity

Flag inappropriate language or behavior

Three Ways to Create Evaluators

1. Build from Scratch

Start with a blank evaluator and add blocks based on your specific needs:
1

Name Your Evaluator

Give it a descriptive name and purpose
2

Add Blocks

Choose from the 9 block types and configure each one
3

Set Pass/Fail Logic

Define how blocks combine (AND/OR logic)
4

Configure Thresholds

Set specific values that determine success

2. Use Templates

Select from our hand-crafted templates designed for common use cases:

Customer Service

Pre-built blocks for support quality

Sales Calls

Conversion and objection handling checks

Appointment Booking

Verification of scheduling success

Compliance

Regulatory and script adherence

3. Generate from Agent Prompt

Automatically create evaluators based on your agent’s purpose:
  1. Select your agent
  2. Review the agent’s prompt and objectives
  3. Let Roark generate relevant evaluation blocks
  4. Customize the generated evaluator as needed

Combining Multiple Blocks

Evaluators become powerful when you combine multiple blocks:
All blocks must pass for the evaluator to pass:
Block 1: Politeness > 80% ✓
Block 2: Latency < 2s ✓
Block 3: Task Completed ✓
Result: PASS

Use Cases

Quality Assurance

  • Ensure agents maintain professional standards
  • Verify script compliance
  • Check for complete information gathering

Performance Monitoring

  • Track task completion rates
  • Monitor response times
  • Measure customer satisfaction

Compliance & Risk

  • Validate regulatory requirements
  • Check for PII handling
  • Monitor for inappropriate content

Training & Improvement

  • Identify coaching opportunities
  • Compare agent performance
  • Track improvement over time

Integration with Metrics

Evaluators work seamlessly with your Metrics:
  • Metrics collect the data
  • Evaluators apply the thresholds
  • Reports show the results
This creates a complete quality assurance loop for your voice AI system.