Skip to main content

Scenarios

Generic vs. Specific Steps

How you write customer step nodes directly affects how closely the simulated customer follows the script versus improvising naturally. First-person (specific) — Writing steps like a transcript tells the simulation agent to stick closely to that exact phrasing:
"Hello, can I make a booking for tomorrow at 2pm?"
This is useful when you need to test a precise utterance — for example, verifying your agent correctly parses a specific date format or handles a particular phrasing. Third-person (generic) — Writing steps as descriptions gives the simulation agent room to adapt:
"Asks to book an appointment tomorrow at 2pm"
With generic steps, the simulation agent relies more on the persona’s backstory to determine tone, word choice, and delivery. This produces more varied and realistic conversations across runs, which is better for testing your agent’s ability to handle natural language variation.
Use specific steps when testing exact phrasing, keyword detection, or slot filling. Use generic steps when testing your agent’s ability to handle a range of natural language inputs for the same intent.

Scenario Structure and Organization

Roark scenarios are graph-based — each scenario is a tree of customer and agent nodes that models the customer’s journey through a conversation. Each unique path from root to leaf in the tree is treated as a distinct scenario path.

Start with a happy path

Begin with a single expected customer path — the “happy path” where everything goes as planned. This is a high-level example — if you’re testing specific flows, you can expand on each step to add more detail:

Add branches for edge cases

Once your happy path works, add branches at points where the conversation can diverge. The graph structure means you only define the shared steps once — branches inherit everything above them: Focus on branching at points where:
  • The agent asks the customer a question (customers can respond in unexpected ways)
  • The agent could go on a tangent or lose track of the conversation
  • Tool calls or lookups might fail or return unexpected results
This approach lets you cover many different paths without duplicating the shared parts of your scenario. If you need to model hundreds or thousands of scenarios that share a common flow — such as an IVR menu — use Scenario Link nodes instead of duplicating branches. A Scenario Link node references another scenario graph and embeds it inline. This is particularly useful for IVR trees, authentication flows, or any shared entry sequence: Keep your shared IVR tree in a single scenario, and reference it from other scenarios via Link nodes. This way, if the IVR menu changes, you update it in one place.

Templating with Variables

Scenarios support template variables using {{variableName}} syntax. Variables are replaced at runtime, making it easy to reuse a single scenario across different contexts.
"Hi, my name is {{patientName}} and I need to reschedule
my {{appointmentType}} appointment"
You can also reference persona properties directly using the reserved {{persona.*}} prefix. These are automatically injected based on the selected persona at runtime:
"Hi, my name is {{persona.name}}"
This is useful when you have a generic scenario that works across many test cases, with only a few key details changing between runs. See the Variables guide for the full lifecycle of defining, pre-setting, and passing variables.

Generating Scenarios with the AI Assistant

Roark’s AI assistant can generate all of the above for you. Our recommended workflow:
  1. Start from production calls — Generate scenarios from real calls to get representative conversation flows. This gives you a realistic baseline that reflects how customers actually interact with your agent.
  2. Extend with branches — Once you have a production-based scenario, add branches to cover paths that didn’t occur in the original call but could happen in production.
  3. Generate across multiple calls — Select several calls and let the assistant identify the unique and custom paths across them, automatically building a branching scenario graph.
Generating scenarios from real calls is the fastest way to get high-quality, representative test coverage. The AI assistant is available in the scenario builder — look for the Generate options when creating a new scenario.

Personas

Personas model the simulated customer throughout the call. A good persona strategy tests your agent across a range of realistic caller profiles.

Diversify Voice and Speech

Include personas that cover:
  • Different accents and languages — Battle-test your transcriber’s accuracy across accents (US, British, Indian, Spanish, etc.) and languages
  • Background noise — Verify your endpointing model works well in non-ideal audio conditions (office noise, etc.)
  • Different response times — Use varied speech paces (slow, normal, fast) to ensure your agent doesn’t interrupt the customer or timeout prematurely

Test Difficult Customer Types

Set up personas that represent challenging interactions your agent needs to handle gracefully:

AI Skeptic

A customer who is suspicious they’re talking to an AI and tests the agent with trick questions

Offensive Caller

A rude or hostile customer — verify your agent always responds politely and professionally

Sensitive Situation

A customer going through a difficult time (bereavement, financial hardship) — ensure your agent is empathetic and considerate

Rapid Switcher

A customer who changes topics frequently and tests your agent’s ability to stay on track

Use Backstories

The backstory field is where personas come to life. It’s a prompt injected into the simulation agent that shapes the customer’s entire behavior. Good backstories provide context that drives realistic, nuanced interactions. Example backstories: James — Bereaved Customer
James recently lost his wife and is calling to cancel her phone line on
their shared plan. He is soft-spoken and may become emotional. He doesn't
fully understand the account details and may need things explained gently.
He is not tech-savvy.
Priya — Skeptical Professional
Priya is a software engineer who immediately suspects she's talking to an
AI. She will ask pointed questions like "Are you a real person?" and "Can
you transfer me to a human?" She values efficiency and will become
frustrated if the agent can't directly answer her questions about a billing
discrepancy on her enterprise account.
Carlos — Impatient Multitasker
Carlos is calling from a noisy construction site during his break. He has
5 minutes before he needs to get back to work. He'll give short answers,
may not hear things clearly the first time, and will ask the agent to
repeat or speak up. He needs to reschedule a delivery that requires a
signature.

Run Plan Configuration

A run plan creates a matrix of simulations — one for each combination of agent endpoint, scenario, and persona. How you configure that matrix depends on what you’re testing.

Common Patterns

Battle-test your agent against thousands of concurrent calls.
  • Pick any persona and a scenario matching your desired call duration
  • Set iterations to your target volume (e.g., 1000)
  • Set concurrency to the same value so all simulations hit the agent simultaneously
This generates 1000 simulations that call your agent at the same time, revealing how it performs under peak load.
Test how your agent handles different voices, accents, and personalities on the same flow.
  • Select a single scenario (typically your happy path)
  • Choose multiple personas with a wide variety of accents, languages, speech paces, and personality types
This isolates persona-driven variation from scenario complexity, making it easy to spot where specific caller profiles cause problems.
Agents are non-deterministic — it’s critical to verify they don’t go off-script or hit loopholes.
  • Create scenarios with multiple branches covering edge cases and recovery paths
  • Focus on points where the agent might go on a tangent or fail to recover
This validates that your agent follows its instructions consistently, even when the conversation takes unexpected turns.
Test your agent’s resilience against adversarial inputs.
  • Set up scenarios covering prompt injection attempts, PII extraction, and social engineering
  • Use personas with adversarial backstories
This helps identify security vulnerabilities before they reach production.

Keeping Simulations Under Control

SettingRecommendation
Max simulation durationSet to ~110% of your average call duration. Prevents runaway calls if the agent goes on a tangent.
Silence timeoutEnsures calls hang up after a set period of silence, catching cases where either side stops responding.
End phrasesSpecific phrases that indicate a call has gone off track. When matched, the simulation ends immediately.
End reasonsWhen you don’t have specific phrases, define end reasons instead. An LLM evaluates each turn and ends the call if the reason is met.
ConcurrencyFor non-load-testing runs, keep concurrency at 5 or below to avoid unnecessary load on your agent and manage costs. Reserve high concurrency for dedicated load tests.

Next Steps

Scenarios

Build and manage scenario graphs

Personas

Create diverse customer profiles

Variables

Templatize scenarios with dynamic values

Run Plans

Configure and execute simulation matrices