PII Redaction

Roark can detect and mask personally identifiable information (PII) — phone numbers, social security numbers, credit cards, and more — in your call and chat transcripts. When redaction is on, the original PII never leaves the redaction boundary: viewers see masked tokens, evaluator LLMs receive masked text, and search results are sanitized. Redaction is a per-project setting and applies to both calls and chats going forward. Existing conversations keep their original transcripts.

How It Works

Transcript becomes available

Whether the transcript came from your integration, your own upload, or Roark’s transcription pipeline, redaction kicks in once the transcript is persisted.

Roark scans the transcript for PII

A redaction pass identifies the entity types you’ve enabled (phone numbers, account numbers, addresses, names, etc.) and stores them as redaction spans alongside the transcript.

Masking on display and analysis

When you view a conversation, the transcript renders [REDACTED:PHONE]-style pills in place of the sensitive content. Evaluators that use LLMs (custom prompts, sentiment, toxicity, emotion, politeness) receive the masked transcript by default.

Enabling Redaction

Open Settings

Go to your project’s Settings page and find the PII Redaction section.

Toggle redaction on

Flip the master toggle. The entity-type list expands.

Choose entity types

Most entity types are on by default. Names and addresses are off by default — they have higher false-positive rates and customers usually opt in deliberately.

Supported Entity Types

Entity	Default	Notes
Social security numbers	On
Credit card number / CVV / expiration	On	Each is a separate toggle
Bank account numbers	On
Bank routing numbers	On
Phone numbers	On
Email addresses	On
Dates of birth	On
PINs	On
Passwords	On
Names	Off	Higher false-positive rate — opt-in
Addresses	Off	Higher false-positive rate — opt-in

Evaluators and Redaction

Evaluator LLM calls (custom prompts, sentiment, toxicity, emotion, politeness) automatically receive the redacted transcript when redaction is enabled. This keeps PII from leaving your tenant for analysis. If your evaluators rely on the original PII to do their job — and you accept the trade-off — you can override this with the Allow evaluators to see original PII toggle in the same settings panel. We recommend leaving it off unless you have a specific reason.

What’s Stored

The original transcript is preserved unchanged — storage is non-destructive.
Redaction spans (entity type, segment, confidence, source) are stored as metadata so masking is consistent across every read path and reproducible if you change the policy later.
Audio recordings are stored as-is — audio-level redaction (beep/silence at PII timestamps) is on the roadmap.

Limitations Today

Masking is applied at the segment level when any PII is detected within it — so a turn containing one phone number renders as a single redaction pill, not inline word-level redaction. Word-precise inline masking is on the roadmap.
Audio recordings are not yet redacted. If you need redacted audio (for sharing recordings outside your team), reach out — it’s planned and we’re prioritizing based on demand.

Compliance and Boundaries

When redaction is enabled:

The frontend renders masked tokens in transcripts and exports.
The GraphQL API returns masked text by default for every segment-returning query (calls and chats).
LLM evaluators receive masked text unless you explicitly opt out per project.
OpenSearch event indexing is unaffected (event properties are structured, not transcript-derived).

If you have specific compliance requirements (HIPAA BAA, PCI scope, destructive retention of original transcripts), reach out to support — we can scope what’s needed for your environment.

Getting Started

Observability

Metrics

Simulations

Recipes

Integrations

SDKs & Libraries

Resources

How It Works

Enabling Redaction

Supported Entity Types

Evaluators and Redaction

What’s Stored

Limitations Today

Compliance and Boundaries

​How It Works

​Enabling Redaction

​Supported Entity Types

​Evaluators and Redaction

​What’s Stored

​Limitations Today

​Compliance and Boundaries

How It Works

Enabling Redaction

Supported Entity Types

Evaluators and Redaction

What’s Stored

Limitations Today

Compliance and Boundaries