Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.roark.ai/llms.txt

Use this file to discover all available pages before exploring further.

Roark can detect and mask personally identifiable information (PII) — phone numbers, social security numbers, credit cards, and more — in your call and chat transcripts. When redaction is on, the original PII never leaves the redaction boundary: viewers see masked tokens, evaluator LLMs receive masked text, and search results are sanitized. Redaction is a per-project setting and applies to both calls and chats going forward. Existing conversations keep their original transcripts.

How It Works

1

Transcript becomes available

Whether the transcript came from your integration, your own upload, or Roark’s transcription pipeline, redaction kicks in once the transcript is persisted.
2

Roark scans the transcript for PII

A redaction pass identifies the entity types you’ve enabled (phone numbers, account numbers, addresses, names, etc.) and stores them as redaction spans alongside the transcript.
3

Masking on display and analysis

When you view a conversation, the transcript renders [REDACTED:PHONE]-style pills in place of the sensitive content. Evaluators that use LLMs (custom prompts, sentiment, toxicity, emotion, politeness) receive the masked transcript by default.

Enabling Redaction

1

Open Settings

Go to your project’s Settings page and find the PII Redaction section.
2

Toggle redaction on

Flip the master toggle. The entity-type list expands.
3

Choose entity types

Most entity types are on by default. Names and addresses are off by default — they have higher false-positive rates and customers usually opt in deliberately.

Supported Entity Types

EntityDefaultNotes
Social security numbersOn
Credit card number / CVV / expirationOnEach is a separate toggle
Bank account numbersOn
Bank routing numbersOn
Phone numbersOn
Email addressesOn
Dates of birthOn
PINsOn
PasswordsOn
NamesOffHigher false-positive rate — opt-in
AddressesOffHigher false-positive rate — opt-in

Evaluators and Redaction

Evaluator LLM calls (custom prompts, sentiment, toxicity, emotion, politeness) automatically receive the redacted transcript when redaction is enabled. This keeps PII from leaving your tenant for analysis. If your evaluators rely on the original PII to do their job — and you accept the trade-off — you can override this with the Allow evaluators to see original PII toggle in the same settings panel. We recommend leaving it off unless you have a specific reason.

What’s Stored

  • The original transcript is preserved unchanged — storage is non-destructive.
  • Redaction spans (entity type, segment, confidence, source) are stored as metadata so masking is consistent across every read path and reproducible if you change the policy later.
  • Audio recordings are stored as-is — audio-level redaction (beep/silence at PII timestamps) is on the roadmap.

Limitations Today

  • Masking is applied at the segment level when any PII is detected within it — so a turn containing one phone number renders as a single redaction pill, not inline word-level redaction. Word-precise inline masking is on the roadmap.
  • Audio recordings are not yet redacted. If you need redacted audio (for sharing recordings outside your team), reach out — it’s planned and we’re prioritizing based on demand.

Compliance and Boundaries

When redaction is enabled:
  • The frontend renders masked tokens in transcripts and exports.
  • The GraphQL API returns masked text by default for every segment-returning query (calls and chats).
  • LLM evaluators receive masked text unless you explicitly opt out per project.
  • OpenSearch event indexing is unaffected (event properties are structured, not transcript-derived).
If you have specific compliance requirements (HIPAA BAA, PCI scope, destructive retention of original transcripts), reach out to support — we can scope what’s needed for your environment.