Roark can detect and mask personally identifiable information (PII) — phone numbers, social security numbers, credit cards, and more — in your call and chat transcripts. When redaction is on, the original PII never leaves the redaction boundary: viewers see masked tokens, evaluator LLMs receive masked text, and search results are sanitized. Redaction is a per-project setting and applies to both calls and chats going forward. Existing conversations keep their original transcripts.Documentation Index
Fetch the complete documentation index at: https://docs.roark.ai/llms.txt
Use this file to discover all available pages before exploring further.
How It Works
Transcript becomes available
Whether the transcript came from your integration, your own upload, or Roark’s transcription pipeline, redaction kicks in once the transcript is persisted.
Roark scans the transcript for PII
A redaction pass identifies the entity types you’ve enabled (phone numbers, account numbers, addresses, names, etc.) and stores them as redaction spans alongside the transcript.
Enabling Redaction
Supported Entity Types
| Entity | Default | Notes |
|---|---|---|
| Social security numbers | On | |
| Credit card number / CVV / expiration | On | Each is a separate toggle |
| Bank account numbers | On | |
| Bank routing numbers | On | |
| Phone numbers | On | |
| Email addresses | On | |
| Dates of birth | On | |
| PINs | On | |
| Passwords | On | |
| Names | Off | Higher false-positive rate — opt-in |
| Addresses | Off | Higher false-positive rate — opt-in |
Evaluators and Redaction
Evaluator LLM calls (custom prompts, sentiment, toxicity, emotion, politeness) automatically receive the redacted transcript when redaction is enabled. This keeps PII from leaving your tenant for analysis. If your evaluators rely on the original PII to do their job — and you accept the trade-off — you can override this with the Allow evaluators to see original PII toggle in the same settings panel. We recommend leaving it off unless you have a specific reason.What’s Stored
- The original transcript is preserved unchanged — storage is non-destructive.
- Redaction spans (entity type, segment, confidence, source) are stored as metadata so masking is consistent across every read path and reproducible if you change the policy later.
- Audio recordings are stored as-is — audio-level redaction (beep/silence at PII timestamps) is on the roadmap.
Limitations Today
- Masking is applied at the segment level when any PII is detected within it — so a turn containing one phone number renders as a single redaction pill, not inline word-level redaction. Word-precise inline masking is on the roadmap.
- Audio recordings are not yet redacted. If you need redacted audio (for sharing recordings outside your team), reach out — it’s planned and we’re prioritizing based on demand.
Compliance and Boundaries
When redaction is enabled:- The frontend renders masked tokens in transcripts and exports.
- The GraphQL API returns masked text by default for every segment-returning query (calls and chats).
- LLM evaluators receive masked text unless you explicitly opt out per project.
- OpenSearch event indexing is unaffected (event properties are structured, not transcript-derived).

