Data Masking for AI
Before any data is sent to a large language model, all personally identifiable information is masked. Names, email addresses, and other identifiers are replaced with anonymised placeholders. The original data is restored only within our own secure infrastructure after processing, so the AI itself never sees a real name or email.
How the masking pipeline works
Incoming text is scanned for personal data: names, emails, phone numbers, and other identifiers.
Each identifier is replaced with an anonymised placeholder, e.g. names become tokens like [PERSON_1].
Only the masked text is sent to the language model, which runs in EU-region Azure OpenAI.
The original identifiers are restored from the mapping inside our own secure infrastructure. The model output is then returned to you.
What this means in practice
The LLM never sees real names, emails, or other identifiers.
AI processing happens inside the EU. Microsoft contractually confirms that submitted data is not used to train OpenAI models.
Optionally configure Sally to use your organization's own language models, so no data ever leaves your infrastructure for AI processing.
See also: Hosting & Subprocessors for where AI processing happens, and the TOMs PDF for the technical control reference.