Skip to main content

Data Masking for AI

Before any data is sent to a large language model, all personally identifiable information is masked. Names, email addresses, and other identifiers are replaced with anonymised placeholders. The original data is restored only within our own secure infrastructure after processing, so the AI itself never sees a real name or email.

How the masking pipeline works

1
Detection

Incoming text is scanned for personal data: names, emails, phone numbers, and other identifiers.

2
Masking

Each identifier is replaced with an anonymised placeholder, e.g. names become tokens like [PERSON_1].

3
LLM call

Only the masked text is sent to the language model, which runs in EU-region Azure OpenAI.

4
Restoration

The original identifiers are restored from the mapping inside our own secure infrastructure. The model output is then returned to you.

What this means in practice

🔐
No personal data in LLM processing

The LLM never sees real names, emails, or other identifiers.

🇪🇺
EU-region Azure OpenAI only

AI processing happens inside the EU. Microsoft contractually confirms that submitted data is not used to train OpenAI models.

🧠
Bring Your Own LLM

Optionally configure Sally to use your organization's own language models, so no data ever leaves your infrastructure for AI processing.


See also: Hosting & Subprocessors for where AI processing happens, and the TOMs PDF for the technical control reference.