We Use Cookies

    We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept All", you consent to our use of cookies. You can customize your preferences or reject non-essential cookies.

    Learn more about our cookie policy
    Privacy & Security

    PII & PHI Redaction: Protect Sensitive Data in Chat and PDF Documents

    AI·Collab automatically removes names, email addresses, phone numbers, IBANs, and medical identifiers — before your data ever reaches an AI model.

    Basics
    5 min read
    New Feature

    Your Data. Your Rules.

    Every day, professionals share sensitive information with AI tools without thinking twice — employee names in a support ticket, patient data in a medical summary, client IBANs in a financial report. AI·Collab PII Guard stops that. It automatically detects and redacts Personally Identifiable Information (PII) and Protected Health Information (PHI) before your content reaches any AI model — for both live chat and uploaded PDF documents. Powered by Microsoft Presidio on European infrastructure, with Zero Data Retention.

    Watch: PII Redaction in Chat & PDF

    See PII Guard in action: chat with and without the toggle enabled, then upload a PDF with sensitive data and watch the redaction in real time.

    Watch on YouTube

    What is PII Guard?

    PII Guard is a privacy layer built directly into AI·Collab. It sits between you and the AI model — intercepting your input, running entity recognition on the text, and replacing sensitive values with neutral placeholders before the request is forwarded. The AI model never sees the raw data. Redacted values are never stored. You get the same quality of AI response — without the privacy risk.

    Why It Matters

    Built for teams where privacy is non-negotiable.

    Chat PII Guard

    Enable a per-model toggle in Account Settings. Every message you send is redacted by Presidio before it reaches the AI.

    PDF PII Redaction

    Upload a PDF to your knowledge base with PII filtering enabled. Marker (self-hosted OCR) extracts the text on-premise, Presidio redacts PII — only clean text enters the knowledge base. No raw document content reaches any external service.

    Named Entity Recognition

    Presidio uses spaCy NLP models to detect names, locations, organisations, and more — not just regex patterns.

    Transparent Billing

    PII-protected requests are marked separately in your usage dashboard. Chat: +30% credit uplift. PDF: +2 credits per page.

    GDPR & ZDRP Compliant

    Presidio runs on our European servers. PII analysis never leaves the EU. Covered by our Zero Data Retention Policy.

    Per-Model Control

    Enable PII Guard only for the models where you need it. No blanket settings — granular, per-model control per user.

    Layer 1: Chat PII Guard

    Redact sensitive data from every prompt before it reaches the model.

    Enable the PII Guard toggle in your Account Settings for any model in your library. Once active, every message you send is processed by Presidio before being forwarded to the AI. Names, email addresses, phone numbers, IBANs, and other identifiers are replaced with [REDACTED] placeholders. The model responds to the cleaned prompt — and you get a full, useful answer without the privacy risk.

    How Chat PII Guard works:

    1You type a message
    2Presidio detects PII entities
    3PII replaced with [REDACTED]
    4Clean prompt sent to AI model

    Layer 2: PDF PII Redaction

    Clean text-only ingestion — PII never enters your knowledge base.

    When you upload a PDF to a knowledge base with PII filtering enabled, AI·Collab intercepts the file before it reaches OpenWebUI. Instead of sending the document to an external OCR API, AI·Collab uses Marker — a self-hosted, GPU-accelerated OCR engine running on our own infrastructure. The raw document text never leaves our network. Presidio then scans the extracted text and removes all detected PII. Only the clean, redacted text is forwarded for chunking and embedding into your knowledge base. This is the key difference from the standard upload path: in normal flow, Mistral OCR processes the document (EU-hosted, but external). In the PII-protected path, Marker handles OCR entirely on-premise — no raw document content ever reaches an external service.

    How PDF PII Redaction works:

    1Upload PDF to knowledge base (PII filter enabled)
    2Middleware intercepts the file
    3Marker (self-hosted GPU OCR) extracts text — no external API call
    4Presidio scans and redacts all PII from extracted text
    5Clean text forwarded to knowledge base — original PDF discarded

    What PII is Detected

    Presidio recognises a broad range of entity types in German and English text.

    Full names
    Email addresses
    Phone numbers
    IBAN / credit card numbers
    Dates of birth
    Passport / ID numbers
    IP addresses
    Medical record numbers
    Locations & addresses
    Organisation names
    Social security numbers
    URLs

    Credits & Cost

    PII Guard has a small credit overhead to cover the Presidio NLP processing: • Chat PII Guard: a 30% credit uplift applies to each protected request (e.g. a 10-credit response costs 13 credits with PII Guard enabled). • PDF PII Redaction: an additional 2 credits per page on top of the standard OCR cost. All PII-protected usage is clearly labelled in your account dashboard under Usage Statistics.

    Security & Compliance

    Presidio runs on AI·Collab's own European infrastructure in Germany — not a third-party cloud. PII analysis happens on-premise, inside the same network as the rest of the platform. No personal data leaves our servers unredacted. Redacted values are never stored or logged. All processing is covered by our Zero Data Retention Policy (ZDRP) and GDPR compliance framework. PII Guard is explicitly documented in our Data Processing Agreement (DPA / AVV under Art. 28 GDPR) as a technical and organisational measure. If your legal or compliance team requires a signed DPA, you can download and request countersignature at aicollab.app/dpa/. For organisations: PII Guard can be enforced centrally by admins, ensuring all team members process sensitive documents correctly — with a full audit trail in centralised billing.

    Download the DPA / AVV (Art. 28 GDPR)

    Frequently Asked Questions

    PII redaction powered by Microsoft Presidio, running on AI·Collab's European infrastructure. Covered by our Zero Data Retention Policy (ZDRP).

    Related Articles

    Ready to Experience 300+ AI Models?

    Get started today. Access models from OpenAI, Google, Anthropic, Grok and more.

    GDPR compliant · Zero data retention · Cancel anytime