An Overview of Data Security in AI & LLM Operations

This document outlines the principles of data redaction and security, providing a guide to balancing deep operational insight with the critical need to protect sensitive information.

The Shared Responsibility Model for AI Security

To ensure the highest standards of security and compliance, we operate under a Shared Responsibility Model. This framework clearly delineates the security obligations between our platform and you, the customer.

Our Responsibility (Security OF the Platform): We are responsible for protecting the infrastructure that runs our services. This includes securing the log ingestion endpoints, ensuring robust authentication, encrypting all data in transit (TLS 1.2+) and at rest (AES-256), and securing the physical data centers where the service operates.
Your Responsibility (Security IN the Platform): You are responsible for the security and compliance of the data you send to and use in the platform. This includes securely managing API keys, implementing proper access controls within your organization, identifying the level of redaction, if any, implmmenting the redaction process.

This model is built on a Zero-Trust Data Ingestion principle. We assume any data may be sensitive. The most secure architectural pattern is one where you process and sanitize data within your own trusted environment before it ever leaves your security perimeter.

Balancing Insight and Privacy: The Redaction Trade-Off

Data valuable for making an LLM application performant and safe—the text of prompts and responses—is also the data that makes the system most vulnerable. This creates a fundamental conflict between observability and data protection.

To navigate this, we use the principle of Customer-Controlled Logging. You, the customer, determine and implement the appropriate level of data sanitization within your own environment. The "level" of redaction is not an API setting on our platform but a description of the architectural pattern you adopt.

Five Levels of Data Redaction

See Data Redaction Levels

Feature Compatibility Matrix for `request.prompt` and `response.completion`

Netra collects data from multiple sources, including logs and associated metadata. Usually the most sensitive data is found in the request.prompt and response.completion fields, therefore we have created a feature compatibility matrix for these fields.

All features are Fully Functional if no redaction is used.

Feature	Level 1: PII Redaction	Level 2: Entity & Keyword Redaction	Level 3: Full Content Redaction
Core Analysis & Recommendations	Fully Functional	Fully Functional	Functional
Prompt & Completion Analysis	Fully Functional	Functional	Non-Functional
Anomaly Detection	Fully Functional	Functional	Functional
Actionable AI Optimization	Fully Functional	Fully Functional	Functional
Elastic Control	Fully Functional	Fully Functional	Functional

For each feature and redaction level combination, we indicate whether the feature family is "Fully Functional," "Functional" (meaning the features work, but its analysis may be less accurate or comprehensive), or "Non-Functional."

The Case for Level 1 (PII Redaction): Maximizing Insight While Ensuring Privacy

Choosing the right data redaction strategy is a critical decision that directly impacts both your security posture and the return on investment you get from our platform. While we provide a spectrum of five redaction levels to fit any need, we recommend Level 1: Targeted Entity Redaction for most production applications. For customers in a pilot phase, we also offer an option for contractually guaranteed data deletion without requiring an immediate redaction pipeline.

This approach represents the optimal balance, allowing you to leverage the full analytical power of the platform without compromising on the foundational principle of protecting user privacy.

Also, you may already be using PII redaction in your own environment, so this may be a familiar process for you.

1. Preserve Full Analytical Functionality

Useful insights for optimizing cost, latency, and quality come from understanding the semantic structure and intent of your prompts and responses. More aggressive redaction levels (Level 2 and above) limit this crucial context.

As shown in our Feature Compatibility Matrix, Level 1 is the only redaction level where every platform feature remains fully functional.

Semantic Analysis: Features like Prompt Verbosity Analysis, Semantic Cache ROI Analysis, and Data-Driven Model Selection require the linguistic content of the prompt to function. Level 1 redaction preserves the core message while removing the PII, allowing these features to work effectively.
Debugging and Quality Control: When a model produces a poor response, you need the surrounding context to understand why. With Level 4 (Value Redaction), the interaction becomes a "black box," making it nearly impossible to debug semantic errors or trace the root cause of hallucinations. Level 1 gives you the context you need without exposing sensitive data.

2. Adhere to a Strong, Defensible Security Posture

Recommending Level 1 is not a recommendation to be lax with security. On the contrary, it aligns perfectly with our Shared Responsibility Model and the principle of Zero-Trust Data Ingestion.

Surgical, Not Total, Removal: Level 1 redaction is not about sending raw data. It is a surgical process where you identify and remove specific, high-risk PII (names, emails, addresses, credit card numbers, etc.) within your own environment.
You Control the Sanitization: The most sensitive data never leaves your security perimeter. You are simply replacing specific entities with generic placeholders, which is a robust and widely accepted method for data anonymization.

3. The Pragmatic Sweet Spot

Level 1 redaction is the pragmatic "sweet spot" on the privacy-insight spectrum.

It avoids the "All or Nothing" trap: You don't have to choose between getting zero analytical value (Level 3+) and taking on unnecessary risk.
It maximizes your ROI: By enabling all platform features, Level 1 ensures you can identify the maximum number of opportunities for cost savings and performance improvements.

For these reasons, we strongly recommend implementing a Level 1:PII Redaction strategy. It is the most effective way to build a secure, verifiable, and auditable logging pipeline that delivers the full value of the platform.

Implementing and Verifying Sanitization

There are two primary methods for implementing this sanitization process, each offering different trade-offs between ease of implementation and accuracy:

SQL-Based PII Redaction (Materialized View): This approach uses ClickHouse's native capabilities to create a materialized view that automatically filters, cleans, and copies relevant logs from your source table into a new, secure table. It is efficient and keeps the entire process within your database.
- For a complete guide, see: Creating a Cleaned LLM Log Table In ClickHouse
High-Accuracy Python PII Redaction Script: For more complex requirements, a dedicated Python script leveraging powerful open-source libraries like Microsoft's Presidio offers higher accuracy for PII detection. This script runs as a continuous process, fetching new data, cleaning it, and inserting it into a secure destination table.
- For a complete guide, see: Customer Guide: Python PII Redaction Script

Both methods empower you to create a secure, verifiable, and auditable logging pipeline that aligns with your organization's security and compliance requirements.

Deletion of Data Mistakes

We understand that even with robust processes, mistakes can happen. A bug in your redaction pipeline could lead to sensitive, unredacted data being sent to our platform.

Our Shared Responsibility Model extends to providing a clear process for handling these data spillage incidents.

If you discover that unredacted data has been sent to our systems, please notify us immediately.

We have a defined data spillage process that includes:

Customer Acknowledgement: Acknowledging the report and initiating our internal incident response.
Secure Purging: A process for the secure and verifiable purging of the specified sensitive data from all of our systems, including our managed ClickHouse instance and any backups.
Compliance Support: Working with you to address any contractual or regulatory obligations that may arise from the incident.

This process underscores the importance of implementing local verification and auditing tools, as they provide the necessary records to identify and manage such incidents effectively.

Optional: Contractually Guaranteed Deletion for Pilots

For customers in a pilot phase who wish to evaluate the platform without implementing a redaction pipeline, we offer an alternative.

We can provide a contractual guarantee to securely purge all submitted data from our systems at the conclusion of the pilot period.

This allows your team to send raw, full-fidelity data to experience the maximum analytical value, with the assurance that the data will be verifiably deleted, satisfying security and compliance requirements for short-term evaluations.