Data Redaction

Five Levels of Data Redaction

Progressing from the least to the most severe. Each level includes an explanation of the method used. Some Netra Apex functions work with all levels of redaction, while others are only compatible with a specific level.

Original Data

This is the baseline data with no redaction applied. It serves as our starting point.

{
  "role": "user",
  "content": "What is the capital of France?"
}

Level 1: PII Redaction - Recommended

It focuses on surgically removing specific Personally Identifiable Information (PII) while leaving the overall structure and intent of the message intact. This allows for maximum analytical value without exposing user-specific or confidential information.

Method: Use a PII detection tool (like Microsoft's Presidio or a custom regular expression) to automatically identify and replace sensitive entities like names, email addresses, phone numbers, and credit card numbers. The identified entity is then replaced with a generic placeholder or a tag indicating the type of redacted information.

Original:

{
  "role": "user",
  "content": "Hi, my name is Jane Doe and my email is [email protected]. Can you help me with my order?"
}

Result:

{
  "role": "user",
  "content": "Hi, my name is [PERSON] and my email is [EMAIL_ADDRESS]. Can you help me with my order?"
}

Level 2: Entity & Keyword Redaction - Optional

This level removes more context by redacting key nouns, verbs, or entire phrases that reveal the specific topic of the query. The general sentence structure is often preserved.

Method: Use pattern matching or keyword detection to identify and remove the core subject of the question. In this case, "capital" and "France" are the key terms that define the query's specific nature.

Result:

{
  "role": "user",
  "content": "What is the [REDACTED] of [REDACTED]?"
}

Level 3: Full Content Redaction - Optional for extreme security cases.

A very common approach where the entire content of a specific field is removed, but the associated metadata is preserved. This shows that an interaction took place but completely hides what was said.

Method: The entire value associated with the "content" key is replaced with a single redaction marker. The metadata ("role": "user") is kept for structural or statistical analysis.

Result:

{
  "role": "user",
  "content": "[REDACTED]"
}

Level 4: Value Redaction - Not Recommended

This is a more severe form of redaction that removes all values from the object, leaving only the keys. This preserves the data's structure (or schema) but removes all specific information, including metadata.

Method: All values for all keys in the JSON object are systematically replaced with redaction markers. This confirms a message with this specific structure was sent, but reveals nothing more.

Result:

{
  "role": "[REDACTED]",
  "content": "[REDACTED]"
}

Level 5: Total Object Redaction - Not Recommended

The most extreme level of redaction, where the entire object is either removed completely or replaced by a single marker. This eliminates any trace that this specific data object ever existed.

Method: The entire JSON object is replaced with a null value, an empty object, or a single string indicating that an object was removed. This is often used when the mere existence of the data itself is sensitive or when it must be completely purged for compliance reasons.

Result:

[REDACTED_OBJECT]