Compliance Redaction: What Works and What Doesn’t
This guide explains how to use compliance presets effectively and avoid common pitfalls that lead to data exposure.
Key insight: Fapilog redacts based on field names, not field content. Understanding this is critical for compliance.
The Golden Rule
Redaction works on structured data with predictable field names.
from fapilog import LoggerBuilder
logger = LoggerBuilder().with_redaction(preset="GDPR_PII").build()
# ✅ WORKS: Named fields are redacted
logger.info("User signup", email="john@example.com", phone="+1-555-1234")
# Output: {"data": {"email": "***", "phone": "***"}}
# ❌ FAILS: PII buried in a string is NOT redacted
logger.info(f"User signed up: john@example.com, phone: +1-555-1234")
# Output: {"message": "User signed up: john@example.com, phone: +1-555-1234"}
What Gets Redacted
Redaction matches field names (and paths) against preset definitions:
Scenario |
Redacted? |
Why |
|---|---|---|
|
✅ Yes |
Field name |
|
✅ Yes |
Pattern |
|
✅ Yes |
Nested path |
|
❌ No |
Content scanning not performed |
|
❌ No |
Field name |
|
❌ No |
|
What Does NOT Get Redacted
1. PII in Message Strings
# ❌ BAD: PII in the message string
logger.info(f"Processing order for {user.email}")
# Output: {"message": "Processing order for john@example.com"}
# ✅ GOOD: PII in named fields
logger.info("Processing order", email=user.email)
# Output: {"message": "Processing order", "data": {"email": "***"}}
2. PII in Arbitrarily-Named Fields
# ❌ BAD: Field name doesn't match any preset pattern
logger.info("Support ticket", customer_contact="john@example.com")
# Output: {"data": {"customer_contact": "john@example.com"}}
# ✅ GOOD: Use recognized field names
logger.info("Support ticket", email="john@example.com")
# Output: {"data": {"email": "***"}}
# ✅ ALSO GOOD: Add custom fields to cover your domain
logger = (
LoggerBuilder()
.with_redaction(preset="GDPR_PII")
.with_redaction(fields=["customer_contact"])
.build()
)
3. PII in Serialized Objects
# ❌ BAD: Serialized JSON string
user_json = '{"email": "john@example.com", "ssn": "123-45-6789"}'
logger.info("User data", payload=user_json)
# Output: {"data": {"payload": "{\"email\": \"john@example.com\", ...}"}}
# ✅ GOOD: Pass as dict, not string
user_data = {"email": "john@example.com", "ssn": "123-45-6789"}
logger.info("User data", **user_data)
# Output: {"data": {"email": "***", "ssn": "***"}}
4. PII in Exception Messages
# ❌ BAD: PII in exception message
try:
process_user(email)
except Exception as e:
logger.error(f"Failed for user {email}: {e}")
# Output: {"message": "Failed for user john@example.com: ..."}
# ✅ GOOD: PII in structured field
try:
process_user(email)
except Exception as e:
logger.error("User processing failed", email=email, error=str(e))
# Output: {"data": {"email": "***", "error": "..."}}
Structuring Logs for Compliance
Use Named Fields for All PII
# Instead of this:
logger.info(f"User {name} ({email}) logged in from {ip}")
# Do this:
logger.info("User logged in", name=name, email=email, ip_address=ip)
Use Context for Request-Scoped PII
from fapilog import LoggerBuilder
logger = LoggerBuilder().with_redaction(preset="GDPR_PII").build()
# Bind user context once
request_logger = logger.bind(
email=request.user.email,
ip_address=request.client.host,
)
# All subsequent logs have PII in named fields (and redacted)
request_logger.info("Viewing dashboard")
request_logger.info("Updated settings", setting="theme")
request_logger.warning("Rate limit approaching")
Pass Objects, Not Strings
# ❌ Avoid string interpolation
logger.info(f"Order {order.id} for {order.customer_email}")
# ✅ Pass structured data
logger.info("Order placed", order_id=order.id, email=order.customer_email)
# ✅ Or unpack relevant fields
logger.info("Order placed", **order.to_log_dict())
Testing Your Redaction
Before deploying, verify PII is actually redacted:
import pytest
from fapilog import LoggerBuilder
from fapilog.testing import capture_logs
@pytest.mark.asyncio
async def test_email_redacted_in_named_field():
"""Email in named field should be redacted."""
async with capture_logs() as logs:
logger = await (
LoggerBuilder()
.with_redaction(preset="GDPR_PII")
.build_async()
)
await logger.info("signup", email="test@example.com")
assert "test@example.com" not in logs.text
assert "***" in logs.text
@pytest.mark.asyncio
async def test_email_in_message_NOT_redacted():
"""Email in message string is NOT redacted - this is expected behavior."""
async with capture_logs() as logs:
logger = await (
LoggerBuilder()
.with_redaction(preset="GDPR_PII")
.build_async()
)
# This is the WRONG way to log PII
await logger.info("User email: test@example.com")
# PII IS exposed - this test documents the limitation
assert "test@example.com" in logs.text
@pytest.mark.asyncio
async def test_custom_field_requires_explicit_config():
"""Custom field names need explicit configuration."""
async with capture_logs() as logs:
logger = await (
LoggerBuilder()
.with_redaction(preset="GDPR_PII")
.with_redaction(fields=["customer_contact"]) # Add custom field
.build_async()
)
await logger.info("ticket", customer_contact="test@example.com")
assert "test@example.com" not in logs.text
Compliance Checklist
Before going to production with compliance redaction:
Audit your logging calls - Search codebase for f-strings and
.format()in log callsUse structured fields - All PII should be in named fields, not message strings
Add domain-specific fields - Extend presets with your custom field names
Test redaction - Write tests that verify PII is masked
Review preset coverage - Check Presets Reference for what’s covered
Document gaps - Note any PII that can’t be redacted (e.g., user-generated content)
Common Patterns by Regulation
GDPR (EU)
logger = (
LoggerBuilder()
.with_redaction(preset="GDPR_PII")
.with_redaction(fields=["customer_id", "account_ref"]) # Your domain fields
.build()
)
# Always use named fields for Article 4 personal data
logger.info(
"Data subject request",
email=user.email, # Redacted
name=user.full_name, # Redacted
ip_address=request.ip, # Redacted
request_type="erasure", # Not PII, preserved
)
HIPAA (US Healthcare)
logger = (
LoggerBuilder()
.with_redaction(preset="HIPAA_PHI")
.with_redaction(fields=["chart_number", "room_number"])
.build()
)
# All 18 PHI identifiers should be in named fields
logger.info(
"Patient admission",
mrn=patient.medical_record_number, # Redacted
dob=patient.date_of_birth, # Redacted
ssn=patient.ssn, # Redacted
admission_type="emergency", # Not PHI, preserved
)
PCI-DSS (Payment Cards)
logger = (
LoggerBuilder()
.with_redaction(preset="PCI_DSS")
.build()
)
# Never log full card numbers, but if you must log payment events:
logger.info(
"Payment processed",
card_number=card.pan, # Redacted (but don't log this!)
cardholder=card.name, # Redacted
amount=transaction.amount, # Not cardholder data, preserved
last_four=card.pan[-4:], # Consider if this is acceptable
)
Summary
Do |
Don’t |
|---|---|
Use named fields for PII |
Embed PII in message strings |
Pass dicts, not JSON strings |
Serialize objects before logging |
Bind PII to context |
Interpolate PII in f-strings |
Extend presets with your fields |
Assume all field names are covered |
Test redaction in CI |
Deploy without verification |