Adaptive Sampling for High-Volume Services

When your service handles thousands of requests per second, logging everything is expensive and often unnecessary. But during incidents, you need full visibility. Adaptive sampling automatically adjusts the sample rate based on traffic.

The Problem

A flash sale or viral moment creates a cost explosion:

Normal:     100 req/s × 86,400 sec = 8.6M logs/day    ($4.30/day)
Flash sale: 10,000 req/s × 3,600 sec = 36M logs/hour  ($18/hour)

At $0.50/GB ingested (typical cloud pricing), a 4-hour sale event costs more than a month of normal operation. Worse, the flood makes it harder to find actual problems.

What you need:

Cost-effective logging during normal operation
Full visibility during incidents
Errors never dropped, regardless of volume

The Solution: adaptive Preset + Adaptive Sampling

Combine the adaptive preset with adaptive sampling for intelligent cost control:

from fapilog import LoggerBuilder

logger = (
    LoggerBuilder()
    .with_preset("adaptive")
    .with_adaptive_sampling(target_events_per_sec=100)
    .build()
)

# During normal traffic: logs ~100 events/sec
# During spikes: automatically samples down, never below 1%
# Errors: always logged via priority queue protection

Why Two Layers of Protection?

The adaptive preset provides queue-level protection via protected_levels:

ERROR/CRITICAL/FATAL events survive queue pressure
Under extreme load, unprotected events may be evicted to make room for protected ones
This is a last-resort safety net

Adaptive sampling provides pre-queue cost control:

Samples events before they enter the queue
Automatically adjusts rate based on throughput
More predictable cost control

Together they provide both cost efficiency and guaranteed error visibility.

Configuration

# Full configuration with adaptive sampling:
logger = (
    LoggerBuilder()
    .with_adaptive_sampling(
        target_events_per_sec=100,  # Target throughput
        min_rate=0.01,              # Never below 1%
        max_rate=1.0,               # Full logging when quiet
        window_seconds=10.0,        # 10-second rolling window
    )
    .with_protected_levels(["ERROR", "CRITICAL", "FATAL"])  # Queue protection
    .with_workers(2)        # Throughput optimization
    .with_drop_on_full()    # Protect latency under pressure
    .add_stdout_json()
    .build()
)

How Adaptive Sampling Works

Unlike fixed-rate sampling, adaptive sampling responds to actual throughput:

Traffic	Sample Rate	Events Logged
50/sec	100%	50/sec (full visibility)
100/sec	100%	100/sec (at target)
1,000/sec	10%	~100/sec (cost controlled)
10,000/sec	1%	~100/sec (minimum rate)

The algorithm uses exponential smoothing over a 10-second window to avoid thrashing.

Errors Always Pass Through

The most important feature: errors are never dropped. The protected_levels setting ensures ERROR, CRITICAL, and FATAL messages survive queue pressure:

# Even during a 10,000 req/sec spike with 1% sampling:
logger.info("request processed")   # May be sampled out or dropped under pressure
logger.error("database timeout")   # Protected: survives queue pressure
logger.critical("service down")    # Protected: survives queue pressure

This is production-safe because you’ll always see:

Unhandled exceptions
Database failures
Service degradations
Security events logged at ERROR+

The protection works at two levels:

Adaptive sampling filter: always_pass_levels bypasses sampling for protected levels
Priority queue: If queue is full, unprotected events are evicted to make room for protected ones

Real-World Example: E-commerce Flash Sale

Before: Fixed Sampling

# Fixed 10% sampling - problematic
logger = LoggerBuilder().with_sampling(rate=0.1).add_stdout().build()

# Problem 1: During quiet periods, you're missing 90% of data
# Problem 2: During flash sale, you might still be over budget
# Problem 3: No automatic adjustment

After: Adaptive Sampling with adaptive

from fapilog import LoggerBuilder

# Adaptive sampling - responds to actual traffic
logger = (
    LoggerBuilder()
    .with_preset("adaptive")
    .with_adaptive_sampling(target_events_per_sec=100)
    .build()
)

# Quiet period (50 req/s): 100% sampled
# Normal load (500 req/s): ~20% sampled, hitting 100/sec target
# Flash sale (5000 req/s): ~2% sampled, hitting 100/sec target
# All errors: 100% captured via queue protection

Cost Comparison

Scenario	Fixed 10%	adaptive + sampling
Quiet (50/sec)	5/sec	50/sec (full visibility)
Normal (500/sec)	50/sec	100/sec
Flash sale (5000/sec)	500/sec	100/sec
Daily cost*	~$10	~$5

*Estimated at $0.50/GB, 1KB average log size

Customizing the Preset

Override specific settings while keeping the preset’s base configuration:

from fapilog import LoggerBuilder

# Higher target for services that need more visibility
logger = (
    LoggerBuilder()
    .with_preset("adaptive")
    .with_adaptive_sampling(target_events_per_sec=500)  # Override target
    .build()
)

# Lower minimum rate for extremely high-volume services
logger = (
    LoggerBuilder()
    .with_preset("adaptive")
    .with_adaptive_sampling(min_rate=0.001)  # 0.1% minimum
    .build()
)

# Add a cloud sink while keeping preset configuration
logger = (
    LoggerBuilder()
    .with_preset("adaptive")
    .add_cloudwatch(log_group="/app/production")
    .build()
)

When to Use adaptive vs Other Presets

Preset	Use When
`adaptive`	Traffic varies widely, cost is a concern, errors must never be missed
`production`	Moderate traffic, durability matters more than cost
`serverless`	Lambda/Cloud Functions with short execution time

Decision Guide

Choose adaptive when:

Your service handles 100+ requests/second regularly
Traffic is spiky or unpredictable
Log storage/ingestion costs are a concern
You need errors to always be captured

Choose production instead when:

You need every log for compliance/audit
Traffic is predictable and moderate
File-based logging is required

Monitoring Adaptive Sampling

Track the current sample rate and dropped events:

from fapilog import LoggerBuilder

logger = (
    LoggerBuilder()
    .with_preset("adaptive")
    .with_metrics(enabled=True)
    .build()
)

# Exposed metrics:
# - fapilog_adaptive_sample_rate: Current rate (0.0-1.0)
# - fapilog_events_filtered: Events dropped by sampling
# - fapilog_events_always_passed: High-priority events that bypassed sampling

Going Deeper

Log Sampling and Rate Limiting - Fixed-rate sampling and token bucket rate limiting
Configuration Guide - Complete settings reference
Performance Tuning - Optimization strategies