Adaptive Sampling for High-Volume Services
When your service handles thousands of requests per second, logging everything is expensive and often unnecessary. But during incidents, you need full visibility. Adaptive sampling automatically adjusts the sample rate based on traffic.
The Problem
A flash sale or viral moment creates a cost explosion:
Normal: 100 req/s × 86,400 sec = 8.6M logs/day ($4.30/day)
Flash sale: 10,000 req/s × 3,600 sec = 36M logs/hour ($18/hour)
At $0.50/GB ingested (typical cloud pricing), a 4-hour sale event costs more than a month of normal operation. Worse, the flood makes it harder to find actual problems.
What you need:
Cost-effective logging during normal operation
Full visibility during incidents
Errors never dropped, regardless of volume
The Solution: adaptive Preset + Adaptive Sampling
Combine the adaptive preset with adaptive sampling for intelligent cost control:
from fapilog import LoggerBuilder
logger = (
LoggerBuilder()
.with_preset("adaptive")
.with_adaptive_sampling(target_events_per_sec=100)
.build()
)
# During normal traffic: logs ~100 events/sec
# During spikes: automatically samples down, never below 1%
# Errors: always logged via priority queue protection
Why Two Layers of Protection?
The adaptive preset provides queue-level protection via protected_levels:
ERROR/CRITICAL/FATAL events survive queue pressure
Under extreme load, unprotected events may be evicted to make room for protected ones
This is a last-resort safety net
Adaptive sampling provides pre-queue cost control:
Samples events before they enter the queue
Automatically adjusts rate based on throughput
More predictable cost control
Together they provide both cost efficiency and guaranteed error visibility.
Configuration
# Full configuration with adaptive sampling:
logger = (
LoggerBuilder()
.with_adaptive_sampling(
target_events_per_sec=100, # Target throughput
min_rate=0.01, # Never below 1%
max_rate=1.0, # Full logging when quiet
window_seconds=10.0, # 10-second rolling window
)
.with_protected_levels(["ERROR", "CRITICAL", "FATAL"]) # Queue protection
.with_workers(2) # Throughput optimization
.with_drop_on_full() # Protect latency under pressure
.add_stdout_json()
.build()
)
How Adaptive Sampling Works
Unlike fixed-rate sampling, adaptive sampling responds to actual throughput:
Traffic |
Sample Rate |
Events Logged |
|---|---|---|
50/sec |
100% |
50/sec (full visibility) |
100/sec |
100% |
100/sec (at target) |
1,000/sec |
10% |
~100/sec (cost controlled) |
10,000/sec |
1% |
~100/sec (minimum rate) |
The algorithm uses exponential smoothing over a 10-second window to avoid thrashing.
Errors Always Pass Through
The most important feature: errors are never dropped. The protected_levels setting ensures ERROR, CRITICAL, and FATAL messages survive queue pressure:
# Even during a 10,000 req/sec spike with 1% sampling:
logger.info("request processed") # May be sampled out or dropped under pressure
logger.error("database timeout") # Protected: survives queue pressure
logger.critical("service down") # Protected: survives queue pressure
This is production-safe because you’ll always see:
Unhandled exceptions
Database failures
Service degradations
Security events logged at ERROR+
The protection works at two levels:
Adaptive sampling filter:
always_pass_levelsbypasses sampling for protected levelsPriority queue: If queue is full, unprotected events are evicted to make room for protected ones
Real-World Example: E-commerce Flash Sale
Before: Fixed Sampling
# Fixed 10% sampling - problematic
logger = LoggerBuilder().with_sampling(rate=0.1).add_stdout().build()
# Problem 1: During quiet periods, you're missing 90% of data
# Problem 2: During flash sale, you might still be over budget
# Problem 3: No automatic adjustment
After: Adaptive Sampling with adaptive
from fapilog import LoggerBuilder
# Adaptive sampling - responds to actual traffic
logger = (
LoggerBuilder()
.with_preset("adaptive")
.with_adaptive_sampling(target_events_per_sec=100)
.build()
)
# Quiet period (50 req/s): 100% sampled
# Normal load (500 req/s): ~20% sampled, hitting 100/sec target
# Flash sale (5000 req/s): ~2% sampled, hitting 100/sec target
# All errors: 100% captured via queue protection
Cost Comparison
Scenario |
Fixed 10% |
adaptive + sampling |
|---|---|---|
Quiet (50/sec) |
5/sec |
50/sec (full visibility) |
Normal (500/sec) |
50/sec |
100/sec |
Flash sale (5000/sec) |
500/sec |
100/sec |
Daily cost* |
~$10 |
~$5 |
*Estimated at $0.50/GB, 1KB average log size
Customizing the Preset
Override specific settings while keeping the preset’s base configuration:
from fapilog import LoggerBuilder
# Higher target for services that need more visibility
logger = (
LoggerBuilder()
.with_preset("adaptive")
.with_adaptive_sampling(target_events_per_sec=500) # Override target
.build()
)
# Lower minimum rate for extremely high-volume services
logger = (
LoggerBuilder()
.with_preset("adaptive")
.with_adaptive_sampling(min_rate=0.001) # 0.1% minimum
.build()
)
# Add a cloud sink while keeping preset configuration
logger = (
LoggerBuilder()
.with_preset("adaptive")
.add_cloudwatch(log_group="/app/production")
.build()
)
When to Use adaptive vs Other Presets
Preset |
Use When |
|---|---|
|
Traffic varies widely, cost is a concern, errors must never be missed |
|
Moderate traffic, durability matters more than cost |
|
Lambda/Cloud Functions with short execution time |
Decision Guide
Choose adaptive when:
Your service handles 100+ requests/second regularly
Traffic is spiky or unpredictable
Log storage/ingestion costs are a concern
You need errors to always be captured
Choose production instead when:
You need every log for compliance/audit
Traffic is predictable and moderate
File-based logging is required
Monitoring Adaptive Sampling
Track the current sample rate and dropped events:
from fapilog import LoggerBuilder
logger = (
LoggerBuilder()
.with_preset("adaptive")
.with_metrics(enabled=True)
.build()
)
# Exposed metrics:
# - fapilog_adaptive_sample_rate: Current rate (0.0-1.0)
# - fapilog_events_filtered: Events dropped by sampling
# - fapilog_events_always_passed: High-priority events that bypassed sampling
Going Deeper
Log Sampling and Rate Limiting - Fixed-rate sampling and token bucket rate limiting
Configuration Guide - Complete settings reference
Performance Tuning - Optimization strategies