Sampling Strategies
Sampling reduces log volume by selectively keeping or dropping log events. Fapilog provides two sampling strategies optimized for different use cases.
Quick Start (Recommended)
Use the builder API for the simplest setup:
from fapilog import LoggerBuilder
# Probabilistic sampling - keep ~10% of logs
logger = LoggerBuilder().with_sampling(rate=0.1).build()
# Trace-based sampling - consistent per trace
logger = LoggerBuilder().with_trace_sampling(rate=0.1).build()
When to Use Sampling
Sampling is useful when:
High-volume services generate more logs than your storage/budget allows
Development/staging environments don’t need every log
Cost optimization for cloud logging services that charge per volume
Sampling is not recommended when:
Compliance requirements mandate complete audit trails
Debugging production issues where you need full context
Low-volume services where storage cost is minimal
Probabilistic Sampling (SamplingFilter)
Randomly samples events at a configured rate. Each log event has an independent probability of being kept.
from fapilog import LoggerBuilder
logger = (
LoggerBuilder()
.with_sampling(rate=0.1) # Keep ~10% of logs
.build()
)
Configuration Options
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
1.0 |
Probability of keeping each event (0.0-1.0) |
|
int |
None |
RNG seed for reproducible sampling |
Characteristics
Independent decisions: Each event is evaluated independently
Statistically representative: Over time, sampled logs reflect the overall distribution
Simple and fast: Minimal overhead per event
Non-deterministic: The same trace may have some events kept and others dropped
When to Use
General log volume reduction
Development/test environments
When you don’t need complete traces
Seed Behavior and Determinism
When seed is provided, sampling becomes reproducible:
# Two loggers with the same seed produce identical sampling decisions
logger1 = LoggerBuilder().with_sampling(rate=0.5, seed=42).build()
logger2 = LoggerBuilder().with_sampling(rate=0.5, seed=42).build()
# Same sequence of keep/drop decisions
Each SamplingFilter instance maintains its own RNG state. Creating multiple filters with the same seed produces identical sequences independently—they don’t interfere with each other.
Without a seed, sampling uses system randomness and is non-reproducible across runs.
Trace-Based Sampling (TraceSamplingFilter)
Samples consistently by trace ID, ensuring all events in a trace are either kept or dropped together.
from fapilog import LoggerBuilder
logger = (
LoggerBuilder()
.with_trace_sampling(rate=0.1, trace_id_field="trace_id")
.build()
)
Configuration Options
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
0.1 |
Probability of keeping each trace (0.0-1.0) |
|
str |
“trace_id” |
Field name containing the trace identifier |
|
list |
[“ERROR”, “CRITICAL”] |
Levels that bypass sampling |
Characteristics
Trace-consistent: All events with the same trace ID get the same decision
Deterministic per trace: Given the same trace ID, the decision is always the same
Error preservation: ERROR and CRITICAL logs are always kept by default
Hash-based: Uses MD5 hash of trace ID for consistent distribution
When to Use
Distributed tracing environments
When you need complete request context
Debugging scenarios where partial traces are useless
Always-Pass Levels
By default, ERROR and CRITICAL logs bypass sampling entirely:
# ERROR logs always pass, even if trace is sampled out
logger.error("Database connection failed", trace_id="abc123")
# Customize which levels always pass
logger = (
LoggerBuilder()
.with_trace_sampling(
rate=0.1,
always_pass_levels=["ERROR", "CRITICAL", "WARNING"]
)
.build()
)
Events Without Trace ID
When an event lacks a trace ID, TraceSamplingFilter falls back to probabilistic sampling for that event.
Comparison: Which Strategy to Use?
Aspect |
|
|
|---|---|---|
Decision unit |
Per event |
Per trace |
Determinism |
Optional (with seed) |
Always (by trace ID) |
Trace completeness |
No guarantee |
All or nothing |
Error handling |
Sampled like other events |
Always kept (configurable) |
Use case |
Volume reduction |
Distributed tracing |
Overhead |
Minimal |
Hash computation |
Configuration via Settings
For Settings-based configuration instead of the builder:
from fapilog import Settings
from fapilog.core import CoreSettings
from fapilog.core.filters import FilterConfigSettings, SamplingFilterSettings
settings = Settings(
core=CoreSettings(filters=["sampling"]),
filter_config=FilterConfigSettings(
sampling=SamplingFilterSettings(rate=0.1)
)
)
Configuration via Environment Variables
# Enable sampling filter
export FAPILOG_CORE__FILTERS='["sampling"]'
# Set sampling rate
export FAPILOG_FILTER_CONFIG__SAMPLING__RATE=0.1
# For trace-based sampling
export FAPILOG_CORE__FILTERS='["trace_sampling"]'
export FAPILOG_FILTER_CONFIG__TRACE_SAMPLING__RATE=0.1
export FAPILOG_FILTER_CONFIG__TRACE_SAMPLING__TRACE_ID_FIELD="trace_id"
Operational Considerations
Monitoring Sampled Volume
Enable metrics to track sampling behavior:
logger = (
LoggerBuilder()
.with_sampling(rate=0.1)
.with_metrics(enabled=True)
.build()
)
Monitor fapilog_events_filtered_total to verify sampling is working as expected.
Combining with Other Filters
Sampling filters can be combined with other filters in the pipeline:
logger = (
LoggerBuilder()
.with_level("INFO") # First: drop DEBUG
.with_sampling(rate=0.5) # Then: sample remaining
.build()
)
Filter order matters—events are processed sequentially through the filter pipeline.
Production Recommendations
Start conservative: Begin with higher sample rates (0.5+) and decrease based on observed volume
Monitor dropped events: Use metrics to track filter behavior
Keep errors: Use
TraceSamplingFilteror configure level-based exceptionsDocument your rate: Team members debugging production need to know sampling is active
Consider per-service rates: High-traffic services may need lower rates than low-traffic ones
Deprecated: observability.logging.sampling_rate
Deprecated since v0.6.0: The
observability.logging.sampling_ratesetting is deprecated. Use filter-based sampling instead.
The legacy observability.logging.sampling_rate setting still works but emits a deprecation warning at runtime. Migrate to filter-based sampling for the recommended approach.
Migration Guide
Before (deprecated):
from fapilog import Settings, get_logger
from fapilog.core.observability import ObservabilitySettings, LoggingSettings
# Deprecated - will emit warning
settings = Settings(
observability=ObservabilitySettings(
logging=LoggingSettings(sampling_rate=0.1)
)
)
logger = get_logger(settings=settings)
After (recommended):
from fapilog import LoggerBuilder
# Using builder API
logger = LoggerBuilder().with_sampling(rate=0.1).build()
Or via Settings:
from fapilog import Settings
from fapilog.core import CoreSettings
from fapilog.core.filters import FilterConfigSettings, SamplingFilterSettings
settings = Settings(
core=CoreSettings(filters=["sampling"]),
filter_config=FilterConfigSettings(
sampling=SamplingFilterSettings(rate=0.1)
)
)
Behavioral Differences
Aspect |
Legacy ( |
Filter-based |
|---|---|---|
Application point |
Inline during log call |
In filter pipeline |
Efficiency |
Less efficient |
More efficient |
Level overrides |
Not supported |
Supported via |
Configurability |
Single rate only |
Full filter options |
Future support |
Will be removed |
Long-term supported |
Why Migrate?
Performance: Filter-based sampling is more efficient
Flexibility: Access to trace-based sampling, level overrides, and seeds
Consistency: Unified configuration with other filters
Future-proof: Legacy setting will be removed in a future major version