Performance Benchmarks

Indicative benchmark results comparing fapilog to Python stdlib logging. These results help contextualize performance tuning recommendations.

Methodology

Parameter	Value
Baseline	Python stdlib `logging` to file
Test	fapilog with rotating file sink
Metrics	Throughput (logs/sec), latency (μs), peak memory (bytes)
Warmup	1,000 calls before measurement
Iterations	20,000 (throughput/memory), 5,000 (latency)
Payload	~256 bytes JSON

Two scenarios are measured:

Standard benchmark - Raw log call rate with fast file I/O
Slow sink benchmark - Application-side latency when sink I/O is constrained (2ms simulated delay)

Environment

Component	Value
Python	3.13.7
OS	macOS 24.6.0 (Darwin)
CPU	Apple M1 Max
Memory	32 GB
fapilog	0.3.6

Results

Standard Throughput

Raw log call throughput with fast file I/O:

Logger	Logs/sec	Relative
stdlib	90,393	1.0x
fapilog	3,295	0.04x

Interpretation: For raw throughput to a fast local file, stdlib logging is faster. fapilog’s async machinery (queue, batching, background flush) adds overhead that doesn’t pay off when the sink is already fast.

Standard Latency

Per-call latency with fast file I/O:

Logger	Avg (μs)	Median (μs)	P95 (μs)
stdlib	24	12	91
fapilog	279	261	523

Interpretation: Similar to throughput, fapilog has higher per-call latency when sinks are fast. The async infrastructure has fixed costs regardless of sink speed.

Slow Sink Latency (Enterprise Scenario)

Application-side latency when sink I/O is constrained (2ms simulated delay):

Logger	Avg (μs)	Median (μs)	P95 (μs)
stdlib	2,037	2,014	2,040
fapilog	286	274	483

Latency reduction: 86%

Interpretation: When sink I/O is slow (network sinks, constrained disk, external services), fapilog’s non-blocking design prevents the application from stalling. The log call returns immediately while the async backend handles I/O in the background. This is where fapilog’s architecture provides value.

Burst Absorption

Ability to absorb traffic bursts without blocking (20,000 log calls in rapid succession with 2ms sink delay):

Metric	Value
Submitted	22,000
Processed	12,362
Dropped	1,712
Queue high-water mark	10,000
Flush latency	5.0s

Interpretation: With drop_on_full=True, fapilog absorbs bursts up to queue capacity, then gracefully drops overflow rather than blocking the application. Configure queue size based on expected burst patterns.

Memory

Peak memory during throughput test:

Logger	Peak (bytes)
stdlib	85,719
fapilog	10,670,043

Interpretation: fapilog uses more memory due to its queue, batching buffers, and async infrastructure. This is a deliberate trade-off for non-blocking behavior. Configure max_queue_size based on available memory.

Worker Count Impact

The worker_count setting controls parallel flush processing and has the largest impact on fapilog throughput:

Workers	Throughput	Relative
1 (default)	~3,500/sec	1.0x
2	~105,000/sec	30x
2 + redaction	~89,000/sec	26x

Key findings:

Workers are the bottleneck with worker_count=1 (serializes all processing)
2 workers is optimal - more shows diminishing returns due to asyncio scheduler overhead (not OS context switching—workers are asyncio tasks, not threads)
Queue size has minimal impact - larger queues slightly hurt due to memory overhead
Redaction cost is minimal (~15%) with proper worker count

Recommendation: Use 2 workers for production. Production-oriented presets (production, fastapi, serverless, hardened) default to 2 workers automatically.

# Option 1: Use a production preset (recommended)
logger = get_logger(preset="production")

# Option 2: Explicitly set worker count
logger = LoggerBuilder().with_workers(2).build()

See Performance Tuning for detailed configuration guidance.

When to Use fapilog

Based on these benchmarks:

Scenario	Recommendation
Fast local file, low volume	stdlib may suffice
Network sinks (HTTP, cloud services)	fapilog recommended
High-volume with slow I/O	fapilog recommended
Latency-sensitive applications	fapilog recommended
Burst traffic patterns	fapilog with `drop_on_full=True`

Limitations

These results are indicative, not definitive:

Single machine - Development laptop, not production hardware
Front-end measurement - Measures log call latency, not end-to-end delivery
Environment-dependent - Results vary with CPU, disk, Python version, workload
Not a substitute for load testing - Test in your actual environment before deployment

Reproducing These Results

python scripts/benchmarking.py --iterations 20000 --latency-iterations 5000

Options:

Flag	Default	Description
`--iterations`	20000	Throughput/memory test iterations
`--latency-iterations`	5000	Latency test iterations
`--payload-bytes`	256	Approximate payload size
`--slow-sink-ms`	2.0	Simulated sink delay for enterprise tests
`--burst`	20000	Burst size for absorption test