Performance Benchmarks
Indicative benchmark results comparing fapilog to Python stdlib logging. These results help contextualize performance tuning recommendations.
Methodology
Parameter |
Value |
|---|---|
Baseline |
Python stdlib |
Test |
fapilog with rotating file sink |
Metrics |
Throughput (logs/sec), latency (μs), peak memory (bytes) |
Warmup |
1,000 calls before measurement |
Iterations |
20,000 (throughput/memory), 5,000 (latency) |
Payload |
~256 bytes JSON |
Two scenarios are measured:
Standard benchmark - Raw log call rate with fast file I/O
Slow sink benchmark - Application-side latency when sink I/O is constrained (2ms simulated delay)
Environment
Component |
Value |
|---|---|
Python |
3.13.7 |
OS |
macOS 24.6.0 (Darwin) |
CPU |
Apple M1 Max |
Memory |
32 GB |
fapilog |
0.3.6 |
Results
Standard Throughput
Raw log call throughput with fast file I/O:
Logger |
Logs/sec |
Relative |
|---|---|---|
stdlib |
90,393 |
1.0x |
fapilog |
3,295 |
0.04x |
Interpretation: For raw throughput to a fast local file, stdlib logging is faster. fapilog’s async machinery (queue, batching, background flush) adds overhead that doesn’t pay off when the sink is already fast.
Standard Latency
Per-call latency with fast file I/O:
Logger |
Avg (μs) |
Median (μs) |
P95 (μs) |
|---|---|---|---|
stdlib |
24 |
12 |
91 |
fapilog |
279 |
261 |
523 |
Interpretation: Similar to throughput, fapilog has higher per-call latency when sinks are fast. The async infrastructure has fixed costs regardless of sink speed.
Slow Sink Latency (Enterprise Scenario)
Application-side latency when sink I/O is constrained (2ms simulated delay):
Logger |
Avg (μs) |
Median (μs) |
P95 (μs) |
|---|---|---|---|
stdlib |
2,037 |
2,014 |
2,040 |
fapilog |
286 |
274 |
483 |
Latency reduction: 86%
Interpretation: When sink I/O is slow (network sinks, constrained disk, external services), fapilog’s non-blocking design prevents the application from stalling. The log call returns immediately while the async backend handles I/O in the background. This is where fapilog’s architecture provides value.
Burst Absorption
Ability to absorb traffic bursts without blocking (20,000 log calls in rapid succession with 2ms sink delay):
Metric |
Value |
|---|---|
Submitted |
22,000 |
Processed |
12,362 |
Dropped |
1,712 |
Queue high-water mark |
10,000 |
Flush latency |
5.0s |
Interpretation: With drop_on_full=True, fapilog absorbs bursts up to queue capacity, then gracefully drops overflow rather than blocking the application. Configure queue size based on expected burst patterns.
Memory
Peak memory during throughput test:
Logger |
Peak (bytes) |
|---|---|
stdlib |
85,719 |
fapilog |
10,670,043 |
Interpretation: fapilog uses more memory due to its queue, batching buffers, and async infrastructure. This is a deliberate trade-off for non-blocking behavior. Configure max_queue_size based on available memory.
Worker Count Impact
The worker_count setting controls parallel flush processing and has the largest impact on fapilog throughput:
Workers |
Throughput |
Relative |
|---|---|---|
1 (default) |
~3,500/sec |
1.0x |
2 |
~105,000/sec |
30x |
2 + redaction |
~89,000/sec |
26x |
Key findings:
Workers are the bottleneck with
worker_count=1(serializes all processing)2 workers is optimal - more shows diminishing returns due to asyncio scheduler overhead (not OS context switching—workers are asyncio tasks, not threads)
Queue size has minimal impact - larger queues slightly hurt due to memory overhead
Redaction cost is minimal (~15%) with proper worker count
Recommendation: Use 2 workers for production. Production-oriented presets (production, fastapi, serverless, hardened) default to 2 workers automatically.
# Option 1: Use a production preset (recommended)
logger = get_logger(preset="production")
# Option 2: Explicitly set worker count
logger = LoggerBuilder().with_workers(2).build()
See Performance Tuning for detailed configuration guidance.
When to Use fapilog
Based on these benchmarks:
Scenario |
Recommendation |
|---|---|
Fast local file, low volume |
stdlib may suffice |
Network sinks (HTTP, cloud services) |
fapilog recommended |
High-volume with slow I/O |
fapilog recommended |
Latency-sensitive applications |
fapilog recommended |
Burst traffic patterns |
fapilog with |
Limitations
These results are indicative, not definitive:
Single machine - Development laptop, not production hardware
Front-end measurement - Measures log call latency, not end-to-end delivery
Environment-dependent - Results vary with CPU, disk, Python version, workload
Not a substitute for load testing - Test in your actual environment before deployment
Reproducing These Results
python scripts/benchmarking.py --iterations 20000 --latency-iterations 5000
Options:
Flag |
Default |
Description |
|---|---|---|
|
20000 |
Throughput/memory test iterations |
|
5000 |
Latency test iterations |
|
256 |
Approximate payload size |
|
2.0 |
Simulated sink delay for enterprise tests |
|
20000 |
Burst size for absorption test |