# Performance Benchmarks Indicative benchmark results comparing fapilog to Python stdlib logging. These results help contextualize [performance tuning](performance-tuning.md) recommendations. ## Methodology | Parameter | Value | |-----------|-------| | Baseline | Python stdlib `logging` to file | | Test | fapilog with rotating file sink | | Metrics | Throughput (logs/sec), latency (μs), peak memory (bytes) | | Warmup | 1,000 calls before measurement | | Iterations | 20,000 (throughput/memory), 5,000 (latency) | | Payload | ~256 bytes JSON | Two scenarios are measured: 1. **Standard benchmark** - Raw log call rate with fast file I/O 2. **Slow sink benchmark** - Application-side latency when sink I/O is constrained (2ms simulated delay) ## Environment | Component | Value | |-----------|-------| | Python | 3.13.7 | | OS | macOS 24.6.0 (Darwin) | | CPU | Apple M1 Max | | Memory | 32 GB | | fapilog | 0.3.6 | ## Results ### Standard Throughput Raw log call throughput with fast file I/O: | Logger | Logs/sec | Relative | |--------|----------|----------| | stdlib | 90,393 | 1.0x | | fapilog | 3,295 | 0.04x | **Interpretation:** For raw throughput to a fast local file, stdlib logging is faster. fapilog's async machinery (queue, batching, background flush) adds overhead that doesn't pay off when the sink is already fast. ### Standard Latency Per-call latency with fast file I/O: | Logger | Avg (μs) | Median (μs) | P95 (μs) | |--------|----------|-------------|----------| | stdlib | 24 | 12 | 91 | | fapilog | 279 | 261 | 523 | **Interpretation:** Similar to throughput, fapilog has higher per-call latency when sinks are fast. The async infrastructure has fixed costs regardless of sink speed. ### Slow Sink Latency (Enterprise Scenario) Application-side latency when sink I/O is constrained (2ms simulated delay): | Logger | Avg (μs) | Median (μs) | P95 (μs) | |--------|----------|-------------|----------| | stdlib | 2,037 | 2,014 | 2,040 | | fapilog | 286 | 274 | 483 | **Latency reduction: 86%** **Interpretation:** When sink I/O is slow (network sinks, constrained disk, external services), fapilog's non-blocking design prevents the application from stalling. The log call returns immediately while the async backend handles I/O in the background. This is where fapilog's architecture provides value. ### Burst Absorption Ability to absorb traffic bursts without blocking (20,000 log calls in rapid succession with 2ms sink delay): | Metric | Value | |--------|-------| | Submitted | 22,000 | | Processed | 12,362 | | Dropped | 1,712 | | Queue high-water mark | 10,000 | | Flush latency | 5.0s | **Interpretation:** With `drop_on_full=True`, fapilog absorbs bursts up to queue capacity, then gracefully drops overflow rather than blocking the application. Configure queue size based on expected burst patterns. ### Memory Peak memory during throughput test: | Logger | Peak (bytes) | |--------|--------------| | stdlib | 85,719 | | fapilog | 10,670,043 | **Interpretation:** fapilog uses more memory due to its queue, batching buffers, and async infrastructure. This is a deliberate trade-off for non-blocking behavior. Configure `max_queue_size` based on available memory. ### Worker Count Impact The `worker_count` setting controls parallel flush processing and has the largest impact on fapilog throughput: | Workers | Throughput | Relative | |---------|------------|----------| | 1 (default) | ~3,500/sec | 1.0x | | 2 | ~105,000/sec | **30x** | | 2 + redaction | ~89,000/sec | 26x | **Key findings:** - Workers are the bottleneck with `worker_count=1` (serializes all processing) - 2 workers is optimal - more shows diminishing returns due to asyncio scheduler overhead (not OS context switching—workers are asyncio tasks, not threads) - Queue size has minimal impact - larger queues slightly hurt due to memory overhead - Redaction cost is minimal (~15%) with proper worker count **Recommendation:** Use 2 workers for production. Production-oriented presets (`production`, `fastapi`, `serverless`, `hardened`) default to 2 workers automatically. ```python # Option 1: Use a production preset (recommended) logger = get_logger(preset="production") # Option 2: Explicitly set worker count logger = LoggerBuilder().with_workers(2).build() ``` See [Performance Tuning](performance-tuning.md) for detailed configuration guidance. ## When to Use fapilog Based on these benchmarks: | Scenario | Recommendation | |----------|----------------| | Fast local file, low volume | stdlib may suffice | | Network sinks (HTTP, cloud services) | fapilog recommended | | High-volume with slow I/O | fapilog recommended | | Latency-sensitive applications | fapilog recommended | | Burst traffic patterns | fapilog with `drop_on_full=True` | ## Limitations These results are indicative, not definitive: - **Single machine** - Development laptop, not production hardware - **Front-end measurement** - Measures log call latency, not end-to-end delivery - **Environment-dependent** - Results vary with CPU, disk, Python version, workload - **Not a substitute for load testing** - Test in your actual environment before deployment ## Reproducing These Results ```bash python scripts/benchmarking.py --iterations 20000 --latency-iterations 5000 ``` Options: | Flag | Default | Description | |------|---------|-------------| | `--iterations` | 20000 | Throughput/memory test iterations | | `--latency-iterations` | 5000 | Latency test iterations | | `--payload-bytes` | 256 | Approximate payload size | | `--slow-sink-ms` | 2.0 | Simulated sink delay for enterprise tests | | `--burst` | 20000 | Burst size for absorption test | ## Related - [Performance Tuning](performance-tuning.md) - Configuration recommendations