Performance Benchmarks

Indicative benchmark results comparing fapilog to Python stdlib logging. These results help contextualize performance tuning recommendations.

Methodology

Parameter

Value

Baseline

Python stdlib logging to file

Test

fapilog with rotating file sink

Metrics

Throughput (logs/sec), latency (μs), peak memory (bytes)

Warmup

1,000 calls before measurement

Iterations

20,000 (throughput/memory), 5,000 (latency)

Payload

~256 bytes JSON

Two scenarios are measured:

  1. Standard benchmark - Raw log call rate with fast file I/O

  2. Slow sink benchmark - Application-side latency when sink I/O is constrained (2ms simulated delay)

Environment

Component

Value

Python

3.13.7

OS

macOS 24.6.0 (Darwin)

CPU

Apple M1 Max

Memory

32 GB

fapilog

0.3.6

Results

Standard Throughput

Raw log call throughput with fast file I/O:

Logger

Logs/sec

Relative

stdlib

90,393

1.0x

fapilog

3,295

0.04x

Interpretation: For raw throughput to a fast local file, stdlib logging is faster. fapilog’s async machinery (queue, batching, background flush) adds overhead that doesn’t pay off when the sink is already fast.

Standard Latency

Per-call latency with fast file I/O:

Logger

Avg (μs)

Median (μs)

P95 (μs)

stdlib

24

12

91

fapilog

279

261

523

Interpretation: Similar to throughput, fapilog has higher per-call latency when sinks are fast. The async infrastructure has fixed costs regardless of sink speed.

Slow Sink Latency (Enterprise Scenario)

Application-side latency when sink I/O is constrained (2ms simulated delay):

Logger

Avg (μs)

Median (μs)

P95 (μs)

stdlib

2,037

2,014

2,040

fapilog

286

274

483

Latency reduction: 86%

Interpretation: When sink I/O is slow (network sinks, constrained disk, external services), fapilog’s non-blocking design prevents the application from stalling. The log call returns immediately while the async backend handles I/O in the background. This is where fapilog’s architecture provides value.

Burst Absorption

Ability to absorb traffic bursts without blocking (20,000 log calls in rapid succession with 2ms sink delay):

Metric

Value

Submitted

22,000

Processed

12,362

Dropped

1,712

Queue high-water mark

10,000

Flush latency

5.0s

Interpretation: With drop_on_full=True, fapilog absorbs bursts up to queue capacity, then gracefully drops overflow rather than blocking the application. Configure queue size based on expected burst patterns.

Memory

Peak memory during throughput test:

Logger

Peak (bytes)

stdlib

85,719

fapilog

10,670,043

Interpretation: fapilog uses more memory due to its queue, batching buffers, and async infrastructure. This is a deliberate trade-off for non-blocking behavior. Configure max_queue_size based on available memory.

Worker Count Impact

The worker_count setting controls parallel flush processing and has the largest impact on fapilog throughput:

Workers

Throughput

Relative

1 (default)

~3,500/sec

1.0x

2

~105,000/sec

30x

2 + redaction

~89,000/sec

26x

Key findings:

  • Workers are the bottleneck with worker_count=1 (serializes all processing)

  • 2 workers is optimal - more shows diminishing returns due to asyncio scheduler overhead (not OS context switching—workers are asyncio tasks, not threads)

  • Queue size has minimal impact - larger queues slightly hurt due to memory overhead

  • Redaction cost is minimal (~15%) with proper worker count

Recommendation: Use 2 workers for production. Production-oriented presets (production, fastapi, serverless, hardened) default to 2 workers automatically.

# Option 1: Use a production preset (recommended)
logger = get_logger(preset="production")

# Option 2: Explicitly set worker count
logger = LoggerBuilder().with_workers(2).build()

See Performance Tuning for detailed configuration guidance.

When to Use fapilog

Based on these benchmarks:

Scenario

Recommendation

Fast local file, low volume

stdlib may suffice

Network sinks (HTTP, cloud services)

fapilog recommended

High-volume with slow I/O

fapilog recommended

Latency-sensitive applications

fapilog recommended

Burst traffic patterns

fapilog with drop_on_full=True

Limitations

These results are indicative, not definitive:

  • Single machine - Development laptop, not production hardware

  • Front-end measurement - Measures log call latency, not end-to-end delivery

  • Environment-dependent - Results vary with CPU, disk, Python version, workload

  • Not a substitute for load testing - Test in your actual environment before deployment

Reproducing These Results

python scripts/benchmarking.py --iterations 20000 --latency-iterations 5000

Options:

Flag

Default

Description

--iterations

20000

Throughput/memory test iterations

--latency-iterations

5000

Latency test iterations

--payload-bytes

256

Approximate payload size

--slow-sink-ms

2.0

Simulated sink delay for enterprise tests

--burst

20000

Burst size for absorption test