Graceful shutdown & flushing logs (don’t lose logs on deploy)

Your last log before a crash might be the most important one. When a container is killed during deployment, buffered logs can be lost forever—taking critical debugging information with them.

The Problem: Lost Logs on Shutdown

Async loggers buffer events for performance. When Kubernetes sends SIGTERM, your app has limited time to flush before SIGKILL arrives:

App receives SIGTERM
├── Pending logs in queue: 47 events
├── Kubernetes grace period: 30 seconds
├── Time to flush: ~100ms
└── Result: Logs written ✓

vs.

App receives SIGKILL (no grace period)
├── Pending logs in queue: 47 events
├── Time to flush: 0ms
└── Result: Logs lost ✗

Common scenarios where logs get lost:

  • Deployment rollouts - Container replaced before buffer drains

  • Pod evictions - Memory pressure triggers immediate termination

  • Crash loops - App exits before async flush completes

  • Scale-down - Replicas removed during queue drain

The debugging pain is real: “I know there was an error log right before the crash, but I can’t find it.”

The Solution: Lifespan Integration

With fapilog’s FastAPI integration, logs are automatically flushed on graceful shutdown:

from fastapi import FastAPI
from fapilog.fastapi import FastAPIBuilder

app = FastAPI(
    lifespan=FastAPIBuilder()
        .with_preset("fastapi")
        .build()  # Automatic flush on shutdown
)

When your app receives SIGTERM, the lifespan ensures:

  1. No new requests are accepted

  2. In-flight requests complete

  3. Log buffer is flushed

  4. Logger workers are stopped

This is the recommended approach for FastAPI applications.

Manual Flush for Custom Scenarios

For non-FastAPI apps or custom shutdown handlers, use drain() directly:

from fapilog import get_async_logger

logger = await get_async_logger()

async def shutdown():
    """Custom shutdown handler."""
    result = await logger.drain()
    print(f"Flushed {result.processed} logs")

The drain() method:

  • Flushes all queued events

  • Stops background workers

  • Returns statistics about what was processed

DrainResult Statistics

result = await logger.drain()

print(f"Submitted: {result.submitted}")      # Total events submitted
print(f"Processed: {result.processed}")      # Events successfully written
print(f"Dropped: {result.dropped}")          # Events dropped (backpressure)
print(f"Latency: {result.flush_latency_seconds:.3f}s")  # Time to flush

# Adaptive pipeline summary (when using preset="adaptive")
if result.adaptive is not None:
    print(f"Peak pressure: {result.adaptive.peak_pressure_level.value}")
    print(f"Escalations: {result.adaptive.escalation_count}")
    print(f"Peak workers: {result.adaptive.peak_workers}")

Use these stats to monitor shutdown health in your observability stack.

Timeout Handling

Default Behavior

The FastAPI lifespan uses a 5-second drain timeout. If flushing takes longer, a warning is emitted but the app continues shutdown:

[WARN] fapilog: logger drain timeout (timeout=5.0)

Configuring Timeout

For the manual approach with explicit timeout:

import asyncio
from fapilog import get_async_logger

logger = await get_async_logger()

async def shutdown():
    try:
        await asyncio.wait_for(logger.drain(), timeout=10.0)
    except asyncio.TimeoutError:
        print("Drain timed out - some logs may be lost")

Builder Configuration

Configure the default shutdown timeout via the builder:

from fapilog import LoggerBuilder

logger = (
    LoggerBuilder()
    .with_shutdown_timeout("5s")  # Maximum time to flush on shutdown
    .add_stdout()
    .build()
)

Or via environment variable:

export FAPILOG_CORE__SHUTDOWN_TIMEOUT_SECONDS=5.0

What Happens When Timeout Exceeds

When drain times out:

  1. A diagnostic warning is emitted

  2. Remaining queued logs are abandoned

  3. Shutdown continues

This is a tradeoff: waiting indefinitely would hang shutdown, but timing out loses logs. Choose a timeout that matches your Kubernetes terminationGracePeriodSeconds minus time for other shutdown tasks.

Best Practices

Match Kubernetes Grace Period

# kubernetes deployment
spec:
  terminationGracePeriodSeconds: 30  # Total shutdown time
# app.py - leave headroom for other shutdown tasks
from fapilog import LoggerBuilder

logger = (
    LoggerBuilder()
    .with_shutdown_timeout("20s")  # 20s for logs, 10s buffer
    .add_stdout()
    .build()
)

Handle Crash Scenarios

For truly critical logs, consider sync sinks for ERROR level:

logger = (
    LoggerBuilder()
    .with_routing(
        rules=[
            # Errors go to sync sink (immediate write)
            {"levels": ["ERROR", "CRITICAL"], "sinks": ["stderr"]},
            # Other levels go to async sink (buffered)
            {"levels": ["DEBUG", "INFO", "WARNING"], "sinks": ["stdout"]},
        ],
    )
    .add_stdout()
    .add_stderr()
    .build()
)

Monitor Drain Statistics

Export drain metrics on shutdown:

async def shutdown():
    result = await logger.drain()
    # Send to metrics before app exits
    metrics.record_gauge("log_drain_latency", result.flush_latency_seconds)
    metrics.record_gauge("log_drain_dropped", result.dropped)

Going Deeper