# Graceful shutdown & flushing logs (don't lose logs on deploy) Your last log before a crash might be the most important one. When a container is killed during deployment, buffered logs can be lost forever—taking critical debugging information with them. ## The Problem: Lost Logs on Shutdown Async loggers buffer events for performance. When Kubernetes sends SIGTERM, your app has limited time to flush before SIGKILL arrives: ``` App receives SIGTERM ├── Pending logs in queue: 47 events ├── Kubernetes grace period: 30 seconds ├── Time to flush: ~100ms └── Result: Logs written ✓ vs. App receives SIGKILL (no grace period) ├── Pending logs in queue: 47 events ├── Time to flush: 0ms └── Result: Logs lost ✗ ``` Common scenarios where logs get lost: - **Deployment rollouts** - Container replaced before buffer drains - **Pod evictions** - Memory pressure triggers immediate termination - **Crash loops** - App exits before async flush completes - **Scale-down** - Replicas removed during queue drain The debugging pain is real: "I know there was an error log right before the crash, but I can't find it." ## The Solution: Lifespan Integration With fapilog's FastAPI integration, logs are automatically flushed on graceful shutdown: ```python from fastapi import FastAPI from fapilog.fastapi import FastAPIBuilder app = FastAPI( lifespan=FastAPIBuilder() .with_preset("fastapi") .build() # Automatic flush on shutdown ) ``` When your app receives SIGTERM, the lifespan ensures: 1. No new requests are accepted 2. In-flight requests complete 3. Log buffer is flushed 4. Logger workers are stopped This is the recommended approach for FastAPI applications. ## Manual Flush for Custom Scenarios For non-FastAPI apps or custom shutdown handlers, use `drain()` directly: ```python from fapilog import get_async_logger logger = await get_async_logger() async def shutdown(): """Custom shutdown handler.""" result = await logger.drain() print(f"Flushed {result.processed} logs") ``` The `drain()` method: - Flushes all queued events - Stops background workers - Returns statistics about what was processed ### DrainResult Statistics ```python result = await logger.drain() print(f"Submitted: {result.submitted}") # Total events submitted print(f"Processed: {result.processed}") # Events successfully written print(f"Dropped: {result.dropped}") # Events dropped (backpressure) print(f"Latency: {result.flush_latency_seconds:.3f}s") # Time to flush # Adaptive pipeline summary (when using preset="adaptive") if result.adaptive is not None: print(f"Peak pressure: {result.adaptive.peak_pressure_level.value}") print(f"Escalations: {result.adaptive.escalation_count}") print(f"Peak workers: {result.adaptive.peak_workers}") ``` Use these stats to monitor shutdown health in your observability stack. ## Timeout Handling ### Default Behavior The FastAPI lifespan uses a 5-second drain timeout. If flushing takes longer, a warning is emitted but the app continues shutdown: ``` [WARN] fapilog: logger drain timeout (timeout=5.0) ``` ### Configuring Timeout For the manual approach with explicit timeout: ```python import asyncio from fapilog import get_async_logger logger = await get_async_logger() async def shutdown(): try: await asyncio.wait_for(logger.drain(), timeout=10.0) except asyncio.TimeoutError: print("Drain timed out - some logs may be lost") ``` ### Builder Configuration Configure the default shutdown timeout via the builder: ```python from fapilog import LoggerBuilder logger = ( LoggerBuilder() .with_shutdown_timeout("5s") # Maximum time to flush on shutdown .add_stdout() .build() ) ``` Or via environment variable: ```bash export FAPILOG_CORE__SHUTDOWN_TIMEOUT_SECONDS=5.0 ``` ### What Happens When Timeout Exceeds When drain times out: 1. A diagnostic warning is emitted 2. Remaining queued logs are abandoned 3. Shutdown continues This is a tradeoff: waiting indefinitely would hang shutdown, but timing out loses logs. Choose a timeout that matches your Kubernetes `terminationGracePeriodSeconds` minus time for other shutdown tasks. ## Best Practices ### Match Kubernetes Grace Period ```yaml # kubernetes deployment spec: terminationGracePeriodSeconds: 30 # Total shutdown time ``` ```python # app.py - leave headroom for other shutdown tasks from fapilog import LoggerBuilder logger = ( LoggerBuilder() .with_shutdown_timeout("20s") # 20s for logs, 10s buffer .add_stdout() .build() ) ``` ### Handle Crash Scenarios For truly critical logs, consider sync sinks for ERROR level: ```python logger = ( LoggerBuilder() .with_routing( rules=[ # Errors go to sync sink (immediate write) {"levels": ["ERROR", "CRITICAL"], "sinks": ["stderr"]}, # Other levels go to async sink (buffered) {"levels": ["DEBUG", "INFO", "WARNING"], "sinks": ["stdout"]}, ], ) .add_stdout() .add_stderr() .build() ) ``` ### Monitor Drain Statistics Export drain metrics on shutdown: ```python async def shutdown(): result = await logger.drain() # Send to metrics before app exits metrics.record_gauge("log_drain_latency", result.flush_latency_seconds) metrics.record_gauge("log_drain_dropped", result.dropped) ``` ## Going Deeper - [Non-blocking Async Logging](non-blocking-async-logging.md) - Backpressure and queue configuration - [Log Sampling and Rate Limiting](log-sampling-rate-limiting.md) - Control volume before it hits the queue - [Why Fapilog?](../why-fapilog.md) - How fapilog compares to other logging libraries