Skip to content

Zero-Downtime Docker Compose: Bypassing the Kubernetes Cargo Cult

Why naive rolling updates fail under load, and how to achieve true zero-downtime deployments using HAProxy or docker-rollout.

Emeka Okafor
Emeka Okafor
Security Editor · Jun 24, 2026 · 5 min read
Zero-Downtime Docker Compose: Bypassing the Kubernetes Cargo Cult

The industry has a collective obsession with Kubernetes. We are told that if you want to run a serious production service with zero downtime, you must run a multi-node cluster, manage etcd, and write thousands of lines of YAML. It is a mass delusion. For many applications, Docker Compose is more than enough to handle production workloads without the operational overhead of a massive orchestrator.

But there is a catch. If you just run docker compose up -d, Docker stops the old container before starting the new one, leaving a 10 to 20 second window of dropped requests. Simply scaling up and down to avoid this introduces subtle race conditions that will drop traffic under load. To achieve true zero-downtime without Kubernetes, you have to understand why naive proxy setups fail and how to configure your routing layer to handle the transition gracefully.

The Traefik Trap: Why Naive Rolling Deploys Fail

Many developers reach for Traefik because of its automatic Docker service discovery. You define some labels, run a rolling deploy, and expect it to work. It does not.

If you try to spin up a new service alongside the old one using the same labels, Traefik's Docker provider treats each service as a separate configuration source. It throws a "Service defined multiple times" error and returns a 404 on every request. There is no fallback or configuration merge; it simply refuses to route traffic.

To bypass this, you might try scaling the service instead (for example, running docker compose --scale backend=4), then scaling back down to 2 after the new containers are healthy. This uncovers the scale-down race. Traefik's internal routing table lags behind Docker's actual state. When you scale down, Traefik keeps routing traffic to containers that are already in the process of shutting down, causing a flood of 502 errors.

The fatal flaw, however, is Traefik's retry behavior. When a container receives a SIGTERM, it begins a graceful shutdown. But there is a window where in-flight requests, or new requests arriving before the routing table updates, hit the dying container. The connection drops mid-stream. Traefik's retry middleware exists, but it retries the request on the same dying backend. It does not redispatch to a healthy container. The request is lost.

The HAProxy Solution: True Redispatching

To solve this, you need a load balancer that can dynamically re-route failed requests to a different, healthy backend mid-flight. This is where HAProxy shines.

The magic lies in HAProxy's option redispatch and its retry configuration. When a connection fails, HAProxy does not just retry blindly on the same IP; it redispatches the request to another active container.

Here is a production-grade HAProxy configuration designed for Docker Compose rolling deploys:

global
    log stdout format raw local0 info
    maxconn 4096

defaults
    mode http
    timeout connect 3s
    timeout client 30s
    timeout server 30s
    retries 3
    option redispatch 1
    retry-on conn-failure empty-response response-timeout 502 503 504

resolvers docker_dns
    nameserver dns1 127.0.0.11:53
    resolve_retries 3
    timeout resolve 1s
    timeout retry 1s
    hold valid 2s

frontend http_in
    bind *:80
    default_backend backends

backend backends
    balance roundrobin
    option httpchk http-check send meth GET uri /health
    http-check expect status 200
    default-server inter 1s fall 1 rise 1 check resolvers docker_dns init-addr none observe layer7 error-limit 3 on-error mark-down
    server-template backend 1-10 backend:8000 check

Let's break down why this works:

  1. option redispatch 1 combined with retry-on: If a backend returns a 502, 503, 504, or drops the connection, HAProxy immediately retries the request on a different container.
  2. resolvers docker_dns: It queries Docker's internal DNS resolver (at 127.0.0.11) and holds the IP addresses as valid for only 2 seconds (hold valid 2s). This keeps the routing table highly synchronized with Docker's state.
  3. server-template backend 1-10 backend:8000 check: Instead of hardcoding container IPs, this tells HAProxy to expect up to 10 replicas of the backend service, resolving them dynamically via DNS.

The Alternative: Orchestrated Rollouts with docker-rollout

If you prefer not to manage HAProxy configurations, another approach is to use docker-rollout, a lightweight Docker CLI plugin.

Instead of running docker compose up -d, you run docker rollout <service>. The plugin scales the service to twice its current instance count, waits for the new containers to pass their health checks, updates the proxy, and then tears down the old containers.

To prevent dropped requests during the teardown, you must implement container draining. This is done by configuring a pre-stop hook that marks the old container as unhealthy before it is killed, giving the proxy time to stop sending traffic.

First, install the plugin:

mkdir -p ~/.docker/cli-plugins
curl -L https://raw.githubusercontent.com/wowu/docker-rollout/master/docker-rollout -o ~/.docker/cli-plugins/docker-rollout
chmod +x ~/.docker/cli-plugins/docker-rollout

Next, configure your service in docker-compose.yml to support draining:

services:
  web:
    image: myapp:latest
    healthcheck:
      test: ["CMD-SHELL", "test ! -f /tmp/drain && curl -f http://localhost:3000/healthcheck"]
      interval: 5s
      retries: 1
    labels:
      docker-rollout.pre-stop-hook: "touch /tmp/drain && sleep 10"

When you trigger docker rollout web, the plugin executes the pre-stop hook on the old container. The hook touches /tmp/drain, causing the health check to fail. The proxy detects this and stops routing new traffic to it. The container then sleeps for 10 seconds, allowing any in-flight requests to finish processing before the container is stopped.

The Developer Angle: Architectural Trade-offs

Let's look at how HAProxy handles a container shutdown compared to a naive proxy during a deployment:

sequenceDiagram
    autonumber
    actor Client
    participant Proxy as HAProxy
    participant Old as Old Container (Dying)
    participant New as New Container (Healthy)

    Note over Old: SIGTERM received, shutting down
    Client->>Proxy: GET /api/data
    Proxy->>Old: Forward Request
    Old-->>Proxy: Connection reset / 502
    Note over Proxy: option redispatch active
    Proxy->>New: Retry Request
    New-->>Proxy: 200 OK
    Proxy-->>Client: 200 OK

Both the HAProxy and the docker-rollout approaches work, but they suit different workflows.

The HAProxy approach is entirely configuration-driven and does not require custom CLI tools on your deployment server. It is ideal for teams that want a standard, battle-tested load balancer and are comfortable writing HAProxy configurations.

The docker-rollout approach is highly developer-friendly and integrates easily into CI/CD pipelines. However, it introduces a few strict constraints:

  • No hardcoded ports: You cannot define container_name or expose host ports directly on your application service in docker-compose.yml. Doing so prevents Docker from scaling the service to twice its size during a rollout. You must route all traffic through a proxy.
  • Mandatory health checks: Your application must have a reliable health check. Without it, the rollout tool cannot know when the new container is ready, defeating the purpose of the rolling update.
  • Tuned draining times: You must carefully tune your pre-stop sleep duration. The sleep time must be longer than your health check interval multiplied by the retry count, plus the time required to finish processing open requests.

Kubernetes is a powerful tool for massive, multi-node deployments, but it is often an expensive, complex solution to a simple problem. By leveraging HAProxy's redispatch capabilities or using a targeted tool like docker-rollout, you can achieve resilient, zero-downtime deployments on a single VPS with plain Docker Compose. Keep your stack simple, understand your proxy's retry behavior, and stop babysitting clusters you do not need.

Sources & further reading

  1. Zero-Downtime Deployments with Docker Compose – No Kubernetes Required — statusdude.com
  2. Easy Zero-downtime Docker Compose deployment — supun.io
  3. GitHub - wowu/docker-rollout: 🚀 Zero Downtime Deployment for Docker Compose — github.com
Emeka Okafor
Written by
Emeka Okafor · Security Editor

Emeka has spent over a decade tracking threat actors, vulnerability disclosures, and the evolving landscape of application security, bringing a sharp continent-spanning perspective to his reporting. He's known for translating dense CVE advisories into clear, actionable context that developers and security teams alike actually read.

Discussion 0

Join the discussion

Sign in or create an account to comment and vote.

No comments yet

Be the first to weigh in.

Related Reading