Anant Jain

Design patterns for container-based distributed systems

Paper Review

Introduction

What is this paper about?

This 2016 paper by Brendan Burns and David Oppenheimer from Google introduces design patterns for building distributed systems using containers. If you're not familiar with containers, think of them as lightweight packages that bundle an application with everything it needs to run (code, dependencies, libraries) into a single unit. Docker is the most popular tool for creating containers, and Kubernetes is a system for managing many containers at scale.

Just as object-oriented programming in the 1990s led to well-known design patterns (like Factory, Singleton, Observer), this paper identifies similar reusable patterns for container-based systems. These patterns help developers build more reliable distributed systems by following proven best practices.

The paper describes three categories of patterns:

  1. Single-container patterns - How individual containers should expose management interfaces
  2. Single-node patterns - How multiple containers work together on one machine
  3. Multi-node patterns - How containers coordinate across multiple machines for distributed algorithms

Why does this matter? These patterns make distributed systems easier to build, test, and maintain by breaking complex problems into reusable components.

Why Containers Are Perfect for Design Patterns

The Language Problem

Previous distributed system frameworks like MapReduce (a programming model for processing large datasets) had a significant limitation: they were tied to specific programming languages. For example, the Apache Hadoop ecosystem is primarily written in and designed for Java. This meant developers couldn't easily use other languages or mix technologies. To create truly reusable design patterns, we need a language-neutral building block.

Containers as the Solution

Containers solve this problem beautifully. While they initially became popular simply as a better way to deploy software (by packaging applications with all their dependencies in isolated, self-contained units), they have much greater potential.

Think of containers as analogous to objects in object-oriented programming:

  • They're hermetically sealed - completely isolated from other containers
  • They carry their dependencies - everything needed to run is included
  • They provide clear success/failure signals - you know immediately if deployment worked
  • They're language-agnostic - a container can run code written in any language

Just as objects became the foundation for design patterns in traditional programming, containers are becoming the foundation for design patterns in distributed systems. This enables developers to build reusable, composable components that work together regardless of the underlying programming language.

Single-Container Management Patterns

Beyond Basic Container Operations

Traditionally, containers have a very simple interface with just three basic operations: run(), pause(), and stop(). While this simplicity is appealing, modern applications need richer management capabilities.

Just as objects in programming have clear boundaries and interfaces, containers provide a natural boundary for defining management interfaces. These interfaces work in two directions:

Upward APIs: What Containers Tell You

Containers can expose rich information about what's happening inside them by hosting a simple web server with specific endpoints. This "upward" communication can include:

  • Monitoring metrics - requests per second (QPS), application health status, error rates
  • Profiling data - active threads, stack traces, lock contention issues, network statistics
  • Configuration info - current settings and parameters
  • Application logs - what the application is doing and any errors it encounters

Think of this as the container saying "Here's what's happening inside me" to monitoring and management systems.

Downward APIs: What You Tell Containers

The "downward" direction is about lifecycle management - how external systems control containers. This makes it easier to write applications that play nicely with orchestration systems like Kubernetes.

A concrete example: Kubernetes uses Docker's "graceful deletion" feature. When Kubernetes needs to shut down a container, it doesn't just kill it immediately. Instead:

  1. It sends a SIGTERM signal (a polite "please shut down" message)
  2. The container gets a configurable grace period (e.g., 30 seconds) to finish what it's doing
  3. The container can complete in-flight operations, save state to disk, etc.
  4. Only after the grace period does Kubernetes send SIGKILL (a forceful shutdown)

This pattern enables clean shutdowns and can be extended to support more advanced features like state serialization (saving the container's state) and recovery.

Other downward APIs might include commands like "replicate yourself" to scale up a service when traffic increases.

Single-Node, Multi-Container Patterns

Working Together on One Machine

While single containers are useful, many applications benefit from multiple containers working closely together on the same machine. Think of these as "symbiotic" containers - they need each other and share resources like disk storage and network interfaces.

To make this work, container orchestration systems need to schedule multiple containers as a single atomic unit - meaning they're always deployed together on the same host machine. Kubernetes calls this unit a "Pod" (the smallest deployable unit in Kubernetes), while other systems like Nomad call them "task groups."

Why run multiple containers instead of just one big container? The benefits of separation become clear with these three patterns:

1. Sidecar Pattern: Helper Containers

The sidecar pattern is the most common multi-container pattern. A sidecar container extends and enhances a main container by providing supporting functionality. They work together by sharing a local disk volume on the same machine.

Example: Imagine a web server that needs to save its logs. Instead of building log-saving code directly into the web server, you run two containers together:

  • Main container: The web server handling requests
  • Sidecar container: A log-saving service that watches the shared disk and uploads logs

Why separate them? You could build everything into one container, but separation provides powerful benefits:

  1. Resource management - The web server can be configured for low-latency responses (guaranteed CPU), while the log saver runs on spare CPU cycles when the server isn't busy. Each container has its own resource allocation (using cgroups).

  2. Team organization - Different teams can own different containers. The web server team and logging team can develop and test independently, then combine them.

  3. Reusability - The same log-saving sidecar can be paired with many different main containers (web servers, databases, APIs, etc.). Write once, reuse everywhere.

  4. Graceful degradation - If the log saver crashes, it doesn't take down the web server. Containers provide failure containment boundaries.

  5. Independent deployment - You can upgrade the web server without touching the log saver, or vice versa. You can also quickly roll back if an upgrade fails.

2. Ambassador Pattern: Proxy for External Communication

An ambassador container acts as a proxy, handling all communication between the main container and the outside world. This works because containers on the same machine share the same localhost network interface.

Example: Imagine your application needs to connect to a distributed cache like Memcache. Instead of connecting directly to multiple cache servers, you use an ambassador:

  • Your application connects to localhost:11211 (simple!)
  • The ambassador container (running something like twemproxy) receives the connection
  • The ambassador handles the complexity of routing to multiple cache servers, load balancing, failover, etc.

Why is this useful?

  1. Simplified application code - Your app only needs to know about one server on localhost, not a complex distributed system

  2. Easy testing - During development, you can run a real Memcache instance locally instead of the ambassador. Same interface, simpler setup.

  3. Reusability across languages - The same twemproxy ambassador works with applications written in Python, Java, Go, or any language. No need to reimplement proxy logic in each language.

3. Adapter Pattern: Standardizing Interfaces

While the ambassador simplifies the outside world for your application, the adapter does the opposite: it presents a standardized view of your application to the outside world.

Example: Different applications export monitoring metrics in different formats:

  • Java apps might use JMX
  • Node.js apps might use StatsD
  • Python apps might use Prometheus format

Instead of building monitoring tools that understand every format, use adapter containers. Each adapter:

  • Reads metrics from the main container in whatever format it uses
  • Converts and exposes them in a standard format
  • Allows monitoring systems to work with any application

This means you can build monitoring infrastructure once and use it with all applications, regardless of their native metric format.

Multi-Node Patterns: Coordinating Across Machines

The previous patterns focused on containers working together on a single machine. These multi-node patterns address how containers coordinate across multiple machines to implement distributed algorithms.

1. Leader Election Pattern

The Problem: Many distributed systems need to elect a leader - one node that coordinates the others. Leader election is complex to implement correctly (handling network partitions, split-brain scenarios, etc.).

The Container Solution: Instead of linking a complicated leader election library into every application, use a leader election sidecar container.

How it works:

  • Each instance of your application runs with a leader election container
  • These leader election containers talk to each other across machines to elect a leader
  • They expose a simple HTTP API on localhost to your application:
    • becomeLeader() - called when this instance becomes the leader
    • renewLeadership() - keep the leadership lease alive
    • isLeader() - check if you're currently the leader

Why this is powerful: Distributed systems experts build the leader election container once, handling all the edge cases correctly. Then any application developer can use it, regardless of programming language. This is abstraction and encapsulation at its best - complex logic hidden behind a simple interface.

2. Work Queue Pattern

The Problem: You have lots of data to process (images to resize, videos to transcode, documents to analyze) and want to distribute the work across many machines.

The Container Solution: Build a generic work queue framework that works with any processing logic packaged as a container.

How it works:

  • Developers create a container with their processing logic: read input file → process → write output file
  • The work queue framework handles everything else:
    • Distributing work to available workers
    • Managing the queue of pending jobs
    • Handling failures and retries
    • Collecting results

The developer only writes the business logic container. The framework is completely reusable for any kind of batch processing job.

3. Scatter/Gather Pattern

The Problem: You need to query multiple data sources in parallel and combine the results. Think of a search engine querying multiple index shards, or a dashboard fetching data from multiple services.

The Container Solution: Implement the pattern with two reusable container types.

How it works:

  1. A client sends a request to a root/parent node
  2. The root scatters the request to many "leaf" containers in parallel
  3. Each leaf performs its partial computation on its data shard
  4. Each leaf returns its partial result
  5. A "merge" container gathers all partial results and combines them into a single response

What developers provide:

  • Leaf container - performs computation on one data shard
  • Merge container - combines partial results into the final answer

What the framework provides: The scatter/gather orchestration, parallel execution, timeout handling, partial failure handling, etc.

This pattern is widely used in systems like Google Search (scatter query to index shards), time-series databases (scatter to time ranges), and distributed analytics systems.

Conclusion

A New Era of Design Patterns

Just as object-oriented programming led to the emergence and codification of design patterns in the 1990s (Gang of Four patterns like Factory, Observer, Strategy), container architectures are now leading to design patterns for distributed systems.

This paper identified three categories of emerging patterns:

  1. Single-container patterns - How containers should expose rich management interfaces beyond just run/pause/stop
  2. Single-node patterns - Three key patterns (Sidecar, Ambassador, Adapter) for containers cooperating on one machine
  3. Multi-node patterns - Three patterns (Leader Election, Work Queue, Scatter/Gather) for distributed algorithms across machines

Why These Patterns Matter

These patterns provide benefits that are especially powerful for distributed systems:

  • Language independence - Mix Python, Java, Go, Rust, or any language in the same system
  • Independent upgrades - Update one component without touching others, reducing deployment risk
  • Graceful degradation - When one container fails, it doesn't necessarily take down the whole system
  • Reusability - Build complex functionality once (like leader election) and reuse it across many applications
  • Separation of concerns - Let experts build framework containers while application developers focus on business logic

The Future

These patterns represent just the beginning. As containers and orchestration systems like Kubernetes mature, we'll likely see more patterns emerge and become standardized, just as happened with object-oriented design patterns. Understanding these foundational patterns helps developers build more reliable, maintainable distributed systems.

PDF


Over the next few Saturdays, I'll be going through some of the foundational papers in Computer Science, and publishing my notes here. This is #26 in this series.