The Mathematical Journey to Flat Datacenter Networks
From 1970s expander graphs to Penrose tiling, cloud architects are rethinking how we route packets at scale.
For decades, building a datacenter network meant embracing hierarchy. The industry fell in love with the fat-tree topology—a folded Clos architecture of stacked switch layers—and never really looked back. But as hyper-scale cloud computing pushes physical and logical limits, the rigid, complex cabling of hierarchical networks is starting to look like a bottleneck.
To solve this, network architects are digging back into mathematical concepts from the 1970s, chasing the elusive promise of a "flat" datacenter network.
The Mathematical Promise of Expanders
The quest for the optimal routing network isn't new. In the late 1970s, mathematicians began defining "expanders"—highly connected graphs with strong connectivity properties that guarantee no subset of vertices can be isolated from the rest. In 1976, Leslie Valiant published some of the earliest discussions on these graphs.
Following the Alon-Boppana bound, which established theoretical limits on these networks, mathematicians like Alexander Lubotzky, Ralph Phillips, and Peter Sarnak created explicit constructions of optimal expanders. The catch? These designs were incredibly intricate, relied on advanced number theory, and only worked for very specific network sizes and degrees. They weren't practical for a rapidly growing datacenter.
A breakthrough came in 1991 when Joel Friedman proved that a randomly wired network is, with high probability, nearly as good an expander as the most complex explicit construction. A recent mathematical proof in 2023 went a step further, showing that random graphs actually match the optimal bound. The implication for systems engineers was massive: if you want an optimal routing network, you don't need a PhD in number theory—you just need to wire your switches at random.
The Fat-Tree Era and the Limits of Structure
Instead of embracing randomness, the networking industry took a highly structured path. From the mid-1980s onward, communication networks relied on fat-tree topologies with two, three, or more layers of switches.
As cloud computing exploded in the late 2000s, engineers scaled these fat-trees to massive proportions. In 2009, a team of researchers led by Albert Greenberg published "VL2: A Scalable and Flexible Data Center Network." The VL2 architecture pushed fat-trees to their limit by introducing flat addressing and randomized Valiant Load Balancing to spread traffic uniformly across paths.
While VL2 (which won the SIGCOMM Test of Time award in 2019) proved that randomizing traffic improved performance, the underlying physical network remained hierarchical, rigid, and incredibly complex to cable.
The Jellyfish Proposal and Its Real-World Hurdles
In 2012, researchers at the University of Illinois proposed a radical alternative called "Jellyfish." Jellyfish suggested building datacenter networks using random graphs, finally bringing Friedman's random-wiring theory to the server room.
While Jellyfish sparked significant research, it remained largely theoretical because it left three massive operational challenges unsolved:
- Routing: Routing in random graphs is incredibly tricky because data can take a massive variety of non-standard paths.
- Cabling: Physically running cables between randomly selected endpoints is a logistical nightmare on a datacenter floor.
- Operations: Managing, troubleshooting, and predicting the behavior of an unstructured, randomized network is highly unpredictable.
A Geometric Alternative: Penrose Tiling
With random graphs proving operationally difficult, researchers began looking for structured, non-hierarchical alternatives. In 2023, Giacomo Bernardi, a principal scientist at AWS, began investigating whether datacenter routers could be arranged in a flat network based on Penrose tiling.
Penrose tiling is a geometric construction where a small set of shapes tessellates an infinite plane without ever repeating. This aperiodic structure offers a fascinating middle ground: it is highly connected and flat, yet possesses a predictable mathematical structure that random graphs lack.
To tackle the immense routing and operational challenges of this approach, Bernardi partnered with Ratul Mahajan, an Amazon Scholar and Professor at the University of Washington. While the practical implementation of Penrose-tiled networks is still an active area of research, it represents a bold step away from the rigid hierarchies that have dominated networking for forty years.
Sources & further reading
- Flat Datacenter Networks at Scale at Amazon — perspectives.mvdirona.com
Lenn writes about cloud platforms, Kubernetes internals, and the infrastructure decisions that quietly make or break engineering organizations. Based in Berlin's vibrant tech scene, they have a talent for turning dense platform-engineering topics into prose that people actually finish reading.
Discussion 0
No comments yet
Be the first to weigh in.