Co-Packaged Optics: Promises and Challenges

The technological advances in information-rich areas such as artificial intelligence, autonomous vehicles, and digital twins are driving an increasing need for data centers to scale their data-handling capabilities. The shift to work-from-home roles, distribution of higher-bitrate content, adoption of cloud-based computing, scaling of the metaverse, and the transition of virtual reality into the mainstream exacerbates this need.

New computing technologies like machine-learning-enabled processors help hyper scalers process this deluge of data more efficiently. However, the sheer throughput of the data itself presents new issues. Managing this barrage of bits requires the use of high-bandwidth Ethernet switches to provide connections between vast networks of servers.
As the amount of data generation and processing grows due to emerging technologies, hyper scalers face the task of keeping up while trying to optimize for physical space, power consumption, cost, reliability, and scalability. This technology requires innovations in computing, cabling, connections, and switch architectures.

Co-packaged optics is one such innovation, a revolution in a long unchanged approach to data center switch design. While many herald co-packaged optics as the bright new path forward, it carries with it an accompanying set of challenges: balancing power and cost savings, standardizing for interoperability, ensuring reliability and repairability, and implementing new methods for test and validation. In this article we’ll dive into these considerations while also providing a primer on the evolution of switch connectivity technology.

Switching Silicon

As you walk down the hallways of a data center you can find a variety of switches at various levels of the infrastructure. When the vast majority of data center traffic consists of data moving into and out of data centers (north-south traffic), the three-tier aggregation structure of switching prevails. With the shift to distributed computing and increasing server-to-server traffic (east-west traffic), data centers have adopted a more robust “spine-leaf” setup shown in Figure 1. As east-west traffic ramps exponentially, unprecedented levels of data center traffic threaten to outpace the development of new switches that are up to the task.

Figure 1. A spine-leaf architecture proves superior to the traditional three-tier architecture to accommodate growing east-west data traffic. Image courtesy of FS.

At the heart of a switch lies a specialized application-specific integrated circuit (ASIC) capable of up to terabits-per-seconds throughput. At one point most of these ASICs were developed in-house by the switch manufacturers. However, that paradigm has shifted with the rise of merchant silicon⁠—ASICs that are developed by third-party silicon vendors and sold to switch manufacturers for final product integration.

With data center traffic doubling every two to three years, companies like Intel, Nvidia, and Broadcom have kept pace by developing switching silicon that can manage the throughput. However, as data moves from switch to switch it hits another bottleneck before reaching the switch’s ASIC: the electrical path between the optics and the ASIC.

Figure 2. Experts project switch silicon to continue doubling throughput capacity every two years.

Data Center Optics

In traditional switches the switching ASIC drives the data over multiple channels across the printed circuit board to ports on the front panel of the switch chassis. The ports and their pluggable modules have evolved alongside the switching silicon in the form of increasing speed or number of channels per link. The throughput per port has grown exponentially from the originally Small Form Factor (SFP) 1 Gb/s links to the latest Quad-SFP Double-Density form factor (QSFP-DD 800) supporting up to 800 Gb/s. One can use modules with copper cabling, otherwise known as direct-attach copper (DAC), to connect switches to one another. However, copper as a medium cannot handle the speeds and distance necessary for most modern data center communication. Instead, data centers leverage fiber-based optical interconnects between switches due to their ability to preserve signal integrity over long distances with the added benefit of lower power consumption and better noise immunity compared to copper cabling.

Fiber cabling requires the use of transceiver modules in the switch ports to convert signals from the electrical domain of the switching silicon to the optical domain of the cabling and vice versa. Figure 3 shows a conventional transceiver with two key components: the transmit optical subassembly (TOSA) manages the electrical-to-optical conversion while the receive optical subassembly (ROSA) manages conversion in the opposite direction. The copper fingers of the transceiver plug into the switch while an optical connector plugs into the other end. The optical connectors come in an entirely separate variety of form factors and variants. Multisource agreement groups (MSAs) work to ensure standardization and interoperability between vendors as new transceiver and cable technologies enter the market.

Figure 3. This QSFP28 LR4 transceiver includes a TOSA and ROSA converting between the electrical and optical domains. Image courtesy of InnoLight.

SerDes

The path between the pluggable transceiver and the ASIC consists of copper-based serializing and deserializing (SerDes) circuitry. As the switching silicon scales, the copper interconnects must equally scale, which switch vendors achieve by increasing either the number or speed of SerDes channels. The highest-bandwidth switch silicon today supports 51.2 Tb/s, which manufacturers accomplished by doubling the number of 100 Gb/s PAM4-modulated SerDes lines from 256 to 512.

If a 51.2 Tb/s ASIC serves a front panel of 16 ports, the switch requires a 3.2T link at each port to fully utilize the provided switching capacity. While today’s highest-bandwidth pluggable implementations provide 800 Gb/s per port, standards groups are actively working to expand the capacity of these links through channel density and speed (e.g. 16 channels at 200 Gb/s to reach 3.2T).

Table 1. Switch silicon vendors’ year-over-year capacity growth

Research is underway into 224 Gbps technology, which would enable the use of 1.6 Tb/s interfaces at the front panel. With increased speed, however, comes the added challenge of more complex signal transmission methods and higher power consumption per bit. Evolutions like the move to 224 Gb/s in have helped to avoid the ominous predictions made in the early 2010s which forecasted skyrocketing power consumption scaling with traffic through data centers. Figure 4 shows how competing factors have kept data center power consumption relatively steady.

Figure 4. While data demand has increased, advances in energy efficiencies have curbed the skyrocketing energy usage predicted a decade earlier. Source: Masanet, et. al., 2020.

Along with updates to network layouts and cooling systems, the increased data rate for a single switch allowed for a smaller number of devices required, reducing the footprint and overall power consumption. However, technology experts suggest that we’re approaching a physical limit of copper channel data rates within the existing server form factor. While breakthroughs in interconnect technology have supported scaling to 800 Gb/s links and 1.6T, driving beyond these data rates will require a fundamental change in switch designs.

In the next article we'll explore new switch architectures including the incorporation of co-packaged optics.

limit
3