Why data center network operators aren’t flipping coins to pick their Ethernet switch fabrics
By James Low | At Keysight, we just wrapped up a project where we benchmarked several Ethernet switches for carrier hyperscale deployment. Going in, I was very concerned that there would be little to no difference in the performance numbers. Switches are a commodity now, right? Testing will simply prove that “flipping a coin” to determine vendor choice is just as risk-free as a full qualification. The results however showed that this could not be further from the truth. We will publish the methodology and results shortly and it should be interesting!
Because switch vendors do their own exhaustive testing for Ethernet frame forwarding performance, traditional test methodologies such as RFC 2544 and 2889 were not included here. Considering the switches under test were going to be deployed in a hyperscale data center, the test focus was on port aggregation performance while simulating a simplified IP Clos architecture as the fabric. It is not uncommon to deploy the switch fabric oversubscribed on the access side. If there was to be a performance difference, it would be in how each switch performs with handling a microburst traffic pattern.
Figure 1. Microbursts cause frame drops due to buffer overflow
The first revelation came with the initial baseline tests. These tests are just to confirm that, port to port, the switch can handle line-rate Ethernet forwarding with a consistent forwarding latency. I virtually yawned as we set up the test on the first switch. The results came in as expected, lossless forwarding and latency at around 1.4 microseconds for min/max and average. We moved on to the next switch and the results came in again with lossless forwarding, but the latency was significantly different. The latency was faster, but with some jitter. There was a fifteen percent difference between min/max with an average of 1.16 microseconds.
The result reinvigorated my effort and we excitedly moved onto the next test. The second test determines the maximum burst over line speed that a switch can handle when configured to oversubscribe the link speed.
The purpose of the test is to identify the threshold to which the switch can handle microbursts with no traffic engineering mechanisms applied. It determines how much memory is applied to the buffer and how well the buffer handles contention. The first switch, configured with 2x100G into a single 100G port, demonstrated its buffer could handle 200G of traffic for one 1 msec before frame loss. As we moved on to the next vendor, the result showed that it could handle 3.25 msec of burst before frame loss. There was a 3x difference? So much for the thought that there are no performance differences when it comes to switching.
In addition to burst tolerance, here are a few other test considerations as you validate your switch fabric:
- Does the fabric ensure that no traffic drops during failover? Does the failover convergence meet your workload expectations?
- Does the data center overlay network (VxLAN, EVPN, ECMP) scale and perform to support multihoming and traffic load balancing?
- How do the congestion control mechanisms work to handle increasing RoCEv2 from storage workloads?
All of these are critical factors to consider when assessing the fabric performance.
Clearly, higher-speed Ethernet switching is hard. Switching at 100G, and more so 400G and even beyond, requires multi pipeline architectures to deliver the required performance. There are multiple combinations, permutations, and subcomponents that need validation as a switch fabric system. This is especially true in this era of disaggregation and open networking, where there are:
- Multiple ODM vendors with their white-box designs with various CPU/memory considerations and ASICs
- Merchant silicon vendors with their ASICs with various sized buffering and forwarding tables
- Open network operating systems (NOS) such as SONiC and the need to harden both open NOS and vendor NOS
Care must be taken to be aware of the architecture of the subcomponents and plan accordingly for different application workloads seen in your data center customer environments. Each vendor’s engineering team will choose a slightly different way to solve the challenges, creating advantages or disadvantages over competitive solutions. Testing will determine not only the best switch but the best way to deploy and architect the switch fabric to ensure the applications are oblivious to switching fabric and services are reliably delivered.