Application Notes
GPUs are costly. Keeping them idle means dollars going in drain. In AI-ML computation, the fabric connecting the servers needs to be efficient - high throughput, low latency, and loss less. So that network does not become the cause of GPU idleness. Our solution provides various measures to find the health of the network before deployment. There are multiple parameters to tune. This app note would be helpful to customer to know about bells and whistles of the solution. Also, guide them what to measure. So, that customer can build a proper test strategy using our solution.
This document emphasizes the focus on improving communication performance specifically for machine learning and high-performance computing tasks. Our RoCEv2 emulation in IxNetwork exposes various configuration parameters to modify the network - this app note would discuss them. Also, the measurements through stats would also be described. Some of the customer DUT parameters will be also described to get congestion controls.
What are you looking for?