Optimizing Retries in UDP-Based Applications

By Daniel Munteanu | We all know that UDP by definition is a connectionless transport layer protocol that does not provide mechanisms for reliable transmission of data packets or congestion control. Such functionalities, when needed at the transport layer, are provided by other protocols like TCP, multipath-TCP, and SCTP.

However, UDP leaves the transmission reliability aspects at layers above, if and how required.

One option to add the missing reliability is to use an intermediary protocol that handles the reliable transmission part like Google's QUIC.

The other option is for the application itself to handle the reliability part like, among other things, performing retries. But how is this exactly driven? There are two possibilities:

A POOR “RETRIES” IMPLEMENTATION IMPACTS APPLICATION PERFORMANCE

Retries are a critical aspect to improve user experience and application behavior but should be handled carefully and according to the application context. This is because there are no generally available golden values and falling towards one extreme is always a risk (e.g., not having enough retries might prematurely lead to a failed transaction, while too many retries could yield inappropriate wait/idle times). A reference standard is RFC8085 which provides guidelines for designing applications running on top of UDP, with respect to congestion control, reliability, message sizes, checksums, etc.

Also, another important RFC is RFC1536, which attempts to document known DNS problems and potential fixes. As stated in the RFC document, “Various DNS implementations and various versions of these implementations interact with each other, producing huge amounts of unnecessary traffic.”

Considering all the above, when we look at the traffic generated by various application mixes, different types of UDP-retries behavior is encountered. As DNS is ubiquitous, we will have the RFC-governed behaviors. Also, we can find UDP-based protocols like network time protocol (NTP), for which the UDP retry mechanism is implemented differently (which can even mean 0 retries) depending on the specific application implementation.

REAL-WORLD TEST TRAFFIC CRITICAL TO OPTIMIZE RETRIES

When facing such complex and diverse environments, modeling the actual traffic profiles and even more importantly, properly assessing the impact of packet loss (in our case UDP packets, but this can be also thought of in the broader sense as well) on the application can become an intangible task. Realistically modeling these scenarios is of paramount importance to properly assess the performance that different network elements (next-gen firewalls, application delivery controllers, deep-packet inspection devices, etc.) or even entire systems can reliably sustain without impacting the in-transit applications. Mimicking the real production traffic dynamics is also important for reproducing/discovering network bugs or various malfunctions.

A test traffic generator offering a high level of flexibility and application richness is a mandatory tool to consider such use cases. Ixia’s BreakingPoint test solution offers such unmatched flexibility and in the following paragraphs, we will go through an example of how to configure and emulate UDP retry behavior for applications that are not necessarily RFC regulated with this regard.

TEST EXAMPLE: VALIDATING NTP UDP RETRIES

Just as an example, we will use NTP, which offers clock synchronization between different systems. The major BreakingPoint functionality that we will leverage to emulate UDP retries is Conditional Request. In a nutshell, a Conditional Request would make the BreakingPoint test tool generate one of a series of responses, based on the received request.

Therefore, in the next paragraphs, we will use a sample Superflow to generate an NTP exchange between a BreakingPoint emulated client and server endpoints, all with the UDP retry logic built-in.

For our UDP retry example, the Superflow configuration would look as follows:

  1. First, we would have the initial NTP message from the BreakingPoint client side\
  2. The second defined action would be a server-side Conditional Request configured as follows:
    a. In the case of a Match (i.e., the server received the client-generated NTP request), the server will respond with the proper NTP message.

upd1

b. In the case of a Mis-match (i.e., the server did not receive the client-generated NTP request), the configuration will jump and re-send the initial client NTP request. Also, there are a number of times we can retry it (before considering it failed) as defined in the Iteration parameter below:

upd2

  1. The third defined action would be another Conditional Request, but this time on the client side, to check whether we received the server message:
    a. In the case of a Match (i.e., the client received the server-generated NTP response), the flow would successfully close.

upd3

b. In the case of a Mis-match (i.e., the client did not receive the server NTP response), the configuration will jump and re-send the initial client NTP request (that is because the client would not know in which direction the NTP packet got lost). Also, the Iteration parameter defines the number of times we can retry it (before considering it failed).

upd4

  1. The fourth defined action is a simple Close statement to finish the flow gracefully.

The exercise above is just an example of the unmatched flexibility that BreakingPoint offers when building application flows. This flexibility empowers power users to build their own flows based on custom use cases and complex logic requirements.

Nevertheless, for ease of use, we’ve added canned Superflows with the UDP retry logic built in. Check out our latest ATI Updates (2018-04) that include such canned Superflows for protocols like NFS, SNMP, and others.

LEVERAGE SUBSCRIPTION SERVICE TO GET THE LATEST APPLICATIONS

The Ixia BreakingPoint Application and Threat Intelligence (ATI) Subscription provides bi-weekly updates of the latest application protocols and attacks for use with Ixia test platforms.

limit
3