What's in an Application Signature?

Have you ever wondered how those fancy Next Generation Firewalls (NGFW), Intrusion Detection and Prevention Systems (IDPS) or network visibility appliances are able to identify which internet application you are accessing, just by sniffing the network traffic? How they know if you’re using the mobile version or the desktop version of the same app? Or how they can tell apart Google Drive from Google Maps or Gmail, even though they’re all encrypted?

Most of that is done through static signatures. Of course, there are newer methods, such as fingerprinting (the most popular being the JA3 fingerprint) or Artificial Intelligence / Machine Learning (AI/ML), but those are only applied in conjunction with and to complement static signatures, not alone. We might talk about these new methods in a future blog post, but for now let’s stick to the main engine: signatures.

Signatures are generic in nature, not standardized, and while some follow popular encoding languages or formats, such as XML, YAML, SNORT etc., others use proprietary file formats. Static means they are pre-built and do not change often, as opposed to dynamic which are learned and created on the go. Signatures contain sets of criteria, expressions or patterns which are compared against the data packets and, if they match, then the data packets are marked as matched. The criteria used for matching application signatures vary between vendors and products, but what most have in common is that they rely on values from packet headers and payloads.

Web Applications that run on the TCP/IP stack can be traced in header fields or payloads at almost any upper layer of the OSI model:

By now, you may be asking: with so much info at every layer, what’s the big deal in identifying apps?

The reality is more complicated. By itself, each layer only provides a piece of the puzzle but not the complete answer.

With one exception: Layer 7 cleartext apps. This is the easiest case you can dream of, but the least common in today’s networks. Various estimates and statistics (Google, Let’s Encrypt) place today’s web traffic encryption ratio between 80% and 95%, which leaves a very small 5-20% fraction of the web apps unencrypted. That means Layer 7 content can only help to reliably identify the app in about 5-20% of the cases. As the saying goes, encryption is both very fortunate and very unfortunate for security.

So, what happens with the rest of 80-95% of the traffic, how is that identified? We already said that, with encrypted apps such as HTTPS, security tools are blind at Layer 7, unless they or a 3rd party decrypt the traffic. But decryption is resource-intensive, and it’s usually done AFTER identifying the apps which need to be decrypted, so we’re back where we started. We now must rely on the other layers and pick the most accurate detection method available.

But which one IS the most accurate? Common knowledge tells us that the lower you go on the OSI layers, the more they focus on network rather than application, and the higher you go, they are more application oriented. Basic networking 101, right? So, the obvious conclusion is to try to base the application detection decisions on the highest layer information available.

For encrypted traffic, the highest layer with readable information is layer 5/6, where the encryption handshake happens, because everything above that is encrypted. TLS is currently the most used encryption protocol on the web, but others such as QUIC are gaining ground. The TLS handshake consists of several messages between client and server and it begins with Client Hello, Server Hello and Server Certificate. The Client Hello contains an extension called SNI (Server Name Indication) which is extremely useful because it indicates the name of the web server that the client is trying to reach. This is often different from the DNS name of the reverse IP lookup and is way more accurate. Just to give an example, here’s a capture of the SNI for a connection to www.youtube.com versus a sample of the reverse IP lookup for the same IP address:

SNI from a YouTube TLS session SNI from a YouTube TLS session

Reverse IP lookup from the same YouTube session - host name is obfuscated, youtube.com is not mentioned anywhere, only Google as the owner Reverse IP lookup from the same YouTube session - host name is obfuscated, youtube.com is not mentioned anywhere, only Google mentioned as the owner

The Server Certificate is another piece of valuable information, as it contains the Subject field, also called Issued To or Owner, and that identifies the entity/organization name that is presenting this certificate. That is often the name of the web server, but not always, as the same Subject could be used on multiple web servers belonging to a larger entity. Let’s take the same example of a YouTube session and check one of its certificates:

Again, this is issued to Google instead of YouTube – it’s legit because YouTube is a subsidiary of Google. And somewhere in its body, the certificate has a field called Subject Alternative Names, which lists all the possible domain names under which this certificate applies, and eventually youtube.com is included:

But we’re not sure which name on the list is being accessed by the client in this exact session, so the name of the actual application remains unclear. Therefore, certificate alone cannot narrow down on the app accurately, and SNI must be used for more accuracy.

There are pitfalls, however, in relying on SNI and certificate for application intelligence. The recent TLS 1.3 standard introduced mandatory encryption of the certificate and optional encryption of the SNI. Encrypted SNI (ESNI) extension is not used on a large scale yet, but it’s supported by most web browsers and servers and it’s likely that application detection rates will severely drop as ESNI adoption increases. More on that in a future blog post.

If the SNI and Server Certificate are not available for reading, for various reasons that are either due to TLS 1.3 and ESNI or use of another encrypted handshake protocol such as QUIC, then we descend to OSI Layer 4. This is where the transport protocol and port numbers reveal some generic info about the application, but nothing specific. The port numbers are generic but not mandatory, therefore, for example, anyone could build an application over TCP port 443 which is not HTTPS. Conversely, anyone could build an HTTPS application over port 44443 instead of 443. Using transport protocol and port numbers is accurate in most cases as long as the applications adhere to standard practices. But as soon as a non-standard application enters the network, detection using layer 4 has a high chance of failing. This is why protocol:port detection is used as a complement to other methods or as a last resort and it’s not a reliable app and malware intelligence method by itself alone.

IP information is not really an application detection method, but merely a way to complement the higher-layer info. There is little record of the real web app in the IP layer and, as shown above, the reverse DNS lookup only reveals the top-level owner of that IP address. But even that can be useful in tracing single-IP web services or unknown apps. Server IP geolocation information can be added to the feed, to enhance the info so that we know whether this is the US localized version of the app, or the Korean one, for example.

Conclusion and next blog preview

Application signatures are distinctive pattern-based detection methods which use expressions or marks for identifying application traffic. Security devices, L7 networking devices, application monitoring appliances, and Keysight’s AppStack, CloudLens vAppStack and TrafficREWIND, all use application signatures. In the next blog, we’re going to talk about the specifics of how AppStack applies highly accurate detection techniques using the most advanced signature databases.