Hiding in Plain Sight: Decoding Malware on Pastebin
By Adrian Hada | In staying on top of the latest threats, my fellow researchers and I discover new and creative ways malware authors try to keep one step ahead. In this respect, discovering samples on a publicly viewable service such as Pastebin might come as a surprise. The truth of the matter is that some researchers have been monitoring Pastebin for a while now, searching for samples–shout-out to @ScumBots who has been publishing his findings on Twitter since last year.
The ATI team continuously monitors fresh “pastes” for interesting threat intelligence. There has been much notable information throughout time on Pastebin – malware sources, malicious binaries, credit card data, e-mailing lists, Perl bots, and many others. Some of these probably belong to the threat actors themselves, while others (such as lists of phishing sites) are used to share information between researchers.
Of course, the samples themselves are not simply pasted, they go through different forms of encoding–most likely an attempt to hamper identification. Part of the reason is that Pastebin receives printable text and an executable file has a lot of non-printable characters. There also might be some attempt to evade security products, but that isn't something that is easily demonstrable. In this blog post, you will see some of these methods in action.
METHOD 1: BASE64 ENCODING
Base64 encoding seems to be the most prevalent choice. You might be acquainted with it from e-mail transfers, but in the field of security it is often used when your only option is to use printable text characters or want to evade detection by using a very simple method. Here’s an example of an njRAT sample in the wild:
Sometimes, capturing the text and decoding isn’t that simple. Some pastes contain Visual Basic code that decodes the binary and executes it as in this other example:
In this case, the byte string would be decoded on the targeted machine and then executed. The encoding used here is also Base64.
This practice of using Visual Basic code has also created more complicated schemes such as the following:
What the author of the code does here is decode the text by first subtracting the key value (3, in this case) from the value of the byte at each point in the array. Then, he proceeds to Base64-decoding it into a working binary and executing said malware.
METHOD 2: HEX-ENCODING
Another means of obfuscation uses hex encoding of the binary, basically transforming the value of each byte into its base-16 equivalent and typing it out. Here’s an example of a Revenge RAT binary encoded this way:
The leading 4D5A values are a dead giveaway of a Windows binary—the so-called "magic number" for a Windows executable file. One might also remove the spaces, relying on the fact that all hex-encoded byte values fit in exactly two characters. Another method seen in the wild is to add some random bytes around the hex-encoded data and split the string using ‘H’ characters to separate the binary start and end:
You can clearly see the giveaway 4d5a after the leading H character in there.
Another hex-based encoding strategy does hex-encoding after base64-encoding the contents. So, to obtain the final malicious binary, a researcher needs to first decode the hex string and then the base64-encoded string. One such example of, once again, an njRAT sample:
This is the part that is hex-encoded. After hex-decoding this becomes:
This is similar to other base64 strings you might remember from earlier in the file. Decoding leads to:
I re-encoded the completely decoded data back to hex format, so you can easily see the leading ‘4d5a’ pattern at the beginning.
METHOD 3: ASCII-ENCODING
Method three involves taking the values of the equivalent ASCII code for every character and outputting it. Since these equivalents can have a variable width of one to three characters when output, spaces around each encoded character become mandatory:
Using the provided ASCII table, you can look up and discover that “77” and “90” have the hex equivalent of “4D” and “5A”, the Windows executable signature that prints as “MZ”. Of course, people in tech always trust code more than manual lookups, so here’s a snippet that decodes these:
METHOD 4: BINARY ENCODING
Last, but not least (I see it more and more often), is a simple binary encoding scheme. The byte values are transformed to their binary equivalents and the resulting 0/1 sequence is output. One such example, again of njRAT:
And here’s the sample being decoded:
CONCLUSION
There are a lot of simple but efficient methods of obfuscating binaries. By combining two such encoding schemes—such as is the case of Base64 combined with an addition to the byte value—you can easily bypass simple string-matching detection methods. This makes the job of a threat analyst more difficult, but also much more entertaining.
These investigations allow us to better understand the tools of our adversaries and create the necessary infrastructure to trace and block their attacks. As a result, we’re able to protect our customers better and better.
Customers of Ixia’s ThreatARMOR benefit of this type of research by our identification of the command and control infrastructure used by the attackers. BreakingPoint subscribers can emulate encoding and obfuscation schemes similar to those found in-the-wild using the evasion options.
LEVERAGE SUBSCRIPTION SERVICE TO STAY AHEAD OF ATTACKS
The Ixia BreakingPoint Application and Threat Intelligence (ATI) Subscription provides bi-weekly updates of the latest application protocols and attacks for use with Ixia platforms.