Clik here to view.

Efficient Detection of XOR-Encoded Traffic, Part 1 of 2
More tips for the savvy malware analyst
The exclusive-or (XOR) logical operation[1] is commonly used by malware to obfuscate exploit payloads and sometimes hide implanted command and control traffic. One of the primary reasons hackers choose the XOR function is that it is extremely light weight: An XOR encode function can typically be written in a handful of machine instructions. Hackers choose XOR mainly to avoid detection rather than protect the information. The latter would require that the hacker use an encryption algorithm requiring much more processing overhead.
Finding XOR encoded content
A common solution to identify the XOR encoded content is to brute force the entire key space and search for patterns of interest. One example could be to use the string “This program cannot be run in DOS mode.” to identify executables. For single byte XOR keys this is trivial since there are only 254 combinations. When four byte keys are used, there are over four billion combinations. With eight byte keys, brute forcing the entire key space is impractical.
Using some of the fundamental properties of the XOR operation[2], the number of operations can be reduced to a single scan for each key length. When a number is XOR’d with itself the result is the XOR identity value 0. This property can be used to remove the key from adjacent encoded bytes resulting in the difference between the plaintexts. To demonstrate this, take the sub-string “This program” (the substring is being used for brevity; in practice the longer string should be used) and single byte XOR encoded using the key 0×41:
T | h | i | s | p | r | o | g | r | a | m | ||
Plain Text | 54 | 68 | 69 | 73 | 20 | 70 | 72 | 6f | 67 | 72 | 61 | 6d |
XOR Key | 41 | 41 | 41 | 41 | 41 | 41 | 41 | 41 | 41 | 41 | 41 | 41 |
Obfuscated | 15 | 29 | 28 | 32 | 61 | 31 | 33 | 2E | 26 | 33 | 20 | 2C |
Table 1: Single byte XOR Key
Take the first two characters in the example string ‘T’ (0×54) and ‘h’ (0×68). Each character is XOR encoded with the same key 0×41. If the result was XOR’d together we’d get the following equation:
(0×54 XOR 0×41) XOR (0×68 XOR 0×41)
Using the associative and commutative properties of XOR, rewrite the equation as:
0×54 XOR 0×68 XOR (0×41 XOR 0×41)
Since any number XOR with itself is 0, we are essentially canceling out the unknown key from the equation:
0×54 XOR 0×68 XOR 0×00
The result is the difference between the plaintext values regardless of the XOR key being used:
0×54 XOR 0×68 = 0x3C
To efficiently scan a document for the XOR encoded string “This program”, we would first calculate the XOR differences in the entire string. This is accomplished by XOR’ing each byte at position N with the byte at position N-1:
Step 1:
T | h | i | s | p | r | o | g | r | a | m | ||
Plain Text | 54 | 68 | 69 | 73 | 20 | 70 | 72 | 6f | 67 | 72 | 61 | 6d |
Delta | 3C |
Step 2:
T | h | i | s | p | r | o | g | r | a | m | ||
Plain Text | 54 | 68 | 69 | 73 | 20 | 70 | 72 | 6f | 67 | 72 | 61 | 6d |
Delta | 3C | 01 |
…
Final Step:
T | h | i | s | p | r | o | g | r | a | m | ||
Plain Text | 54 | 68 | 69 | 73 | 20 | 70 | 72 | 6f | 67 | 72 | 61 | 6d |
Delta | 3C | 01 | 1A | 53 | 50 | 02 | 1D | 08 | 15 | 13 | 0C |
Sample Python code to calculate the XOR delta:
def xor_delta(s, key_len = 1): delta = array.array('B', s) for x in xrange(key_len, len(s)): delta[x - key_len] ^= delta[x] return delta.tostring()[:-key_len]
The next step is to calculate the XOR delta for the document suspected to contain obfuscated content. This is the exact same process as calculating the delta for the search string. The resulting XOR delta can then be search for the sample string delta (“\x3c\x01\x1A\x53…”). Any instances of the sample string, regardless of the XOR key used, can be identified in the document delta by searching for the delta of the sample string.
Expanding for longer keys
This process can be expanded to work with keys of arbitrary length provided the string being searched for is longer than the XOR key length. Take the same string “This program” but XOR encode it using the 4 byte key 0×41424344. To calculate the delta in this situation, XOR the bytes starting at position 4 with the byte at the position N-4. The following table shows the XOR delta for the sample string using a 4 byte XOR key:
T | h | i | s | p | r | o | g | r | a | m | ||
Plain Text | 54 | 68 | 69 | 73 | 20 | 70 | 72 | 6f | 67 | 72 | 61 | 6d |
Delta | 74 | 18 | 1B | 1C | 47 | 02 | 13 | 02 | ||||
XOR Key | 41 | 42 | 43 | 44 | 41 | 42 | 43 | 44 | 41 | 42 | 43 | 44 |
Obfuscated | 15 | 2A | 2A | 37 | 61 | 32 | 31 | 2B | 26 | 30 | 22 | 29 |
Delta | 74 | 18 | 1B | 1C | 47 | 02 | 13 | 02 |
Table 2: Four byte XOR Key
Sample code
The following Python script is included to demonstrate the method described above. It was tested using Python version 2.7. It defaults to testing keys of lengths 1, 2, 4 and 8 while searching for the string “This program cannot be run in DOS mode.”
Sample usage: python xor_poc.py –file malicious.doc Or: python xor_poc.py –keys 1,2,3,4,5,6,7,8 –string “Try and catch me if you can!” –file malicious.doc import array import sys import binascii import getopt def usage(): print "usage: %s [option] --file [filename]" % sys.argv[0] print "-v :verbose" print "-k|--keys 1,2,4,8 :comma seperated list of key lengths" print "-s|--string string :string to search for" def xor_delta(s, key_len = 1): delta = array.array('B', s) for x in xrange(key_len, len(s)): delta[x - key_len] ^= delta[x] #return the delta as a string return delta.tostring()[:-key_len] if __name__=="__main__": search_file = None key_lengths=[1,2,4,8] search_string = "This program cannot be run in DOS mode." verbose = False try: opts, args = getopt.getopt(sys.argv[1:], "k:s:fvh", ["keys=", "string=", "file="]) except getopt.GetoptError as err: print str(err) usage() sys.exit(2) for o, a in opts: if o in ("-k", "--keys"): key_lengths = [int(x) for x in a.split(',')] elif o in ("-s", "--string"): search_string = a elif o in ("-f", "--file"): search_file = open(a, "r").read() elif o == "-v": verbose = True elif o == "-h": usage() sys.exit(1) else: assert False, "unhandled option: %s" % o if(search_file == None): print "Missing filename" usage() sys.exit(1) for l in key_lengths: key_delta = xor_delta(search_string, l) if(verbose): print "%d:%s" % (l, binascii.hexlify(key_delta)) doc_delta = xor_delta(search_file, l) offset = -1 while(True): offset += 1 offset = doc_delta.find(key_delta, offset) if(offset > 0): print ("Key length: %d offset: %08X" % (l, offset)) else: break
Final thoughts
Using the fundamental properties of the logical XOR operation it is possible to drastically reduce the work required to search for XOR encoded content. Using this method, it is possible to search for XOR encoded content using large keys provided the search string is longer than the key length. This method can also be expanded to scan incrementing XOR keys; which will be discussed in a future blog entry.
1: http://en.wikipedia.org/wiki/Exclusive_or
2: http://en.wikipedia.org/wiki/Exclusive_or#Properties
Image: Fotolia.com, adam121
Clik here to view.