Quantcast
Channel: CloudShield Blog » python
Viewing all articles
Browse latest Browse all 5

Efficient Detection of XOR-Encoded Traffic, Part 1 of 2

0
0
Efficient Detection of XOR-Encoded Traffic, Part 1 of 2

Efficient Detection of XOR-Encoded Traffic, Part 1 of 2

More tips for the savvy malware analyst

The exclusive-or (XOR) logical operation[1] is commonly used by malware to obfuscate exploit payloads and sometimes hide implanted command and control traffic. One of the primary reasons hackers choose the XOR function is that it is extremely light weight: An XOR encode function can typically be written in a handful of machine instructions. Hackers choose XOR mainly to avoid detection rather than protect the information. The latter would require that the hacker use an encryption algorithm requiring much more processing overhead.

Finding XOR encoded content

A common solution to identify the XOR encoded content is to brute force the entire key space and search for patterns of interest. One example could be to use the string “This program cannot be run in DOS mode.” to identify executables. For single byte XOR keys this is trivial since there are only 254 combinations. When four byte keys are used, there are over four billion combinations. With eight byte keys, brute forcing the entire key space is impractical.

Using some of the fundamental properties of the XOR operation[2], the number of operations can be reduced to a single scan for each key length. When a number is XOR’d with itself the result is the XOR identity value 0. This property can be used to remove the key from adjacent encoded bytes resulting in the difference between the plaintexts. To demonstrate this, take the sub-string “This program” (the substring is being used for brevity; in practice the longer string should be used) and single byte XOR encoded using the key 0×41:

T h i s p r o g r a m
Plain Text 54 68 69 73 20 70 72 6f 67 72 61 6d
XOR Key 41 41 41 41 41 41 41 41 41 41 41 41
Obfuscated 15 29 28 32 61 31 33 2E 26 33 20 2C

Table 1: Single byte XOR Key

Take the first two characters in the example string ‘T’ (0×54) and ‘h’ (0×68). Each character is XOR encoded with the same key 0×41. If the result was XOR’d together we’d get the following equation:

(0×54 XOR 0×41) XOR (0×68 XOR 0×41)

Using the associative and commutative properties of XOR, rewrite the equation as:

0×54 XOR 0×68 XOR (0×41 XOR 0×41)

Since any number XOR with itself is 0, we are essentially canceling out the unknown key from the equation:

0×54 XOR 0×68 XOR 0×00

The result is the difference between the plaintext values regardless of the XOR key being used:

0×54 XOR 0×68 = 0x3C

To efficiently scan a document for the XOR encoded string “This program”, we would first calculate the XOR differences in the entire string. This is accomplished by XOR’ing each byte at position N with the byte at position N-1:

Step 1:

T h i s p r o g r a m
Plain Text 54 68 69 73 20 70 72 6f 67 72 61 6d
Delta 3C

Step 2:

T h i s   p r o g r a m
Plain Text 54 68 69 73 20 70 72 6f 67 72 61 6d
Delta 3C 01

Final Step:

T h i s p r o g r a m
Plain Text 54 68 69 73 20 70 72 6f 67 72 61 6d
Delta 3C 01 1A 53 50 02 1D 08 15 13 0C

Sample Python code to calculate the XOR delta:


def xor_delta(s, key_len = 1):
    delta = array.array('B', s)

    for x in xrange(key_len, len(s)):
        delta[x - key_len] ^= delta[x]

    return delta.tostring()[:-key_len]

The next step is to calculate the XOR delta for the document suspected to contain obfuscated content. This is the exact same process as calculating the delta for the search string. The resulting XOR delta can then be search for the sample string delta (“\x3c\x01\x1A\x53…”). Any instances of the sample string, regardless of the XOR key used, can be identified in the document delta by searching for the delta of the sample string.

Expanding for longer keys

This process can be expanded to work with keys of arbitrary length provided the string being searched for is longer than the XOR key length. Take the same string “This program” but XOR encode it using the 4 byte key 0×41424344. To calculate the delta in this situation, XOR the bytes starting at position 4 with the byte at the position N-4. The following table shows the XOR delta for the sample string using a 4 byte XOR key:

T h i s p r o g r a m
Plain Text 54 68 69 73 20 70 72 6f 67 72 61 6d
Delta 74 18 1B 1C 47 02 13 02
XOR Key 41 42 43 44 41 42 43 44 41 42 43 44
Obfuscated 15 2A 2A 37 61 32 31 2B 26 30 22 29
Delta 74 18 1B 1C 47 02 13 02

Table 2: Four byte XOR Key

align="center">

Sample code

The following Python script is included to demonstrate the method described above. It was tested using Python version 2.7. It defaults to testing keys of lengths 1, 2, 4 and 8 while searching for the string “This program cannot be run in DOS mode.”


Sample usage:	python xor_poc.py –file malicious.doc
Or: python xor_poc.py –keys 1,2,3,4,5,6,7,8 –string “Try and catch me if you can!” 
    –file malicious.doc
import array
import sys
import binascii
import getopt

def usage():
    print "usage: %s [option] --file [filename]" % sys.argv[0]
    print "-v                   :verbose"
    print "-k|--keys 1,2,4,8    :comma seperated list of key lengths"
    print "-s|--string string   :string to search for"

def xor_delta(s, key_len = 1):
    delta = array.array('B', s)
    
    for x in xrange(key_len, len(s)):
        delta[x - key_len] ^= delta[x]
        
    #return the delta as a string
    return delta.tostring()[:-key_len]

if __name__=="__main__":
    search_file = None
    key_lengths=[1,2,4,8]
    search_string = "This program cannot be run in DOS mode."
    verbose = False

    try:
        opts, args = getopt.getopt(sys.argv[1:], "k:s:fvh", 
        ["keys=", "string=", "file="])
    except getopt.GetoptError as err:
        print str(err)
        usage()
        sys.exit(2)
        
    for o, a in opts:
        if o in ("-k", "--keys"):
            key_lengths = [int(x) for x in a.split(',')]
        elif o in ("-s", "--string"):
            search_string = a
        elif o in ("-f", "--file"):
            search_file = open(a, "r").read()
        elif o == "-v":
            verbose = True
        elif o == "-h":
            usage()
            sys.exit(1)
        else:
            assert False, "unhandled option: %s" % o
    if(search_file == None):
        print "Missing filename"
        usage()
        sys.exit(1)
        
    for l in key_lengths:
        key_delta = xor_delta(search_string, l)
        
        if(verbose):
            print "%d:%s" % (l, binascii.hexlify(key_delta))
        
        doc_delta = xor_delta(search_file, l)
        
        offset = -1
        while(True):
            offset += 1
            offset = doc_delta.find(key_delta, offset)
            if(offset > 0):
                print ("Key length: %d offset: %08X" % (l, offset))
            else:
                break

Final thoughts

Using the fundamental properties of the logical XOR operation it is possible to drastically reduce the work required to search for XOR encoded content. Using this method, it is possible to search for XOR encoded content using large keys provided the search string is longer than the key length. This method can also be expanded to scan incrementing XOR keys; which will be discussed in a future blog entry.

 


Viewing all articles
Browse latest Browse all 5

Latest Images

Trending Articles





Latest Images