Quantcast
Viewing latest article 1
Browse Latest Browse All 5

The Case for Learning Python® for Malware Analysis

Image may be NSFW.
Clik here to view.
The Case for Learning Python for Malware Analysis banner

The Case for Learning Python® for Malware Analysis

Why Python is the best language for fighting malware

If you are a malware analyst, you really can’t resist learning and using Python anymore. The best tools are either developed in Python or use it in some manner including Volatility, Scapy, IDA Python, Immunity Debugger, PyDbg, Hopper, and more.

At the beginning of my cybersecurity career, the hot language was Perl. It had a vast amount of ready-to-use modules in CPAN, which enabled churning out code to make light work of various security, network, and system administration tasks. Back in 2005, I attended a one-week course taught by Learning Python author, Mark Lutz. While I came away with excellent instruction on how to use Python, it was not nearly useful enough to dislodge my dependence on Perl. One statistic that re-ignited my interest was the TIOBE Programming Community Index for August 2013, which shows that while Python remained in 8th place, Perl slid from 9th to 11th place since August 2012.

Python is ready for primetime malware analysis

Python has come a long way since 2005. There are now specific developments that aim to improve its performance such as PyPy and Stackless. The Python Package Index (PyPI) has steadily grown to house over 34,000 packages.

Python has a clear and concise syntax and is object-oriented (OO) by default. These traits have proven it to be an ideal language for team-based development which is especially useful when working to combat advanced malware. In a team development environment, it is necessary to establish coding standards that address API designs, protocols, formats, languages, and style. Many cybersecurity teams have found both value and increased productivity by standardizing their development efforts on Python. As a result, Python is a key enabler to building tools that fight malware.

It’s easy to overlook the advantage and importance of Python’s built-in object-oriented nature. Perl also features object-oriented programming but doesn’t lend itself as readily. Granted, the native Perl object-oriented programming can be “fixed” to some degree by using a module like Moose, but that means that anyone who uses your code also has to install Moose. In Python you get OO for free, and OO is a fundamental part of making development both rapid and resilient. Python’s cleanliness and adaptability enable it to model complex, real-world scenarios such as securing enterprise networks and performing malware analysis.

Python code can improve processing time – find threats faster

Another reason to choose Python is that it is a bit faster, providing incremental efficiency when performing analyses. For example, I implemented a file hashing tool that reads file contents once and simultaneously calculates several cryptographic hash values (MD5, SHA1, SHA256, and SHA512) as opposed to processing each file once for each hash, which would take roughly four times longer. Another benefit to coding this myself is that the output can be formatted in any desired way, without having to parse the output of other commands.

The following Python code is very similar to an implementation in Perl. In both cases, I use standard modules to perform the hashing algorithms, and files are processed in 1 MB chunks. Although both are similar in size, it was interesting to find that the Python version was approximately 0.1 seconds faster per 100 MB. Apply that difference to a 1 TB drive and we’re talking about a savings of 17 minutes of processing time.

#!/usr/bin/env python

import sys
import hashlib

bs = 1024 * 1024
hashes = ['md5', 'sha1', 'sha256', 'sha512']

l = 'filename'
for hash in hashes:
	l += ',' + hash
print l

for file in sys.argv[1:]:
	fh = open(file, 'rb')
	h = {}
	for hash in hashes:
	  h[hash] = hashlib.new(hash)
	while 1:
	  b = fh.read(bs)
	  if not b:
	    break
	  for hash in hashes:
	    h[hash].update(b)
	l = file
	for hash in hashes:
	  l += ',' + h[hash].hexdigest()
	print l

#!/usr/bin/env perl

use Digest::MD5;
use Digest::SHA;

my $bs = 1024 * 1024;
my @hashes = qw/md5 sha1 sha256 sha512/;

my $l = 'filename';

$l .= ",$_" for @hashes;
print "$l\n";

for my $file (@ARGV) {
		open my $fh, '<', $_ || die;
		binmode $fh;
		my %h;
		$h{md5} = Digest::MD5->new();
		$h{$_} = Digest::SHA->new($_) for grep /^sha/, @hashes;
		my $buf;
		while (my $read = sysread(IN, $buf, $bs)) {
		  $h{$_}->add($buf) for @hashes;
		}
		close IN || die;
		my $l = $_;
		$l .= ',' . $h{$_}->hexdigest for @hashes;
		print "$l\n";
}

How to begin using Python for malware analysis

  1. Find it on your system: If you’re using Linux or Mac, you should find Python available on your system, although it may not be the latest version. You can open a terminal window and type “python” to get started. By the way, when finished use either Ctrl+D or the quit() function to exit the Python interpreter. Windows users as well as any Linux or Mac users who want to use the latest and greatest Python release can download installers from the official download page.
  2. Start with Python 2.x: Please note that there are two major versions of Python: 2 and 3. There are some compatibility issues between them. According to the wiki, “Python 2.x is the status quo, Python 3.x is the present and future of the language.” Many existing malware analysis tools still depend on version 2, so my recommendation would be to start with the latest 2.x release.
  3. Review Python alternative implementations: For more serious Python development efforts, you may want to check out the alternative implementations section on the official download page. Malware analysis tools tend to require specific versions of Python and modules, but the pyenv tool should help to alleviate some of the headaches of managing multiple versions of Python and modules and keep them separate from your system’s Python.
  4. Check out the documentation: Python’s documentation can be accessed in a couple different ways. The built-in documentation can be accessed by typing “help()” at Python’s >>> prompt. This same documentation can also be accessed via the “pydoc” system command. You can also browse or download the documentation in PDF, HTML, TXT, EPUB, and CHM formats at http://docs.python.org/.
  5. Bookmark these references: Anyone reading this article should check out Gray Hat Python by Justin Seitz and Violent Python by T.J. O’Connor, which focus on the specific application of Python to malware analysis and other information security topics. There are also a few options for professional training and certifications, including the SecurityTube Python Scripting Expert (SPSE) certification, and Mark Lutz’s Python Training is still available.

As always, “Google is your friend.” If you run into issues, don’t be shy about posting your question. You’ll likely find someone who has had the exact same issue and solved it already. Once you learn a bit of Python and apply it towards analytical tasks and in your use of malware analysis tools, you’ll wonder how you ever got along without it.

What do you think?

If you are using or thinking about using Python for malware analysis, let me know your experience in the comments below. I’d like to hear what you think.

UPDATE (09/17/2013): The original example code has since been improved to incorporate an object-oriented module interface and uses threads to process multiple files at once. You can find it at Github at: https://github.com/csnj/hasher

Image: Fotolia.com, Gunnar Assmy
Python is a registered trademark of Python Software Foundation
Image may be NSFW.
Clik here to view.

Viewing latest article 1
Browse Latest Browse All 5

Trending Articles