I’ve recently finished my Msc dissertation, titled “Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities“. A PDF copy of it is available here should you feel the need to trawl through 110 or so pages of prose, algorithms, diagrams and general ramblings. The abstract is the following:
Software bugs that result in memory corruption are a common and dangerous feature of systems developed in certain programming languages. Such bugs are security vulnerabilities if they can be leveraged by an attacker to trigger the execution of malicious code. Determining if such a possibility exists is a time consuming process and requires technical expertise in a number of areas. Often the only way to be sure that a bug is in fact exploitable by an attacker is to build a complete exploit. It is this process that we seek to automate. We present a novel algorithm that integrates data-flow analysis and a decision procedure with the aim of automatically building exploits. The exploits we generate are constructed to hijack the control flow of an application and redirect it to malicious code.
Our algorithm is designed to build exploits for three common classes of security vulnerability; stack-based buffer overflows that corrupt a stored instruction pointer, buffer overflows that corrupt a function pointer, and buffer overflows that corrupt the destination address used by instructions that write to memory. For these vulnerability classes we present a system capable of generating functional exploits in the presence of complex arithmetic modification of inputs and arbitrary constraints. Exploits are generated using dynamic data-flow analysis in combination with a decision procedure. To the best of our knowledge the resulting implementation is the first to demonstrate exploit generation using such techniques. We illustrate its effectiveness on a number of benchmarks including a vulnerability in a large, real-world server application.
The implementation of the described system is approx. 7000 lines of C++. I probably won’t be releasing the code as I’m fairly sure I signed over my soul (and anything I might create) to the University earlier in the year. The two core components are a data-flow/taint analysis library and higher level library that uses the previous API to perform data-flow/taint analysis over x86 instructions (as given to us by Pin). Both of these components are useful in their own right so I think I’m going to do a full rewrite (with added GUI + DB) and open source the code in the next couple of months. Hopefully they’ll prove useful for others working on dynamic analysis problems.