Automatic exploit generation: Lessons learned so far

Here are a couple of thoughts that are bouncing around in my head as I come to the concluding stages of my v1 prototype. I’ve made a number of design decision in the initial implementation that I now see issues with, but hopefully by documenting them v2 will be better!

Using in-process analysis tools might not be such a good idea: Early on I decided to use dynamic analysis to gather information about taint data propagation and path conditions. Due to previous work (such as catchconv) using dynamic binary instrumentation frameworks, like Valgrind, I pretty much immediately decided I would do the same. After writing a couple of basic apps for Valgrind, Pin and DynamoRio, I eventually settled on Pin due to its cross platform support and C++ codebase. One critical factor I ignored is that these tools really aren’t designed with malicious code in mind. So, when you do things like trash a stored instruction pointer it can really confuse the DBI tool.

Other problems can occur if the vulnerability ends up writing over several hundred megabytes of junk in the application address space. This can lead to difficult to debug problems, where the memory in use by the injected analysis client is being corrupted, as well as that of the application under test.

More basic, but just as time consuming, problems stem from the fact that these in-process analysis clients are rather difficult to debug once something goes wrong. The three frameworks mentioned vary in their support for debugging and error logging, but in general it is exceedingly annoying to debug anything non-trivial. Simple segfaults have eaten hours of my time and often you are left resorting to printf based ‘debugging’.

The final issue I’ve come across is obvious, but should still be mentioned. Complex runtime instrumentation, such as dataflow analysis, really effect the responsiveness and runtime of the application. My current code, which is admittedly incredibly unoptimised, can increase the runtime of ls from milliseconds to about 20 seconds. This isn’t much of an issue if you don’t need to interact with the application to trigger the vulnerability, but in a case where some buttons need to be clicked or commands entered, it can become a significant inconvenience.

Assisted may be better than automated: The idea of this project is to investigate what vulnerability classes are automatically exploitable, and to develop a prototype that can show the results. I’ve achieved this goal for a sufficient variety of vulnerabilities and shown that automation is in fact possible.

There is a but here though; to continue this project would require constant attention and coding to replace the human effort in exploit generation each time a new class of vulnerability comes along, or a change in exploit technique. As I’ve moved from basic stack overflows to considering more complicated scenarios, the differences in exploit types become more time consuming to encode. Because of this, I intend (when I implement v2 of the tool in the coming months) to move away from complete automation. By putting the effort into providing a decent user interface, it will be possible to inform an exploit writer of the results of data flow and constraint analysis and have them make an educated judgement on the type of exploit to attempt, and specify some parameters. Working from this point of view should make the entire tool much less effort to port between operating systems also.

Information on memory management is very important: This is an obvious point for anyone that has had to write heap exploits in the last 5 years or so. It is near impossible to automatically generate a Linux heap exploit without having some information on the relationship between user input and the structure of the processes heap. When manually writing an exploit we will often want to force the program to allocate large amounts of memory, and the usual way to do this involves jumping into the code/disassembly and poking around for a while until you find a memory allocation dependent on the size of some user supplied field, or a loop doing memory allocation with a user influenceable bound.

Essentially, to have a solution that takes care of both scenarios we need a way to infer relationships between counters and program input. The first paper to discuss how to do this using symbolic execution was published in March of this year, and is a good read for anyone considering implementing this kind of tool.

As I hadn’t added loop detection, or other required functionality described in that paper, my current tool is unable to do the analysis described. I consider this a rather annoying drawback, and it will be among my highest priorities for v2. Hacky solutions are possible, such as modelling the result of strlen type functions on user input, but this would miss a number of scenarios and is in general quite an ugly approach.