Gollum: Modular and Greybox Exploit Generation for Heap Overflows in Interpreters

At the upcoming ACM Conference on Computer and Communications Security (CCS) I’ll be presenting a paper on Automatic Exploit Generation (AEG), with the same title as this blog post. You can find the paper here. In the paper I discuss a system for automatically discovering primitives and constructing exploits using heap overflows in interpreters. The approach taken in the paper is a bit different from most other AEG solutions in that it is entirely greybox, relying on lightweight instrumentation and various kinds of fuzzing-esque input generation. The following diagram shows the stages of the system, and each is explained in detail in the paper. 

Workflow diagram for Gollum
Workflow diagram showing how Gollum produces exploits and primitives

In terms of evaluation, I used 10 vulnerabilities in the PHP and Python interpreters as tests, and provided these as input to Gollum for it to use in its search for primitives and to build exploits.

Exploit generation and primitive search results
Exploit generation and primitive search results

There are three main takeaways from the paper that I think worth highlighting (see paper for details!):

1. AEG is a multi-stage process, and by breaking the problem into distinct phases it becomes reasonable to attack a number of those phases using a fuzzing-esque combination of lightweight instrumentation and relatively dumb input generation. Traditionally, AEG systems used symbolic execution as their main driver and, while there are some positives to this, it also encounters all of the scalability issues that one expects with symbolic execution. In a paper last year at USENIX Security, I showed how with lightweight instrumentation one could use the existing tests of an application, combined with a fuzzer, to discover language fragments that could be used to perform heap layout manipulation, as well as to allocate interesting objects to corrupt. In the upcoming CCS paper, I show how one can use a similar approach to also discover exploitation primitives, and in certain situations to even build exploits. It’s worth noting that in their paper on FUZE, Wu et al. take a similar approach, and you should check out their paper for another example system. My guess is that in the next couple of years fuzzing-driven exploit generation is likely to be the predominant flavour, with symbolic execution being leveraged in scenarios where its bit-precise reasoning is required and state-space explosion can be limited.

2. When automatically constructing exploits for heap overflows one needs a solution for achieving the desired heap layout and then another solution for building the rest of the exploit. In the CCS paper I introduce the idea of lazy resolution of tasks in exploit generation. Essentially, this is an approach for solving the problems of heap layout manipulation and the rest of the exploit generation in the reverse order. The reason one might want to do this is simple: in an engine where the process for achieving the heap layout can potentially take much longer than the one to create the rest of the exploit (as is the case in Gollum), it makes sense to check if it is feasible to produce an exploit under the assumption that the more difficult problem is solvable, and then only bother to solve it if it enables an exploit. Specifically, in the case of Gollum, I constructed a heap allocator that allows you to request a particular layout, then you can attempt to generate the exploit under the assumption the layout holds, and only later figure out how to achieve it once you know it enables the exploit.

I think this sort of idea might be more generally useful in exploit generation for other vulnerability types as well, e.g. in an exploit generation system for race conditions, one could have a solution that allows one to request a particular scheduling, check if that scheduling enables an exploit, and only then search for the input required to provide the scheduling. Such an approach also allows for a hybrid of manual and automatic components. For example, in our case we assume a heap layout holds, then generate an exploit from it, and finally try and automatically solve the heap layout problem. Our solution for the heap layout problem has a number of preconditions though, so in cases where those conditions are not met we can still automatically generate the rest of the exploit under the assumption that the layout problem is solved, and eventually the exploit developer can manually solve it themselves.

3. In last year’s USENIX Security paper I discussed a random search algorithm for doing heap layout manipulation. As you might expect, a genetic algorithm, optimising for distance between the two chunks you want to place adjacent to each other, can perform this task far better, albeit at the cost of a more complicated and time consuming implementation.

Random search versus GA graph
% of heap layout benchmarks solved by random search (rand) versus the genetic algorithm (evo)

 

Automation in Exploit Generation with Exploit Templates

At last year’s USENIX Security conference I presented a paper titled “Automatic Heap Layout Manipulation for Exploitation” [paper][talk][code]. The main idea of the paper is that we can isolate heap layout manipulation from much of the rest of the work involved in producing an exploit, and solve it automatically using blackbox search. There’s another idea in the paper though which I wanted to draw attention to, as I think it might be generally useful in scaling automatic exploit generation systems to more real world problems. That idea is exploit templates.

An exploit template is a simply a partially completed exploit where the incomplete parts are to be filled in by some sort of automated reasoning engine. In the case of the above paper, the parts filled in automatically are the inputs required to place the heap into a particular layout. Here’s an example template, showing part of an exploit for the PHP interpreter. The exploit developer wants to position an allocation made by imagecreate adjacent to an allocation made by quoted_printable_encode.

$quote_str = str_repeat("\xf4", 123);

#X-SHRIKE HEAP-MANIP 384
#X-SHRIKE RECORD-ALLOC 0 1
$image = imagecreate(1, 2); 

#X-SHRIKE HEAP-MANIP 384 
#X-SHRIKE RECORD-ALLOC 0 2 
quoted_printable_encode($quote_str); 

#X-SHRIKE REQUIRE-DISTANCE 1 2 384

SHRIKE (the engine that parses the template and searches for solutions to heap layout problems) takes as input a .php file containing a partially completed exploit, and searches for problems it should solve automatically. Directives used to communicate with the engine begin with the string X-SHRIKE. They are explained in full in the above paper, but are fairly straightforward: HEAP-MANIP tells the engine it can insert heap manipulating code at this location, RECORD-ALLOC tells the engine it should record the nth allocation that takes place from this point onwards, and REQUIRE-DISTANCE tells the engine that at this point in the execution of the PHP program the allocations associated with the specified IDs must be at the specified distance from each other. The engine takes this input and then starts searching for ways to put the heap into the desired layout. The above snippet is from an exploit for CVE-2013-2110 and this video shows SHRIKE solving it, and the resulting exploit running with the heap layout problem solved. For a more detailed description of what is going on in the video, view its description on YouTube.

So, what are the benefits of this approach? The search is black-box, doesn’t require the exploit developer to analyse the target application or the allocator, and, if successful, outputs a new PHP file that achieves the desired layout and can then be worked on to complete the exploit. This has the knock-on effect of making it easier for the exploit developer to explore different exploitation strategies for a particular heap overflow. In ‘normal’ software development it is accepted that things like long build cycles are bad, while REPLs are generally good. The reason is that the latter supports a tight loop of forming a hypothesis, testing it, refining and repeating, while the former breaks this process. Exploit writing has a similar hypothesis refinement loop and any technology that can make this loop tighter will make the process more efficient.

There’s lots of interesting work to be done still on how exploit templates can be leveraged to add automation to exploit development. In automatic exploit generation research there has been a trend to focus exclusively on full automation and, because that is hard for almost all problems, we haven’t explored in any depth what aspects can be partially automated. As such, there’s a lot of ground still to be broken. The sooner we start investigating these problems the better, because if the more general program synthesis field is anything to go by, the future of automatic exploit generation is going to look more like template-based approaches than end-to-end solutions.

Some Cool Projects from a Dagstuhl Seminar on SAT, SMT and CP

I was lucky enough to attend a Dagstuhl seminar titled “Bringing CP, SAT & SMT Together” earlier this week, and learned about some really cool work I hadn’t previously heard of, especially in the realm of constraint satisfaction and optimization. There were plenty of other of great talks and discussions, but below are the projects I made a note to play around with.

MiniZinc

MiniZinc is a free and open-source constraint modeling language.

You can use MiniZinc to model constraint satisfaction and optimization problems in a high-level, solver-independent way, taking advantage of a large library of pre-defined constraints. Your model is then compiled into FlatZinc, a solver input language that is understood by a wide range of solvers.

There are even a couple of Coursera courses on the topic.

https://www.minizinc.org/
https://www.coursera.org/learn/basic-modeling
https://www.coursera.org/learn/advanced-modeling

Unison

Unison is a simple, flexible, and potentially optimal tool that performs integrated register allocation and instruction scheduling using constraint programming as a modern method for combinatorial optimization.

https://unison-code.github.io/

Approximate Model Counting

Blog post by Mate Soos (author of cryptominisat), with code:

Approximate model counting allows to count the number of solutions (or “models”) to propositional satisfiability problems. This problem seems trivial at first given a propositional solver that can find a single solution: find one solution, ban it, ask for another one, ban it, etc. until all solutions are counted. The issue is that sometimes, the number of solutions is 2^50 and so counting this way is too slow. There are about 2^266 atoms in the universe, so counting anywhere near that is impossible using this method.

Slightly chilled attendees