Gollum: Modular and Greybox Exploit Generation for Heap Overflows in Interpreters

At the upcoming ACM Conference on Computer and Communications Security (CCS) I’ll be presenting a paper on Automatic Exploit Generation (AEG), with the same title as this blog post. You can find the paper here. In the paper I discuss a system for automatically discovering primitives and constructing exploits using heap overflows in interpreters. The approach taken in the paper is a bit different from most other AEG solutions in that it is entirely greybox, relying on lightweight instrumentation and various kinds of fuzzing-esque input generation. The following diagram shows the stages of the system, and each is explained in detail in the paper. 

Workflow diagram for Gollum
Workflow diagram showing how Gollum produces exploits and primitives

In terms of evaluation, I used 10 vulnerabilities in the PHP and Python interpreters as tests, and provided these as input to Gollum for it to use in its search for primitives and to build exploits.

Exploit generation and primitive search results
Exploit generation and primitive search results

There are three main takeaways from the paper that I think worth highlighting (see paper for details!):

1. AEG is a multi-stage process, and by breaking the problem into distinct phases it becomes reasonable to attack a number of those phases using a fuzzing-esque combination of lightweight instrumentation and relatively dumb input generation. Traditionally, AEG systems used symbolic execution as their main driver and, while there are some positives to this, it also encounters all of the scalability issues that one expects with symbolic execution. In a paper last year at USENIX Security, I showed how with lightweight instrumentation one could use the existing tests of an application, combined with a fuzzer, to discover language fragments that could be used to perform heap layout manipulation, as well as to allocate interesting objects to corrupt. In the upcoming CCS paper, I show how one can use a similar approach to also discover exploitation primitives, and in certain situations to even build exploits. It’s worth noting that in their paper on FUZE, Wu et al. take a similar approach, and you should check out their paper for another example system. My guess is that in the next couple of years fuzzing-driven exploit generation is likely to be the predominant flavour, with symbolic execution being leveraged in scenarios where its bit-precise reasoning is required and state-space explosion can be limited.

2. When automatically constructing exploits for heap overflows one needs a solution for achieving the desired heap layout and then another solution for building the rest of the exploit. In the CCS paper I introduce the idea of lazy resolution of tasks in exploit generation. Essentially, this is an approach for solving the problems of heap layout manipulation and the rest of the exploit generation in the reverse order. The reason one might want to do this is simple: in an engine where the process for achieving the heap layout can potentially take much longer than the one to create the rest of the exploit (as is the case in Gollum), it makes sense to check if it is feasible to produce an exploit under the assumption that the more difficult problem is solvable, and then only bother to solve it if it enables an exploit. Specifically, in the case of Gollum, I constructed a heap allocator that allows you to request a particular layout, then you can attempt to generate the exploit under the assumption the layout holds, and only later figure out how to achieve it once you know it enables the exploit.

I think this sort of idea might be more generally useful in exploit generation for other vulnerability types as well, e.g. in an exploit generation system for race conditions, one could have a solution that allows one to request a particular scheduling, check if that scheduling enables an exploit, and only then search for the input required to provide the scheduling. Such an approach also allows for a hybrid of manual and automatic components. For example, in our case we assume a heap layout holds, then generate an exploit from it, and finally try and automatically solve the heap layout problem. Our solution for the heap layout problem has a number of preconditions though, so in cases where those conditions are not met we can still automatically generate the rest of the exploit under the assumption that the layout problem is solved, and eventually the exploit developer can manually solve it themselves.

3. In last year’s USENIX Security paper I discussed a random search algorithm for doing heap layout manipulation. As you might expect, a genetic algorithm, optimising for distance between the two chunks you want to place adjacent to each other, can perform this task far better, albeit at the cost of a more complicated and time consuming implementation.

Random search versus GA graph
% of heap layout benchmarks solved by random search (rand) versus the genetic algorithm (evo)

 

Automation in Exploit Generation with Exploit Templates

At last year’s USENIX Security conference I presented a paper titled “Automatic Heap Layout Manipulation for Exploitation” [paper][talk][code]. The main idea of the paper is that we can isolate heap layout manipulation from much of the rest of the work involved in producing an exploit, and solve it automatically using blackbox search. There’s another idea in the paper though which I wanted to draw attention to, as I think it might be generally useful in scaling automatic exploit generation systems to more real world problems. That idea is exploit templates.

An exploit template is a simply a partially completed exploit where the incomplete parts are to be filled in by some sort of automated reasoning engine. In the case of the above paper, the parts filled in automatically are the inputs required to place the heap into a particular layout. Here’s an example template, showing part of an exploit for the PHP interpreter. The exploit developer wants to position an allocation made by imagecreate adjacent to an allocation made by quoted_printable_encode.

$quote_str = str_repeat("\xf4", 123);

#X-SHRIKE HEAP-MANIP 384
#X-SHRIKE RECORD-ALLOC 0 1
$image = imagecreate(1, 2); 

#X-SHRIKE HEAP-MANIP 384 
#X-SHRIKE RECORD-ALLOC 0 2 
quoted_printable_encode($quote_str); 

#X-SHRIKE REQUIRE-DISTANCE 1 2 384

SHRIKE (the engine that parses the template and searches for solutions to heap layout problems) takes as input a .php file containing a partially completed exploit, and searches for problems it should solve automatically. Directives used to communicate with the engine begin with the string X-SHRIKE. They are explained in full in the above paper, but are fairly straightforward: HEAP-MANIP tells the engine it can insert heap manipulating code at this location, RECORD-ALLOC tells the engine it should record the nth allocation that takes place from this point onwards, and REQUIRE-DISTANCE tells the engine that at this point in the execution of the PHP program the allocations associated with the specified IDs must be at the specified distance from each other. The engine takes this input and then starts searching for ways to put the heap into the desired layout. The above snippet is from an exploit for CVE-2013-2110 and this video shows SHRIKE solving it, and the resulting exploit running with the heap layout problem solved. For a more detailed description of what is going on in the video, view its description on YouTube.

So, what are the benefits of this approach? The search is black-box, doesn’t require the exploit developer to analyse the target application or the allocator, and, if successful, outputs a new PHP file that achieves the desired layout and can then be worked on to complete the exploit. This has the knock-on effect of making it easier for the exploit developer to explore different exploitation strategies for a particular heap overflow. In ‘normal’ software development it is accepted that things like long build cycles are bad, while REPLs are generally good. The reason is that the latter supports a tight loop of forming a hypothesis, testing it, refining and repeating, while the former breaks this process. Exploit writing has a similar hypothesis refinement loop and any technology that can make this loop tighter will make the process more efficient.

There’s lots of interesting work to be done still on how exploit templates can be leveraged to add automation to exploit development. In automatic exploit generation research there has been a trend to focus exclusively on full automation and, because that is hard for almost all problems, we haven’t explored in any depth what aspects can be partially automated. As such, there’s a lot of ground still to be broken. The sooner we start investigating these problems the better, because if the more general program synthesis field is anything to go by, the future of automatic exploit generation is going to look more like template-based approaches than end-to-end solutions.

SMT Solvers for Software Security (USENIX WOOT’12)

At WOOT’12 a paper co-written by Julien Vanegue, Rolf Rolles and I will be presented under the title “SMT Solvers for Sofware Security”. An up-to-date version can be found in the Articles/Presentation section of this site.

In short, the message of this paper is “SMT solvers are well capable of handling decision problems from security properties. However, specific problem domains usually require domain specific modeling approaches. Important limitations, challenges, and research opportunities remain in developing appropriate models for the three areas we discuss – vulnerability discovery, exploit development, and bypassing of copy protection”. The motivation for writing this paper is to discuss these limitations, why they exist, and hopefully encourage more work on the modeling and constraint generation sides of these problems.

A quick review of the publication lists from major academic conferences focused on software security will show a massive number of papers discussing solutions based on SMT technology. There is good reason for this 1) SMT-backed approaches such as symbolic/concolic execution have proved powerful tools on certain problems and 2) There are an increasing number of freely available frameworks.

The primary domain where SMT solvers have shone, in my opinion, is in the discovery of bugs related to unsafe integer arithmetic using symbolic/concolic execution. There’s a fairly obvious reason why this is the case; the quantifier free, fixed size, bitvector logic supported by SMT solvers provides direct support for the precise representation of arithmetic at the assembly level. In other words, one does not have to do an excessive amount of work when modeling the semantics of a program to produce a representation suitable for the detection of unsafe arithmetic. It suffices to perform a near direct translation from the executed instructions to the primitives provided by SMT solvers.

The exploit generation part of the paper deals with what happens when one takes the technology for solving the above problem and applies it to a new problem domain. In particular, a new domain in which the model produced simply by tracking transformations and constraints on input data no longer contains enough data to inform a solution. For example, in the case of exploit generation, models that do not account for things like the relationship between user input and memory layout. Obviously enough, when reasoning about a formula produced from such a model a solver cannot account for information not present. Thus, no amount of computational capacity or solver improvement can produce an effective solution.

SMT solvers are powerful tools and symbolic/concolic execution can be an effective technique. However, one thing I’ve learned over the past few years is that they don’t remove the obligation and effort required to accurately model the problem you’re trying to solve. You can throw generic symbolic execution frameworks at a problem but if you’re interested in anything more complex than low level arithmetic relationships you’ve got work to do!

Misleading the Public for Fun and Profit

Sometimes I read a research paper, usually in the area where computer science meets application, and it’s obvious that the authors are far overstating the practical impact of the work. This can be due to the researchers simply not having any exposure to the practical side of the field in which they are investigating and thus accidentally (through ignorance) overstate their claims. Alternatively it can be a purposeful and deliberate attempt to mislead and posture in front of a readership that hopefully won’t know any better.

The first case is presumably simple ignorance but is still lamentable. The obvious solution here is to avoid making such claims at all. If the research cannot stand on its own then perhaps it is not worthwhile? Researchers (both academic and industrial) have a habit of jumping on problems they underestimate, throwing a variety of techniques at them, hoping one sticks and then calling the problem solved. This typically occurs when they are not actually required to solve the problem correctly and robustly but merely as a ‘prototype’. They then get pilloried by anyone who actually has to solve the problem properly and almost always because of a disparity between claims made and the real impact rather than issues with methodology, recording or technical aspects.

The second case is far more insidious and unfortunately I think not uncommon. In academic research it can be easy to impress by combining cutting edge, but not necessarily original, research with a practical problem, sort-of solving parts of it and like before declaring it solved. Often followed quickly by phrases involving ‘game changing’, ‘paradigm shifting’ and so forth. Personally, I think this is a serious problem in the research areas that are less theoretical and more practical. Often the investigators refuse to accept they aren’t actually aware of the true nature of the problem they are dealing with or how it occurs in the real world. Egotistically this is difficult as they are often lauded by their academic peers and therefore surely must grasp the trivialities of the practical world, no? At this point a mixture of ego, need to impress and lack of ethics combine to give us papers that are at best deluded and at worst downright wrong.

Regardless of whether a paper ends up making such claims mistakenly for the first or the second reason the result is the same. It cheapens the actual value of the research, results in a general loss of respect for the capabilities of academia, deludes the researchers further and causes general confusion as to where research efforts should be focused. Worse still is when attempts to overstate the impact are believed by both the media and other researchers resulting in a complete distortion between the actual practical and theoretical value of the research and it’s perceived impact.

Now, on to the paper that has reminded me of this most recently: The latest paper from David Brumleys group at CMU titled AEG: Automatic Exploit Generation. I was looking forward to reading this paper as it was the area I worked on during my thesis but quite honestly it’s incredibly disappointing at best and has serious factual issues at worst. For now let’s focus on the topic at hand ‘overstating the impact of academic research cheapens it and spreads misinformation‘. With the original Patch-Based Exploit Generation paper we had all sorts of stories about how it would change the way in which patches had to be distributed, how attackers would be pushing buttons to generate their exploits in no time at all and in general how the world was about to end. Naturally none of this happened and people continued to use PatchDiff. Unfortunately this is more of the same.

Near the beginning of the most recent paper we have the following claim “Our automatic exploit generation techniques have several immediate security implications. First, practical AEG fundamentally changes the perceived capabilities of attackers“. This statement is fundamentally flawed. It assumes that practical AEG is currently possible on bugs that people actually care about. This is patently false. I’ve written one of these systems. Did it generate exploits? Yes it did. Is it going to pop any program running on a modern operating system with the kinds of vulnerabilities we typically see? Nope. That would require at a minimum another 2 years of development and at that point I would expect a system that is usable by a skilled exploit writer as an augmentation of his skillset rather than a replacement. The few times I did use the tool I built for real exploits it was in this context rather than full blown exploit generation. The system discussed in the mentioned paper has more bells and whistles in some areas and is more primitive in others and it is still an unfathomable distance from having any impact on a realistic threat model.

Moving on, “For example, previously it has been believed that it is relatively difficult for untrained attackers to find novel vulnerabilities and create zero-day exploits. Our research shows this assumption is unfounded“. It’s at this point the distance between the authors of this paper and the realities of industrial/government/military vulnerability detection and exploit development can be seen. Who are the people we are to believe have this view? I would assume the authors themselves do and then extrapolated to the general exploit creating/consuming community. This is an egotistical flaw that has been displayed in many forays by academia into the vulnerability detection/exploit generation world.

Let’s discuss this in two parts. Firstly, in the context of the exploits discussed in this paper and secondly in the context of exploits seen in the real world.

In the case of the bug classes considered in the paper this view is entirely incorrect. Anyone who looks at Full Disclosure can regularly see low hanging bugs being fuzzed and exploited in a cookie cutter style. Fuzz the bug, overwrite the SEH chain, find your trampoline, jump to your shellcode bla bla bla rinse and repeat, start a leet h4x0r group and flood Exploit DB. All good fun, no useful research dollars wasted. The bugs found and exploited by the system described are of that quality. Low hanging, fuzzable fruit. The ‘training’ involved here is no more than would be required to set up, install and debug whatever issues come up in the course of running the AEG tool. In our most basic class at Immunity I’ve seen people who’ve never seen a debugger before writing exploits of this quality in a couple of days.

For more complex vulnerabilities and exploits that require a skilled attacker this AEG system doesn’t change the threat model. It simply doesn’t apply. A fully functional AEG tool that I can point at Firefox and press the ‘hack’ button (or any tool that had some sort of impact on real threats. I’d be happy with exploit assistance rather than exploit generation as long as it works) would of course, but we are a long, long way from that. This is not to say we won’t get there or that this paper isn’t a step in the right direction but making the claim now is simply laughable. To me it just reeks of a research group desperate to shout ‘FIRST!’ and ignoring the real issues.

A few more choice phrases for your viewing pleasure:

Automated exploit generation can be fed into signature generation algorithms by defenders without requiring real-life attacks” – Fiction again. This would be possible *if* one had a usable AEG system. The word I presume they are looking for is *could*, “could be fed into”.

In order to extend AEG to handle heap-based overflows we would need to also consider heap management structures, which is a straight-forward extension” – Again, this displays a fundamental ignorance of what has been required to write a heap exploit for the past six or so years. I presume they heard about the unlink() technique and investigated no further. Automatic exploit generation of heap exploits requires one to be able to discover and trigger heap manipulation primitives as well as whatever else must be done. This is a difficult problem to solve automatically and one that is completely ignored.

In reference to overflows that smash local variables and arguments that are dereferenced before the function returns and therefore must be valid – “If there is not enough space to place the payload before the return address, AEG can still generate an exploit by applying stack restoration, where the local variables and function arguments are overwritten, but we impose constraints that their values should remain unchanged. To do so, AEG again relies on our dynamic analysis component to retrieve the runtime values of the local variables and arguments” – It’s at this point that I start to wonder if anyone even reviewed this thing. In any program with some amount of heap non-determinism, through normal behaviour or heap base randomisation, this statement makes no sense. Any pointers to heap allocated data passed as arguments or stored as local variables will be entirely different. You may be lucky and end up with that pointer being in an allocated heap region but the chances of it pointing to the same object are rather slim in general. Even in the context of local exploits where you have much more information on heap bases etc. this statement trivialises many problems that will be encountered.

Conclusion

With the above paper I have two main issues. One is with the correctness of some of the technical statements made and the other is with distortion between reality and the stated impact and generality of the work. For the technical issues I think the simple observation that they are there is enough to highlight the problem. The flawed statements on impact and generality are more problematic as they display a fundamental corruption of what a scientific paper should be.

I have a deep respect for scientific research and the ideals that I believe it should embody. Much of this research is done by university research groups and some of the papers produced in the last century are among humanities greatest intellectual achievements. Not all papers can be revolutionary of course but even those that aren’t should aim to uphold a level of scientific decorum so that they may contribute to the sum of our knowledge. In my opinion this single idea should be at the heart of any university researcher being funded to perform scientific investigation. A researcher is not a journalist nor a politician and their papers should not be opinion pieces or designed to promote themselves at the expense of facts. There is nothing wrong with discussing perceived impact of a paper within the paper itself but these statements should be subjected to the same scientific rigour that the theoretical content of the paper is. If one finds themselves unqualified (as in the above paper) to make such statements then they should be excluded. Facts are all that matter in a scientific paper, distorting them through ignorance is incompetence, distorting them on purpose is unethical and corrupt.

Game Over! Thank you for playing Academia

I’ve recently finished my Msc dissertation, titled “Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities“. A PDF copy of it is available here should you feel the need to trawl through 110 or so pages of prose, algorithms, diagrams and general ramblings. The abstract is the following:

Software bugs that result in memory corruption are a common and dangerous feature of systems developed in certain programming languages. Such bugs are security vulnerabilities if they can be leveraged by an attacker to trigger the execution of malicious code. Determining if such a possibility exists is a time consuming process and requires technical expertise in a number of areas. Often the only way to be sure that a bug is in fact exploitable by an attacker is to build a complete exploit. It is this process that we seek to automate. We present a novel algorithm that integrates data-flow analysis and a decision procedure with the aim of automatically building exploits. The exploits we generate are constructed to hijack the control flow of an application and redirect it to malicious code.

Our algorithm is designed to build exploits for three common classes of security vulnerability; stack-based buffer overflows that corrupt a stored instruction pointer, buffer overflows that corrupt a function pointer, and buffer overflows that corrupt the destination address used by instructions that write to memory. For these vulnerability classes we present a system capable of generating functional exploits in the presence of complex arithmetic modification of inputs and arbitrary constraints. Exploits are generated using dynamic data-flow analysis in combination with a decision procedure. To the best of our knowledge the resulting implementation is the first to demonstrate exploit generation using such techniques. We illustrate its effectiveness on a number of benchmarks including a vulnerability in a large, real-world server application.

The implementation of the described system is approx. 7000 lines of C++. I probably won’t be releasing the code as I’m fairly sure I signed over my soul (and anything I might create) to the University earlier in the year. The two core components are a data-flow/taint analysis library and higher level library that uses the previous API to perform data-flow/taint analysis over x86 instructions (as given to us by Pin). Both of these components are useful in their own right so I think I’m going to do a full rewrite (with added GUI + DB) and open source the code in the next couple of months. Hopefully they’ll prove useful for others working on dynamic analysis problems.

Exploit generation, a specialisation of testing?

It sounds like a silly question, doesn’t it? Nobody would consider exploit development to be a special case of vulnerability detection. That said, all research on exploit generation that relies on program analysis/verification theory (From now on assume these are the projects I’m discussing. Other approaches exist based on pattern matching over program memory but they are riddled with their own problems.) has essentially ridden on the coat-tails of research and tools developed for test-case generation. The almost standard approach to test-case generation consists of data flow analysis in combination with some sort of decision procedure. We then generate formulae over the paths executed to create inputs that exercise new paths. This is also the exact approach taken by all exploit generation projects.

There are pros and cons to this relationship. For instance, some activities are crucial to both test-case generation and exploit generation, e.g., data flow and taint analysis. Algorithms for these activities are almost standardised at this stage and when we work on exploit generation we can basically lift code from test generation projects. Even for these activities though there are sufficient differences and opportunities presented by exploit generation that it is worth doing some re-engineering. For example, during my research I extending the taint analysis to reflect the complexity of the instructions involved in tainting a location. When building a formula to constrain a buffer to shellcode we can then use this information to pick the locations that result in the least complex formulae. An exploit only needs a single successful formula (usually) so we can pick and choose the locations we want to use; testing on the other hand typically requires exhaustive generation and thus this optimisation hasn’t been previously applied because the benefits are less evident (but still might be a decent way of increasing the number of test cases generated in a set time frame).

The two problems share other similarities as well. In both cases we find ourselves often dissatisfied with the results of single path analysis. When generating an exploit the initial path we analyse might not be exploitable but there may be another path to the same vulnerability point that is. Again in this case we can look to test case generation research for answers. It is a common problem to want to focus on testing different sub-paths to a given point in a program and so there are algorithms that use cut points and iterative back-tracking to find relevant paths. So with such research available one might begin to think that exploit generation is a problem that will be inadvertently ‘solved’ as we get better at test case generation.

Wrong.

With test case generation all test cases are essentially direct derivatives from the analysis of a previous test case. We build a formula that describes a run of the program, negate a few constraints or add on some new ones, and generate a new input. Continue until boredom (or some slightly more scientific measure). What I am getting at is that all the required information for the next test is contained within the path executed by a previous test. Now consider an overflow on Windows where we can corrupt the most significant byte of a function pointer that is eventually used. If you decide to go down the ‘heap spray’ route to exploit this vulnerability you immediately hit a crucial divergence from test case generation. In order to successfully manipulate the structure of a programs heap(s) we will almost always require information that is not contained in the path executed to trigger the vulnerability initially. Discovering heap manipulation primitives is a problem that requires an entirely different approach to the test case generation approach of data flow analysis + decision procedure over a single path. It is also not a problem that will likely ever be solved by test case generation research as it really isn’t an issue in that domain. Whole classes of vulnerabilities relating to memory initialisation present similar difficulties.

What about vulnerability classes that fit slightly better into the mould carved out by test generation research? One of the classes I considered during my thesis was write-4-bytes-anywhere style vulnerabilities. Presuming we have a list of targets to overwrite in such cases (e.g. the .dtors address) this is a solvable problem. But what if we only control the least significant byte (or word) and can’t modify the address to equal one of the standard targets? Manually one would usually see what interesting variables fall within the controllable range, looking for those that will be at a static offset from the pointer base. But what is an ‘interesting variable’? Lets assume there are function pointers within that range. How do we automatically detect them? Well we’d need to monitor the usage of all byte sequences within the range we can corrupt. It’s a problem we can approach using data flow/taint analysis but once you start to consider that solution it starts to look a lot like a multi-path analysis problem but over a single path. We are no longer considering just data that is definitely tainted by user input, we are considering data that might be, and as we can only control a single write we have different ‘paths’ depending on what bytes we choose to modify….. and we’re doing this analysis over a single concrete path? Fun.

I guess the core issue is that test-case generation and exploit generation are close enough that we can get adequate results by applying the algorithms developed for the former to the latter. To get consistently good results though we need to consider the quirks and edge cases presented by exploit generation as a separate problem. Obviously there are many useful algorithms from test case generation research that can be applied to exploit generation but to apply these blindly misses opportunities for optimisations and improvements (e.g. the formula complexity issue mentioned). On the other hand there are problems that will likely never be considered by individuals working on test case generation; these problems will require focused attention in their own right if we are to begin to generate exploits for certain vulnerability classes.

Automatic exploit generation: Lessons learned so far

Here are a couple of thoughts that are bouncing around in my head as I come to the concluding stages of my v1 prototype. I’ve made a number of design decision in the initial implementation that I now see issues with, but hopefully by documenting them v2 will be better!

Using in-process analysis tools might not be such a good idea: Early on I decided to use dynamic analysis to gather information about taint data propagation and path conditions. Due to previous work (such as catchconv) using dynamic binary instrumentation frameworks, like Valgrind, I pretty much immediately decided I would do the same. After writing a couple of basic apps for Valgrind, Pin and DynamoRio, I eventually settled on Pin due to its cross platform support and C++ codebase. One critical factor I ignored is that these tools really aren’t designed with malicious code in mind. So, when you do things like trash a stored instruction pointer it can really confuse the DBI tool.

Other problems can occur if the vulnerability ends up writing over several hundred megabytes of junk in the application address space. This can lead to difficult to debug problems, where the memory in use by the injected analysis client is being corrupted, as well as that of the application under test.

More basic, but just as time consuming, problems stem from the fact that these in-process analysis clients are rather difficult to debug once something goes wrong. The three frameworks mentioned vary in their support for debugging and error logging, but in general it is exceedingly annoying to debug anything non-trivial. Simple segfaults have eaten hours of my time and often you are left resorting to printf based ‘debugging’.

The final issue I’ve come across is obvious, but should still be mentioned. Complex runtime instrumentation, such as dataflow analysis, really effect the responsiveness and runtime of the application. My current code, which is admittedly incredibly unoptimised, can increase the runtime of ls from milliseconds to about 20 seconds. This isn’t much of an issue if you don’t need to interact with the application to trigger the vulnerability, but in a case where some buttons need to be clicked or commands entered, it can become a significant inconvenience.

Assisted may be better than automated: The idea of this project is to investigate what vulnerability classes are automatically exploitable, and to develop a prototype that can show the results. I’ve achieved this goal for a sufficient variety of vulnerabilities and shown that automation is in fact possible.

There is a but here though; to continue this project would require constant attention and coding to replace the human effort in exploit generation each time a new class of vulnerability comes along, or a change in exploit technique. As I’ve moved from basic stack overflows to considering more complicated scenarios, the differences in exploit types become more time consuming to encode. Because of this, I intend (when I implement v2 of the tool in the coming months) to move away from complete automation. By putting the effort into providing a decent user interface, it will be possible to inform an exploit writer of the results of data flow and constraint analysis and have them make an educated judgement on the type of exploit to attempt, and specify some parameters. Working from this point of view should make the entire tool much less effort to port between operating systems also.

Information on memory management is very important: This is an obvious point for anyone that has had to write heap exploits in the last 5 years or so. It is near impossible to automatically generate a Linux heap exploit without having some information on the relationship between user input and the structure of the processes heap. When manually writing an exploit we will often want to force the program to allocate large amounts of memory, and the usual way to do this involves jumping into the code/disassembly and poking around for a while until you find a memory allocation dependent on the size of some user supplied field, or a loop doing memory allocation with a user influenceable bound.

Essentially, to have a solution that takes care of both scenarios we need a way to infer relationships between counters and program input. The first paper to discuss how to do this using symbolic execution was published in March of this year, and is a good read for anyone considering implementing this kind of tool.

As I hadn’t added loop detection, or other required functionality described in that paper, my current tool is unable to do the analysis described. I consider this a rather annoying drawback, and it will be among my highest priorities for v2. Hacky solutions are possible, such as modelling the result of strlen type functions on user input, but this would miss a number of scenarios and is in general quite an ugly approach.