Better Interpreter Fuzzing with Clang

Last weekend I decided to finally put some effort into investigating Clang, the C/C++/Objective-C frontend for LLVM. Clang is interesting as it is not only designed to provide efficient parsing and processing during compilation, but also as a platform for program analysis tools. Other options for static analysis of C/C++ exist e.g. GCC extensions, cscope hackery, and the various things built on top of CIL (e.g. Frama-C). However, Clang is attractive to me as it supports both C and C++, has a very permissive license, includes a well designed, extensive and documented API and is itself written in C++ (which might be a negative point depending on your view of the world but I like it :P). It also appears to have the backing of both Apple and Google, with a number of their engineers contributing regularly to both LLVM and Clang. All in all, it feels like a maturing project with a skilled and active community behind it.

Clang supports three approaches to writing analysis tools. Using libclang is the encouraged method for most tasks and is probably what you’re looking for if all you want to do is iterate over ASTs and pull out data. The next option is to write a plugin for Clang to be run during the build phase of a project. This is slightly more involved and doesn’t have the benefit of allowing you to use a Python/whatever wrapper around the API. However, if you want to do things like automated source rewriting, or more involved static analysis as part of a build, then this is where you want to be. The final approach is using libtooling. I haven’t looked into writing anything based on this yet but it appears to offer the benefits of writing a standalone tool provided by libclang with the power of full AST control offered by a plugin. The only potential downsides, in comparison to libclang, that I can see are that you cannot write your code in a higher level language and it does not have access to as convenient an API.

After a bit of digging around in documentation and source, I think libclang offers the most gentle introduction to Clang as a program analysis platform. It’s not as powerful as the other two methods but has a stable API (check out include/clang-c/Index.h for the C interface and bindings/python/clang/cindex.py for the Python bindings) and supports a nice visitor based approach for writing tools that traverse an AST. From reading over the above code, and following Eli Bendersky’s introduction to using the Python bindings, you should be able to get up and running fairly easily. As a side note, libclang has progressed somewhat since that blog post so the limitations related to missing cursor kinds are now largely alleviated. There are still some missing features but we’ll get to that later.

A relevant question about now is probably “What the hell has all of that got to do with interpreter fuzzing?”. In general, not a whole lot =) However, when trying to think of some small problem I could tackle to get a feel for libclang I recalled an issue I previously encountered when fuzzing PHP. When fuzzing an interpreter there are typically two high level stages that are somewhat unique to interpreter fuzzing; 1) Discover the functions you can call and 2) Discover how to correctly call them e.g. the number and types of their arguments.

In this regard, PHP is a slightly easier target than say, Python or Ruby, because on the first point it has a lot of global functions that can be called without instantiating a class or figuring out what module to import. These functions can be easily discovered using get_defined_functions. Using this approach it is simple to enumerate a fairly large attack surface that calls directly into the C backend (somewhere upwards of 500 functions in a default configuration) and get fuzzing.

Point 2 remains a problem however. Each PHP function implemented in the C backend takes a number of typed parameters. Some type conversion can and does take place, which means you can specify a boolean when an int is required or other such things. Naturally, there are some types that are completely incompatible, e.g. a function callback is required and you provide an integer type. If you get this wrong the function will bail out without performing any further processing. As well as incorrect argument types, specifying an incorrect number of arguments will also often result in the function execution terminating before anything interesting happens. Precise implementation details can be found in the zend_parse_va_args and zend_parse_arg_impl functions of Zend/zend_API.c.

When fuzzing PHP we are then left with a bit of a conundrum. We can easily discover a substantial attack surface but we still need to figure out the types and number of arguments that each function expects. One solution is to just play a numbers game. Fuzzing an interpreter is fast… really fast. It is also easily distributed and instrumented. So, if we just guess at the types and parameters we’ll probably be correct enough of the time [1]. This has a slight issue, in that there is going to be a bias towards functions that take a smaller number of arguments for purely probabilistic reasons. An improvement is to just parse the error messages from PHP which give an indication as to why the function bailed out e.g. incorrect types or incorrect number of arguments. This is all kind of ugly however and it would be nice if we could just know up front the correct argument types and the exact number required.

Enter Clang! An efficient solution for this problem is simply to parse every C file included in the PHP build, find all calls to zend_parse_parameters, extract the format string that specifies the arguments and their types and then relate this information back to the function names discovered via get_all_functions. It’s a fairly straightforward task on the PHP code base and could probably be achieved via regular expressions and some scripting. That said, using libclang we can come up with a much tidier and less fragile solution. Before I delve into the solution I should mention that I’ve uploaded the code to GitHub under the name InterParser, minus the fuzzer, should you want to try it out for yourself, extend it or whatever.

I’ll explain the code using questions for which the answers were not immediately obvious from the Clang documentation or are things that were interesting/surprising to me.

How do I run my libclang based tool on all files in a build?

The solution to this is also a solution to the next problem so I’ll deal with it there!

How do I know the correct compiler arguments to use when parsing a file?

The clang_parseTranslationUnit of Index.h allows you to specify the arguments that would be passed to the compiler when parsing a particular source file. In some cases it is necessary that you specify these arguments correctly or the code that is parsed will not accurately reflect the code that is included when you run make. For example, if a chunk of code is enclosed within #if BLA / #endif tags and BLA is supposed to be defined, or not, via the compiler arguments then unless you provide the same argument when creating the translation unit the code will not be parsed.

This problem, and the previous one, could easily be solved if we could tell 1) every file processed when make is run and 2) the command line argument passed to the compiler on each invocation. After browsing the Clang documentation for a solution a friend of mine suggested the obvious answer of just wrapping the compiler in a script that logged the relevant information during the build. The script creplace.py provides this functionality. By pointing the CC and CXX environment variables at it you should get a log of all files processed by the compiler when make is ran, as well as the arguments passed.

The ccargparse.py module provides the load_project_data function which you can then use to build a Python dictionary of the above information. If there’s an inbuilt Clang API to do all this I didn’t find it [2], but if not you should be able to just drop in my code and work from there! See the main function of parse_php.py for exact usage details.

Once the compiler arguments have been extracted and converted to a list they can just be provided to the index.parse method when constructing the translation unit for a given file. e.g.

comp_args = load_project_data(cc_file)
args = comp_args[file_path]
index = clang.Index.create()
tu = index.parse(file_path, compiler_args)

How do I find all functions within a file?

Our first task is to find all calls to zend_parse_parameters and note the name of the top level backend API function so that we can later relate this to a user-space PHP function call. To do this we first need to find all function declarations within a translation unit for a source file. e.g the process_all_functions method:

for c in tu.cursor.get_children():
    if c.kind == clang.CursorKind.FUNCTION_DECL:
        f = c.location.file
        if file_filter and f.name != file_filter:
            continue
        fmt_str = process_function(c)

This is a fairly simple task — we iterate over all children of the top level translation unit for the given source file and compare the cursor kind against FUNCTION_DECL, for a function declaration. It’s worth noting that the children of a translation unit can include various things from files that are #include‘d. As a result, you may want to check the the file name associated with the location for the child is the name of the file passed to index.parse.

How do I find all calls to a particular function?

For each function declaration we can then iterate over all of its children and search for those with the kind CALL_EXPR. This is performed in the process_function method. Firstly, we need to check if the function being called has the correct name. The first child of a CALL_EXPR AST node will represent the function itself, with the rest representing the arguments to this function.

The last statement isn’t quite correct. The C and C++ standards specify a number of conversions that can take place on a function call, so the direct children of a CALL_EXPR AST node will represent these conversions and then the direct child of these conversion nodes will contain the details we require. Clang does have a mode to dump the AST for a given source file so if you’re curious what the AST actually looks like try passing -ast-dump or -ast-dump-xml as parameters to the clang compiler.

As an example, on a function call we encounter a CALL_EXPR node and its first child represents the decay of the function to a pointer as specified by the C/C++ standards. This node will have the kind UNEXPOSED_EXPR. It will have a single child of type DECL_REF_EXPR from which we can retrieve the function name via the displayname attribute. The following code in process_function performs the function name extraction and comparison.

if n.kind == clang.CursorKind.CALL_EXPR:
    unexposed_exprs = list(n.get_children())
    func_name_node = get_child(unexposed_exprs[0], 0)
    if func_name_node.displayname == "zend_parse_parameters":
        ...

How do I extract the parameters to a function call?

Once we know we have the correct function call we then need to extract the format string parameter that provides the type specification for the user-space API call. As with the node representing the function, arguments may also undergo conversions. The second argument to zend_parse_parameters is the one we are interested in, and it is usually specified as a string literal. However, string literals undergo a conversion from an rvalue to an lvalue. To account for this, we again go through an UNEXPOSED_EXPR node representing the conversion to access the node for the string literal itself.

Things get a little inconvenient at this point. Once we have the node representing the string literal we ideally want some way to access the actual value specified in the source code. It turns out that this functionality hasn’t been added to the Python bindings around libclang yet. Fortunately, a guy by the name of Gregory Szorc has been working on a branch of libclang and the associated Python bindings that adds the get_tokens method to AST nodes. His code is great but also contains a bug when get_tokens is called on an empty string literal. Until that gets fixed I’ve mirrored his cindex.py and enumerations.py files. Drop both into your clang/bindings/python/clang directory and you should be good to go!

The actual code to extract the string literal given the node representing it is then straightforward.

if tkn_container.kind != clang.CursorKind.STRING_LITERAL:
    raise VariableArgumentError()

tokens = list(tkn_container.get_tokens())
if tokens[0] is None:
    return ""

The check on the kind of the node is required as in a miniscule number of cases the format string parameter is passed via a variable rather than directly specified. This situation is so rare that it isn’t worth handling. The only other slightly odd thing is that when get_tokens is called on an AST node representing a string literal, like the first parameter in func("xyz", a), it will return a generator for two elements. The first will be the string literal itself, including the double quotes, while the second will represent the comma separator. I’m not sure if this is a bug or not but it’s not a big deal. If the string literal is empty then the generator will return None.

Conclusion

At this point we can associate the functions calling zend_parse_parameters with the format string passed to that call, and thus the types and number of arguments that the function expects. In a fairly small amount of Python we have a robust solution to problem 2, discussed earlier. A question you may have is How do you then associate the function name/argument specification from the C code to the corresponding user-space PHP function?. Well, PHP is nice in that it doesn’t do a massive amount of name mangling and consistently if we have a user-space function called func_one then the backend C function will be called zif_func_one (in the code it will probably look like PHP_FUNCTION(func_one) but Clang will expand the macro for you).

When parse_php.py terminates it will output something like the following:

INFO:main:API info for 518 functions in 68 files written to php_func_args.txt

Inspecting the output file we’ll see function to type specification mappings:

# /Users/sean/Documents/git/php-src/ext/standard/levenshtein.c
zif_levenshtein ss sss sslll
# /Users/sean/Documents/git/php-src/main/main.c
zif_set_time_limit l
# /Users/sean/Documents/git/php-src/ext/standard/versioning.c
zif_version_compare ss|s
...

This tells us that a user-space PHP script can correctly call levenshtein with two strings, three strings or two strings and three longs. It can call set_time_limit with a single long only. version_compare can be called with two or three strings and so on and so forth. So, with this information at hand we can construct a PHP fuzzer that no longer has to guess at the number of arguments to a function or their types.

Extensions

The most obvious extension of this code that comes to mind is in pulling out numeric values and string literals from comparison operators. For example, say we’re analyzing a function with the following code:

if (x == 0xc0)
    ...
else
   ....

It might be an idea to pull out the constant 0xc0 and add it to a list of possible values for numeric input types when fuzzing the function in question. Naturally, you’ll end up with extra constants that are not compared with the function inputs but I would guess that this approach would lead to some gain in code coverage during fuzzing.

Resources

Introduction to writing Clang tools
Eli Bendersky’s blog post on parsing C with libclang
Unofficial Clang tutorials

[1] This is what I did when I first fuzzed PHP a long time ago. You hit a lot of incorrect type/argument count errors but you can get so many executions per second that in general it works out just fine =)

[2] Two days after I wrote this it seems that support for the compilation database functionality was added to the Python bindings. You can read about this feature here. Using this feature is probably a more portable way of gathering compiler arguments and pumping them into whatever APIs require them.

Anatomy of a Symbolic Emulator, Part 3: Processing Symbolic Data & Generating New Inputs

In this final video in the series we go over how to generate new inputs for a program once we detect a user-influenced conditional branch. At the end there is also an example of the type of condition/resulting formula that we get from working on a real file parser, in this case libwebp.

(You probably want to click the “Watch on YouTube” option on the bottom right of the video and set the quality to 720p)

Conclusion
This type of emulation, input generation and formula checking does not need to be limited to conditional jumps. As I discussed in a previous post you can use a similar approach to discover variable ranges, check for variable relationships and assist in figuring out complex algorithms. For example, one could generate a query to the solver every time an argument to malloc is found to be influenced by the user, or export a list of all functions that operate on user-influenced data to IDA for manual review. (In fact, a light-weight version of this approach in combination with fuzzing and an IDA importer is possibly more generally useful to an individual auditor than going down the route of full on whitebox fuzzing. More on that later =))

Anyway, I hope these videos provide some insight into how a whitebox fuzzer might work as well as one approach to building a symbolic emulator. To give an idea of the effort involved – the combined whitebox fuzzing, trace parsing and emulation code (along with supporting libraries) comes to around 10,000 lines of Python. Of this, the emulator itself is only 3000 lines or so. The PIN tracer is just under 1000 lines of C++.

Tracing is currently fairly unoptimised and parsing something like a video or image while tracing can result in a factor of 10-100 increase in running time. This usually means a wait of 30 seconds, which isn’t too bad for whitebox fuzzing as tracing is not performed too often but for other uses of a symbolic emulator (like tracing while fuzzing normally) this will require some work. The emulator itself is Python based and as such is not lightning fast. In the default run-mode it emulates ~5000 instructions per second. What this translates to is about 30-40 minutes per trace of an average file parser. This isn’t as bad as you might think however as the tests cases generated tend to be much more effective at hitting new code than what you would get from dumb fuzzing. Despite this we still need performance gains and I’m working on a few different solutions for that. Somewhere around 30,000+ instructions per second would be what I would consider approaching acceptable =)

To preempt the inevitable questions – for now JESTER is not publicly available but that may change in the future. It’s very much a research prototype at the moment where we’re testing out several approaches to improving performance and general usefulness. However, if you are interested in working on this type of research post a comment with a contact address (it won’t appear publicly) as I’m fairly sure we are currently hiring.

Anatomy of a Symbolic Emulator, Part 2: Introducing Symbolic Data

In the previous post I discussed one way to go about gathering a trace for emulation. In this I’m going to talk about how we go about emulating such a trace, how and why we hook functions as they are emulated and how symbolic operations are performed.

As before, this post is accompanied by a video which demonstrates the code in action. Unlike the previous post I’ve decided to skip the paragraphs of rambling and instead most of the info is in the actual video itself =)

Topics covered:
– Introducing symbolic data via function hooks
– Performing computations on symbolic data

(You probably want to click the “Watch on YouTube” option on the bottom right of the video and set the quality to 720p. Btw, near the end of the video I said something along the lines of “one of the advantages of whitebox fuzzing over symbolic emulation”. That makes no sense =) What I meant to say was “one of the advantages of whitebox fuzzing over normal symbolic execution”.)

Anatomy of a Symbolic Emulator, Part 1: Trace Generation

A couple of months ago there was an ACM article on the SAGE whitebox fuzzing system from Microsoft Research. SAGE is one of the most interesting products of research on automated program testing in recent years and, according to Microsoft, has been used to find a massive amount of bugs in their various file parsers.

At its core, SAGE contains a symbolic emulator for executing instruction traces over symbolic data. As well as whitebox fuzzing, symbolic emulators are fairly useful things for a variety of reverse engineering, vulnerability discovery and program analysis tasks. Essentially, a symbolic emulator is a CPU emulator that not only supports operations on concrete numeric values but also on abstract values that may represent a range of concrete values.

In this series of posts I’m going to give an overview of a Python-based symbolic emulator for x86 that I’ve been working on (called JESTER) and show how it can be applied to the problem of whitebox fuzzing. Hopefully this will give an idea of how symbolic emulation works (it’s fairly simple) and also provide some insight into how systems like SAGE operate. Part 2 in this series covers how to introduce symbolic data into the system, and part 3 covers how the system can generate new inputs for the target program that take new paths.

Consider the x86 instruction add eax, ebx. Operating over concrete values an emulator will do the obvious thing of taking the value in EAX, adding it to EBX and then storing the result back in EAX. It will also update the various flags that are affected. Over symbolic values however the result is a bit more interesting. Lets assume that EAX contains the abstract value V1 which represents an unconstrained 32-bit variable, and EBX contains the concrete value 0x10. In this case the emulator will create a new abstract value V2 which represents the addition of V1 and 0x10 and store that back in EAX. Diagrammatically, we can see that EAX now contains something that is a function rather than a single value.

      v1   10
       \   /
    EAX: +    

A slightly more complex diagram shows what the Zero Flag would hold after the above instruction.

      v1   10
       \   /
         +    0
          \  /
           ==   1    0
            \   |   /
        ZF: if-then-else

I purposefully used the word ‘function’ because what we end up with, in registers and memory, are expression trees that map from a given set of inputs to an output. As more instructions are emulated these trees get bigger and more difficult to reason about so people usually take the approach of exporting them to a SMT solver and querying their models that way. The obvious applications being input crafting, tracking user-influenced data and checking security properties. This is fairly well documented in previous posts and in a decade worth of academic literature so I won’t delve into the details.

The point of this post is instead to look at the overall architecture of a symbolic emulator with the aim of illuminating some of the components involved, more directly than is typically done in formal descriptions. I also want to give people an idea of how much or how little effort is involved in building these tools. In order to demonstrate the use of a symbolic emulator I’ll apply it to the problem of whitebox fuzzing i.e. using a symbolic emulator in combination with a SMT solver to generate inputs for a program guaranteed to force execution down a new path.

While writing this series of posts Rolf Rolles posted a great video/blog entry on the topic of input crafting using an SMT solver. Taking a leaf out of his book I’ve decided to accompany these with a video that demonstrates the tools described in operation and should ideally give some insight into their construction. The video is linked at the end but the following wall of text will give some context and might be worth glancing over. This isn’t the most entertaining of entries in the series and is mostly for completeness so if you’re bored out of your mind I accept minimal responsibility =)

1. Trace generation

An emulator needs some way to know what instructions execute and it also needs a starting memory and thread context. There are a few different approaches to getting such information. The Bitblaze/BAP research groups modified Qemu and hook in directly there, the guys working on S2E do something similar and I previously wrote a C++ library that was used as part of a Pintool at run time. There are a couple of problems with tying your emulation directly into the run time environment of the tool however. Firstly, it’s a lot more annoying to debug an extension to Qemu or PIN than n separate emulator and secondly, it prevents you from doing the emulation on a separate machine to the tracing. The second issue is probably the most important in the long run as to really scale whitebox fuzzing to the point where it is useful requires parallelism.

The approach I took this time around is directly inspired by the work of MSR on their Nirvana/iDNA tool, but much more simplistic. Instead of using the Pintool to do the emulation I use a lightweight one to just trace the instructions executed and other interesting events, like image loads/unloads, system calls and debugging info. If you’ve used PIN before then most of what I’m about to describe will be obvious and fairly boring so you might want to skip on to part 2 of this series of entries.

The trace format is uncompressed and unoptimised and to date I’ve not had any problems with that. A typical segment just looks as follows (L denotes an image load, I an instruction execution and C provides debugging information as discussed below):

L;4;/lib32/libc.so.6;f5c54000;157244
L;5;/lib32/libm.so.6;f5c2e000;24790
C;0;EAX:ffb292a4;EBX:f5da9ff4;ECX:53f78923;EDX:5;ESP:ffb291f8;EBP:ffb291f8 ... 
I;0;8048fc5
C;0;EAX:ffb292a4;EBX:f5da9ff4;ECX:53f78923;EDX:5;ESP:ffb291f0;EBP:ffb291f8 ... 
I;0;8048fc8
C;0;EAX:ffb292a4;EBX:f5da9ff4;ECX:53f78923;EDX:5;ESP:ffb291ec;EBP:ffb291f8 ... 

In the early stages of the project I worried about this and thought I’d have to come up with some compression method but that hasn’t been the case. Most file parsers generate traces that can be measured in 10s of millions of instructions and things of that scale easily fit in a few gigabytes of storage.

1.1 Debugging Assistance

Writing an emulator of any kind can be tedious work. It’s easy to make mistakes and get the semantics of an instruction slightly wrong or set a flag incorrectly. Initially I tried to counter this by writing unit-tests but it quickly became obvious that 1) These were never going to be exhaustive and 2) They were as likely to have mistakes as the emulator code itself. Instead, I added a debug mode to the tracer that logs the register values after each instruction (The lines starting with a “C” above). This then allows the emulator to compare its register values to the ones we know it should have and highlight any discrepancies. Tracing everything in /usr/bin/ and checking these values is a hell of a lot more exhaustive than any unit-testing I would have done! The only reason I’m mentioning this is that I’d recommend it to anyone writing something of this nature. The best tests are by far those you can extract from real binaries.

1.2 Handling system calls

One of the disadvantages of using a user-land tracer is that you miss out on any updates to memory that happens within the kernel. The only real way to handle this correctly is to define per-system-call handlers that know which memory addresses a system call will update based on its arguments or return value. In PIN this is fairly straightforward, you register a syscall entry and exit handler, get the syscall args or return value and then log whatever you need on exit.

int main(int argc, char *argv[])
{
        ...

        PIN_AddSyscallEntryFunction(SyscallEntry, 0);

        ...
}

VOID SyscallEntry(THREADID tid, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v)
{
        ADDRINT syscall_num = PIN_GetSyscallNumber(ctxt, std);
        ...
#ifdef LINUX
        // Handle Linux syscalls
        switch (syscall_num) {
        case SYS_READ:
        {
                ADDRINT read_buf_ptr = PIN_GetSyscallArgument(ctxt, std, 1);
                size_t read_sz = PIN_GetSyscallArgument(ctxt, std, 2);

                t_data->log_addrs[read_buf_ptr] = read_sz;
                break;
        }
        ...
}

Handling each and every system call might seem like an onerous task but if you’re working on particular types of software (e.g. file parsers) then you can get away with a minimal subset e.g. open, read, lseek, mmap and a few others. My general approach is to just add them as necessary. You’ll encounter many more along the way but it turns out not a whole lot end up having any interaction with the user controlled data you’re interested in.

In the trace log format I included support for events other than those shown in the above snippet.). For syscalls as just discussed there is the M event which looks like as follows and tells the emulator to update the memory address given with the contents of a file.

M;0;f5f97000:syscall_c0_f5f97000_1000_1

There is also the ‘R’ event which tells the emulator to update a register with a particular value. This is useful for instructions you can’t handle for whatever reason. Other than that there isn’t really anything to capturing a trace. The only thing I haven’t mentioned is that on starting tracing, either at a given address or the programs entry point, you also need to log the programs memory and thread contexts at that point in order to give your emulator starting values. This is fairly straightforward though and PIN provides all the required API calls.

(You probably want to click the “Watch on YouTube” option on the bottom right of the video and set the quality to 720p. The tools discussed are not publicly available but that may change in the future.)

Parts 2 and 3 of this series can be found here and here.

SAT/SMT Summer School 2011 Summary (Days 5 & 6)

Day 5

Sketching: Program Synthesis using SAT Solvers (Armando Solar-Lezama)

Armando started his talk by demonstrating the automatic synthesis of a program for swapping two integer variables without using a third. It’s a standard algorithm and quite small but was still cool to see. He then demonstrated a few more algorithms involving bit-level arithmetic. The implementation of this tool, called Sketch, can be found here. The demonstrations given were for a C-like language and apparently synthesis works quite well for algorithms based around bit-twiddling.

These programs were generated from program ‘sketches’, essentially algorithmic skeletons, and a test harness, similar to unit tests, that described the desired semantics of the program. The sketches express the high level structure of the program and then the details are synthesized using a SAT solver and a refinement loop driven by the tests. The idea of the sketches is to make the problem tractable. The intuition for this was given by the example of curve fitting. That can be a difficult problem if you have nothing to go on but data points whereas if you are told the curve is Gaussian, for example, the problem becomes much more feasible.

The synthesis algorithm first uses the sketched fragment to generate a candidate program and then a SAT solver is invoked to see if the conjunction of this program and the semantics described by the tests are valid. If not, a counter-example is generated and used to refine the next iteration of program generation. This is incredibly simplified and the full details can be found in Armando’s thesis.

This was yet another talk where there was an emphasis on pre-processing formulae before they get to the solver. The phrase ‘aggressive simplification’ was used over and over throughout the conference and for synthesis this involved dataflow analysis and expression reduction (e.g. y AND 1 reduces to y) as well as more standard common sub-expression elimination.

Day 6

Harnessing SMT power using the verification engine Boogie (Rustan Leino)

This talk began with some coding demonstrations in a language called Dafny that has support for function pre-conditions, post-conditions and loop invariants. As these features are added to a code-base they are checked in real time. Dafny is translated into an intermediate verification language (IVL) called Boogie (the verification system for it is open source under the MS public license) which can be converted into SMT form and then checked using the Z3 SMT solver. While this fun to watch, most languages don’t have these inbuilt constructs for pre/post-conditions and invariants. Fortunately, Boogie is designed to be a generic IVL and translation tools exist for C, C++, C#, x86 and a variety of other languages (although from what I gather only some of these are publicly available and none are open source). As such, Boogie is designed to separate the verification of programs in a given language from the effort of converting them into a form that is amenable to checking.

The high level, take-away message from this talk was “Don’t go directly to the SMT solver”. It relates to the separation of concerns I just mentioned. This lets you share infrastructure and code for verification tasks that will be common between many languages and also means you have an intermediate form to perform simplification on before passing any formulae to a solver.

HAVOC: SMT solvers for precise and scalable reasoning of programs (Shuvendu Lahiri & Shaz Qadeer)

HAVOC is one such verification tool for C that makes use of Boogie. It adds support for user-defined contracts on C code that can then be checked. Based on the Houdini algorithm HAVOC can also perform contract inference with the aim of alleviating much of the burden on the user.

I really wish we had something similar to HAVOC for code auditing (this was actually one of the use cases mentioned during the talk). I’m not sure about others but essentially how I audit source code involves manually coming up with pre-conditions, post-conditions and invariants and then trying to verify these across the entire code-base by hand. This is fine, but with a tool-set of vim, ctags and cscope it’s also incredibly manual and seems like something that could at least be partially automated. It was mentioned that a more up-to-date version of HAVOC might be released soon so maybe this will be a possibility.

Non-DPLL Approaches to Boolean SAT Solving (Bart Selman & Carla Gomes)

This talk was on probabilistic approaches to SAT solving. These techniques still lag far behind DPLL based algorithms on industrial benchmarks but are apparently quite good on random instances with large numbers of variables.

Symbolic Execution and Automated Exploit Generation (David Brumley)

While I previously ranted about the paper this talk was based on, this talk was a far better portrayal of the research. Effectively, we’re in the very, very early stages of exploit generation research; while there have been some cool demos of how solvers might come into play we’re still targeting the most basic of vulnerabilities and in toy environments. All research has to start somewhere though, my own thesis was no more advanced, so it was good to see this presented with an honest reflection on how much work is left.

One interesting feature of the CMU work is preconditioned symbolic execution which adds preconditions to paths that must be satisfied for the path to be explored. This is a feature missing from KLEE but would be just as useful in symbolic execution for bug finding as well as exploit generation. Something that remains to be researched and discussed is efficient ways to come up with these pre-conditions.

Conclusion

The summer school was a great event and renewed my enthusiasm for formal methods as a feasible and cost effective basis for bug finding and exploit development. The best talks were those that presented an idea, gave extensive, concrete data to back it up and explained the core concepts and limitations with real world examples. I hope to see more papers and talks like this in the future.

A generic conclusion for the six days would be difficult, so instead the following were the reoccurring themes that stood out to me across the talks that may be relevant to someone implementing these systems:

Focus on one thing and do it well. For example, separate instrumentation from symbolic execution from solving formulae.
Aggressively simplify before invoking a solver. Simplification strategies varied from domain specific e.g. data-flow analysis, to generic logical reductions but all of them greatly reduced the complexity of the problems that solvers had to deal with and thus increased the problems the tools could handle.
Abstract, refine and repeat. The concept of a counter-example guided abstraction refinement loop seemed to be core to algorithms from hardware model checking, to program synthesis, to bug finding. In each, CEGAR was used to scale algorithms to more complex and more numerous problems by abstracting complexity and then reintroducing it as necessary.
Nothing beats hard data for justifying conclusions and driving new research. This point was made in the earliest talks on comparing SAT solver algorithms and reiterated through the SAGAN information collection/organisation system of SAGE. Designing up front to gather data lets you know where things are going wrong, keeps a record of improvements and makes for some pretty cool slides when you need to convince other people you’re not insane =)

SAT/SMT Summer School 2011 Summary (Days 3 & 4)

The slides for the summer school have started to go online so for the remaining days I’ll just give a quick summary of parts I thought were particularly interesting or comments that were made but not in the slides.

Day 3

BitBlaze & WebBlaze: Tools for computer security using SMT Solvers (Dawn Song & Prateek Saxena)

The first thing of note from this talk was a brief discussion on selecting paths for analysis during symbolic/concrete execution. Anyone who has used KLEE knows the lack of sane path selection mechanisms is a significant drawback so it was good to see one of the talks on this type of system discuss it. The methods used were dataflow and control flow distances to functions of interest.

An interesting problem tackled later in the talk was how to distinguish due from undue influence of tainted data over control flow. For example, it might be acceptable for user data to taint a value used in a switch statement that selects a function to run but it’s unlikely to be very good if it can taint a function pointer directly. Four different methods were presented from distinguishing these cases, the simplest being point by point exhaustion using a solver of the number of possible target addresses in the address space. More complex probabilistic and information theoretic approaches were also discussed and are elaborated on in their paper. It would be nice to see some more experimental data with these more advanced methods though as it is limited to 3 vulnerabilities and 3 benign cases.

SAT-based Model-Checking (Armin Biere)

Armin is the developer of one of the best SAT solvers, Lingeling, and his talk discussed advances in using SAT technology for model checking. During the talk he mentioned a paper by Aaron Bradley on SAT based model checking without unrolling which might be worth checking out but I haven’t had a chance to yet

CryptoMiniSat — A Rough Guide (Mate Soos)

This was a great talk by Mate Soos on CryptoMiniSat, which won last years SAT Race and is open source, and SAT solver design. Mate started with a discussion of the software design philosophy behind the project and put forward that it’s better to have less optimised and complex code if you can more easily implement better ideas. Given that his solver is faster than Lingeling, which is far more difficult to comprehend, it seems that he is correct. He had some other interest things to say on SAT solver features, emphasising regular simplification of expressions and maintaining a cache of results from unit propagation even if they are not currently useful.

SAGE: Automated Whitebox Fuzzing using SMT solvers (Patrice Godefroid & David Molnar)

In my opinion, this was the best talk of the summer school so far. Patrice and David discussed SAGE and presented a lot of data to encourage the development of tools for this kind of testing. SAGE is built on top of previously developed MS tools detecting crashes (AppVerifier), recording traces (Nirvana), generating constraints (TruScan) and solving constraints (Z3).

Unlike KLEE and the Bitblaze tools, the symbolic execution part of SAGE only accounts for a small fraction of the time cost. Only 1/4 of the total time is spent on symbolic execution with the remainder of their 3 week fuzzing runs spent on tracing, generating constraints and running the application under test on the fuzz files.

One interesting thing mentioned was that while most queries to the solver only take 1/10th of a second all queries are capped at 5 seconds and after that the solver is killed and the result is presumed UNSAT. This is based on the observation that they get more code coverage by this method than waiting for hours for a single query to return. They backed this up with some statistics that showing that longer run times only very rarely led to more bugs.

Some other points of note were:
– From the start SAGE was engineered to provide enough information and statistics on every part of its system that determining what it is doing and where it is succeeding/failing is possible. This is facilitated through a system called SAGAN that allows them to focus on areas needing work.
– SAGE is primarily deployed against file parsers. This is a use case where the majority of non-determinism is from the input. In other environments with different sources of non-determinism it might be more difficult to direct the application through constraint solving.
– Most OOM conditions are a result of trying to store the constraints in memory while analysing the trace, not in the solver as I would have expected. As a result, simplification and expression elimination can be necessary even before that staged.
– Most crashes seemed to be concentrated within the first 6 generations of constructed fuzz files but crashes were seen in all generations up to the mid to late teens. I don’t think they’ve ran for any longer than that.
– SAGE was responsible for 30% of bugs found in a certain class of file parsers on Windows 7. These were bugs missed by all other testing mechanisms. I wonder how long it will be before those of us interested in bug finding will have to start looking at tools like SAGE from the point of view of discovering where they are weak as a starting point for auditing.

All in all, this presentation had hard data to support some very exciting conclusions.

Day 4

Approaches to Parallel SAT Solving (Youssef Hamadi)
I had recently been wondering what the state of the art in parallel solving is so this was good to see. Youssef first started by proposing that we are unlikely to see order of magnitude speed ups in SAT solving based on advances in the current CDCL architecture of SAT solvers. I guess there are two ways to deal with this, one is to look at different approaches to sequential SAT solving and the other is to look at parallelism.

From 1996 to 2008 most of the approaches to parallel SAT proceeded by splitting the problem space into chunks and solving these instances in parallel with clause sharing of clauses under a certain size. Another approach is the portfolio approach used by parallel Z3, PLingeling and ManySAT. In this approach the same problem is attacked by several different solver instances each using a different configuration. The solver configuration is parameterized on the restart policy, polarity selection, clause learning and branching heuristics. This lead to super-linear speed-up with combinations of these solvers performing better than the sum of their parts.

One particular issue for this strategy though is how best to do clause sharing between solvers. Sharing of large clauses can be costly so one needs a strategy to figure out the upper limit on the size of clauses to share. This is complicated by the fact that the size of learned clauses progresses as the solver advances in the problem. Two algorithms were discussed in this context. The first was based on TCP bandwidth calculation algorithms that slowly increase the size of shared clauses and then quickly back off once a problem is detected and a heuristic based on variable activity to set a quality threshold on clauses to accept from other solvers.

This portfolio mechanism works well for between 4 and 8 cores but after that the effects of added cores is greatly diminished.

SAT Solving and Complexity Theory (Ryan Williams)

This was a theoretical talk that discussed some of the difficulties that people have ran into in not only proving P != NP but on many problems on the relations between complexity spaces. Much like the talk by Shai Ben David I’m pretty sure I’d butcher the mathematical details were I to summarise but the take away message was similar: the logics for reasoning and proof techniques that we have today for problems like this are quite insufficient; as a result people have struggled to even do far weaker reasoning like establishing lower bounds on NP-complete problems. It was a really interesting talk but I’ll definitely need to rewatch the video when it comes out =D

SAT/SMT Summer School 2011 Summary (Day 2)

Independence Results for the P vs. NP Question (Shai Ben David)

This was a really fun talk that was very much on the theoretical side of logic and satisfiability but with potentially very important implications. Ever since learning about early 20th century work by Hilbert, Godel, Turing etc. on foundations of logical systems and proofs I’ve been fascinated by anything that discusses the universal limitations and capabilities of logical systems. This was the first talk I’ve seen where this kind of purely theoretical work was linked to an implication for solving technologies. The fundamental question approached in the talk was whether P != NP is an irresolvable question give the logics we have available. That is, can we prove that it is unprovable.

I would do it injustice to try and summarise the talk (and I’d get it wrong!) but the main result was that if it were true that P is nearly equal to NP then we would not be able to prove P != NP using current lines of reasoning and tools. The interesting result for SAT solvers is that if this were the case then many of the problems we want to solve may be solvable in almost-polynomial time. The downside is that even if we could prove this the proof probably wouldn’t help at all in building a solver than can exploit this closeness.

I’ve totally butchered the details of this talk but you can find a earlier/shorter version of it here and a paper here.

HAMPI: A Solver for String Theories (Vijay Ganesh)

Vijay’s talk was on the HAMPI solver. HAMPI contains a theory for bounded (that’s important) character strings that allows it to reason about things like whether a regular expression matches against a particular string or not. From what I gathered, it operates by converting a regular expression into a context-free-grammar and then converting that context-free-grammar, along with any constraints we may wish to check, into a formula over bitvectors and checking the satisfiability of this with STP. The main target application was detecting oversights in regexs designed to catch SQL injection attempts but Vijay also mentioned they got a 2-5x speed-up when using this solver with KLEE on applications that do a lot of string manipulation. KLEE tends to perform quite poorly on things like XML parsers so I’d love to see if specialised solvers can help out here.

Modern SMT Solver Implementation (Leonardo De Moura& Nikolaj Bjorner)

This was a good talk by some of the best guys building SMT solvers, Leonardo De Moura and Nikolaj Bjorner. Both of their publication pages are worth checking out for details on building SMT solvers as well as the theoretical aspects.

They first highlighted some of the core problems in SMT solvers that affect performance: combining engines, unfairness between theory solvers and quantifiers. The most interesting part of the talk for me was on the use of abstraction/relaxing and then refinement when dealing with complex problems. For example, abstracting problems problems using uninterpreted functions and then checking satisfiability may reduce the complexity of the original problem. If it turns out that is UNSAT then the original is UNSAT and if you get a SAT result you can then refine the abstraction if necessary and check again. This idea of abstraction/refinement (CEGAR I guess) loops came up a lot in many different talks.

Also interesting was the mention of their verifying compiler projects that do function level verification and use contracts for called functions in the analysis rather than analysing down into them. I know the idea of contracts is used in HAVOC and discussed extensively in Thomas Ball’s publications but I’m not sure if this was the project they were referring too.

Scalable Testing/Reverse Engineering/Performance Profiling with Parallel and Selective Symbolic Execution (George Candea & Stefan Bucur)

The next talk was on the guys behind S2E and Cloud9. Cloud9 is cool in that it’s a big cluster of nodes each exploring different parts of a tree in symbolic execution. They found run times for gaining a particular code coverage percentile to drop dramatically when going from 1 to 8 nodes and then drop even further as they went up to 48 nodes. The total effect being a drop from 6 hours to minutes for the particular example.

S2E caught my attention a few weeks ago after reading their paper as it is designed to be a platform for writing analysis tools that leverage symbolic execution. To my knowledge it is the first system of this kind that allows a callback/event based mechanism for analysis tools and can target an entire operating system stack (it’s built on QEMU). They have some good documentation as well which is crucial for getting users involved. When I grabbed the code a few weeks back I did notice some dramatic slowdown in execution times even when not doing symbolic execution so that’s an issue that will have to be addressed but this looks like it could be a great project. With the combination of docs and well thought out design I’m hoping for the PIN of symbolic execution tools.

In the later part of their talk they gave some feedback to the SMT developer community with suggestions for improvements. For example, 30% of the time spent within their solver (STP) was spent in memory allocation routines. It’s something I haven’t seen a whole lot written on but the type of work that SMT engines is probably specific enough to require carefully crafted memory allocation algorithms. It’ll be interesting to see what comes of this in the future.

CVC3 and Applications (Clark Barrett)

Clark Barrett has been involved in SMT solver development for probably as long as anyone else and as CVC3 is the solver used internally in Immunity Debugger this talk was of particular interest. Clark mentioned that CVC4 is in development and should be seeing a release sometime this year so that’s good news. We’ve had some issues with CVC3 dealing with large array constraints and as this is being redone it should hopefully fare a bit better.

Unrelated to CVC3 really but one of the comments at the end was kind of striking in that the person said they often found using the theory of linear integer arithmetic with constraints to represent the bounded nature of 32-bit values faster than the theory of bitvectors. I guess that has something to do with their application area and the kinds of constraints if they’re not heavy on bit-level operations but it was something I’ve never thought to do before.

CEGAR+SMT: Formal Verification of Control Logic in the Reveal System (Karem Sakallah)

Karem Sakallah was one of the most entertaining speakers of the day and also presented some interesting ideas behind a verification system based on model checking and Counter Example Guided Abstraction Refinement (CEGAR) that is currently being used to verify real hardware. This was the second talk of the day in which abstraction and refinement using uninterpreted functions were discussed to make difficult problems more tractable (the first being the one by the MSR guys). In this talk Karem also mentioned that naive refinement was not sufficient. So, typically what happens is that when a SAT result turns out to be a false positive a constraint is generated that will block that result from being given again and this is added to the global state. To alleviate this some post-processing is done on the generated formula. They weaken the conditions so that it entails an entire family of states. For example, if the condition was (a == 5 AND b == 6) they weaken it to (a < b). I have no idea how they prevent this weakening from excluding valid states so I’ll need to follow up on that tomorrow =D

A final point made was that throughout their development process they built a number of optimisations but discovering the best combination of these optimisations was a trial/error process. The graph shown for this varied from combinations of optimisations that had no effect at all to some that cut the execution time to minuscule fractions of the base case.

SAT/SMT Summer School 2011 Summary (Day 1)

This week I’m attending the first SAT/SMT Summer School, organised by Vijay Ganesh and hosted at MIT. There are plenty of interesting talks, organised into three categories, so I figured it might be useful to do a brief summary of each day with some links to relevant material. I’ll update this post as the week progresses.

Introduction to Satisfiability Solving with Practical Applications (Niklas Een)

The first day of talks was a preliminary day, providing introductions to the foundations of the SAT and SMT problems and a quick history of how the research areas have progressed. Niklas Een, one of the MiniSAT developers, opened the technical part of the conference discussing the history of automated approaches to SAT. He then moved on to an overview of the algorithms that form the core of most SAT solvers. In Niklas’ opinion the most important algorithms in terms of their impact on the ability of SAT solvers have been those for conflict clause analysis and variable activity tracking. Perhaps an obvious point, but one worth making was that often industrial SAT problems are trivial once you know what part of the graph to explore and hence why algorithms for locating this are quite important. One idea that was reiterated multiple times was that the development of SAT solvers is largely an experimental science and that even though we can give intuitions as to why certain algorithms are useful, in many cases nobody has a clue what the true reason is, not even the algorithms inventors.

SMT Theory and DPLL(T) (Albert Oliveras)

The second talk was by Albert Oliveras and provided an introduction to SMT solving. After a quick discussion on the motivation for SMT solvers Albert gave an overview of both the lazy and eager approaches to SMT solving. The eager approach, embodied in solvers like STP, involves directly encoding the expressions over different theories into a SAT problem and then using a SAT solver on that problem. The lazy approach, found in Z3 among others, has two distinct systems – a SAT solver and one or more theory specific solvers, and proceeds by finding SAT solutions to a propositional skeleton and using theory specific solvers to check for consistency of the returned model in their domains. Albert also provided a good high level summary of how these theory specific solvers share information. The slides from this talk can be found here and are definitely worth a look for an introduction to the topic.

SAT Solvers for Formal Verification (Ed Clarke)

After lunch Ed Clarke continued the introductions with a basic summary of bounded model checking (BMC) and linear temporal logic (LTL). BMC has progressed over the years from initially using basic data structures to explicitly represent states to the use of binary decision diagrams and on to SAT encodings. Its an interesting topic and there are many parallels and cross overs between modern versions of these algorithms (and CEGAR) and symbolic execution based approaches to verification. The second part of Ed’s talk was on current research his students are doing into the use of bounded model on systems that have properties modelled by differential equations. I apparently didn’t pay enough attention in calculus classes because most of this went over my head =) The work is being led by Sicun Gao so if you’re interested his research page is probably helpful.

SMT-LIB Initiative (Cesare Tinelli)

Following this, Cesare Tinelli talked about the SMT-LIB and SMT-COMP initiatives. It was interesting to hear the story behind how it started. One thing of note that Cesare said is that back when this was started it was incredibly hard to tell what algorithms and extensions to SAT solvers were truly useful. This was because of unstandardised reporting and also because the tendency of developers to report on the benchmarks that their tools performed particularly well on. This is relevant I think because we are at a similar stage with symbolic execution and security focused tools. It’s hard to tell what really works as sometimes the tools aren’t released and when they are it can be quite difficult to recreate the authors results. It might be useful for some sort of standardised benchmark/testing suite with a focus on modern problems, especially as people move into other areas like automatic exploit generation.

An interesting discussion broke out near the end on the usefulness of the SMT-LIB standard as an input mechanism for tools given that it is not designed for efficient storage and so writing large problems to disk to read into a solver isn’t feasible for many cases in symbolic execution. The solution here seems to be to simply embed a solver and use its C/OCaml/etc API but that does somewhat nullify the usefulness of the SMT-LIB language as anything more than a standard and teaching tool. It might be interesting to see a version of the language developed specifically with the goal of efficient storage and parsing though.

Constraint Solving Challenges in Dynamic Symbolic Execution (Cristian Cadar)

The final talk of the day was by Christian Cadar on the technologies behind EXE and KLEE, with a focus on the problems of solving the types of constraints generated. Using the example of constraints over arrays, Christian made the point that it can be useful to modify a solvers algorithms for a particular theory for domain specific cases. The graph presented showed a reduction from something near exponential slowdown to linear slowdown by removing the array axioms typically added for every array based operation (and one other thing that escapes me right now!) and thus checking an under-approximation of the original formula.

These tools are essentially mixed symbolic/concrete execution tools with a focus on bug finding, equivalence checking and so forth. KLEE is open source, which is a definite plus and worth checking out. I’d love to hear some feedback from people on its performance as I’ve ran it a few times and haven’t really gotten results that are as useful as fuzzing or manual auditing for bug finding. I think for tools such as this a large set of standardised benchmarks in modern applications could be very useful for gauging progress and focusing research efforts. Apparently Microsoft Research have found this approach very useful in their SAGE framework but it’s hard to tell if it can be used generally when you don’t have a data-center and a dedicated research group.

And that was it! Fun day and tomorrow looks awesome =)

Infiltrate 2011 Slides

The slides for most of the talks from this years Infiltrate have gone online! Among them you can find the slide deck for Attacking the WebKit Heap, which Agustin and I gave.

The talks were awesome and I’d recommend grabbing them all. Halvar’s, titled State Spaces and Exploitation, was one of my favourites. I generally believe that the reason we in industry, as well as university based research groups, sometimes fail to build better tools is because we don’t spend enough time reflecting on the nature of exploitation and bug finding and as a result end up solving the wrong problems. Halvar spent about half of his talk addressing one way to think about exploits, which is programming a ‘weird machine’ that lives inside a program and is unlocked by a bug trigger. It’s an interesting way to look at things and, as he mentioned, similar to how many of us think of exploit development when it comes down to it.

As far as I know, we’ll only be releasing audio for one of the talks, Nico’s keynote on Strategic Surprise, which can be found here. Also worth checking out, educational, funny and just a little bit troll-y … what more can you ask =D

“We don’t care about nulls because this ain’t no strcpy shit” – Ryan Austin, by consensus the best quote of the conference.

Finding Optimal Solutions to Arithmetic Constraints

This post is a follow on to my previous ones on automatically determining variable ranges and on uses for solvers in code auditing sessions. In the first of those posts I showed how we can use the symbolic execution engine of ID to automatically model code and then add extra constraints to determine how it restricts the state space for certain variables. In the second I looked at one use case for manual modelling of code and proving properties about it as part of C++ auditing.

In this post I’m going to talk about a problem that lies between the previous two cases. That is, manually modelling code, but using Python classes provided by ID in a much more natural way than with the SMT-LIB language, and looking for optimal solutions to a problem rather than a single one or all possible solutions.

Consider the following code, produced by HexRays decompiler from an x86 binary. It was used frequently throughout the binary in question to limit the ranges allowed by particular variables. The first task is to verify that it does restrict the ranges of width and height as it is designed to. Its purpose is to ensure that v3 * height is less than 0x7300000 where v3 is derived from width.

 

int __usercall check_ovf(int width, int height,
    int res_struct)
{
  int v3; // ecx@1

  v3 = ((img_width + 31) >> 3) & 0xFFFFFFFC;
  *(_DWORD *)(res_struct + 12) = width;
  *(_DWORD *)(res_struct + 16) = height;
  *(_DWORD *)(res_struct + 20) = v3;
  if ( width <= 0 || height <= 0 ) // 1
  {
    *(_DWORD *)(res_struct + 24) = 0;
    *(_DWORD *)(res_struct + 28) = 0;
  }
  else 
  {
    if ( height * v3 <= 0 || 120586240 / v3 <= height ) // 2
      *(_DWORD *)(res_struct + 24) = 0;
    else
      *(_DWORD *)(res_struct + 24) = malloc_wrapper(res_struct,
                                       120586240 % v3,
                                       height * v3); // 3
    *(_DWORD *)(res_struct + 28) = 1;
  }
  return res_struct;

 

If the above code reaches the line marked as 3 a malloc call will occur with height * v3 as the size argument. Can this overflow? Given the checks at 1 and 2 it’s relatively clear that this cannot occur but for the purposes of later tasks we will model and verify the code.

One of the things that becomes clear when using the SMT-LIB language (even version 2 which is considerably nicer than version 1) is that using it directly is still quite cumbersome. This is why in recent versions of Immunity Debugger we have added wrappers around the CVC3 solver that allow one to build a model of code using Python expressions (credit for this goes to Pablo who did an awesome job). This was one of the things we covered during the recent Master Class at Infiltrate and people found it far easier than using the SMT-LIB language directly.

Essentially, we have Expression objects that represent variables or concrete values and the operators on these expressions (+, -, %, >> etc) are over-ridden so that they make assertions on the solvers state. For example, if x and y are Expression objects then x + y is also an Expression object representing the addition of x and y in the current solver context. Using the assertIt() function of any Expression object then asserts that condition to hold.

With this in mind, we can model the decompiled code in Python as follows:

 

import sys
import time

sys.path.append('C:\\Program Files\\Immunity Inc\\Immunity Debugger\\Libs\\x86smt')

from prettysolver import Expression
from smtlib2exporter import SmtLib2Exporter

def check_sat():
    img_width = Expression("img_width", signed=True)
    img_height = Expression("img_height", signed=True)
    tmp_var = Expression("tmp_var", signed=True)

    const = Expression("const_val")
    (const == 0x7300000).assertIt()
    
    (img_width > 0).assertIt()
    (img_height > 0).assertIt()
    
    tmp_var = ((img_width + 31) >> 3) & 0xfffffffc
    (img_height * tmp_var > 0).assertIt()
    (const / tmp_var > img_height).assertIt()

    expr = (((tmp_var * img_height) &
            0xffffffff000000000) != 0)  # 1
    expr.assertIt()

    s = SmtLib2Exporter()
    s.dump_to_file(expr, 'test.smt2') # 2
    
    # After this we can check with z3 /smt2 /m test.smt2
    # Alternatively we can use expr.isSAT which calls CVC3 but it
    # is a much slower solver

    start_time = time.time()
    if expr.isSAT():
        print 'SAT'
        print expr.getConcreteModel()
    else:
        print 'UNSAT'

    print 'Total run time: %d seconds' % (time.time() - start_time)

if __name__ == '__main__':
    check_sat()

 

The above code (which can be run from the command-line completely independently of Immunity Debugger) models the parts of the decompiled version that we care about. The added condition, marked as 1 checks for integer overflow by performing a 64-bit multiplication and then checking if the upper 32 bits are 0 or not. The first thing to note about this code is that it models the decompiled version quite naturally and is far easier to write and understand than the SMT-LIB alternative. This makes this kind of approach to analysing code much more tractable and means that once you are familiar with the API you can model quite large functions in very little time. For example, asserting that the condition if ( height * v3 <= 0 || 120586240 / v3 <= height ) must be false translates to the following, which is syntactically quite close to the C code:

 

tmp_var = ((img_width + 31) >> 3) & 0xfffffffc
(img_height * tmp_var > 0).assertIt()
(const / tmp_var > img_height).assertIt()

 

Checking if the function does in fact prevent integer overflow is then simple.

Using the solver to check if an overflow is possible on the argument to malloc

So, modulo modelling errors on our behalf, the check is safe and prevents an overflow on the size argument to malloc*. So what now? Well, in the case of this particular code-base an interesting behaviour appeared later in the code if the product of width and height is sufficiently large and the above function succeeded in allocating memory. That is, the height and width were small enough such that height * v3 was less than 0x7300000 but due to multiplication with other non-constants later in the code may then overflow. The question we then want to answer is, what is the maximum value of image * height that can be achieved that also passes the above check?

Solving Optimisation Problems with Universal Quantification**

This problem is essentially one of optimisation. There are many assignments to the input variables that will pass the overflow check but we are interested in those that maximise the resulting product image and height. Naturally this problem can be solved on paper with relative ease for small code fragments but with longer, more complex code this approach quickly becomes an more attractive.

The first thing to note is that at the line marked as 2 in the above Python code we used a useful new feature of ID, the SmtLib2Exporter***, to dump the model constructed in CVC3 out to a file in SMT-LIB v2 syntax. This is useful for two reasons, firstly we can use a solver other than CVC3, e.g. Z3 which is much faster for most problems, and secondly we can manually modify the formula to include things that our Python wrapper currently doesn’t have, such as universal quantification.

Universal quantification, normally denoted by the symbol ∀ and the dual to existential quantification, is used to apply a predicate to all members of a set. e.g. ∀x ∈ N.P(x) states that for all elements x of the natural numbers some predicate P holds. Assume that the conditions of the integer overflow check are embodied in a function called sat_inputs and M is the set of natural numbers module 2^32 then the formula that we want to check is (sat_inputs(x, y) => (∀ a, b ∈ M | sat_inputs(a, b), x * y >= a * b)), that is that we consider x and y to be solutions if x and y satisfy the conditions of sat_inputs implies that the product x * y is greater or equal to the product of any other two values a and b that also satisfy sat_inputs. This property is encoded in the function is_largest in the following SMT-LIB v2 code. The rest of the code is dumped by the previous Python script so checking this extra condition was less than 5 lines of work for us. The details of sat_inputs has been excluded for brevity. It simply encodes the semantics the integer overflow checking code.

 

(declare-funs ((img_width BitVec[32])(img_height BitVec[32])))

(define-fun sat_inputs ((img_width BitVec[32])(img_height BitVec[32])) Bool
    (and
         ; Model of the code goes here
    )
)

(define-fun is_largest ((i BitVec[32])(j BitVec[32])) Bool
    (forall ((a BitVec[32]) (b BitVec[32]))
        (implies (sat_inputs a b)
            (bvsge (bvmul i j) (bvmul a b))
        )
    )
)

(assert (and
    (sat_inputs img_width img_height)
    (is_largest img_width img_height)
    )
)

(check-sat)
(get-info model)
Finding the maximum product of height and width

Running this through Z3 takes 270 seconds (using universal quantification results in a significant increase in the problem size) and we are provided with an assignment to the height and width variables that not only pass the checks in the code but are guaranteed to provide a maximal product. The end result is that with the above two inputs height * width is 0x397fffe0, which is guaranteed to be the maximal product, and height * (((width + 31) >> 3) & 0xfffffffc) is 0x72ffffc, as you would expect, which is less than 0x7300000 and therefore satisfies the conditions imposed by the code. Maximising or minimising other variables or products is similarly trivial, although for such a small code snippet not particularly interesting (Even maximising the product of height and width can be done without a solver in your head pretty easily but instructive examples aren’t meant to be rocket science).

This capability becomes far more interesting on larger or more complex functions and code paths. In such cases the ability to use a solver as a vehicle for precisely exploring the state space of a program can mean the difference between spotting a subtle bug and missing out.

Conclusion
By its nature code auditing is about tracking state spaces. The task is to discover those states implied by the code but not considered by the developer. In the same way that one may look at a painting and discover meaning not intended by the artist, an exploit developer will look at a program and discover a shadow-program, not designed or purposefully created, but in existence nonetheless. In places this shadow-program is thick, it is easily discovered, has many entry points and can be easily leveraged to provide exploit primitives. In other places, this shadow-program clings to the intended program and is barely noticeable. An accidental decrement here, an off-by-one bound there. Easy to miss and perhaps adding few states to the true program. It is from these cases that some of the most entertaining exploits derive. From state spaces that are barren and lacking in easily leveragable primitives. Discovering such gateway states, those that move us from the intended program to its more enjoyable twin, is an exercise in precision. This is why it continues to surprise me that we have such little tool support for truly extending our capacity to deal with massive state spaces in a precise fashion.

Of course we have some very useful features for making a program easier to manually analyse, among them HexRays decompiler and IDA’s various features for annotating and shaping a disassembly, as well as plugin architectures for writing your own tools with Immunity Debugger, IDA and others. What we lack is real, machine driven, assistance in determining the state space of a program and, dually, providing reverse engineers with function and basic block level information on how a given chunk of code effects this state space.

While efforts still need to be made to develop and integrate automation technologies into our workflows I hope this post, and the others, have provided some motivation to build tools that not only let us analyse code but that help us deal with the large state spaces underneath.


* As a side note, Z3 solves these constraints in about half a second. We hope to make it our solving backend pretty soon for obvious reasons.
** True optimisation problems in the domain of satisfiability are different and usually fall under the heading of MaxSAT and OptSAT. The former deals with maximising the number of satisfied clauses while the latter assigns weights to clauses and looks for solutions that minimise or maximise the sum of these weights. We are instead dealing with optimisation within the variables of the problem domain. OptSAT might provide interesting solutions for automatic gadget chaining though and is a fun research area.
*** This will be in the next release which should be out soon. If you want it now just drop me a mail.

Thanks to Rolf Rolles for initially pointing out the usefulness of Z3’s support for universal/existential quantification for similar problems.