Pin problem solved!

After some discussion on the Pin mailing list, I came to the conclusion that the best approach to my previous issue with Pin is simply to check the integrity of all values that get put in the EIP register from a potentially tainted source. This is arguably what I should have been doing anyway to detect the potential for an exploit. This means instrumenting ret, call, loop, jmp and its conditional variants, and checking the integrity of the value being used. In this context, this means checking that the value isn’t tainted by user input, and it is a fairly trivial task. For example, this code is called from my instruction instrumentation routine, registered with Pin:

VOID
processRET(INS ins)
{
    INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR(checkRETIntegrity),
            IARG_REG_VALUE, LEVEL_BASE::REG_ESP,
            IARG_CONTEXT,
            IARG_INST_PTR,
            IARG_END);
}

Which results in the following function being called before any ret instructions are executed, and passed the value currently stored in the ESP register:

(tmgr is a custom object that maintains taint propagation information)

VOID
checkRETIntegrity(ADDRINT espVal, CONTEXT *ctx, ADDRINT pc)
{
    if (tmgr.isMemLocTainted(espVal, DWORD_SIZE)) {
        postRunAnalysis(true, ctx, TAINTED_RET);
        exit(0);
    }
}

Using this same technique I can also check for corruption of the stored frame pointer. I should also extend these checks to read/writes from and to memory. The problem gets a little more complex here as it is quite common for user input to taint the offset of a read/write. The only practical approach that springs to mind is to check the location being read/written is mapped; as attempting to detect if the tainted index is within a certain bound involves solving a whole other array of associated problems.

The romance is over…

..with Pin that is. Pin has been my DBI framework of choice for all of 3 weeks now and I haven’t had a single problem, until today. It would appear that smashing the stored EIP causes all sorts of problems for the Pin analysis code. What appears to happen, is that at the return from the function, Pin assumes the stored EIP is a valid instruction and it passes it to its analysis engine. If the smashed EIP points to the heap for example, Pin will then start to disassemble the data located there in an attempt to find the end of the basic block. If then keeps progressing through memory until it is eventually killed by the operating system (presumably for trying to disassemble an unreadable address).

The following code illustrates the issue:

#include <stdlib.h>

void smashySmashy(char *heapArr)
{
    asm("movl %0, 4(%%ebp)\n\t" 
        :   
        : "D"(heapArr));
}

int main(int argc, char *argv[])
{
    char *heapArr = NULL; 
    heapArr = malloc(256*sizeof(char));
    
    smashySmashy(heapArr);

    return 0;
}

Running the above code with any of the Pin instrumentation tools will result in the Pin process eventually aborting.

This is quite the showstopper for me, in terms of my usage of Pin, as functions like this are exactly the kind I need to analyse for exploit generation. I’ll update this post when I get a response from the developers.

Not all shellcode locations are made equal

When automatically generating an exploit, one of the core tasks is to find a hospitable buffer that we can store our shellcode in so that at the vulnerability point we have somewhere to jump to. Some tools (like Byakugen) do this by scanning processes memory and searching for sequences of bytes from the input (or slight variations thereof, such as the upper/lower case conversion said input). There are a number of problems with this, first of all you can easily miss perfectly usable buffers. If all user input is passed through a function that adds 4 to every byte then this will be missed, but it is still a usable shellcode buffer. The second major problem is that if you do find a suitable buffer this way you have no guarantee that the values you’re changing weren’t used in a path condition on the way to the vulnerability point. Replacing this input with your shellcode could result in the vulnerability not being triggered.

I’m addressing the first problem by using dynamic data flow analysis to track all user input as it moves through registers and process memory, and the second problem by noting all path conditions that operate on such ‘tainted’ data. Already we can see a division between potential shellcode buffers. There are bytes that are used in path conditions, there are those that are not and in both cases there are those that are tainted by simple assignment instructions like mov, versus those that are tainted as a result of arithmetic operations like add.

In the case of a buffer consisting only of bytes that are not used in path conditions and are tainted via simple assignment then we can simply swap out those bytes for our shellcode, but what about the other cases? The approach I am taking is to record all tainting operations on a byte, as well as all path conditions and as a result, at a given program location we can express the potential values of a given byte via an algebraic formula. For example, assume in the following eax is tainted by the 10th byte of user input:

mov ecx, eax
add ecx, 4
mov [1234], ecx

At the end of this block we can represent the constraints on location ‘1234’ by the equation, t1 := in10 & t2 := t1 + 4 & mem1234 := t2. If we then want to discover if there exists an assignment to the 10th input byte that would result in location ‘1234’ being equal to 0x41, we can include this additional constraint in the previous formula to give, t1 := in10 & t2 := t1 + 4 & mem1234 := t2 & mem1234 == 0x41. A SAT/SMT solver will take this and return us an assignment to the literal ‘in10’ that will satisfy the formula e.g. 0x3E.

Now obviously real world constraint formulae will be much more complex than this, they could have hundreds of thousands of literals and conditions. A modern SAT/SMT solver will be able to handle such an input, but it could take a number of hours depending on the complexity. Because of this, it would be an advantage to be able to select those buffers that have the least complex constraint formulae (preferably one no more complex than a sequence of assignments, in which case a solver will not be needed). The basic classification scheme I’m using is represented by the following lattice:

Constraint complexity lattice
Constraint complexity lattice

Formulae are associated with a single point in the lattice, with the more complex formulae towards the top and the less complex towards the bottom. We can classify based on the complexity of arithmetic in the formula, as well as whether it contains a path condition, like a jump conditioned on user input, or not.

When an object is created, representing the constraints on a given byte, it starts at ‘DIRECT COPY’ and its position in the lattice is updated as operations are performed on it. A byte can only move up in the lattice (i.e. there is only a join function, no meet), with the lattice position of a byte resulting from the interaction of two tainted bytes being the join of the those two source bytes. The join function is defined as the greatest upper bound of two inputs. We can then use this lattice to represent the constraint complexity of a single byte, or a sequence of bytes by taking the join of all bytes in the sequence. Currently, I have classified a path constrained direct copy and non-linear arithmetic as being of equivalent complexity but, with the arithmetic support in modern SMT solvers, it might turn out that a formula made up entirely of arithmetic operations, linear or non-linear, is easier to solve than one containing predicates that must be satisfied.

The advantage of the above scheme is that we now have a way to classify potential shellcode buffers by the amount of effort it will be to use them as such. By doing this we can potentially avoid calling a SAT/SMT solver altogether, and in cases where we must we can select the easier formulae to solve.

Difficulties in taint data propagation without an IR

I’m currently building a tool that propagates taint information through a program as it executes. Essentially this consists of marking the data read by certain syscalls (e.g. read and recv) as ‘tainted’, or user controllable, followed by tracking the flow of this data through the program. The ‘propagation’ occurs when tainted data is used as a source in an instruction, thus tainting the result. For example, if the data at address 0x100 is tainted and the instruction mov ebx, [0x100] is executed then ebx is now tainted.

To track all possible taint propagation is a fairly involved task. The mov instruction for instance has 4 potential configurations (memory to memory, register to memory etc.) and many instructions (arithmetic ones for instance), do more than simply copying memory. Considering that most binaries from the /bin directory on Linux have between 70-100 unique instructions, and many of these instructions have unique semantics in terms of how they spread the data from source operands into the destination operands, modelling these effects for each instruction would seem to be a fairly involved task.

It’s also a task I’m having to deal with at the moment due to my choice of dynamic binary analysis (DBI) framework. I am using Pin, a free but closed source DBI. One of Pin’s primary goals is speed and thus it can’t afford something like converting each basic block to an intermediary representation (IR) (like Valgrind does). This means we are left with convenience functions to access operators and operands and are required to perform a case by case analysis on each instruction. Luckily, the functions provided by Pin are quite well thought out. Let us consider how to process an instruction like mov dword ptr ds[esi+edx*4+0x2a0], ecx, where ecx is tainted. First of all, we must register a function with Pin to instrument each instruction. This is about as simple as you’d expect.

INS_AddInstrumentFunction(instruction, 0);

This function will then be passed each instruction and will have to decide the correct way to handle it.

VOID
instruction(INS ins, VOID *v)
{
    UINT32 cat = INS_Category(ins);

    switch (cat) {
        case XED_CATEGORY_DATAXFER:
            switch (INS_Opcode(ins)) {
                case XED_ICLASS_XCHG:
                    processXCHG(ins);
                    break;
                case XED_ICLASS_MOV:
                    processMov(ins);
                    break;
etc etc

As you can see, we can work at two levels of granularity. The instruction categories groups together operators with similar emantics e.g there are seperate categories for string operations like rep movsb, arithmetic operations, interrupts etc. The operators in certain categories all have the same taint semantics and we need not go any deeper, but others contain instructions that must be handled on a case by case basis. DATAXFER for instance, contains xchg, mov and bswap, among others, all of which require special attention.

In our example we will end up in the processMov function. The purpose of this function is basically to figure out which of the 4 mov cases we are considering. Pin provides the INS_IsMemoryRead/Write functions, and others, that allow us to figure this out. Quite usefully, it also provides a mechanism to determine the exact address written/read and the number of bytes written/read at run time. This saves us having to calculate the effective address ourselves. For our example we would use these functions to determine that we are writing to memory, from a register, and then use the following call to insert a run time hook to our taint propagation function:

INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR(simMov_RM),
    IARG_MEMORYWRITE_EA,
    IARG_MEMORYWRITE_SIZE,
    IARG_UINT32, INS_RegR(ins, INS_MaxNumRRegs(ins)-1),
    IARG_END);

You can see here the use of IARG_MEMORYWRITE_EA to get the effective address of the write and the use of INS_RegR to get the last read register (ecx in this case). At runtime this will result in the simMov_RM (simulate mov register to memory) function being called with the specified arguments. It is this function that will finally do the taint propagation.

As you can see there are many cases that must be considered, even for a seemingly trivial operator like mov. Even the above explanation doesn’t deal with the case where a register used in calculating an effective address is itself tainted and requires more code to process correctly. A potential solution, suggested by Silvio might be to do a case split in some cases and then have a default case for instructions not explicitly handled. A solution like this, that over-approximates the tainted data, could lead to extensive false positives though and is not really suitable for the kind of work I’m doing. This leaves one other obvious option for a ‘default case’, and that is to untaint all destination operands in instructions I do not handle. This will lead to an under-approximation of the amount of tainted data, until I encode the semantics of all instructions, but will eliminate false positives and allow me to work on other parts of the algorithm once I have encoded the most common instructions.

Granular instrumentation with Pin

Of the DBI frameworks I’ve used, Pin has best support for instrumentation beyond the basic block/instruction level. The designers seem to have recognised that not all instrumentation needs to be done at the basic block or instruction level and thus you are provided with ways to instrument events at the routine level, as groups of basic blocks (traces), on thread creation/deletion and on image loading/unloading and more importantly you are provided with convenience functions to access important data when these events occur. I required this level of instrumentation earlier today when I ran into the following problem:

I have a Pin tool that performs data flow analysis at run time by marking data from certain function calls (e.g. read and recv) as tainted and then tracking this data through the program. We can do this relatively easily in Pin by registering a hook on all system calls as follows:

PIN_AddSyscallEntryFunction(syscallEntry, 0);
PIN_AddSyscallExitFunction(syscallExit, 0);

These functions can then access syscall arguments and return values. I noticed an issue earlier though where the number of tainted bytes was far higher than it should have been. The reason for this is that those syscall hooks also catch syscalls that take place while the executable is being loaded. I considered using a counter to denote a certain number of calls to skip over but then Silvio suggested a much cleaner alternative; to disable syscall hooking until the entry point has been executed. This turns out to be incredibly easy using Pin’s image level instrumentation.

We first register a function to be called when an image is loaded:

IMG_AddInstrumentFunction(image, 0);

Then using some functions provided by Pin we can extract the entry point of the main executable, and store it in a global variable entryPoint, as follows:

VOID
image(IMG img, VOID *v)
{
    if (IMG_IsMainExecutable(img))
        entryPoint = IMG_Entry(img);
}

And finally within our instruction level instrumentation function we can simply check the address of the instruction against this entry point value and set a global flag, passedEntryPoint, when the entry point is executed:

VOID
instruction(INS ins, VOID *v)
{
    if (!passedEntryPoint && INS_Address(ins) == entryPoint)
        passedEntryPoint = true;
...

A check on this flag within the syscallExit function, before we mark any data as tainted then allows us to avoid spurious tainting.

VOID 
syscallExit(THREADID tid, CONTEXT *ctx, SYSCALL_STANDARD std, VOID *v)
{
    int bufLen = 0;
    if (readData) {
        bufLen = PIN_GetSyscallReturn(ctx, std);
        if (passedEntryPoint && bufLen > 0) {
            tmgr.createTaintSourceM((unsigned)readBuf, bufLen, readCount++);
            totalBytesRead += bufLen;
        }
    }
}

Blackhat USA paper

I submitted an abstract etc. for a Blackhat talk a few days ago. The title is “Automatic exploit generation for complex programs” and the following is the abstract:

The topic of this presentation is the automatic generation of control flow hijacking exploits. I will explain how we can generate functional exploits that execute shellcode when provided with a known ’bad’ input, such as the crashing input from a fuzzing session, and sample shellcode. The theories presented are derived from software verification and I will explain their relevance to the problem at hand and the benefits of using them compared to approaches based on ad-hoc pattern matching in memory.

The novel aspect of this approach is the combination of techniques from data flow analysis and symbolic execution for the purpose of exploit generation. We track input data as it is passed through a running program and taints other variables; in parallel we also track all constraints and modifications imposed on such data. As a result, we can precisely locate all memory regions influenced by the tainted input. We can then apply a constraint solver to generate an exploit.

This technique is effective in environments where the input data is subjected to complex, low level manipulations that may be difficult and time consuming for a human to unravel. I will demonstrate that this approach can be used in the presence of ASLR, non-executable regions and other protections for which known work-arounds exist.

During the presentation I will show functioning exploits generated by this technique and describe their creation in detail. I will also discuss a number of auxiliary benefits of the tool and possible extensions. These include the ability to denote sections of a given input used in determining the path taken, in memory allocation routines and in length constraints. Possible uses of this information are in generating more reliable versions of known exploits and in guiding a fuzzer.

 
So, in a nutshell I’m using dynamic data flow analysis in combination with path constraint gathering and SAT/SMT solving to generate an input for a program that will result in shellcode execution…. assuming it works 😉 I should know by June 1st if it was accepted or not.

Update: The talk was rejected. Success!… or not.

ISSA Ireland seminar

I got back from the latest ISSA Ireland seminar today. The event was held in Dublin and consisted of a number of talks and a panel discussion. There was an excellent crowd and some really interesting people with conversation varying from program analysis to governmental cyber-security policy.

I gave a presentation titled ‘VoIP Security: Implementation and Protocol Problems‘ which was a relatively high level talk about finding bugs in VoIP applications and deployments. It consisted of an overview on finding vulnerabilities in VoIP stack implementations and auxiliary services and introduced some of the common tools and methods for discovery/enumeration/attacking VoIP deployments.

Hart Rossman, of SAIC, gave an excellent talk which touched on a number of different issues around developing and implementing cyber-defence policies. Aidan Lynch, of Ernst and Young, discussed security issues in deploying VoIP in a corporate environment. The panel discussion focused on securing national infrastructure (or so I’m told because I managed to miss that). And finally there were a number of lightning talks; of particular interest was one on the application security process in Dell which introduced me to the concept of threat modelling and Microsofts TAM tool. (There is an MSDN blog here which contains a lot of good information on the topic in general)

It was an educational day all round and I’d like to thank the organisers for inviting me to present and being such excellent hosts.