Granular instrumentation with Pin

Of the DBI frameworks I’ve used, Pin has best support for instrumentation beyond the basic block/instruction level. The designers seem to have recognised that not all instrumentation needs to be done at the basic block or instruction level and thus you are provided with ways to instrument events at the routine level, as groups of basic blocks (traces), on thread creation/deletion and on image loading/unloading and more importantly you are provided with convenience functions to access important data when these events occur. I required this level of instrumentation earlier today when I ran into the following problem:

I have a Pin tool that performs data flow analysis at run time by marking data from certain function calls (e.g. read and recv) as tainted and then tracking this data through the program. We can do this relatively easily in Pin by registering a hook on all system calls as follows:

PIN_AddSyscallEntryFunction(syscallEntry, 0);
PIN_AddSyscallExitFunction(syscallExit, 0);

These functions can then access syscall arguments and return values. I noticed an issue earlier though where the number of tainted bytes was far higher than it should have been. The reason for this is that those syscall hooks also catch syscalls that take place while the executable is being loaded. I considered using a counter to denote a certain number of calls to skip over but then Silvio suggested a much cleaner alternative; to disable syscall hooking until the entry point has been executed. This turns out to be incredibly easy using Pin’s image level instrumentation.

We first register a function to be called when an image is loaded:

IMG_AddInstrumentFunction(image, 0);

Then using some functions provided by Pin we can extract the entry point of the main executable, and store it in a global variable entryPoint, as follows:

VOID
image(IMG img, VOID *v)
{
    if (IMG_IsMainExecutable(img))
        entryPoint = IMG_Entry(img);
}

And finally within our instruction level instrumentation function we can simply check the address of the instruction against this entry point value and set a global flag, passedEntryPoint, when the entry point is executed:

VOID
instruction(INS ins, VOID *v)
{
    if (!passedEntryPoint && INS_Address(ins) == entryPoint)
        passedEntryPoint = true;
...

A check on this flag within the syscallExit function, before we mark any data as tainted then allows us to avoid spurious tainting.

VOID 
syscallExit(THREADID tid, CONTEXT *ctx, SYSCALL_STANDARD std, VOID *v)
{
    int bufLen = 0;
    if (readData) {
        bufLen = PIN_GetSyscallReturn(ctx, std);
        if (passedEntryPoint && bufLen > 0) {
            tmgr.createTaintSourceM((unsigned)readBuf, bufLen, readCount++);
            totalBytesRead += bufLen;
        }
    }
}

Blackhat USA paper

I submitted an abstract etc. for a Blackhat talk a few days ago. The title is “Automatic exploit generation for complex programs” and the following is the abstract:

The topic of this presentation is the automatic generation of control flow hijacking exploits. I will explain how we can generate functional exploits that execute shellcode when provided with a known ’bad’ input, such as the crashing input from a fuzzing session, and sample shellcode. The theories presented are derived from software verification and I will explain their relevance to the problem at hand and the benefits of using them compared to approaches based on ad-hoc pattern matching in memory.

The novel aspect of this approach is the combination of techniques from data flow analysis and symbolic execution for the purpose of exploit generation. We track input data as it is passed through a running program and taints other variables; in parallel we also track all constraints and modifications imposed on such data. As a result, we can precisely locate all memory regions influenced by the tainted input. We can then apply a constraint solver to generate an exploit.

This technique is effective in environments where the input data is subjected to complex, low level manipulations that may be difficult and time consuming for a human to unravel. I will demonstrate that this approach can be used in the presence of ASLR, non-executable regions and other protections for which known work-arounds exist.

During the presentation I will show functioning exploits generated by this technique and describe their creation in detail. I will also discuss a number of auxiliary benefits of the tool and possible extensions. These include the ability to denote sections of a given input used in determining the path taken, in memory allocation routines and in length constraints. Possible uses of this information are in generating more reliable versions of known exploits and in guiding a fuzzer.

 
So, in a nutshell I’m using dynamic data flow analysis in combination with path constraint gathering and SAT/SMT solving to generate an input for a program that will result in shellcode execution…. assuming it works 😉 I should know by June 1st if it was accepted or not.

Update: The talk was rejected. Success!… or not.

ISSA Ireland seminar

I got back from the latest ISSA Ireland seminar today. The event was held in Dublin and consisted of a number of talks and a panel discussion. There was an excellent crowd and some really interesting people with conversation varying from program analysis to governmental cyber-security policy.

I gave a presentation titled ‘VoIP Security: Implementation and Protocol Problems‘ which was a relatively high level talk about finding bugs in VoIP applications and deployments. It consisted of an overview on finding vulnerabilities in VoIP stack implementations and auxiliary services and introduced some of the common tools and methods for discovery/enumeration/attacking VoIP deployments.

Hart Rossman, of SAIC, gave an excellent talk which touched on a number of different issues around developing and implementing cyber-defence policies. Aidan Lynch, of Ernst and Young, discussed security issues in deploying VoIP in a corporate environment. The panel discussion focused on securing national infrastructure (or so I’m told because I managed to miss that). And finally there were a number of lightning talks; of particular interest was one on the application security process in Dell which introduced me to the concept of threat modelling and Microsofts TAM tool. (There is an MSDN blog here which contains a lot of good information on the topic in general)

It was an educational day all round and I’d like to thank the organisers for inviting me to present and being such excellent hosts.