Of the DBI frameworks I’ve used, Pin has best support for instrumentation beyond the basic block/instruction level. The designers seem to have recognised that not all instrumentation needs to be done at the basic block or instruction level and thus you are provided with ways to instrument events at the routine level, as groups of basic blocks (traces), on thread creation/deletion and on image loading/unloading and more importantly you are provided with convenience functions to access important data when these events occur. I required this level of instrumentation earlier today when I ran into the following problem:
I have a Pin tool that performs data flow analysis at run time by marking data from certain function calls (e.g. read and recv) as tainted and then tracking this data through the program. We can do this relatively easily in Pin by registering a hook on all system calls as follows:
PIN_AddSyscallEntryFunction(syscallEntry, 0); PIN_AddSyscallExitFunction(syscallExit, 0);
These functions can then access syscall arguments and return values. I noticed an issue earlier though where the number of tainted bytes was far higher than it should have been. The reason for this is that those syscall hooks also catch syscalls that take place while the executable is being loaded. I considered using a counter to denote a certain number of calls to skip over but then Silvio suggested a much cleaner alternative; to disable syscall hooking until the entry point has been executed. This turns out to be incredibly easy using Pin’s image level instrumentation.
We first register a function to be called when an image is loaded:
IMG_AddInstrumentFunction(image, 0);
Then using some functions provided by Pin we can extract the entry point of the main executable, and store it in a global variable entryPoint, as follows:
VOID image(IMG img, VOID *v) { if (IMG_IsMainExecutable(img)) entryPoint = IMG_Entry(img); }
And finally within our instruction level instrumentation function we can simply check the address of the instruction against this entry point value and set a global flag, passedEntryPoint, when the entry point is executed:
VOID instruction(INS ins, VOID *v) { if (!passedEntryPoint && INS_Address(ins) == entryPoint) passedEntryPoint = true; ...
A check on this flag within the syscallExit function, before we mark any data as tainted then allows us to avoid spurious tainting.
VOID syscallExit(THREADID tid, CONTEXT *ctx, SYSCALL_STANDARD std, VOID *v) { int bufLen = 0; if (readData) { bufLen = PIN_GetSyscallReturn(ctx, std); if (passedEntryPoint && bufLen > 0) { tmgr.createTaintSourceM((unsigned)readBuf, bufLen, readCount++); totalBytesRead += bufLen; } } }
So i was playing around with this and i was wondering how you know when the program exits? So i can get an more exact output of addresses
Look at the docs for PIN_AddFiniFunction and PIN_AddThreadFiniFunction http://www.pintool.org/docs/45467/Pin/html/
cool thanks