Last week I spoke at REcon on the above topic. The slides can be found here. The abstract for this talk is as follows:
As reverse engineers and exploit writers we spend much of our time trying to illuminate the relationships between input data, executed paths and the values we see in memory/registers at a later point. This work can often be tedious, especially in the presence of extensive arithmetic/logical modification of input data and complex conditions.
Using recent (and not so recent) advances in run-time instrumentation we can go a long way towards automating the process of tracking input data and its effects on execution. In this talk we will discuss possible approaches to solving this problem through taint analysis. The solutions we will discuss are useful in many scenarios e.g. determining the set of conditional jumps under our control, discovering buffers in memory that are useful for injecting shellcode, tracking parameters to potentially insecure function calls, discovering ‘bad bytes’ for exploits and so on.
Building on this we will delve into the construction of logical formulae expressing the relationships between input and data in memory and ways in which these formulae can be manipulated and solved for interesting results. Depending on how we manipulate the initial formulae we can use theorem provers to automatically solve many problems e.g. ‘unraveling’ arithmetic/logical modifications on input, generating inputs that trigger specific paths, discovering the bounds on given variables and so forth.
I will update this post whenever the REcon videos etc go online. In the meantime I should probably finish at least one of the 5 or so blog posts in a semi-complete state 😛