A friend of mine is working on a project that involves building a metamorphic engine for x86 assembly. One of the core parts of such a project is a context free grammar describing valid mutations for a given instruction. To use one of his examples, an instruction to put `v` in `r` can be modelled in a BNF style grammar as follows:

put_v_in_r(v1,r1) := pushl_v(v1) popl_r(r1) | movl_v_r(v1,r1)

This could be quite useful in the process of automatic exploit generation. I expect the user to provide shellcode expressing what they want the exploit to do, but what if that shellcode is constrained by conditionals in the program? e.g. the shellcode is `\x41\x42\x43\x44` but there is a conditional in the program that causes it to abort if the byte `\x41` is detected in the input stream.

One solution in this situation, is to simply ask the user for different shellcode, but in the presence of complex modifications and conditions it quickly becomes impossible to provide any kind of meaningful feedback on what they should do to pass the filters. How about an automated solution? We have the conditions in propositional form and, due to the other project mentioned, we also have access to a grammar of potential shellcode mutations. If we can combine the two into a new propositional formula, describing the effect of mutating the shellcode using this grammar and the conditions from the program, we might be able to use the power of SMT/SAT solvers to brute force a solution! If a solution is found then we can preserve the effects of the original shellcode, as well as passing the filters.

The main difficulty in implementing this solution comes from encoding the grammar in propositional form. The typical solution to encoding a transition relation as a propositional formula involves iteratively applying the transition relation to existing states until a fixed point is reached. That is, starting with our original shellcode we would expand the first non-terminal to one or more potential shellcodes. We then expand these in the same fashion, and so on until no further new states are added; we then have the set of all potential shellcodes given the starting shellcode and the provided grammar. Obviously, there can be problems storing the states if the number grows exponentially on each expansion, but this is a common verification problem and there are a number of approaches to resolve it, varying from “hope it doesn’t happen” to using a compact storage mechanism like binary decision diagrams. We can then extract the states that contain no non-terminals, i.e. they are a valid x86 instruction stream, conjoin this with the path condition and ask our SAT/SMT solver to enumerate the shellcodes that satisfy these conditions.

There is a bit of a snag with this fixed point approach though. What if our grammar contains an expansion ` A ->* aB `, which states that A expands to, some terminal, `a` and, some non-terminal, B in one or more steps, and also the rule `B ->* A`? In this case we are no longer guaranteed to reach a fixed point, as we can enter an infinite loop. The obvious solution is to transform our grammar into one that doesn’t contain such rules. We can do this by setting an upper bound, `k`, on the number of such recursive iterations allowed and including these expansions as new non-terminals. This new grammar has lost some of the potential outcomes of the original but it can now be processed using our fixed point approach. It should be noted that this upper bound on the number of self-recursions isn’t quite as artificial or as limiting as it may seem. There will always be some upper bound on the size of the buffer used for shellcode and thus it may be practical, in some cases, to use quite a low value of `k`.

*(In fact, for a real world implementation it is well worth considering how to involve the length of the buffer we intend to use as we perform the fixed point computation. There is no reliable way to favour shorter shellcodes that also satisfy the program constraints, but one optimisation would be to exclude any shellcodes that have already exceeded the length limit from future expansions.)*