At WOOT’12 a paper co-written by Julien Vanegue, Rolf Rolles and I will be presented under the title “SMT Solvers for Sofware Security”. An up-to-date version can be found in the Articles/Presentation section of this site.

In short, the message of this paper is *“SMT solvers are well capable of handling decision problems from security properties. However, specific problem domains usually require domain specific modeling approaches. Important limitations, challenges, and research opportunities remain in developing appropriate models for the three areas we discuss – vulnerability discovery, exploit development, and bypassing of copy protection”*. The motivation for writing this paper is to discuss these limitations, why they exist, and hopefully encourage more work on the modeling and constraint generation sides of these problems.

A quick review of the publication lists from major academic conferences focused on software security will show a massive number of papers discussing solutions based on SMT technology. There is good reason for this 1) SMT-backed approaches such as symbolic/concolic execution have proved powerful tools on certain problems and 2) There are an increasing number of freely available frameworks.

The primary domain where SMT solvers have shone, in my opinion, is in the discovery of bugs related to unsafe integer arithmetic using symbolic/concolic execution. There’s a fairly obvious reason why this is the case; the quantifier free, fixed size, bitvector logic supported by SMT solvers provides direct support for the precise representation of arithmetic at the assembly level. In other words, one does not have to do an excessive amount of work when modeling the semantics of a program to produce a representation suitable for the detection of unsafe arithmetic. It suffices to perform a near direct translation from the executed instructions to the primitives provided by SMT solvers.

The exploit generation part of the paper deals with what happens when one takes the technology for solving the above problem and applies it to a new problem domain. In particular, a new domain in which the model produced simply by tracking transformations and constraints on input data no longer contains enough data to inform a solution. For example, in the case of exploit generation, models that do not account for things like the relationship between user input and memory layout. Obviously enough, when reasoning about a formula produced from such a model a solver cannot account for information not present. Thus, no amount of computational capacity or solver improvement can produce an effective solution.

SMT solvers are powerful tools and symbolic/concolic execution can be an effective technique. However, one thing I’ve learned over the past few years is that they don’t remove the obligation and effort required to accurately model the problem you’re trying to solve. You can throw generic symbolic execution frameworks at a problem but if you’re interested in anything more complex than low level arithmetic relationships you’ve got work to do!

Hi Sean, recently a read your paper, I’m particularly interested in vulnerability checking (second section) using SMT. I made some notes that I would like to share with you (and get some insight, if you have time). (i) How those variables inside the loop were selected? (ii) We can’t create a finite automaton using the code as it is (with the vulnerability). Then, we cannot build a logic invariant. If we don’t have an invariant we cannot use SMT solvers to determine if the code is vulnerabile or not.

I think the application showed inside the section 2 just proof that the invariant can represent the code. But fails to verify if the code is vulnerable.

Hey Marco,

So section 2 was written by Julien so I think he would be the best person to provide answers to your questions. I’ll pass him on your contact details.

Hello Marco,

I hope this message finds you well.

Section 2 has two invariants. One accurately describes the vulnerability (and can be proved), the other not. The invariant that doesnt work show how difficult it is to make the invariant more abstract, since just a little introduced abstraction already make the invariant unprovable (as explained in the paper, due to the introduction of spurious “symbolic” states that are infeasible in the loop, in which case the proof by induction fails on some newly introduced transitions on the automaton).

The reason to want to make the invariant more abstract is to simplify it (for example, use less variables) and attempt at generating the invariant automatically by using a pre-defined grammar on a small set of variables instead of needing an expert (the developer, usually) to manually write the invariant for the checker. You can read more on this precise example in another presentation: “Modern static security checking of C/C+ programs” that gives more context.

Let me know if you need any further explanation.

@jvanegue