Independence Results for the P vs. NP Question (Shai Ben David)
This was a really fun talk that was very much on the theoretical side of logic and satisfiability but with potentially very important implications. Ever since learning about early 20th century work by Hilbert, Godel, Turing etc. on foundations of logical systems and proofs I’ve been fascinated by anything that discusses the universal limitations and capabilities of logical systems. This was the first talk I’ve seen where this kind of purely theoretical work was linked to an implication for solving technologies. The fundamental question approached in the talk was whether P != NP is an irresolvable question give the logics we have available. That is, can we prove that it is unprovable.
I would do it injustice to try and summarise the talk (and I’d get it wrong!) but the main result was that if it were true that P is nearly equal to NP then we would not be able to prove P != NP using current lines of reasoning and tools. The interesting result for SAT solvers is that if this were the case then many of the problems we want to solve may be solvable in almost-polynomial time. The downside is that even if we could prove this the proof probably wouldn’t help at all in building a solver than can exploit this closeness.
HAMPI: A Solver for String Theories (Vijay Ganesh)
Vijay’s talk was on the HAMPI solver. HAMPI contains a theory for bounded (that’s important) character strings that allows it to reason about things like whether a regular expression matches against a particular string or not. From what I gathered, it operates by converting a regular expression into a context-free-grammar and then converting that context-free-grammar, along with any constraints we may wish to check, into a formula over bitvectors and checking the satisfiability of this with STP. The main target application was detecting oversights in regexs designed to catch SQL injection attempts but Vijay also mentioned they got a 2-5x speed-up when using this solver with KLEE on applications that do a lot of string manipulation. KLEE tends to perform quite poorly on things like XML parsers so I’d love to see if specialised solvers can help out here.
Modern SMT Solver Implementation (Leonardo De Moura& Nikolaj Bjorner)
This was a good talk by some of the best guys building SMT solvers, Leonardo De Moura and Nikolaj Bjorner. Both of their publication pages are worth checking out for details on building SMT solvers as well as the theoretical aspects.
They first highlighted some of the core problems in SMT solvers that affect performance: combining engines, unfairness between theory solvers and quantifiers. The most interesting part of the talk for me was on the use of abstraction/relaxing and then refinement when dealing with complex problems. For example, abstracting problems problems using uninterpreted functions and then checking satisfiability may reduce the complexity of the original problem. If it turns out that is UNSAT then the original is UNSAT and if you get a SAT result you can then refine the abstraction if necessary and check again. This idea of abstraction/refinement (CEGAR I guess) loops came up a lot in many different talks.
Also interesting was the mention of their verifying compiler projects that do function level verification and use contracts for called functions in the analysis rather than analysing down into them. I know the idea of contracts is used in HAVOC and discussed extensively in Thomas Ball’s publications but I’m not sure if this was the project they were referring too.
Scalable Testing/Reverse Engineering/Performance Profiling with Parallel and Selective Symbolic Execution (George Candea & Stefan Bucur)
The next talk was on the guys behind S2E and Cloud9. Cloud9 is cool in that it’s a big cluster of nodes each exploring different parts of a tree in symbolic execution. They found run times for gaining a particular code coverage percentile to drop dramatically when going from 1 to 8 nodes and then drop even further as they went up to 48 nodes. The total effect being a drop from 6 hours to minutes for the particular example.
S2E caught my attention a few weeks ago after reading their paper as it is designed to be a platform for writing analysis tools that leverage symbolic execution. To my knowledge it is the first system of this kind that allows a callback/event based mechanism for analysis tools and can target an entire operating system stack (it’s built on QEMU). They have some good documentation as well which is crucial for getting users involved. When I grabbed the code a few weeks back I did notice some dramatic slowdown in execution times even when not doing symbolic execution so that’s an issue that will have to be addressed but this looks like it could be a great project. With the combination of docs and well thought out design I’m hoping for the PIN of symbolic execution tools.
In the later part of their talk they gave some feedback to the SMT developer community with suggestions for improvements. For example, 30% of the time spent within their solver (STP) was spent in memory allocation routines. It’s something I haven’t seen a whole lot written on but the type of work that SMT engines is probably specific enough to require carefully crafted memory allocation algorithms. It’ll be interesting to see what comes of this in the future.
CVC3 and Applications (Clark Barrett)
Clark Barrett has been involved in SMT solver development for probably as long as anyone else and as CVC3 is the solver used internally in Immunity Debugger this talk was of particular interest. Clark mentioned that CVC4 is in development and should be seeing a release sometime this year so that’s good news. We’ve had some issues with CVC3 dealing with large array constraints and as this is being redone it should hopefully fare a bit better.
Unrelated to CVC3 really but one of the comments at the end was kind of striking in that the person said they often found using the theory of linear integer arithmetic with constraints to represent the bounded nature of 32-bit values faster than the theory of bitvectors. I guess that has something to do with their application area and the kinds of constraints if they’re not heavy on bit-level operations but it was something I’ve never thought to do before.
CEGAR+SMT: Formal Verification of Control Logic in the Reveal System (Karem Sakallah)
Karem Sakallah was one of the most entertaining speakers of the day and also presented some interesting ideas behind a verification system based on model checking and Counter Example Guided Abstraction Refinement (CEGAR) that is currently being used to verify real hardware. This was the second talk of the day in which abstraction and refinement using uninterpreted functions were discussed to make difficult problems more tractable (the first being the one by the MSR guys). In this talk Karem also mentioned that naive refinement was not sufficient. So, typically what happens is that when a SAT result turns out to be a false positive a constraint is generated that will block that result from being given again and this is added to the global state. To alleviate this some post-processing is done on the generated formula. They weaken the conditions so that it entails an entire family of states. For example, if the condition was (a == 5 AND b == 6) they weaken it to (a < b). I have no idea how they prevent this weakening from excluding valid states so I’ll need to follow up on that tomorrow =D
A final point made was that throughout their development process they built a number of optimisations but discovering the best combination of these optimisations was a trial/error process. The graph shown for this varied from combinations of optimisations that had no effect at all to some that cut the execution time to minuscule fractions of the base case.