Some Early-Stage Work on Statistical Crash Triage

Last week at Infiltrate I presented some early-stage work on crash triage under the title “Automated Root Cause Identification for Crashing Executions“. The slides can be found here, and the presentation notes contain further details in draft form.

A better title for this work would probably have been “Statistical Crash Triage” or possibly something involving the term “predicate synthesis“, as really those things are at the core of what was presented. The main algorithm can be summarised fairly quickly as follows:

  1. Run the target application on all the good inputs and all the bad inputs. Record coverage information.
  2. Use the coverage information to predict what functions are likely relevant to the crash.
  3. Rewrite those functions to insert instrumentation which will record all computed values and other properties.
  4. Rerun the instrumented version of the target application on all the good and bad inputs.
  5. From the recorded state, and predicate templates, synthesise and evaluate predicates over the program’s variables.
  6. Apply a lightweight statistical analysis to the evaluations of these predicates to shake out those relevant to the crash from those that are not.

The actual details are in the slides so I won’t go into that any further. The advantages of this approach are that it isn’t tuned towards one bug type, or class of applications, and simply relies on large amounts of input data and lightweight analyses to discover ‘interesting’ things. Alongside that, the analyses at each stage are parallelisable and so it’s quite simple to scale up. On a reasonably sized target (PHP) the approach is fast enough to fit into a normal workflow.

The most significant downside is that the generality means that if a bug is best described by a very complicated/precise predicate it may not be discovered (although more simple predicates may be, which can help a user manually discover the precise predicate more easily). In terms of the work as a whole there is also a weakness in the limited evaluation so far, but this is something I’m currently in the process of alleviating. In particular I’m working on testing with a wider variety of memory management issues as those pose some unique challenges for the algorithm above.

Besides the analysis algorithm itself, I also want to quickly mention CrashCorpus*. This is a corpus I’m currently working on for the evaluation of crash triage (and related) tools. A significant issue with the academic literature is that the algorithms tend to be tested almost exclusively on toy programs or synthetic bugs. The goal of CrashCorpus is to eliminate that by providing easy to build, easy to run, real-world targets. I’ll have more info on this once I get closer to a public release.

* I’m still building towards a public release of the corpus as it needs more targets before it is truly useful, but if you’d like to contribute drop me an email (sean _at_ All I require is a set of crashing inputs, a set of non-crashing inputs and the source code for the target. I’ll take care of the rest.


6 thoughts on “Some Early-Stage Work on Statistical Crash Triage

  1. This seems really interesting, but I cannot access the slides; according to Google Docs there is some issue with cookes, but it doesn’t tell me what it is (I do allow cookies from them). It would be great if you could host the slides somewhere else, maybe as a PDF?

    • Weird. Not sure what the issue there is. Anyway, a PDF version is here. This version is missing the speaker notes though, so unfortunately you’ll be missing context/explanation on a few things. Might be worth trying a different browser.

  2. Hi, it is always good to see posts from you. I yet to read your slides, but I have one early question. It seems that you require source code for instrumentation. Why is it a necessary condition? Can’t we use any DBI framework? I am thinking about the applicability of this tool for arbitrary software as I am building a fuzzer that will give crashing inputs (and I assume that we have some valid inputs already). But I don’t have source code as I am only focusing on binaries.

    • Hi Sanjay,

      I’m sure something similar is indeed possible for situations in which one does not have access to the source. It does pose some unique challenges however, and will require modifications to the algorithm I discuss in the slides. For example, with binary software you are likely going to be dealing with far more reads/writes which need to be recorded and which will add noise. You are also going to be missing expressions that occur at the source level and are useful for forming predicates.

      Regarding using DBI: That is one option, but it may run into performance issues when it comes to the stage where every instruction in a subset of the executed functions needs to be instrumented. With ‘normal’ DBI this is going to require you to check before each basic block is executed if it is part of the set to be observed or not. That doesn’t seem optimal and a solution based on binary rewriting to insert the required instrumentation seems like it would be better, if more complex to implement.

      As I said, I have yet to try to apply this to binary software however and thus the above is speculation on my part. If you do, and get either positive or negative results, let me know!

    • Oh, and on the topic of good inputs: You want to use not only the good inputs you start with but also the good inputs your fuzzer produces as it fuzzes, which are “similar” to the crashes. The reason for this is that you ideally want to sample on good runs as many of the executed components on the bad runs as you can, in order to give you the data from which to determine the components that are relevant to the crash. If you don’t sample a component during any of the good runs, but you do on the bad runs, then you will have no data from which to conclude it is relevant to the crash or not and thus will simply have to conclude that it is relevant.

Comments are closed.