Fuzzing PHP’s unserialize Function

Recently, the PHP development team have decided that they will no longer consider bugs in the implementation of the unserialize function to be security relevant. In this post I’d like to outline why I think this is a bad idea, and provide an easy set up for fuzzing/ongoing QA of unserialize, as the fact that it is a bottomless pit of bugs seems to be part of the motivation for this move.

The argument for the change in categorisation is twofold. Firstly, they state that unserialize is inherently an unsafe function to use on data from an untrusted source. Essentially, they are saying that there is no point in treating bugs in the implementation of unserialize as security-relevant, because the mere fact that an application passes untrusted data to it means that the security of that application has already been compromised. Secondly, they argue that treating unserialize bugs as security relevant encourages developers to use it on untrusted data. In my opinion this is quite a weak point. It is prominently documented that unserialize is unsafe, and if that warning doesn’t get the message across then I very much doubt that a change in categorisation in the PHP bug tracker will.

Lets briefly discuss the first point in a little more detail, as the reality is a bit more nuanced. Often, it is relatively easy to leverage an unserialize of untrusted data without resorting to attacking the unserialize implementation itself. There are some requirements for this though. In particular, the attacker must be able to find a class containing a PHP magic method in the target application or one of the libraries that it makes use of, e.g. a __destruct function. Also, if the allowed_classes argument to unserialize is provided then the class containing the magic method must be included in this list. These conditions don’t always hold. If the application is closed source then the attacker will have no direct way to figure out the name of such classes, and if allowed_classes is the empty list then they won’t be able to make use of them anyway.

The above are not merely hypothetical scenarios.  This rather nice write-up documents the exploitation of a vulnerability in PornHub which required the attackers to target the unserialize implementation. As mentioned in the write-up, the fact that the application was closed source prevented them from discovering a class which could be used for object injection.

I think most people will agree with the PHP development team that if an application contains a call to unserialize with untrusted data then it is a bad idea and should be fixed. However, it is also true that treating unserialize bugs like normal bugs unnecessarily exposes PHP application developers, and the users of their applications, to further risk.

The Fun Part

One reason the PHP development team are drowning in unserialize bugs appears to be that they either don’t fuzz the function at all after making changes, or that their mechanism for fuzzing it is ineffective. I fully recognize that the functionality that unserialize implements is complex and that adding to it, or modifying the existing code, is always going to come with the risk of introducing bugs. However, I also believe that with a small of effort it should be possible to significantly decrease the number of low hanging fruit that make it to release builds.

Anyway, in the interests of putting my shell scripts where my mouth is, I’ve uploaded a repository of scripts and auxiliary files for building PHP and fuzzing unserialize via AFL. There’s no secret sauce. If anything that’s kind of the point; ongoing sanity checking of the existing unserialize implementation and changes to it could be trivially automated.

The README file explains how to get everything up and running, but in short it should just be a matter of running ./get.sh && ./build.sh && ./fuzz.sh output N.  A GNU screen session will be running in the background containing the output of an AFL master instance and N slaves. which you can attach to via screen -r fuzz.

The scripts are straightforward, but there are a few things worth noting for the curious:

  • The driver PHP script loads a string representing the data to be unserialized from a file and passes it directly to unserialize. This should be sufficient to find a significant portion of the bugs that I know of that have previously been fixed in unserialize, e.g. things like this. However, if some extra PHP code is required to trigger the bug then it will not be found. e.g. things like this. It should be relatively easy to extend the fuzzing setup with ideas from other blog posts on the topic, e.g. there are some good ideas in this post by the people who wrote the PornHub exploit mentioned above.
  • One can either build all of PHP with AFL’s instrumentation, or just the components that one thinks will be relevant to the unserialize functionality. The advantage of the former is you’re guaranteed that AFL can intelligently explore all functionality which may be triggered via unserialize, with the disadvantage that it is slower. The build.sh script currently compiles everything with AFL’s instrumentation. If you would like a faster binary you can compile PHP without AFL/ASAN, delete the object files that relate to unserialize, reconfigure the build to use AFL/ASAN and then recompile those object files. I’ve used this is the past and it works quite well but you need to make sure you recompile all the object files which relate to code that unserialize might trigger, otherwise AFL won’t be able to track coverage there.
  • AFL can be provided with a dictionary of tokens which it will use when fuzzing. The dictionary in the repository contains tokens pulled from the unserialize parser and I think it is fairly complete, but if I’ve missed anything let me know and I’ll be sure to include it.
  • The seed files one provides to AFL for fuzzing can have a significant impact on it’s ability to find bugs, and how quickly one finds a particular bug. The seed files I have provided have been constructed from examples of valid unserialize usage I’ve found in the PHP documentation and triggers for previous vulnerabilities. Feel free to let me know if you find a seed file which would be useful to add.
  • I have created this repository by pulling files and configurations from my fuzzing machines as I write this blog post. The goal is to give a faithful representation of the setup I’ve used in the past to find multitudes of unserialize bugs. However, I may have forgotten something, or broken something in the process of pulling it all together for release. As I’m writing this I have started up the fuzz.sh script to ensure everything works, but it’s possible I’ve messed something up and won’t notice. If that does turn out to be the case, let me know! If you’d like to confirm everything is functional then compile PHP version 5.6.11 and you should find this bug within 40 million executions or so.

Happy hunting!

(Unrelated side note: I’ll be teaching a 4 day version of ‘Advanced Tool Development with SMT Solvers’ in London in early November. See here for details if you’re interested!)