I won’t delve into too many details in this post–it distracts from the crux of my argument–but I will take a few detours along the way. If you’re reading this, you have access to Google; thus, further research is left as an exercise to the reader.
Firstly, I want to make a point that’s been grating on me for a while. It’s 2014, and there are still some of you out there who insist on using MD5 (or some flavor of SHA) for password hashing. This is absolute insanity. Worse, this isn’t really a matter of insistence as much as it seems to be the fault of two things: Terrible tutorials that perpetuate a horribly broken practice followed by blissful developer’s ignorance. Yes, I know–you’re not a security expert–but neither am I! And I still know this isn’t something you should be doing. If someone as stupid as me can figure it out, then what the heck is going on out there? Burying your head in the sand doesn’t magically make fast hashing algorithms any better for password security.
Second, and most depressing, is the number of commercial products that have never used anything other than MD5 hashes for password storage and many continue using this archaic practice. vBulletin 3 (all versions), vBulletin 4 (all versions), and IP.Board up to at least version 3.4 (probably later) use some variation of salts plus MD5 to store password hashes. For all I know, it’s possible newer versions of the aforementioned software still do this for backwards compatibility. This means that if you’ve registered on almost any major forum on the Internet, there’s a very strong likelihood that if the site is compromised, your password is going to be discovered almost immediately. How immediate is “immediately?” Well, considering that as of 2012, a single Radeon GPU could generate something on the order of 8 billion MD5 hashes per second (assuming I read that correctly), most “strong” passwords are unlikely to survive a week or two. After a month, it’s unlikely any but the most absolutely convoluted password will be discovered. Even if your password is discovered, you probably won’t even know, because disclosure assumes the forum operator 1) knows they were compromised and 2) is forthcoming about the breach. Realistically, the best bet is to assume that any password you’ve ever entered on a message board has been compromised the moment you’ve entered it (they’re probably not using SSL anyway) and never reuse that password (using something like KeePass or KeePassX helps).
There are, of course, some exceptions to this rule. It’s my understanding that after the Ubuntu Forums were compromised some time ago they switched their authentication services to a home grown solution, so chances are your password is safer there than on traditional vBulletin message boards. Either way, the point remains.
The state of commercial PHP software is maddeningly pathetic. But there is a silver lining. XenForo, for instance, not only makes use of a well-tested popular framework (the Zend Framework to be specific), but they also use bcrypt for password hashing. This means that optimistically, there is hope we’ll eventually be rid of poor practices. (Good luck convincing the gaggle of misguided bloggers who still suggest MD5 + salts–it doesn’t matter how convincing they are, they’re wrong.) WordPress, phpBB, and many other open source packages also rely on bcrypt internally, so the inertial switcheroo is already nudging the pendulum in the right direction.
That said, there’s still one substantial problem with the ecosystem–well, make that two–PHP’s bcrypt support isn’t really feature complete in versions below PHP 5.5 and the use of Composer is still not as widespread as I’d like, thus limiting the use of libraries like password-compat to those of us who actually know about it (and it doesn’t work on versions below PHP 5.3.7). Thus, in versions of PHP less than 5.5 where the password_hash() function is unavailable, developers have to essentially roll their own salting algorithm. And we can easily guess where that leads us.
If you’re not already depressed enough, that’s okay. Anthony Ferrera wrote an interesting article a year and a half ago about how easy it is to screw up bcrypt implementations. When I first read the article and saw that he mentioned generating your own salts, I thought he was insane, because I knew that bcrypt implementations include salt generation as per the OpenBSD spec. Right? …right?
As it turns out, that’s not the case. In PHP versions less than 5.5, it’s up to you, the developer to supply the salt to crypt() so it can do the right thing. Worse, since almost no one ever bothers actually reading the spec (much less using one of several libraries already out there that implement it), there’s a non-zero probability that at least a half dozen implementations of bcrypt are either completely flawed (like this one–it uses microtime() piped into SHA1 for the salt–no, I’m not even joking) or so idiomatically incompatible with the official standards as to make the entire charade resemble a poorly organized three-ring circus (minus a few lions). And by “idiomatically incompatible,” I mean to say that such implementations probably work, and they might even validate with implementations that do follow the standard, but they are so counter to the intent and spirit of the standard that they may as well not even bother pretending to implement a bcrypt salt. Fortunately, there’s at least one GitHub gist that brings some sanity to the table, but it’s sadly not the only one discoverable via Google.
The fundamental problem with older versions of PHP is that the only way to follow the standard yourself is to either read from /dev/urandom (which might not be possible depending on the configuration of open_basedir) or use openssl_random_pseudo_bytes() and hope that a strong source of entropy is in use behind the scenes. For instance:
<?php // We'll see if we're using a strong implementation behind the scenes. $strong = false; // Generate 128 bits of entropy and base64 encode it. "+" characters will invalidate // the salt causing it to output the error/invalid salt indicator "*0". $salt = rtrim(strtr(base64_encode(openssl_random_pseudo_bytes(128 / 8, $strong)), '+', '.'), '=='); // Generates something like "$2y$10$qK8pFSIPOnEqIESCzkMc7uVldEi/zb2bgiyUSB2kc8iIcTlAKTO5K" crypt('This is a password.', '$2y$10$'.$salt); // Complain. if (!$strong) { echo 'Oh dear, the source of randomness might be a bit light-headed. Maybe it should sit down.' } |
Now, considering that most people writing code just want to get something working, it’s unlikely that they’ll do sufficient research to understand the limits of their tools, and even if they do understand the limits, they’re likely to make a mistake. Thus, the oft-repeated advice surfaces to chide anyone with such intent: “[use] a standard library.” Don’t roll your own. You’ll probably just screw it up.
Curiously, what started all this nonsense was when I searched earlier to determine what the maximum number of characters is that bcrypt will happily consume before discarding the rest. Although the spec suggests 72, it turns out that the question itself is a whole lot more complicated. Some implementations handle 72. Some assume the 72nd character is a null byte so they only do 71. Others, maybe 55. Or was that 56? Heck, maybe no one really knows. Or maybe some of these implementations will die off. So what do you do?
Before the thought crosses your mind, don’t even think about getting around the limit using a hashing algorithm. If a user enters a lengthy passphrase and you funnel their input through MD5 (for the sake of argument), you’ve probably stripped a substantial amount of entropy from their input. Perhaps that’s not a big deal (bcrypt is intentionally slow after all), but it introduces a potential source of incompatibilities with other implementations beyond those mentioned above, to say nothing of MD5’s well-known collisions, and who’s to say you’ll remember a year from now (or more) what changes you made to the password implementation when you’re suddenly tasked to add a new service or update the old one. Unless you’re migrating an old system and absolutely need compatibility with older software, you’re probably better off just resetting everyone’s passwords, sending an email explaining the situation (or displaying a notice at login), and using bcrypt behind the scenes.
Another plus side in all of this is that the PHP 5.5+ implementation of bcrypt appears to follow the standard. This means that a bcrypt password generated by PHP is interchangeable with one generated using Python (for instance). More importantly, the random salt generated by password_hash() is created by using /dev/urandom behind the scenes (for better or worse), and the appropriate system calls also appear to be made under Windows. Thus, upgrading to the latest PHP distribution is ideal. But you should be doing that anyway, shouldn’t you?
Now, I’ve already written about twice as much on the topic as I originally intended, so it’s probably time I cut this short. Heck, I wrote so much that I even forgot to proofread the entire thing, and here I am finally posting it a month later (!).
The take away from this is that you should be aware that everyone is probably doing password management wrong. It doesn’t even matter that the “bcrypt bandwagon” left the station two years ago, because there are thousands of legacy applications that are still alive and well with each one effectively an exploit away from having every single one of their user’s passwords released into the wild. The only thing you can do is to implement something stronger and hope for the best (but try not to invent it yourself). Otherwise, you may as well treat most of your passwords as compromised and don’t reuse them.
Can’t use bcrypt? Try PBKDF2 instead. It’s good enough for Apple’s iOS. Just please, stop (ab)using MD5.
Leave a comment