Password Hashing and PHP Insanity

I won’t delve into too many details in this post–it distracts from the crux of my argument–but I will take a few detours along the way. If you’re reading this, you have access to Google; thus, further research is left as an exercise to the reader.

Firstly, I want to make a point that’s been grating on me for a while. It’s 2014, and there are still some of you out there who insist on using MD5 (or some flavor of SHA) for password hashing. This is absolute insanity. Worse, this isn’t really a matter of insistence as much as it seems to be the fault of two things: Terrible tutorials that perpetuate a horribly broken practice followed by blissful developer’s ignorance. Yes, I know–you’re not a security expert–but neither am I! And I still know this isn’t something you should be doing. If someone as stupid as me can figure it out, then what the heck is going on out there? Burying your head in the sand doesn’t magically make fast hashing algorithms any better for password security.

Second, and most depressing, is the number of commercial products that have never used anything other than MD5 hashes for password storage and many continue using this archaic practice. vBulletin 3 (all versions), vBulletin 4 (all versions), and IP.Board up to at least version 3.4 (probably later) use some variation of salts plus MD5 to store password hashes. For all I know, it’s possible newer versions of the aforementioned software still do this for backwards compatibility. This means that if you’ve registered on almost any major forum on the Internet, there’s a very strong likelihood that if the site is compromised, your password is going to be discovered almost immediately. How immediate is “immediately?” Well, considering that as of 2012, a single Radeon GPU could generate something on the order of 8 billion MD5 hashes per second (assuming I read that correctly), most “strong” passwords are unlikely to survive a week or two. After a month, it’s unlikely any but the most absolutely convoluted password will be discovered. Even if your password is discovered, you probably won’t even know, because disclosure assumes the forum operator 1) knows they were compromised and 2) is forthcoming about the breach. Realistically, the best bet is to assume that any password you’ve ever entered on a message board has been compromised the moment you’ve entered it (they’re probably not using SSL anyway) and never reuse that password (using something like KeePass or KeePassX helps).

There are, of course, some exceptions to this rule. It’s my understanding that after the Ubuntu Forums were compromised some time ago they switched their authentication services to a home grown solution, so chances are your password is safer there than on traditional vBulletin message boards. Either way, the point remains.

The state of commercial PHP software is maddeningly pathetic. But there is a silver lining. XenForo, for instance, not only makes use of a well-tested popular framework (the Zend Framework to be specific), but they also use bcrypt for password hashing. This means that optimistically, there is hope we’ll eventually be rid of poor practices. (Good luck convincing the gaggle of misguided bloggers who still suggest MD5 + salts–it doesn’t matter how convincing they are, they’re wrong.) WordPress, phpBB, and many other open source packages also rely on bcrypt internally, so the inertial switcheroo is already nudging the pendulum in the right direction.

That said, there’s still one substantial problem with the ecosystem–well, make that two–PHP’s bcrypt support isn’t really feature complete in versions below PHP 5.5 and the use of Composer is still not as widespread as I’d like, thus limiting the use of libraries like password-compat to those of us who actually know about it (and it doesn’t work on versions below PHP 5.3.7). Thus, in versions of PHP less than 5.5 where the password_hash() function is unavailable, developers have to essentially roll their own salting algorithm. And we can easily guess where that leads us.

If you’re not already depressed enough, that’s okay. Anthony Ferrera wrote an interesting article a year and a half ago about how easy it is to screw up bcrypt implementations. When I first read the article and saw that he mentioned generating your own salts, I thought he was insane, because I knew that bcrypt implementations include salt generation as per the OpenBSD spec. Right? …right?

As it turns out, that’s not the case. In PHP versions less than 5.5, it’s up to you, the developer to supply the salt to crypt() so it can do the right thing. Worse, since almost no one ever bothers actually reading the spec (much less using one of several libraries already out there that implement it), there’s a non-zero probability that at least a half dozen implementations of bcrypt are either completely flawed (like this one–it uses microtime() piped into SHA1 for the salt–no, I’m not even joking) or so idiomatically incompatible with the official standards as to make the entire charade resemble a poorly organized three-ring circus (minus a few lions). And by “idiomatically incompatible,” I mean to say that such implementations probably work, and they might even validate with implementations that do follow the standard, but they are so counter to the intent and spirit of the standard that they may as well not even bother pretending to implement a bcrypt salt. Fortunately, there’s at least one GitHub gist that brings some sanity to the table, but it’s sadly not the only one discoverable via Google.

The fundamental problem with older versions of PHP is that the only way to follow the standard yourself is to either read from /dev/urandom (which might not be possible depending on the configuration of open_basedir) or use openssl_random_pseudo_bytes() and hope that a strong source of entropy is in use behind the scenes. For instance:

<?php
// We'll see if we're using a strong implementation behind the scenes.
$strong = false;
 
// Generate 128 bits of entropy and base64 encode it. "+" characters will invalidate
// the salt causing it to output the error/invalid salt indicator "*0".
$salt = rtrim(strtr(base64_encode(openssl_random_pseudo_bytes(128 / 8, $strong)), '+', '.'), '==');
 
// Generates something like "$2y$10$qK8pFSIPOnEqIESCzkMc7uVldEi/zb2bgiyUSB2kc8iIcTlAKTO5K"
crypt('This is a password.', '$2y$10$'.$salt);
 
// Complain.
if (!$strong) {
    echo 'Oh dear, the source of randomness might be a bit light-headed. Maybe it should sit down.'
}

Now, considering that most people writing code just want to get something working, it’s unlikely that they’ll do sufficient research to understand the limits of their tools, and even if they do understand the limits, they’re likely to make a mistake. Thus, the oft-repeated advice surfaces to chide anyone with such intent: “[use] a standard library.” Don’t roll your own. You’ll probably just screw it up.

Curiously, what started all this nonsense was when I searched earlier to determine what the maximum number of characters is that bcrypt will happily consume before discarding the rest. Although the spec suggests 72, it turns out that the question itself is a whole lot more complicated. Some implementations handle 72. Some assume the 72nd character is a null byte so they only do 71. Others, maybe 55. Or was that 56? Heck, maybe no one really knows. Or maybe some of these implementations will die off. So what do you do?

Before the thought crosses your mind, don’t even think about getting around the limit using a hashing algorithm. If a user enters a lengthy passphrase and you funnel their input through MD5 (for the sake of argument), you’ve probably stripped a substantial amount of entropy from their input. Perhaps that’s not a big deal (bcrypt is intentionally slow after all), but it introduces a potential source of incompatibilities with other implementations beyond those mentioned above, to say nothing of MD5’s well-known collisions, and who’s to say you’ll remember a year from now (or more) what changes you made to the password implementation when you’re suddenly tasked to add a new service or update the old one. Unless you’re migrating an old system and absolutely need compatibility with older software, you’re probably better off just resetting everyone’s passwords, sending an email explaining the situation (or displaying a notice at login), and using bcrypt behind the scenes.

Another plus side in all of this is that the PHP 5.5+ implementation of bcrypt appears to follow the standard. This means that a bcrypt password generated by PHP is interchangeable with one generated using Python (for instance). More importantly, the random salt generated by password_hash() is created by using /dev/urandom behind the scenes (for better or worse), and the appropriate system calls also appear to be made under Windows. Thus, upgrading to the latest PHP distribution is ideal. But you should be doing that anyway, shouldn’t you?

Now, I’ve already written about twice as much on the topic as I originally intended, so it’s probably time I cut this short. Heck, I wrote so much that I even forgot to proofread the entire thing, and here I am finally posting it a month later (!).

The take away from this is that you should be aware that everyone is probably doing password management wrong. It doesn’t even matter that the “bcrypt bandwagon” left the station two years ago, because there are thousands of legacy applications that are still alive and well with each one effectively an exploit away from having every single one of their user’s passwords released into the wild. The only thing you can do is to implement something stronger and hope for the best (but try not to invent it yourself). Otherwise, you may as well treat most of your passwords as compromised and don’t reuse them.

Can’t use bcrypt? Try PBKDF2 instead. It’s good enough for Apple’s iOS. Just please, stop (ab)using MD5.

No comments.
***

A Lesson from Twitter

Today, I got a curious e-mail from Twitter:

Hi, zancarius

Twitter believes that your account may have been compromised by a website or service not associated with Twitter. We’ve reset your password to prevent others from accessing your account.

You’ll need to create a new password for your Twitter account. You can select a new password at this link: [redacted]

As always, you can also request a new password from our password-resend page: https://twitter.com/account/resend_password

Please don’t reuse your old password and be sure to choose a strong password (such as one with a combination of letters, numbers, and symbols).

In general, be sure to:

Always check that your browser’s address bar is on a https://twitter.com website before entering your password. Phishing sites often look just like Twitter, so check the URL before entering your login information!
Avoid using websites or services that promise to get you lots of followers. These sites have been known to send spam updates and damage user accounts.
Review your approved connections on your Applications page at https://twitter.com/settings/applications. If you see any applications that you don’t recognize, click the Revoke Access button.

For more information, visit our help page for hacked or compromised accounts.

(Before you ask, yes this did come from Twitter.)

It turns out that my Twitter account had been compromised. I hadn’t posted anything since 2011, and I seriously doubt I logged into Twitter any time recently on my browser (though I probably have it active on a mobile device–I just never check it). This was puzzling to me, as I thought I had used a random password on the account as per my usual habit.

Except that I hadn’t. Instead, I had used a simple throw away that could’ve been relatively easy to brute force given sufficient time. This was entirely my fault, and while there’s no excuse for it, I admit that I hadn’t ever thought enough of using Twitter to protect the account. Furthermore, the account was created circa 2009 when I used to use fairly simple passwords for throwaways and strong passwords for accounts I wanted to protect (my personal e-mail accounts use > 40-70 character pass-phrases, for example). So, this was entirely my mistake, and while it’s plausible that I may have given access to a 3rd party to tweet on my behalf, I suspect this isn’t the case; there were no apps listed in the authorized application list, and the Twitter e-mail strongly hints that they will remain there until manually removed.

So, lesson learned I suppose.

However, this did present a unique opportunity to learn from one of the top social networking sites in the world. Rather than closing accounts or granting spammers free reign, Twitter resets the account password and sends a polite notice to the e-mail address registered for the account indicating what the problem is and how to rectify it. It’s a brilliant idea, I think, and I’d love if more sites followed suite. After all, spammers are using similar tactics elsewhere (including Youtube) to exploit accounts that might otherwise hold good standing with the community to continue their nefarious activities. Plus, is it really fair to terminate someone’s account that’s been compromised, just because it was used to spam? I don’t think so–not anymore.

The other lesson in all of this is to use strong passwords even for accounts you don’t think you’ll use again. It can affect your reputation, it can cause embarrassment, and it feels unnaturally violating to see spammy comments from an account with your picture on it. While my account was only used for two spam tweets before Twitter shut it down, the sensation of such violation wrought deep into my core.

For a couple of years, I’ve been using the excellent KeePass password storage application (more specifically, the KeePassX v2 port) to generate and store random passwords. The tactic of generating random passwords is increasingly more and more viable as forum software (like vBulletin) exhibits such strong weaknesses that MD5-hashed passwords are no longer strong enough to protect against attackers with even modest resources. By using randomly generated passwords, even if one is compromised, you don’t have to worry about an attacker gaining access to other accounts–or to the mental algorithm you use to generate passwords you can remember.

That said, for my most important accounts, I do use fairly lengthy pass-phrases. By mixing KeePass with pass-phrases, I can save my mental energies for remembering those passwords that are the most important, and offload the remainder of the work to the computer. So far, it’s worked fairly well. Twitter being the only account I’ve had compromised due to forgetting to change the password to something random and having used an older throw-away password, being somewhat “cutesy” (or so I thought) in the process, serves as a good testament to this. It doesn’t mean I won’t have another account compromised, but it does dramatically reduce the probability. The fact that an account I seldom used was compromised helped push me into action to reset some of my more important passwords and to verify the ones that I have collected to ensure they meet my criteria of strong and random.

So, even if you have an account you never think you’ll use again, be absolutely certain you use a strong (preferably random) password or pass-phrase. After all of this nonsense, I think I might have to go back to using my Twitter account. At least I didn’t lose it; all I lost was some face (but I have hardly any followers whom I don’t personally know in real life… so does it really matter?).

The other moral in all of this is that such compromises can hit anyone. Even you.

No comments.
***