Password Hashing and PHP Insanity

I won’t delve into too many details in this post–it distracts from the crux of my argument–but I will take a few detours along the way. If you’re reading this, you have access to Google; thus, further research is left as an exercise to the reader.

Firstly, I want to make a point that’s been grating on me for a while. It’s 2014, and there are still some of you out there who insist on using MD5 (or some flavor of SHA) for password hashing. This is absolute insanity. Worse, this isn’t really a matter of insistence as much as it seems to be the fault of two things: Terrible tutorials that perpetuate a horribly broken practice followed by blissful developer’s ignorance. Yes, I know–you’re not a security expert–but neither am I! And I still know this isn’t something you should be doing. If someone as stupid as me can figure it out, then what the heck is going on out there? Burying your head in the sand doesn’t magically make fast hashing algorithms any better for password security.

Second, and most depressing, is the number of commercial products that have never used anything other than MD5 hashes for password storage and many continue using this archaic practice. vBulletin 3 (all versions), vBulletin 4 (all versions), and IP.Board up to at least version 3.4 (probably later) use some variation of salts plus MD5 to store password hashes. For all I know, it’s possible newer versions of the aforementioned software still do this for backwards compatibility. This means that if you’ve registered on almost any major forum on the Internet, there’s a very strong likelihood that if the site is compromised, your password is going to be discovered almost immediately. How immediate is “immediately?” Well, considering that as of 2012, a single Radeon GPU could generate something on the order of 8 billion MD5 hashes per second (assuming I read that correctly), most “strong” passwords are unlikely to survive a week or two. After a month, it’s unlikely any but the most absolutely convoluted password will be discovered. Even if your password is discovered, you probably won’t even know, because disclosure assumes the forum operator 1) knows they were compromised and 2) is forthcoming about the breach. Realistically, the best bet is to assume that any password you’ve ever entered on a message board has been compromised the moment you’ve entered it (they’re probably not using SSL anyway) and never reuse that password (using something like KeePass or KeePassX helps).

There are, of course, some exceptions to this rule. It’s my understanding that after the Ubuntu Forums were compromised some time ago they switched their authentication services to a home grown solution, so chances are your password is safer there than on traditional vBulletin message boards. Either way, the point remains.

The state of commercial PHP software is maddeningly pathetic. But there is a silver lining. XenForo, for instance, not only makes use of a well-tested popular framework (the Zend Framework to be specific), but they also use bcrypt for password hashing. This means that optimistically, there is hope we’ll eventually be rid of poor practices. (Good luck convincing the gaggle of misguided bloggers who still suggest MD5 + salts–it doesn’t matter how convincing they are, they’re wrong.) WordPress, phpBB, and many other open source packages also rely on bcrypt internally, so the inertial switcheroo is already nudging the pendulum in the right direction.

That said, there’s still one substantial problem with the ecosystem–well, make that two–PHP’s bcrypt support isn’t really feature complete in versions below PHP 5.5 and the use of Composer is still not as widespread as I’d like, thus limiting the use of libraries like password-compat to those of us who actually know about it (and it doesn’t work on versions below PHP 5.3.7). Thus, in versions of PHP less than 5.5 where the password_hash() function is unavailable, developers have to essentially roll their own salting algorithm. And we can easily guess where that leads us.

If you’re not already depressed enough, that’s okay. Anthony Ferrera wrote an interesting article a year and a half ago about how easy it is to screw up bcrypt implementations. When I first read the article and saw that he mentioned generating your own salts, I thought he was insane, because I knew that bcrypt implementations include salt generation as per the OpenBSD spec. Right? …right?

As it turns out, that’s not the case. In PHP versions less than 5.5, it’s up to you, the developer to supply the salt to crypt() so it can do the right thing. Worse, since almost no one ever bothers actually reading the spec (much less using one of several libraries already out there that implement it), there’s a non-zero probability that at least a half dozen implementations of bcrypt are either completely flawed (like this one–it uses microtime() piped into SHA1 for the salt–no, I’m not even joking) or so idiomatically incompatible with the official standards as to make the entire charade resemble a poorly organized three-ring circus (minus a few lions). And by “idiomatically incompatible,” I mean to say that such implementations probably work, and they might even validate with implementations that do follow the standard, but they are so counter to the intent and spirit of the standard that they may as well not even bother pretending to implement a bcrypt salt. Fortunately, there’s at least one GitHub gist that brings some sanity to the table, but it’s sadly not the only one discoverable via Google.

The fundamental problem with older versions of PHP is that the only way to follow the standard yourself is to either read from /dev/urandom (which might not be possible depending on the configuration of open_basedir) or use openssl_random_pseudo_bytes() and hope that a strong source of entropy is in use behind the scenes. For instance:

<?php
// We'll see if we're using a strong implementation behind the scenes.
$strong = false;
 
// Generate 128 bits of entropy and base64 encode it. "+" characters will invalidate
// the salt causing it to output the error/invalid salt indicator "*0".
$salt = rtrim(strtr(base64_encode(openssl_random_pseudo_bytes(128 / 8, $strong)), '+', '.'), '==');
 
// Generates something like "$2y$10$qK8pFSIPOnEqIESCzkMc7uVldEi/zb2bgiyUSB2kc8iIcTlAKTO5K"
crypt('This is a password.', '$2y$10$'.$salt);
 
// Complain.
if (!$strong) {
    echo 'Oh dear, the source of randomness might be a bit light-headed. Maybe it should sit down.'
}

Now, considering that most people writing code just want to get something working, it’s unlikely that they’ll do sufficient research to understand the limits of their tools, and even if they do understand the limits, they’re likely to make a mistake. Thus, the oft-repeated advice surfaces to chide anyone with such intent: “[use] a standard library.” Don’t roll your own. You’ll probably just screw it up.

Curiously, what started all this nonsense was when I searched earlier to determine what the maximum number of characters is that bcrypt will happily consume before discarding the rest. Although the spec suggests 72, it turns out that the question itself is a whole lot more complicated. Some implementations handle 72. Some assume the 72nd character is a null byte so they only do 71. Others, maybe 55. Or was that 56? Heck, maybe no one really knows. Or maybe some of these implementations will die off. So what do you do?

Before the thought crosses your mind, don’t even think about getting around the limit using a hashing algorithm. If a user enters a lengthy passphrase and you funnel their input through MD5 (for the sake of argument), you’ve probably stripped a substantial amount of entropy from their input. Perhaps that’s not a big deal (bcrypt is intentionally slow after all), but it introduces a potential source of incompatibilities with other implementations beyond those mentioned above, to say nothing of MD5’s well-known collisions, and who’s to say you’ll remember a year from now (or more) what changes you made to the password implementation when you’re suddenly tasked to add a new service or update the old one. Unless you’re migrating an old system and absolutely need compatibility with older software, you’re probably better off just resetting everyone’s passwords, sending an email explaining the situation (or displaying a notice at login), and using bcrypt behind the scenes.

Another plus side in all of this is that the PHP 5.5+ implementation of bcrypt appears to follow the standard. This means that a bcrypt password generated by PHP is interchangeable with one generated using Python (for instance). More importantly, the random salt generated by password_hash() is created by using /dev/urandom behind the scenes (for better or worse), and the appropriate system calls also appear to be made under Windows. Thus, upgrading to the latest PHP distribution is ideal. But you should be doing that anyway, shouldn’t you?

Now, I’ve already written about twice as much on the topic as I originally intended, so it’s probably time I cut this short. Heck, I wrote so much that I even forgot to proofread the entire thing, and here I am finally posting it a month later (!).

The take away from this is that you should be aware that everyone is probably doing password management wrong. It doesn’t even matter that the “bcrypt bandwagon” left the station two years ago, because there are thousands of legacy applications that are still alive and well with each one effectively an exploit away from having every single one of their user’s passwords released into the wild. The only thing you can do is to implement something stronger and hope for the best (but try not to invent it yourself). Otherwise, you may as well treat most of your passwords as compromised and don’t reuse them.

Can’t use bcrypt? Try PBKDF2 instead. It’s good enough for Apple’s iOS. Just please, stop (ab)using MD5.

No comments.
***

Updating PostgreSQL JSON fields via SQLAlchemy

If you’ve found this article, you may have discovered that as of PostgreSQL 9.3, there’s no immediately obvious way to easily (for some value of “easy”) update JSON columns (and their fields) in place like you sort of can with HSTORE when using SQLAlchemy. Supposedly, PostgreSQL 9.4 may be adding this feature, which means we’ll have to wait some time thereafter for support to appear in psycopg2 and, by extension, SQLAlchemy.

Update December 20th: PostgreSQL 9.5 will be adding support for incremental updates but you must use JSONB column types. The reason for this is explained in the manual but essentially boils down to this: JSON is stored as text, JSONB is pre-processed into a binary format and stored internally. Besides, you really ought to be using JSONB instead!

Anyway, as it turns out, updating JSON isn’t a big deal, and SQLAlchemy has some really great support for JSON columns. The only thing you need to do is update your JSON field as you would on any other instance if you’re using the ORM, but you still can’t update a specific subset of the JSON column (for that, it needs DBMS support). It’s that easy.

Here’s a contrived example using the Python shell:

# Fetch our object. Pretend there are no errors and it's guaranteed to exist.
>>> config = session.query(Config).filter(Config.id==1).one()
 
# Our object in this case is a configuration instance with some JSON data.
>>> config.values
{"do_something": true}
 
# Change it.
>>> config.values["do_something"] = false
>>> from sqlalchemy.orm.attributes import flag_modified
 
# flag_modified is necessary in this case, especially for complicated JSON structures.
# If you don't use it, you might discover that updated contents will never persist
# to the database because SQLAlchemy isn't aware that the field was changed.
>>> flag_modified(config, "values")
 
# Commit your changes.
>>> session.commit()

Or use the session directly

 
# Here's what we want to save:
>>> values
{"do_something": true}
 
# Change it.
>>> value["do_something"] = false
 
# Update it in place, for instance using the ORM (or you could the same thing using
# the expression language). You won't need flag_modified in this case, because
# you're overwriting the whole field.
>>> session.query(Config).filter(Config.id == 1).update({"values": values})
>>> session.commit()

The maddening bit about this was how long it took me to figure it out! But hey, I like to share. Hopefully this will save you some time.

Update October 16th, 2015: Included flag_modified usage for updating instances directly.

7 comments.
***

SQL and PHP

We have a few things to talk about regarding SQL and PHP. It’s short and sweet, so I’ll be brief.

I’ll start with saying that I’d like to put this tactfully, but unfortunately, I don’t know how else to say this:

It’s 2014 and we’re still not using parameterized statements.

Seriously.

I ran across a project that was linked from one of the various “discover Github” mailings I’m subscribed to that looks very promising. It’s called Lychee, it looks great, and the demo functions well. I can’t think of any other open source gallery systems at the moment that look as good as Lynchee (or as simple). There’s just one problem: It’s an absolute haven for SQL injection attacks. Gosh, I don’t even know where to begin. Nearly every single query that accepts some form of external input is a potential landmine of shame, sadness, and roasted pandas. Worse, the last ticket addressing this particular issue was posted about a month ago. If there’s no activity this weekend, I might just go through and convert it to use parameterized statements myself and offer a pull request.

The other problem I have with it is the code formatting. It’s pretty ugly. It certainly wouldn’t pass PSR-2, and it reminds me vaguely of the ad hoc formatting found in most other popular PHP products (commercial and otherwise, so it’s not like Lychee is unique). But, that’s okay. Code formatting is mostly subjective, as long as you’re following some standard (even your own) and remain consistent. While some of us would rather split hairs over various stupid things, it’s mostly incidental because we care about appearance. Appearance does matter, because care and consideration for how the code looks probably reflects how the code operates. Pragmatically, though, appearance doesn’t matter that much. It can be fixed with automated tools. No biggy.

However, the SQL injections are unforgivable.

There are hundreds of resources on the matter, many targeting PHP. So it’s not like there’s much of an excuse. Heck, I’ve written about this in the past, and I’m not even a smart dude. If you haven’t heard of parameterized statements, either you’ve been living under a rock or you’re new at this.

The good news is that if you’re new to the big bad world of web software, don’t fret. The absolute best place to get started if you’re writing PHP is to read PHP the Right Way. It includes dozens of tips and tricks, information on established and emerging standards, and delves into a few advanced topics for larger applications. I would highly encourage you to read the entire thing. In particularly, I would suggest reading the section on databases, because it covers PDO, SQL injections, and parameterized statements (often known as prepared statements). There’s nothing wrong with the MySQLi extensions per se but new PHP code should use PDO instead. PDO even has a lesser known feature to inject queries into objects, which can take care of some of the more mundane work (if you happen to swing that way).

Don’t get me wrong. It’s great to see new projects popping up all the time challenging older, more established ones. It breathes new life into under-served markets and creates a greater body of useful tools for all of us. Sometimes a fresh look into old concepts helps us think outside the box, questioning established design principles, and jarring us back into the reality that there might actually be a better way of doing things. Please don’t feel this post is an outright criticism: It’s not. We all had to start somewhere. If you’ve never heard of SQL injections, well, now you have. Extra information in your arsenal of tools won’t hurt, but it certainly will help combat one of the most common design flaws in software today.

What you should take away from this post is that there are unsavory individuals out there who are deeply interested in easy-to-exploit flaws in software. These folks gain a great deal of enjoyment from imposing misery on others, either by writ of outright damage or embarrassment. Because of them, we have to develop for the web differently than we might have, say, 10 years ago.

So let’s start by teaching the newcomers not to repeat our mistakes!

No comments.
***