To 404 or not to 404?

If you’ve been following my (admittedly) rare posts over the course of the last few years, you’re likely to have noticed a growing aggression toward PHP as a language. It’s not that I hate the language per se (although I sometimes do), it’s that there’s so much crap written in PHP that it’s almost impossible to find something well-written that’s pleasant to work with (major props to Fabien Potencier of SensioLabs for writing some of the best PHP I’ve ever had the pleasure to work with–see, I can be even-handed!). There are some positive developments, particularly in commercial PHP, but all things considered, we’ve a long way to go.

XenForo (as an example) is leaps and bounds better than vBulletin in terms of actual design. They use a well-tested framework (Zend), mostly keep to the philosophy of separation of concerns, and isolate components from each other without polluting the global namespace (hi, vBulletin!). However, (you were waiting for this, weren’t you?) the coding style still remains abysmal (what’s wrong the K&R style, guys? seriously!) and it’s almost infuriating how much magic is used to create classes on the fly for various tasks. Need an example? Just look through some of the deferred tasks: Nearly everything is generated by factory classes that accept a class to instantiate as a string argument, then return the instantiated class. This absolutely screws with a developer’s ability to rationalize about types being passed around, and if it weren’t so infuriating, I think most of us would break down in tears. Do you want to see a grown man cry?

Yes, I can understand why the XenForo developers have made their choices (no need for a long switch statement, object selection for instantiation becomes the concern of the calling code, etc.), but it’s absolutely a tremendous pain in the arse when one happens to be trawling through the sources to figure out why something isn’t working. Let’s not even get into the whole bazillion singleton methods splattered everywhere like the ground zero of a misplaced spittoon. How’s that for a visual? I think I just gave myself a fist-bump.

But alas, I distract myself with unnecessary things. This post is supposed to be about 404 errors, isn’t it? And here I was about to launch into an argument about how there’s so many libraries out there that support dependency injection, you’d have to be insane to use a hundred different singleton classes.

Today I encountered a relatively curious thing with XenForo. John (of Forum Foundry, Inc.) was having some difficulty with the nginx configuration on one of his sites and had convinced himself it was his fault because of something he had changed (it wasn’t). Something, somewhere was causing a 404 page to crop up every time users would attempt to reset their passwords. It was bugging him, and he asked if I’d take a look. So I did.

After examining his nginx configuration, I was convinced he did nothing wrong. But disconcertingly, I grew increasingly more convinced that it wasn’t nginx’s fault, either. Strangely, though, the 404 page was being spat out by nginx–not by XenForo–so it seemed impossible for the fault to lay elsewhere. It had to be nginx. Yet, I found myself battling with a sort of cognitive dissonance: Posting (using curl or similar) to the lost-password URL returned a 200 OK. Using GET (or other methods) returned a 405 Method Not Allowed (as expected). If it were the fault of nginx, the 404 error should be returned regardless of the method (GET, POST, or otherwise).

We tried various things, mostly involving the nginx debug log, but couldn’t quite get a handle on the source of the problem. I was pretty sure PHP was somehow at fault, but I couldn’t duplicate bubbling a 404 error up from PHP to nginx. Puzzled, I sat back and thought about the problem, and on a whim decided to do some digging around. It wasn’t long before I stumbled on the documentation for nginx’s fastcgi_intercept_errors directive, which seemed as if it might exacerbate upward error propagation, from application code. From the documentation:

[fastcgi_intercept_errors] [d]etermines whether FastCGI server responses with codes greater than or equal to 300 should be passed to a client or be redirected to nginx for processing with the error_page directive.

What does this mean? It means that, if fastcgi_intercept_errors is enabled, HTTP response codes with a value of 300 or greater (of which 404 is one) will trickle up and be intercepted by nginx regardless of the application (PHP) layer’s intent. Thus, even if the application displays something useful with the 404 message, nginx will intercept the response and display its own spartan 404 page.

Then a lightbulb came on.

Around line 63 or so of the XenForo lost-password handler, a 404 response is generated if a user’s name or email address doesn’t exist when a request is submitted to reset their password. This means that, in all likelihood, if someone mistypes their username (or email), they’ll receive a 404 message from nginx and no error message from XenForo. Rather than attempting to hit the back button, change their submission, and try again, the affected user is likely to believe the site is broken and they are therefore unable to reset their password. This couldn’t be further from the truth.

But it gets better. I didn’t stumble on fastcgi_intercept_errors initially, because I couldn’t get the 404 responses to bubble up to nginx for interception on my system (Arch Linux). The reason for this is that the /etc/nginx/fastcgi_params that ships with Arch is fairly sparse (the intent being that if you need more specific parameters, you add them yourself). But on this server, that wasn’t the case, because fastcgi_intercept_errors was enabled. Moreover, tagging along with it was a number of other parameters (I initially though it was the fault of Ubuntu), but it became clear that these were not the defaults that shipped with nginx (it doesn’t ship with fastcgi_intercept_errors enabled, for instance) or Ubuntu. Thus, any time a 404 error was dispatched from the application code, nginx intercepted it and displayed its 404 error page instead.

How many others might be affected by this? Who knows. If there’s a tutorial out there that recommends adding fastcgi_intercept_errors to /etc/nginx/fastcgi_params (it turns out there might be–read the update below), it goes without saying that the suggestion is a terrible idea if you expect your applications to generate and handle 404 errors (or greater). Of course, for high performance configurations, you probably want nginx to do as much of the heavy lifting as possible, but in this particular circumstance, we encountered an issue where using certain configuration options can yield unexpected results. In other words, don’t blindly copy and paste “suggested performance tweaks” from random Internet blogs without first consulting the documentation. The results may surprise you. (I don’t know if that’s what happened in this case, but I certainly can’t rule it out. For all I know, it may have been–and probably was–an accident.)

Is this the fault of XenForo? I’d argue no. While sending a 404 error in response to members that don’t exist who have attempted to reset their passwords is probably intended to be a RESTful response (not a member? respond with a 404) that does something unexpected (nginx 404 instead), it’s not a bug per se. However, it’s a potential privacy concern: Anyone who has access to a large corpus of email addresses can deduce based on the response returned by XenForo whether or not those email addresses are registered with the server. By doing so, a potential attacker can then target those accounts that are known to exist on the server for nefarious deeds (or violate their privacy).

But here is where we diverge into a matter of directed concerns: Do you provide users (some of whom might be terrible typists) with immediate feedback, indicating that the information they submitted was wrong, thereby increasing the usability of the site; or do you protect users’ privacy by generating the same response whether or not the information was submitted correctly? There is no right or wrong answer to this question, and generally it boils down purely to the matter of usability. Do you value ease of use or privacy? You can’t have both. Therefore, the answer rests on the shoulders of the site operator to decide. For sites that value users’ privacy, perhaps it’s better to sacrifice some usability for security.

For everyone else, it’s probably best to leave /etc/nginx/fastcgi_params at its default unless you absolutely must change the defaults. Better: Place the tweaks you need in a separate file and include that file separately.

Update I think I may have discovered one of the culprits for fastcgi_intercept_errors here, listed as the “Ultimate Speed Guide for WordPress on NGINX.” Hint: Don’t blindly copy everything someone on the Internet tells you is a good idea. If you don’t know what something does, always read the documentation. If you don’t, it will bite you.

No comments.
***

Password Hashing and PHP Insanity

I won’t delve into too many details in this post–it distracts from the crux of my argument–but I will take a few detours along the way. If you’re reading this, you have access to Google; thus, further research is left as an exercise to the reader.

Firstly, I want to make a point that’s been grating on me for a while. It’s 2014, and there are still some of you out there who insist on using MD5 (or some flavor of SHA) for password hashing. This is absolute insanity. Worse, this isn’t really a matter of insistence as much as it seems to be the fault of two things: Terrible tutorials that perpetuate a horribly broken practice followed by blissful developer’s ignorance. Yes, I know–you’re not a security expert–but neither am I! And I still know this isn’t something you should be doing. If someone as stupid as me can figure it out, then what the heck is going on out there? Burying your head in the sand doesn’t magically make fast hashing algorithms any better for password security.

Second, and most depressing, is the number of commercial products that have never used anything other than MD5 hashes for password storage and many continue using this archaic practice. vBulletin 3 (all versions), vBulletin 4 (all versions), and IP.Board up to at least version 3.4 (probably later) use some variation of salts plus MD5 to store password hashes. For all I know, it’s possible newer versions of the aforementioned software still do this for backwards compatibility. This means that if you’ve registered on almost any major forum on the Internet, there’s a very strong likelihood that if the site is compromised, your password is going to be discovered almost immediately. How immediate is “immediately?” Well, considering that as of 2012, a single Radeon GPU could generate something on the order of 8 billion MD5 hashes per second (assuming I read that correctly), most “strong” passwords are unlikely to survive a week or two. After a month, it’s unlikely any but the most absolutely convoluted password will be discovered. Even if your password is discovered, you probably won’t even know, because disclosure assumes the forum operator 1) knows they were compromised and 2) is forthcoming about the breach. Realistically, the best bet is to assume that any password you’ve ever entered on a message board has been compromised the moment you’ve entered it (they’re probably not using SSL anyway) and never reuse that password (using something like KeePass or KeePassX helps).

There are, of course, some exceptions to this rule. It’s my understanding that after the Ubuntu Forums were compromised some time ago they switched their authentication services to a home grown solution, so chances are your password is safer there than on traditional vBulletin message boards. Either way, the point remains.

The state of commercial PHP software is maddeningly pathetic. But there is a silver lining. XenForo, for instance, not only makes use of a well-tested popular framework (the Zend Framework to be specific), but they also use bcrypt for password hashing. This means that optimistically, there is hope we’ll eventually be rid of poor practices. (Good luck convincing the gaggle of misguided bloggers who still suggest MD5 + salts–it doesn’t matter how convincing they are, they’re wrong.) WordPress, phpBB, and many other open source packages also rely on bcrypt internally, so the inertial switcheroo is already nudging the pendulum in the right direction.

That said, there’s still one substantial problem with the ecosystem–well, make that two–PHP’s bcrypt support isn’t really feature complete in versions below PHP 5.5 and the use of Composer is still not as widespread as I’d like, thus limiting the use of libraries like password-compat to those of us who actually know about it (and it doesn’t work on versions below PHP 5.3.7). Thus, in versions of PHP less than 5.5 where the password_hash() function is unavailable, developers have to essentially roll their own salting algorithm. And we can easily guess where that leads us.

If you’re not already depressed enough, that’s okay. Anthony Ferrera wrote an interesting article a year and a half ago about how easy it is to screw up bcrypt implementations. When I first read the article and saw that he mentioned generating your own salts, I thought he was insane, because I knew that bcrypt implementations include salt generation as per the OpenBSD spec. Right? …right?

As it turns out, that’s not the case. In PHP versions less than 5.5, it’s up to you, the developer to supply the salt to crypt() so it can do the right thing. Worse, since almost no one ever bothers actually reading the spec (much less using one of several libraries already out there that implement it), there’s a non-zero probability that at least a half dozen implementations of bcrypt are either completely flawed (like this one–it uses microtime() piped into SHA1 for the salt–no, I’m not even joking) or so idiomatically incompatible with the official standards as to make the entire charade resemble a poorly organized three-ring circus (minus a few lions). And by “idiomatically incompatible,” I mean to say that such implementations probably work, and they might even validate with implementations that do follow the standard, but they are so counter to the intent and spirit of the standard that they may as well not even bother pretending to implement a bcrypt salt. Fortunately, there’s at least one GitHub gist that brings some sanity to the table, but it’s sadly not the only one discoverable via Google.

The fundamental problem with older versions of PHP is that the only way to follow the standard yourself is to either read from /dev/urandom (which might not be possible depending on the configuration of open_basedir) or use openssl_random_pseudo_bytes() and hope that a strong source of entropy is in use behind the scenes. For instance:

<?php
// We'll see if we're using a strong implementation behind the scenes.
$strong = false;
 
// Generate 128 bits of entropy and base64 encode it. "+" characters will invalidate
// the salt causing it to output the error/invalid salt indicator "*0".
$salt = rtrim(strtr(base64_encode(openssl_random_pseudo_bytes(128 / 8, $strong)), '+', '.'), '==');
 
// Generates something like "$2y$10$qK8pFSIPOnEqIESCzkMc7uVldEi/zb2bgiyUSB2kc8iIcTlAKTO5K"
crypt('This is a password.', '$2y$10$'.$salt);
 
// Complain.
if (!$strong) {
    echo 'Oh dear, the source of randomness might be a bit light-headed. Maybe it should sit down.'
}

Now, considering that most people writing code just want to get something working, it’s unlikely that they’ll do sufficient research to understand the limits of their tools, and even if they do understand the limits, they’re likely to make a mistake. Thus, the oft-repeated advice surfaces to chide anyone with such intent: “[use] a standard library.” Don’t roll your own. You’ll probably just screw it up.

Curiously, what started all this nonsense was when I searched earlier to determine what the maximum number of characters is that bcrypt will happily consume before discarding the rest. Although the spec suggests 72, it turns out that the question itself is a whole lot more complicated. Some implementations handle 72. Some assume the 72nd character is a null byte so they only do 71. Others, maybe 55. Or was that 56? Heck, maybe no one really knows. Or maybe some of these implementations will die off. So what do you do?

Before the thought crosses your mind, don’t even think about getting around the limit using a hashing algorithm. If a user enters a lengthy passphrase and you funnel their input through MD5 (for the sake of argument), you’ve probably stripped a substantial amount of entropy from their input. Perhaps that’s not a big deal (bcrypt is intentionally slow after all), but it introduces a potential source of incompatibilities with other implementations beyond those mentioned above, to say nothing of MD5’s well-known collisions, and who’s to say you’ll remember a year from now (or more) what changes you made to the password implementation when you’re suddenly tasked to add a new service or update the old one. Unless you’re migrating an old system and absolutely need compatibility with older software, you’re probably better off just resetting everyone’s passwords, sending an email explaining the situation (or displaying a notice at login), and using bcrypt behind the scenes.

Another plus side in all of this is that the PHP 5.5+ implementation of bcrypt appears to follow the standard. This means that a bcrypt password generated by PHP is interchangeable with one generated using Python (for instance). More importantly, the random salt generated by password_hash() is created by using /dev/urandom behind the scenes (for better or worse), and the appropriate system calls also appear to be made under Windows. Thus, upgrading to the latest PHP distribution is ideal. But you should be doing that anyway, shouldn’t you?

Now, I’ve already written about twice as much on the topic as I originally intended, so it’s probably time I cut this short. Heck, I wrote so much that I even forgot to proofread the entire thing, and here I am finally posting it a month later (!).

The take away from this is that you should be aware that everyone is probably doing password management wrong. It doesn’t even matter that the “bcrypt bandwagon” left the station two years ago, because there are thousands of legacy applications that are still alive and well with each one effectively an exploit away from having every single one of their user’s passwords released into the wild. The only thing you can do is to implement something stronger and hope for the best (but try not to invent it yourself). Otherwise, you may as well treat most of your passwords as compromised and don’t reuse them.

Can’t use bcrypt? Try PBKDF2 instead. It’s good enough for Apple’s iOS. Just please, stop (ab)using MD5.

No comments.
***

Updating PostgreSQL JSON fields via SQLAlchemy

If you’ve found this article, you may have discovered that as of PostgreSQL 9.3, there’s no immediately obvious way to easily (for some value of “easy”) update JSON columns (and their fields) in place like you sort of can with HSTORE when using SQLAlchemy. Supposedly, PostgreSQL 9.4 may be adding this feature, which means we’ll have to wait some time thereafter for support to appear in psycopg2 and, by extension, SQLAlchemy.

Update December 20th: PostgreSQL 9.5 will be adding support for incremental updates but you must use JSONB column types. The reason for this is explained in the manual but essentially boils down to this: JSON is stored as text, JSONB is pre-processed into a binary format and stored internally. Besides, you really ought to be using JSONB instead!

Anyway, as it turns out, updating JSON isn’t a big deal, and SQLAlchemy has some really great support for JSON columns. The only thing you need to do is update your JSON field as you would on any other instance if you’re using the ORM, but you still can’t update a specific subset of the JSON column (for that, it needs DBMS support). It’s that easy.

Here’s a contrived example using the Python shell:

# Fetch our object. Pretend there are no errors and it's guaranteed to exist.
>>> config = session.query(Config).filter(Config.id==1).one()
 
# Our object in this case is a configuration instance with some JSON data.
>>> config.values
{"do_something": true}
 
# Change it.
>>> config.values["do_something"] = false
>>> from sqlalchemy.orm.attributes import flag_modified
 
# flag_modified is necessary in this case, especially for complicated JSON structures.
# If you don't use it, you might discover that updated contents will never persist
# to the database because SQLAlchemy isn't aware that the field was changed.
>>> flag_modified(config, "values")
 
# Commit your changes.
>>> session.commit()

Or use the session directly

 
# Here's what we want to save:
>>> values
{"do_something": true}
 
# Change it.
>>> value["do_something"] = false
 
# Update it in place, for instance using the ORM (or you could the same thing using
# the expression language). You won't need flag_modified in this case, because
# you're overwriting the whole field.
>>> session.query(Config).filter(Config.id == 1).update({"values": values})
>>> session.commit()

The maddening bit about this was how long it took me to figure it out! But hey, I like to share. Hopefully this will save you some time.

Update October 16th, 2015: Included flag_modified usage for updating instances directly.

7 comments.
***