Annoyances: vBulletin Version Check

I hate vBulletin. I really, really, really hate vBulletin, and as much as I’d like to pin it on their apparent disdain for K&R coding style (curly braces should, in my opinion, always be on the same line as the if statement to which they are bound) or their awful code in general, I’m afraid I can’t. Mostly, my experiences have been with vBulletin 3 and 4, but while I have no experience with vBulletin 5, I can’t imagine it’s any better.

vBulletin represents everything that is wrong with the PHP development community as a whole. Or rather, everything from the earliest, darkest ages of PHP antiquity that I’d rather just fade away. It’s 2015, and there’s absolutely no reason to continue encouraging people to do stupid, idiotic things when there’s plenty of fantastic documentation out there that would be much better reading for new developers.

Sadly, though, we have to contend with what have rather than what we want.

(Aside: XenForo isn’t much better, but at least they’re using the Zend Framework and provide at least something of a façade that they might actually know what parameterized statements are.)

Today, I wasted about 15-20 minutes on a confounding problem with a vBulletin plugin. The version check URL wasn’t working–at all. Why? Well, there’s the rub, isn’t it? Here’s what I had done: I put an XML file containing the appropriate nonsense for plugin versioning (which would fail XML validation, but that’s besides the point) with nginx sitting in front, pointed vBulletin at the URL, and… nothing.

What the…?

After some fiddly rubbish, I finally ran Wireshark to see what was going on. Lo and behold, it was obviously much more work to host some simple version checking nonsense than to just throw in a static page:

POST /some/path/to/version.xml HTTP/1.0
Host: example.org
User-Agent: vBulletin Product Version Check
Content-Length: 0
 
HTTP/1.1 200 OK
Server: nginx/1.8.0
Date: Tue, 28 Jul 2015 22:27:25 GMT
Content-Type: text/xml
Content-Length: 52
Last-Modified: Tue, 28 Jul 2015 22:21:13 GMT
Connection: close
ETag: "55b80059-34"
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000
Accept-Ranges: bytes
 
<version productid="test_product">1.0.0</version>

Did one of their developers read somewhere that POST is somehow better than GET? Not only did this violate the principle of least surprise (for me at least), but it also flies in the face of HTTP verb standards. GET is generally expected to be used for repeatable requests with no side effects. POST is generally expected to be used for requests that create a resource or otherwise perform tasks that have side effects. A version check should not, in my opinion, have side effects since the only reason for doing such a thing is to check the damn version in the first place.

An even more egregious problem in this particular case is the lack of any post data. Seriously, look! Content-Length: 0. Nothing. Nada. Zero. Why would you do this? Do I have to write some kind of script just to handle a stupid version check that does nothing but look here, examine the number, and say “Oh, hey, guess I better not update. It’s still the same!”

No. Or rather, Hell, no.

I hate to abuse nginx’s config for something of this nature, but since I haven’t finished setting up a distribution framework for these plugins, all I wanted to do was push some static files indicating the current version somewhere. So, after a cursory search, here’s what I stumbled on:

error_page 405 =200 $uri;

If nginx would otherwise return a “405 Method Not Allowed,” we redirect it to the URL we were going to retrieve anyway, and return that. It’s a horrible workaround, and there’s probably better, but I shouldn’t have to do this in the first place.

I suppose I could understand it if you were passing along some sort of token with the version check, perhaps for client identification or similar, but in this case it’s just used to determine whether a plugin has outstanding updates ready. Let’s GET real, people. Stop abusing HTTP verbs.

Oh, and that tag (yes, you know which one) is an inside joke. I’m not allowed to repeat it in mixed company.

No comments.
***

Porting from PyQt to PySide

I recently put together a small project in PyQt, but the license is something that makes me a bit unsettled. I wanted to release the sources under the MIT or BSD licenses, but the requirements of the GPL prohibit carrying out the spirit of either of these, at least in terms of practicality (no downstream commercial use). Thus, I’ve been somewhat torn: I appreciate the GPL, but I more deeply appreciate the MIT/BSD license family for the greater extent of freedom they afford. I disagree that allowing commercial use somehow precludes “freedom” because developers have to eat–and for that matter, I’ll probably purchase a PyQt license at some point in the future, if needed, because it is actively maintained by a small business.

Since starting the project, I rediscovered PySide, an LGPL-licensed alternative to PyQt that acts mostly as a drop-in replacement for the latter. There are some differences that require minor changes to your sources before it can act as a complete replacement, but they’re so minimal as to be of no consequence. Of the ones that I’ve encountered, I’ll document them here since it may not be immediately obvious (although some of the easier cases are).

Signal() and Slot() declarations

PySide’s signal and slot declarations are identical (more or less) to PyQt’s with the exception that they lack a “pyqt” prefix. Thus, the following code:

class MyDialog (QtGui.Dialog):
    progressFinished = QtCore.pyqtSignal()

Must be rewritten as:

class MyDialog (QtGui.Dialog):
    progressFinished = QtCore.Signal()

Likewise, @QtCore.pyqtSlot() decorators are supported by PySide but must be rewritten as @QtCore.Slot(). To this extent, PySide eliminates a handful of unnecessary characters and provides more obvious translation from C++.

setCheckState doesn’t accept integers

PyQt is somewhat more forgiving when passed Python data types and will usually attempt to do the right thing. PySide expects the bindings to follow their C++ cousins more or less exactly, including setting a checkbox’s state via setCheckState(). Thus, checkbox states cannot simply be assigned with the integers 0, 1, or 2 as is possible in PyQt. Instead, the developer must use QtCore.Qt.CheckState.Checked, QtCore.Qt.CheckState.PartiallyChecked, or QtCore.Qt.CheckState.Unchecked. This code does not appear to be compatible between PyQt and PySide.

Python data types may behave unexpectedly in signals

Similar to CheckState, signal arguments are not automatically wrapped (or converted) from Python data types to C++ data types. In one particular case I encountered, larger integers may generate “overflow” errors when exceeding the boundaries of a signed 32-bit integer (2.1 billion, or thereabouts). The only solution in this case is to declare the signal arguments as a Python object as per this example. Other incompatibilities may exist in signal declarations, but this is the one that I encountered fairly early on. Lists and dictionaries are correctly wrapped and behave as expected.

Most other behaviors (threads included) are identical between PyQt and PySide, which is useful for more complicated applications that require offloading activities to one or more threads and error messages appear similar enough (for the most part). More importantly, PySide is a mature enough alternative to PyQt for most uses, although I’ve yet to try it under Windows. It appears binary-only installation is supported on that platform.

No comments.
***

Password Hashing and PHP Insanity

I won’t delve into too many details in this post–it distracts from the crux of my argument–but I will take a few detours along the way. If you’re reading this, you have access to Google; thus, further research is left as an exercise to the reader.

Firstly, I want to make a point that’s been grating on me for a while. It’s 2014, and there are still some of you out there who insist on using MD5 (or some flavor of SHA) for password hashing. This is absolute insanity. Worse, this isn’t really a matter of insistence as much as it seems to be the fault of two things: Terrible tutorials that perpetuate a horribly broken practice followed by blissful developer’s ignorance. Yes, I know–you’re not a security expert–but neither am I! And I still know this isn’t something you should be doing. If someone as stupid as me can figure it out, then what the heck is going on out there? Burying your head in the sand doesn’t magically make fast hashing algorithms any better for password security.

Second, and most depressing, is the number of commercial products that have never used anything other than MD5 hashes for password storage and many continue using this archaic practice. vBulletin 3 (all versions), vBulletin 4 (all versions), and IP.Board up to at least version 3.4 (probably later) use some variation of salts plus MD5 to store password hashes. For all I know, it’s possible newer versions of the aforementioned software still do this for backwards compatibility. This means that if you’ve registered on almost any major forum on the Internet, there’s a very strong likelihood that if the site is compromised, your password is going to be discovered almost immediately. How immediate is “immediately?” Well, considering that as of 2012, a single Radeon GPU could generate something on the order of 8 billion MD5 hashes per second (assuming I read that correctly), most “strong” passwords are unlikely to survive a week or two. After a month, it’s unlikely any but the most absolutely convoluted password will be discovered. Even if your password is discovered, you probably won’t even know, because disclosure assumes the forum operator 1) knows they were compromised and 2) is forthcoming about the breach. Realistically, the best bet is to assume that any password you’ve ever entered on a message board has been compromised the moment you’ve entered it (they’re probably not using SSL anyway) and never reuse that password (using something like KeePass or KeePassX helps).

There are, of course, some exceptions to this rule. It’s my understanding that after the Ubuntu Forums were compromised some time ago they switched their authentication services to a home grown solution, so chances are your password is safer there than on traditional vBulletin message boards. Either way, the point remains.

The state of commercial PHP software is maddeningly pathetic. But there is a silver lining. XenForo, for instance, not only makes use of a well-tested popular framework (the Zend Framework to be specific), but they also use bcrypt for password hashing. This means that optimistically, there is hope we’ll eventually be rid of poor practices. (Good luck convincing the gaggle of misguided bloggers who still suggest MD5 + salts–it doesn’t matter how convincing they are, they’re wrong.) WordPress, phpBB, and many other open source packages also rely on bcrypt internally, so the inertial switcheroo is already nudging the pendulum in the right direction.

That said, there’s still one substantial problem with the ecosystem–well, make that two–PHP’s bcrypt support isn’t really feature complete in versions below PHP 5.5 and the use of Composer is still not as widespread as I’d like, thus limiting the use of libraries like password-compat to those of us who actually know about it (and it doesn’t work on versions below PHP 5.3.7). Thus, in versions of PHP less than 5.5 where the password_hash() function is unavailable, developers have to essentially roll their own salting algorithm. And we can easily guess where that leads us.

If you’re not already depressed enough, that’s okay. Anthony Ferrera wrote an interesting article a year and a half ago about how easy it is to screw up bcrypt implementations. When I first read the article and saw that he mentioned generating your own salts, I thought he was insane, because I knew that bcrypt implementations include salt generation as per the OpenBSD spec. Right? …right?

As it turns out, that’s not the case. In PHP versions less than 5.5, it’s up to you, the developer to supply the salt to crypt() so it can do the right thing. Worse, since almost no one ever bothers actually reading the spec (much less using one of several libraries already out there that implement it), there’s a non-zero probability that at least a half dozen implementations of bcrypt are either completely flawed (like this one–it uses microtime() piped into SHA1 for the salt–no, I’m not even joking) or so idiomatically incompatible with the official standards as to make the entire charade resemble a poorly organized three-ring circus (minus a few lions). And by “idiomatically incompatible,” I mean to say that such implementations probably work, and they might even validate with implementations that do follow the standard, but they are so counter to the intent and spirit of the standard that they may as well not even bother pretending to implement a bcrypt salt. Fortunately, there’s at least one GitHub gist that brings some sanity to the table, but it’s sadly not the only one discoverable via Google.

The fundamental problem with older versions of PHP is that the only way to follow the standard yourself is to either read from /dev/urandom (which might not be possible depending on the configuration of open_basedir) or use openssl_random_pseudo_bytes() and hope that a strong source of entropy is in use behind the scenes. For instance:

<?php
// We'll see if we're using a strong implementation behind the scenes.
$strong = false;
 
// Generate 128 bits of entropy and base64 encode it. "+" characters will invalidate
// the salt causing it to output the error/invalid salt indicator "*0".
$salt = rtrim(strtr(base64_encode(openssl_random_pseudo_bytes(128 / 8, $strong)), '+', '.'), '==');
 
// Generates something like "$2y$10$qK8pFSIPOnEqIESCzkMc7uVldEi/zb2bgiyUSB2kc8iIcTlAKTO5K"
crypt('This is a password.', '$2y$10$'.$salt);
 
// Complain.
if (!$strong) {
    echo 'Oh dear, the source of randomness might be a bit light-headed. Maybe it should sit down.'
}

Now, considering that most people writing code just want to get something working, it’s unlikely that they’ll do sufficient research to understand the limits of their tools, and even if they do understand the limits, they’re likely to make a mistake. Thus, the oft-repeated advice surfaces to chide anyone with such intent: “[use] a standard library.” Don’t roll your own. You’ll probably just screw it up.

Curiously, what started all this nonsense was when I searched earlier to determine what the maximum number of characters is that bcrypt will happily consume before discarding the rest. Although the spec suggests 72, it turns out that the question itself is a whole lot more complicated. Some implementations handle 72. Some assume the 72nd character is a null byte so they only do 71. Others, maybe 55. Or was that 56? Heck, maybe no one really knows. Or maybe some of these implementations will die off. So what do you do?

Before the thought crosses your mind, don’t even think about getting around the limit using a hashing algorithm. If a user enters a lengthy passphrase and you funnel their input through MD5 (for the sake of argument), you’ve probably stripped a substantial amount of entropy from their input. Perhaps that’s not a big deal (bcrypt is intentionally slow after all), but it introduces a potential source of incompatibilities with other implementations beyond those mentioned above, to say nothing of MD5’s well-known collisions, and who’s to say you’ll remember a year from now (or more) what changes you made to the password implementation when you’re suddenly tasked to add a new service or update the old one. Unless you’re migrating an old system and absolutely need compatibility with older software, you’re probably better off just resetting everyone’s passwords, sending an email explaining the situation (or displaying a notice at login), and using bcrypt behind the scenes.

Another plus side in all of this is that the PHP 5.5+ implementation of bcrypt appears to follow the standard. This means that a bcrypt password generated by PHP is interchangeable with one generated using Python (for instance). More importantly, the random salt generated by password_hash() is created by using /dev/urandom behind the scenes (for better or worse), and the appropriate system calls also appear to be made under Windows. Thus, upgrading to the latest PHP distribution is ideal. But you should be doing that anyway, shouldn’t you?

Now, I’ve already written about twice as much on the topic as I originally intended, so it’s probably time I cut this short. Heck, I wrote so much that I even forgot to proofread the entire thing, and here I am finally posting it a month later (!).

The take away from this is that you should be aware that everyone is probably doing password management wrong. It doesn’t even matter that the “bcrypt bandwagon” left the station two years ago, because there are thousands of legacy applications that are still alive and well with each one effectively an exploit away from having every single one of their user’s passwords released into the wild. The only thing you can do is to implement something stronger and hope for the best (but try not to invent it yourself). Otherwise, you may as well treat most of your passwords as compromised and don’t reuse them.

Can’t use bcrypt? Try PBKDF2 instead. It’s good enough for Apple’s iOS. Just please, stop (ab)using MD5.

No comments.
***