Musing about… Prepared Statements

A spurt of curiosity this evening–more specifically, one of those circumstances we each have from time to time wherein a handful of unrelated thoughts flutter about the conscious mind like a pair of butterflies flitting from flower to flower–consumed me sufficiently that I decided to do a brief Google search on prepared statements. I’m unsure where such a motive originated, but I’m fairly convinced that it was at least tangentially related to some misinformation I’ve heard of late related to web programming advice and also possibly due to my surprise that few commercial PHP bulletin board packages actually use prepared statements.

Before I begin, let’s consider for a moment that last and most disconcerting statement: Few commercial PHP forums use prepared statements. To the uninitiated, this might seem to be a matter of nick-picking unimportant to the real world. To the rest of you, it may come as a sad commentary on the state of modern programming and commercial software (perhaps, fittingly, as a commentary on the average run-of-the-mill PHP programmer). Prepared statements certainly aren’t new, and while they’ve been a part of PHP for a number of years now, it’s infuriating that they hardly see common use.

PHP first shipped PDO with PHP 5.1 (available as a PECL extension for PHP 5.0, circa 2004-2005). Intriguingly, for systems that don’t provide PDO support (or the appropriate drivers for PDO), the MySQLi and PostgreSQL functions and classes have provided prepared statements for quite some time, and the SQLite 3 drivers have provided a prepare() method since PHP 5.3. Commercial bulletin boards, like vBulletin and IPB, have seen many revisions in the years since, and several free/open source packages including phpBB have been part of similarly major overhauls. Yet the overwhelming majority of them still make no use of prepared statements. Humorously, as of this writing, vBulletin does provide a misleadingly-named sql_prepare method in its database class, but it doesn’t emulate prepared statements–it simply provides an escape wrapper with data type introspection and casting.

PDO has been available for nearly 8 years and many RDBMS drivers for PHP have offered prepared statements for at least that long (longer in the case of PostgreSQL if memory serves correctly). Yet every year or two, new major versions of popular PHP message boards are released, and every major release sees the same legacy database code under the hood. Perhaps it’s intentional. Perhaps the developers still want to support PHP 4.x in spite of the fact that it went EOL in 2008. Perhaps they just don’t know any better. Who knows!

Why Prepared Statements?

A prepared statement or parameterized statement, as it is occasionally known by in DBA parlance, is a specially-formatted SQL string that utilizes placeholders, either question marks (?), special named parameters (such as “:name”), or other database-specific strings, to indicate to the database or the driver where data is to be inserted. This has the benefit that, in theory at least, any data managed by a prepared statement is unlikely to serve as a vector for SQL injection attacks. The reason this works is because most drivers dispatch the prepared statement and its data separately on the wire and process them independently providing a certain degree of isolation. But wait, there’s more! Because of the implementation nature of prepared statements on most platforms, the query planner can often optimize and partially compile the statement such that, if it runs again, much of the legwork has already been completed and the query can run faster. Software like forums or blogs often execute the same query multiple times–with different data–so one might think it would be a natural fit. If it’s such a good thing, why do so many popular packages forgo such a benefit?

While I can’t answer for many developers, I think I know what at least part of the answer might be. First, for enormous code bases like vBulletin (and phpBB to a lesser extent), virtually no effort is made to separate the application logic from the underlying model. I’ll be fair in my distinction: The presentation layer is thankfully separated from the mess in the form of templates, but the remaining code is a bowl of spaghetti not unlike that of many of the very first PHP applications (and Perl!) that first graced the Internet over a decade ago. Because the model (and, by extension, the SQL) is so deeply entrenched in the functional logic of the application, reworking it to use prepared statements–and consider, also, that many of these queries are generated programmatically–would be a tremendous undertaking of many man-hours. Cleaning up the code properly such that it is more of a structurally sound framework (think MVC) is most certainly out of the picture. It isn’t impossible, of course, but when you consider that some functions in many of these software packages have persisted since the dawn of time, such refactoring becomes the thing of fairy tales.

To illustrate some of my displeasure, vBulletin version 4.2 still provides an iif function which is little more than a wrapper for the ternary operator (?:) in PHP. The ternary operator has been around since at least PHP4, yet there it is, in all its glory, a legacy function still available from the early days of PHP3 when such a beast didn’t exist!

One might think that it would simply be a matter of adding some logging code to old function calls, tracing the source that called them, and then reworking the culprit code to use built in language features. It might even take less than an afternoon.

The Drawbacks Programmer Mistakes

While prepared statements (parameterized queries for those of you who are embarrassingly excited by more elaborate verbiage) aren’t a panacea (I did it again) for all things SQL injection-like, they’re a good mitigation strategy, but it’s important to use them with caution. As Jason Lam states on the ISC Diary, “I still remember 4-5 years ago when SQL injection just started to become popular, the common mitigation suggested [was] to use prepared statement [sic] as if [they were] a magic bullet. As we [now] understand the SQL injection problem better, we realize that even prepared statement can be vulnerable to SQL injection as well.”

Well, yeah. This is where I smack my forehead. Maybe I’m being overly critical as I re-read a post from 5 years ago, because I’ve had the tremendously good fortune of witnessing some magnificently terrible code in my time as a web application developer.

Mr. Lam goes on to explain the insertion of unchecked user input, but I can’t shake the feeling that there is an implicit overtone in the article that it is somehow the fault of prepared statements. Perhaps more accurately, the article is faulting most of us for having championed prepared statements as a welcome solution to a very common and widespread problem. Realistically, though, it’s not an issue with prepared statements–they work just fine. It’s an issue with developers inappropriately using the tools at their disposal and doing so in a manner that simply transfers the vulnerability from query() to prepare() by forgetting to properly manage incoming data. Though, I should say that I’m inclined to suggest that programmatically assembling a prepared statement is somewhat counter-productive. More on this later.

Ironically, while doing some research for this article, I ran across a couple of posts on Stack Overflow that presented this problem of unchecked user input as one of the primary drawbacks of prepared statements. Really? Drawbacks? If you’re not using named parameters or placeholders for your query data, you’re probably not using prepared statements correctly! But drawbacks? Gee, maybe we were a little too vigilant in telling people to use prepared statements–so much so that they did a find/replace for query and swapped it with prepare. (I’m being facetious; so, to head off any comments to the contrary, it’s not at all possible to simply swap some text, because prepared statements do require a little more work.)

The problem I have with labeling unchecked user input as a drawback of prepared statements is that it is no longer a “real” prepared statement whenever such data is concatenated with the resulting query. Yes, it is still a prepared statement, insofar as calling prepare() on the driver’s end, but it’s no longer being used like a prepared statement. Here’s a hint to new developers, particularly PHP developers since a huge percentage of them are guilty of doing stupid things like this: Never concatenate unchecked input in any query–prepared or otherwise. If you’re using a prepared statement, use it like a damned prepared statement. The moment you start piping data into the query string itself, it’s no longer going to have the benefits of a prepared statement. (I’ll give you a special exception if you’re using LIMIT and concatenating integers with your queries since not all of you may be running MySQL 5.0.7 or later.)

Will the Real Prepared Statement Please Step Forward?

In my mind, and trust me, it’s a very strange place in here, a prepared statement is one that may contain parameters and is “prepared” ahead of time for reuse (that is, compiled) by the driver or the RDBMS (usually the RDBMS). Nothing more, nothing less. The instant some unfiltered data is slapped on to the end of the query, it’s no longer a pure prepared statement; instead, it becomes a mistake. Again: Prepared statements are parameterized queries that are usually compiled by the backend for a little extra speed. A query can contain anything else that the programmer adds into it, but fundamentally, a prepared statement is something that dictates a very specific structure. It certainly cannot overcome the mistakes of a naive developer who, believing that a prepared statement will magically fix all of their (singular they, sorry linguistic purists) security-related woes, use such a tool in addition to dangerous techniques like concatenating unchecked input. Another way to look at it is thus: If prepared statements are prepared (that is, compiled) by the database for reuse, and the developer is concatenating a dynamic value to the statement, the entire benefit of preparing (compiling) that statement is immediately lost, because the RDBMS has to re-compile the statement every single time it’s sent along the wire. Please, don’t do this.

Of course, there may be reasons not to use prepared statements all the time. For one, prepared statements in MySQL versions prior to 5.1 can no longer be managed by the query cache which may impact performance (High Performance MySQL, 2nd ed., p229). DBMSs that don’t support prepared statements can, in PDO at least, can have them emulated by the PDO driver at the cost of some pre-processing performance, and using older PHP functions like the popular-but-now-deprecated mysql_* ones just outright don’t support anything but basic queries (they also don’t use the binary interface, making them somewhat slower). If you’re using only a single query with absolutely no intention of reusing it, prepared statements may incur some overhead since the query must be compiled. Furthermore, for MySQL at least, if you’re not using stored procedures, the database has no way to share compiled prepared statements among multiple connections. Yet, while a prepared statement is no substitute for caution–particularly with programmatically-generated queries–it is a useful tool in the developer’s arsenal to protect against attacks like SQL injection. Plus, if you make it a habit to use PDO (you should), not only do you get emulated prepared statements for databases that don’t support them, you also get to use the modern MySQL APIs under the hood and some consistency, which says a lot in the PHP world.

Tangentially, this is also why it boggles my mind that many sites (banks, news agencies, airlines, and even some game companies) limit what characters the user can enter for their password, and how so many companies with an online presence often have draconian limits of less than 16 characters, inclusive. Seriously: If you’re correctly storing a secure hash of the password (HMAC, bcrypt, scrypt, or at least SHA256 or similar), you don’t need to store the password directly, nor does it matter if the password is 5 or 500 characters. It’s going to be comprised of a fixed length of some limited set of ASCII characters representing hexadecimal numbers which can be stored without much fuss. The 1990s were over two decades away. I think it’s time we stopped writing code like Y2K is still a viable problem.

Also, let’s start using prepared statements a little more often.

1 comment.
***

The Arch Linux Rant – Part Two

As I mentioned early on in part one (you may read it here), comments like this are somewhat frequent on the “newbie” forum over at the Arch Linux forums, although the one I linked is probably the mildest of the lot. While they follow the same motif, they range from individuals voicing veiled frustration over the fact that Arch doesn’t seem to “work” like their favorite distro, that it requires too much effort, or that Debian/Ubuntu/<insert distro here> is so much easier (for some value of easy). Fortunately, the majority of these posts are filtered out quickly by the moderators and dealt with accordingly (see the “Topics Going Nowhere” forum for samples). It’s one of the reasons I love the Arch community–the wheat is separated from the chaff early on, and consequently, most of the threads that live beyond a day or two tend to be useful, interesting, or informative.

Now, fair warning: I’ll be illustrating my point with a fair amount of allegory. There are also a few rather broad generalizations ahead, and I’m aware of this. I also don’t address related issues like self-sustenance or the benefits of having broad, practical knowledge (I’ve always wanted to make my own cheese, for instance). This is intentional.

So, let’s begin.

The problem, fundamentally, is not one that’s endemic to any one Linux distro in particular. It’s actually a consequence of society coddling the fools and bringing up the helpless. Before you feel offended, let me clarify that I don’t mean to sound brash. It isn’t the fault of those who are helpless per se. An efficient, productive society creates specialized niches, and sooner or later there’s a threshold that’s crossed where it’s no longer possible for any one individual to know everything they need to know in order to function without outsourcing some of their daily activities.

I realize that the word “outsourcing” has developed exceedingly negative connotations recently, particularly in the US, but it’s the only term that fits. Outsourcing isn’t inherently wrong, nor is it evil. In fact, you might be surprised to learn that you do it all the time. When you go to the grocery store, you’re buying products that have essentially been outsourced for you: Rather than baking your bread, you outsource the baking to the store and exchange some currency for the finished product. Rather than farming your own wheat or milling your own flour, you purchased bagged flour at the store. An efficient society is one where the individual can focus on other tasks (like programming) instead of being overly concerned about day to day needs (such as growing food). An efficient society therefore approaches a point where individual specialization is such that we’re all “helpless” to a point. For example, I’m not particularly mechanically inclined and wouldn’t know where to begin to look to fix a car, but I have family and friends who often look to me for help with technical matters (or building their own systems). We outsource to each other. Put another way, you and I outsource our issues in areas where we lack the expertise or the domain specific knowledge to those who have it.

As I alluded to earlier, this sort of helplessness isn’t a bad thing. It’s the hallmark of a productive, efficient society. However, it’s also important to know and understand these individual limitations before embarking into areas where one lacks the required knowledge. Ignorance is a refusal to learn; arrogance is a refusal to listen. Unfortunately, ignorance and arrogance often play a part in some of the discussions I’ve seen, and it’s all too common that the two traipse around together causing mischief where there otherwise should be none.

The problem then is that the efficiency of society–of outsourcing–has created a clime where a certain subset of the population accumulates the mistaken belief that their problems are the fault of others who are unwilling to help them. They fail to understand the difference between outsourcing a product, such as bread, and voluntary efforts like Arch Linux which require the consumer to be his or her own navigator in seas uncharted. Perhaps the most sinister thing about this lack of understanding is that it’s not primarily cognitive–the consumer in this scenario fully understands that open source projects are largely staffed by volunteers–but it is a form of devious dissonance that leads them to behave as though they’re still outsourcing. It therefore expresses itself as a deep seated inability to make the correlation between F/OSS (Free/Open Source Software) and volunteer labor versus outsourcing and paid-for products. I won’t cover it further here, but if you’re interested in a much more lengthy discussion along these lines, it would behoove you to read the excellent article Linux is not Windows.

I should point out that it’s one thing to say “Help! I don’t have a clue what I’m doing! Can someone point me in the right direction?” but it’s another thing entirely to say “Help! I can’t get my favorite feature to work. It worked fine on my previous platform. Can yours not do this too?” The former expresses ignorance combined with a willingness to learn–this is a good thing, because it’s easily correctable. The latter combines both ignorance and arrogance, culminating in a distinct unwillingness to do research while placing the blame on those who would otherwise offer their help.

But I digress.

To continue: Because of the efficiency and specialization that our society encourages–another good thing–there are some individuals who therefore think that their problem is not their own. The “problem ownership” is then shifted, in their minds, to the people trying to help. This is exacerbated by those who refuse to do any legwork on their own, and it is this personality type that often leaves with unresolved problems and a certain level of anger.

What this means for Arch but not limited to Arch is that there’s always going to be some small number of people who aren’t necessarily willing to help themselves. The clueful ones may eventually acknowledge the error of their ways, apologize, and possibly redeem themselves after some embarrassment. Others are beyond redemption, and when they find their repeated inquiries for help are no longer wanted on the Arch forums, they’ll eventually become a bother to someone else.

However, I think that at least a small part of the problem stems from the evangelism we Arch users periodically exhibit. (Yes, I’m guilty of this.) Because we tout the benefits of rolling-release distros and the simplicity of Arch often without the appropriate warnings attached, newbie users hear these and immediately develop an unrealistic ideal of what Arch is (and can never be) and then project that into their pleas for help. On at least two occasions I’ve seen this develop, and the easiest way to spot these types are from remarks such as “Arch is supposed to be the best rolling-release distro, but I’ve never had these problems with ${other_distro} before!”

The best solution, of course, is to amend our evangelism with warnings like “Arch isn’t for everybody,” but I doubt that would work. We’re all aware that most people gloss over warnings (myself included) whenever a specific threshold of positive remarks is reached. If one’s mind is made up that the road is paved in gold, little attention will be paid to the sign that reads “there be dragons!”

The next best solution is to educate in addition to evangelizing. If you know someone who’s interested in Arch, point them to the forums and the wiki, but be sure to emphasize that Arch is a do-it-yourself distro. Educate interested souls on the merits of problem solving and figuring things out. (Bonus points if you link them to the late Richard Feynman’s book The Pleasure of Finding Things Out as a tongue-in-cheek gesture.) If your friend is someone who isn’t particularly interested in or adept at solving problems, Arch is very likely not the best match. That’s not to say they may not benefit from learning Arch, but you’ll likely save them (and yourself) from some frustration in the event of mismatched expectations.

I really like Arch Linux, and I’d love it if everyone I knew used it. I also think that’s an unrealistic expectation, because I know that not everyone finds enjoyment from the same things I do. If you’re considering using Arch, you need to be aware that you’re going to run into problems. The difficulty of the problems you’re likely to encounter will be determined by your relative skill; the more skilled you are, the easier a potential problem will be to solve. There will also come a time when an update may break the system, and you must be ready to spend an hour or more without access to a graphical environment (shell only). It also helps to familiarize yourself with chrooting Arch in the event you have to rescue your system from a live CD. You must also read the front page news articles prior to an update process, because important information about potentially breaking changes is posted there (if applicable). If this sounds like too much work, you may have to concede defeat. Arch may not be for you.

No comments.
***

The Arch Linux Rant

This post is part one in a two part series (part two can be read here). In this post, I’ll explore some of my thoughts and feelings about Arch from a fairly high level vantage point–consider it a 50,000′ review. In the next post, I’ll explain why I feel comments like this one demonstrate a certain degree of na├»vety, although it’s by far one of the more benign ones I’ve encountered in my short time partaking in the Arch community.

For those of you who know me or, at the very least, read my (mostly) pointless and fairly infrequent rants, you’ll probably recall my decision to leave Gentoo. Some eight or nine months later, I changed my mind and stuck with it for a while longer. That’s no longer the case, and as of January this year I permanently switched over to Arch Linux. I’ve been holding off on posting this rant primarily to take a “wait and see” approach to determine how well I’d do without Gentoo.

The short version is thus: I haven’t missed Gentoo one bit. I started about a year ago (maybe less), running Arch in a VM, tweaking settings, playing around with KDE 4.7-something, and spent quite some time familiarizing myself with the system. Coming from Gentoo and FreeBSD, the transition wasn’t that bad–I’m used to taking a greater hands-on approach with my systems than most people might be comfortable with–so ultimately, Arch felt like a natural fit.

Then I took the plunge.

For the interested, I should share a little bit of history. My initial foray into the world of Unix and Unix-like operating systems began around 2000ish (actually earlier, but it wasn’t until then when I started tinkering with them for my own purposes) with OpenBSD. I later transitioned to FreeBSD for a variety of reasons–mostly the improved performance and greater compatibility (at that time) with other software. It wasn’t until quite some time after that when I began using the ports collection, and prior to then I made a habit of configuring and compiling most of the software I used by hand. As my software library began to increase, so too did the amount of time needed to invest into updating it. I eventually began looking for other solutions, although I kept a FreeBSD system at home for a number of years thereafter.

Sometime later, around late 2003 or early 2004 (possibly as late as 2005), I began experimenting with Gentoo. While I tried other Linux distros, I found most of them too alien in contrast with my beloved *BSDs. Their package managers were strange and sometimes convoluted, and I had little idea how to properly install custom configured software as I was prone to doing. The BSD way worked, but I knew it wasn’t optimal since /usr/local seemed to be mostly unused or remained distro-specific as to its preferred usage. Gentoo was a reasonable fit: Its well-documented nature, a ports-like collection (portage), and the tendency to keep mostly out of the user’s way beckoned me to take a closer look. Better yet, though the system was compiled mostly from scratch, configuring individual options and customizing packages was simple and very BSD-like.

I’ll keep this as short as I can, but I’ll just say this: I ran Gentoo on my desktop and as a file server for a number of years, probably from 2005-2006 until just last year (2011). Indeed, in 2006, I made the switch to use Gentoo almost exclusively, even while I was going to university. Later that same year, my FreeBSD install on my file server was gone, victim to the spread of Gentoo. Don’t get me wrong: I still love FreeBSD–and Gentoo, too–but for home use FreeBSD didn’t seem cut out for what I wanted to do. Granted, I’d still run it in circumstances where I needed a stable, long-running platform as a web service or similar, but at that time, Ports often lagged behind Gentoo and sometimes the only way you could install software was to download and compile it yourself. Not that there’s anything wrong with that, of course. (N.B.: In 2010, the situation somewhat reversed: Ports was updated with newer software versions while Portage stagnated. Strange what a few years will do.)

So, the transition to Gentoo was made because everything I wanted was just an emerge away. Things remained like this for a long time, too, because Gentoo was at the bleeding edge, and the convenience of a rolling-release plus fairly simple package management was seductive. I think that’s why I had a hard time adopting other distros. Frequent, new releases of software allowed me to continue sampling what was coming down the turnpike and forced me to stay up to date on current software.

Gentoo was not without its warts, of course. I won’t go into detail here as I’ve linked to my thoughts on the matter earlier in this post–and they should be mostly up to date–but the problems that began as a slow trickle eventually turned into a torrent of disaster. Occasional dependency conflicts requiring manual intervention, lengthy X server compiles (and let’s not get into glibc or a desktop environment like KDE), the lack of package maintainers, and a slow, downward spiral into the abyss lead me to look for alternatives. I didn’t want to part with the rolling release model; indeed, once you’ve sampled such ambrosia, it’s difficult to whet your palate with meager release-based distros. Around this time (late 2010, early 2011), I was participating fairly regularly in various Slashdot discussions and made a similar offhanded remark about my concerns with Gentoo, and someone suggested trying Arch Linux. So I did.

The thing that struck me the most about Arch was its simplicity. There’s very little that it does for you. System configuration using the standard init scripts is exceptionally simple and straightforward (mostly in /etc/rc.conf), and with the exception of the package manager and build system, Arch makes Gentoo look like an impenetrable fortress of automation. Within a few days of using Arch, I fell in love with the brilliantly minimalistic design.

Truth be told, there’s a lot about Arch to love and enjoy. It does take some time to get the system configured to your liking, but the installation process is mostly painless and fairly simple (bugs in the installer notwithstanding). Using an AUR helper like yaourt isn’t necessary but strongly recommended. It also helps to have at least a passing understanding of how makepkg and pacman work and interact. Of course, it isn’t necessary to use extras like the Arch Build System (ABS) and the Arch User Repository (AUR), but to ignore them is to do yourself a great disservice: There’s so much software available on the AUR that it makes Ubuntu’s impressive repositories and widespread support (through .debs) seem almost anemic.

However, I don’t think it’s possible to use Arch for very long until you’re drawn in by the allure of creating your own PKGBUILDs. I think that’s at least part of what I like most about Arch. Unlike Gentoo’s ebuilds, PKGBUILDs are simple. Armed with just a basic understanding of sh-like syntax (e.g. bash), a moderately skilled user otherwise unfamiliar with PKGBUILDs could put together a custom one in under an hour. More advanced users could piece together PKGBUILDs customizing their favorite software in an evening or two (mostly dependent upon how many packages they want). But here’s the trick: If your PKGBUILD fails for whatever reason, it’s unlikely to break your system save for inappropriate dependencies or “conflicts” statements. Since pacman (Arch’s package manager) examines the contents of packages, including those generated by custom PKGBUILDs, and determines where each file in the archive is to be placed, it takes an exceptionally stupid habit (usually using -f for force) to circumvent pacman’s all-seeing eye.

To illustrate: When Arch released GIMP 2.8, I was disappointed by some of the new features. The solution? Create a GIMP 2.6 package! (You can download the PKGBUILD here, but be sure to grab both the gegl and babl PKGBUILDs, too.) Since the Arch project provides all the appropriate PKGBUILDs for the official repos, it’s easy to find the one you want, download it, and then modify it to your liking. You don’t even have to deal with the headaches caused by portage overlays.

The astute reader may have noticed that I haven’t yet addressed the issue of binary package updates with regards to the core system and official repositories. There’s a reason for that. While having the binary packages available is nice, and it’s certainly better than spending two days compiling KDE, it wasn’t my primary reason for switching. Binary releases did impact my decision–and don’t get me wrong, I love being able to download just the updates I need without compiling anything but the few AUR packages I have–but it’s one of those conveniences that’s nice to have. I won’t begrudge compiling the system (or kernel) from scratch, because I spent so many years with stagnant Gentoo systems compiling off and on over a week or two. It would seem a tad bit hypocritical of me to complain about compile times. Besides, if you swing that way, you can do that with Arch, too.

I don’t mean to wax optimistic about the glory of Arch. It’s not without the occasional sharp edge, and there’s a plethora of things that will snag the naive user who hasn’t yet developed the healthy habit of thoroughly reading documentation. I certainly wouldn’t recommend it to someone who’s new to Linux in general (they should use Ubuntu or a similar distribution), but for others who are at least familiar with the shell and want a do-it-yourself distro, Arch is certainly worth a look.

Interestingly, and unexpectedly, Arch has made using Linux fun again. I’m reminded of the days when I first started getting into Gentoo after understanding its quirks and loving every minute of it. The difference, though, is that I spend so much time doing things in Arch and with Arch that I almost don’t do much else (not even games!). My Windows install is probably lonely; it might receive a boot once every three weeks for the occasional update or game that I can’t otherwise get working under Wine, but even then, there’s so much more to explore with Arch.

No comments.
***