Annoyances: vBulletin 4 Template Hooks

I’ve been doing commercial software development for vBulletin 3.x (and, by extension, 4.x) off and on now for a couple of years. While there are some things that irritate the crap out of me about both of these products, vBulletin (both versions) have features that just aren’t found in other bulletin board packages. Admittedly, many of these exclusive features are provided by an extensive library of 3rd party software, but the point still stands–as much as I hate to admit it. Few other message boards have a plugin system that’s easy to develop for, and fewer still have the vast library of plugins available. phpBB doesn’t even come close. vBulletin still has its shortcomings for developers, but I’ll save my complaints for a later installment.

What I’m going to write about tonight is something that bit me, and I know it’s going to bite someone else out there: Template hooks have been the bane of my existence in vB4 for the majority of this weekend, and once you start adding a few yourself, you’ll grow to appreciate the manic schizophrenia that is the vBulletin 4.x template system in all its unadulterated glory. I hope to save you from the onset of severe insanity, so keep reading for my story and my solution.

Side note: You might also want to make this a summer project, because you’ll be bald by the time you’re finished, and I understand that bald heads get cold quite quickly. If you’re already bald, accept my apologies and tear something else out–like the upholstery stuffing in your desk chair. Don’t have a chair? Reach for the carpet. Don’t have carpet? Well, you’re on your own.

Template Hooks: They Work–But not When You Want Them To

I’ve written a couple of plugins that rely on the various forumhome_wgo_pos* template hooks for both vBulletin 3.x and 4.x. These hooks work perfectly for most use cases, regardless of when your plugin fires, and are almost foolproof. Don’t be lulled into a false sense of security, though. The moment you do anything unusual with template hooks in vBulletin 4.x, you’ll be bitten by the what-the-heck-happened-to-my-output surprise.

To reproduce the ailment that has been afflicting my sanity for the better part of this last Sunday, I direct you to a simple test:

  1. Create a new product, complete with its very own plugin.
  2. Set the plugin to fire on the global_start plugin hook.
  3. Add the following code to the plugin:
    $template_hook['footer_test_hook'] = '<b>Hi!</b>';
  4. Add the following code to your footer template:
    {vb:raw template_hook.footer_test_hook}
  5. Run it!

You should notice that you now have a nice, shiny string containing Hi! at the bottom of your page in the footer code. Now, let’s break it:

  1. Add a template, such as break_my_footer to your product XML (optional; you could use any other template if you like)
  2. Call this template from your plugin using something like:
    $tpl = vB_Template::create('mytemplate');

    Or, if you decided to use an existing (small) template:

    $tpl = vB_Template::create('option');
  3. Then modify your template hook code appropriately:
    $template_hook['footer_test_hook'] = htmlentities($tpl->render());
  4. Watch in horror as nothing appears in your footer.

Try as I might, I spent a good hour or two trawling various vBulletin support sites for answers. Rather than make a post somewhere and risk having one of their ill-tempered devs explain “Well, this is how it’s supposed to work, didn’t you use the search?” when the built in search generally sucks and Google doesn’t always pick up their help threads, I decided that this issue became personal. That is to say, this code insulted my mother, my father, my nonexistent siblings, and each of my ancestors going back 1,500 years.

After performing various blind tests I concluded that somehow the call to the vB_Template::create() factory method was effectively wiping the contents of $template_hook–or ignoring it, or purging it, or performing an exorcism on it with tremendous glee while I steamed with fury in front of my monitor. I then decided that I’d had enough, and so I searched for the footer template to determine where it was being called, prepared, and possibly rendered in the code. My hunch was that the footer was being generated separately from the forumhome cruft that so happily seemed to work no matter where I used it or what I did with it (and indeed it is generated separately). Yet my own template hook refused to work.

Then I came across this code in includes/class_bootstrap.php:

 $templater = vB_Template::create('footer');
                        $templater->register('admincpdir', $admincpdir);
                        $templater->register('ad_location', $ad_location);
                        $templater->register('cronimage', $cronimage);
                        $templater->register('languagechooserbits', $languagechooserbits);
                        $templater->register('modcpdir', $modcpdir);
                        $templater->register('quickchooserbits', $quickchooserbits);
                        $templater->register('template_hook', $template_hook);
                        $templater->register('facebook_footer', $facebook_footer);
                $footer = $templater->render();

Pay careful attention to the line $template->register('template_hook', $template_hook);. Clearly, the footer is processing the template hook here–so I thought to myself, perhaps there’s a nearby hook that I could attach my plugin to so I can guarantee I know that the content of $template_hook won’t be interfered with.

I scrolled up and found a hook that probably should have been fairly obvious to me from the start. But hey, it’s the weekend. What more can you expect?

($hook = vBulletinHook::fetch_hook('parse_templates')) ? eval($hook) : false;

Sheepishly, I changed my plugin to use the parse_templates plugin hook instead of global_start, and it worked! So the upshot is: If you’re going to try using custom template hooks and you discover that they won’t work the moment you load a template, try changing the plugin hook to parse_templates. It might just fix the problem.

Now, this was admittedly all my fault for not realizing that parse_templates may be the correct solution; I really should have examined the vBulletin sources more closely. Shame on me. In my defense, though, the vBulletin documentation is pretty poor, much of it is outdated, and even less of it focuses on issues specific to 4.x. However, I have one particular bone to pick: It’s puzzling to me that whatever is in $template_hook will work fine up until the moment you decide to call vB_Template::create(). There’s a comment under the create() method that indicates something to do with $template_hook and treating it as a special case for the purpose of various globals or some such, along with a reference to a bug tracker ID. I think that’s more coincidental than anything else, and certainly if I wanted to find out what was happening, I could run a trace with XDebug, but I’m not that desperate–or bored (yet). My guess is that, somehow, subsequent calls to vB_Template::create() clobber the contents of $template_hook by the time vBulletin gets around to rendering the footer; I may be wrong–I probably am–but this is an example of bizarre code suffering from manic schizophrenia.

Frankly, the vBulletin sources are so stupidly convoluted it’s a miracle the software works as well as it does. I’ll save that for another rant much later this week or next. In short, remember: If you’re toying with custom template hooks, you might just break your code. If you do, try changing the plugin you’re writing for template rendering purposes to hook into parse_templates. You’re almost guaranteed to have little to no interference with the contents of $template_hook and the parse_templates hook is nearest to the templates that are most likely to be affected.

Toodles for now. Expect to see a whiny rant soon!

3 comments.
***

Insecurity is Your Fault

Earlier this month, I stumbled across this article on Slashdot and my comments inflamed a few posters with an uncomfortable truth:

Insecurity is your fault.

No, really, it is. Now that the storm has mostly blown over and much of this has been forgotten (plus I’ve gotten some time to sit down and write!), I’d like to explain why.

First, though, a brief recap for those of you who may have missed out on all the excitement.

Memcached: Security Hole or Convenient Misconfiguration?

I won’t explain what Memcached is. If you’re curious, the Wikipedia article details its usage model, behavior, and other interesting tidbits for the curious. However, the important part is that Memcached provides absolutely no method or mechanism by which to authenticate attached clients. Neither does Memcached provide an interface to protect data integrity–that’s your job. The whole Slashdot meltdown occurred when some security researchers discovered that many well-known sites failed to protect their Memcached instances, exposing sensitive data to the outside world including passwords, password hashes, e-mail addresses, and other assorted data typically stored in a Memcached instance. The surprising discovery here isn’t so much that the problem existed but that it afflicted the sites of several well known names including PBS! Really, the system administrators of the sites showcased in this study should be ashamed of themselves. (Maybe; I’m sure many of them were too busy to notice the err of their ways. I’m also sure that a sizable chunk of them felt a little more uncomfortable puckering of anatomical places better left to the imagination than is otherwise healthy.)

I also hate to point out the obvious here, but a simple firewall would have circumvented the issue entirely. The authors of the article in question emphasized this point by appending in their conclusion several strings containing “FW”–and then some. It’s not like setting up a firewall is difficult either, so I’m not sure if this is equal parts ignorance, oversight, laziness, economics, and harsh schedules for the IT staff involved. I don’t think any of these would make for a particularly good excuse. After all, it doesn’t take long to set up ipfw, iptables, and kin. There’s also plenty of documentation and quick-start guides available just a Google search away.

Unfortunately, the Slashdot article was slightly sensationalist and came close to leading readers into believing that running Memcached was a security hole. If you were to take a few minutes (do so now) to browse through the comments, you’ll find that they fall into roughly two categories:

  • Security is the responsibility of the system administrator, network administrator, or whomever happens to be in charge of handling the servers, their configuration, and/or network topography.
  • Security is the responsibility of the developer to ensure that their application doesn’t ship in a potentially hazardous state such that the exposure of critical data could be made accidentally.

Which statement do you agree with?

Take your time. I’ll be here all day.

It’s Your Fault

If you think it’s the developer’s responsibility to ensure their software doesn’t ship in a potentially hazardous state, you’re wrong, and Memcached is a perfect example why the responsibility of service configuration rests squarely on your shoulders. Of course, there are many reasons why the developers are absolved of responsibility, and I’ll attempt to outline some of them here. Essentially, though, if you’re blaming Memcached for this particular deficit, you’re demonstrating great ignorance of the F/OSS ecosystem.

How dare I point out the obvious!

Package Maintainers Know Everything

Most open source projects, particularly those centered around popular operating system distributions like FreeBSD, Gentoo, or Debian often have specific individuals who, largely on a volunteer basis, maintain a specific software package or packages. In many projects, these maintainers operate rather loosely as a committee whose primary objectives are to: 1) make sure the software installs and runs correctly on their target platform(s) and 2) configure the package to ensure that it suits the specific goals of their distribution (or not, sometimes inaction is as much of a goal as action). This is why package maintainers know everything, and I certainly don’t mean it in a derogatory sense: These individuals, typically picked by the community or by merit, understand both the software they’re maintaining as well as the environment they’re maintaining it for. Package maintainers are an integral part of a particular distribution’s microculture, and they understand what defaults are likely to work best and what won’t. There’s also a certain degree of personal preferences involved but package inclusions and behaviors are almost exclusively influenced by the community as a whole.

When it comes to Memcached, Debian-based distributions tend to configure it to listen exclusively on localhost. FreeBSD and Gentoo both leave Memcached as it ships–not configured and not turned on by default. (Ubuntu enables Memcached as soon as you install it unless you’re running Ubuntu server, in which case it remains disabled.) Since the default–not configured–instructs Memcached to listen on all available interfaces, those distributions that happen to ship Memcached in its default state are therefore the same ones that will expose it to the Big Bad Interwebs the very instant it happens to be enabled. Sort of.

When Configurations… are Not

This is a minor point of contention, but I feel it’s worthwhile to mention because it bears some relevance to the topic of Memcached security. Plus, since several comments on Slashdot made reference to some magical Memcached configuration floating around somewhere in the sky, I confess that I feel this overwhelming desire to point out the obvious. It’s a lot like honking at the motorist in front of you whose blissful ignorance of the surrounding world is responsible for backing up traffic for miles while the unyielding stoplight changes from green to yellow to red and back again–you just can’t help but lay it on thick. I’m also convinced that some of you share this blissful dream-like trance with our hypothetical motorist, because I otherwise cannot fathom why you’d think we software developers are out to get you. (Maybe we are, but that’s another story.)

Okay, here’s the earth-shattering news: Memcached has no configuration file. That’s it. That’s the anticlimactic reality my last paragraph was building up to. I don’t believe I can hear you honking yet, so let me say that again.

Memcached has no configuration file. If it has one, it’s a lie. It’s a lot like that cake. The configuration file is a lie.

Several OS vendors (FreeBSD isn’t one of them–same for Gentoo) happen to ship with their particular Memcached package a file conveniently named “memcached.conf.” To most of us servermonkeys, it would appear that this file has something to do with Memcached–and we’d be right. It’s amazing how that works. Ah, but such assumptions only go so far. You see, “memcached.conf” is just a convenience wrapper. Memcached itself has no integral facilities to read external configuration files and relies instead on command line parameters. In fact, I would argue that vendors who provide a separate configuration file of the sort may be misleading impressionable individuals into believing that Memcached ships with its own configuration file, and that those someones who got caught with their firewall down happened to get the short end of the stick by starting up the service without actually checking that file.

Admittedly, it’s easy to be mislead. Here’s a sample of the Ubuntu (server) Memcached configuration file:

# memcached default config file
# 2003 - Jay Bonci

But, if you were to continue reading…

# This configuration file is read by the start-memcached script provided as
# part of the Debian GNU/Linux distribution.

Oh! Memcached doesn’t read it after all! Good thing I read passed the top two lines, otherwise I would’ve been convinced it shipped with Memcached!

Fortunately, both FreeBSD and Gentoo configure Memcached through their RC system and make it fairly clear that any configuration is done directly to the Memcached daemon. FreeBSD configures Memcached via /etc/rc.conf. Gentoo configures Memcached via /etc/conf.d/memcached. Both of these configurations therefore operate in Memcached’s default mode of listening on all available interfaces.

The question, then, becomes a matter of whether Memcached is insecure by default.

If a Cache Server Listens on all Interfaces in the Forest… is it really There?

Memcached gained its reputation as a light weight, fast, minimalistic memory-based caching server partially by not integrating high-latency features like authentication, data integrity checking, and encryption. If the end user’s environment requires any one of these features, it’s up to the end user to implement them on top of Memcached–and most likely, other solutions. (You can run Memcached over an encrypted connection using a variety of techniques, for example, but they require some effort on your behalf to implement.) Failure to secure a network-accessible service isn’t the fault of the developers of that package. After all, if the server were properly firewalled, it wouldn’t be network-accessible outside the network for which it was intended.

(This, of course, assumes that the attackers don’t have access to your internal network. Granted, if this were the case, I suspect Memcached would be the least of your concerns.)

The obvious point of this entire debate–specifically, the one on Slashdot–is that Memcached shouldn’t be shipped to listen on all available interfaces. Certainly there are great points on either side of this issue, but the fundamental truth is that Memcached’s default behavior isn’t up to us to decide. The developers decide what goes, and I’m sure they have a good reason for attaching to every interface under the sun. In this case, the obvious solution isn’t to argue with me about why it wasn’t your fault you just exposed 15,000 passwords to every 16 year old pimply hacker with an Internet connection. Nay, you should argue with the developers of Memcached and explain to them why their software should listen on localhost by default. Maybe they’ll even agree with you and release a new version that does exactly that. On the other hand, they may just give a hearty laugh in your direction and explain that such a drastic change in default behavior would very likely impact the bazillion or so installations of Memcached that exist out there on the Interwebs the next time some system administrator goes to upgrade.

Generally speaking, it’s better to piss off the people who haven’t done enough research to know how to actually use the software than it is to squirt a healthy dose of Tabasco sauce into the shorts of the very people who probably have several dozen large-scale installations each and enough disposable income to donate generously to your project.

Not surprisingly, that brings me to my next point.

If You can’t Configure that Package Securely… It’s Your Fault

I really hate to harp on this point, but since it seems that there’s so many individuals out there bellyaching over the simple fact that Memcached doesn’t listen exclusively on localhost by default, I really have to hammer it until I drive my point home (or break my hammer).

It’s your fault.

Don’t like it? Tough. Write that sentence 300 times on some college ruled notebook paper and get back to me in the morning. No cheating! I’ll be counting every single one of ’em, too.

The implication of my attitude–or attitude “problem,” depending on how you take this–is that configuring any software you install is your business. If you configure it incorrectly and lose your company a million dollars, it’s your fault. If you configure it incorrectly and drive traffic to your site, earning your company a million dollars, it’s your fault. Conversely, if I write a software package, hypothetically, that advertises itself as some must-have network-connected service that’ll make your servers hum, your teeth whiter, and maybe cure world hunger and you install it without bothering to look at the manual to find out that, holy bat nuggets it really is network-connected, you won’t find a lot of sympathy from me when the entire Internet discovers a few dozen things better left secret just because you couldn’t read a two paragraph instruction manual or secure your server. It’s not my fault that my hypothetical software might be used in an inadequately protected environment anymore than it is the Memcached developers’ fault that their server can be accessed remotely without authentication.

Now that We’ve Established Blame, What Next?

If you happened to read the article(s) the Slashdot summary linked to and the first thing that came to mind was “firewall!”, consider yourself dismissed. You’re free to go home, and you’re getting an A for the course. You needn’t trouble yourself with reading through my inane ramblings, because you probably know a lot more than I do. However, if you’re still fuming out the ears blaming the Memcached developers for defaults that clearly should have been better chosen, you’re staying after class.

I torqued off a number of posters on Slashdot when I uttered these words, so I’ll utter them again: It is your responsibility as a systems administrator to understand the security implications of installing any given software package. If it’s installed on your server–doubly so if you’re the one installing it–you damn well better know what it’s going to do when it starts running. If you don’t have a clue what might happen or how to configure it, then shame on you when someone sniffs critical data from a service you didn’t configure correctly. System administration isn’t intended to be easy. System administration isn’t opening a couple of menu-driven wizards, clicking “next,” and hoping for the best. System administration is about best, secure practices; vigilance; researching and reading; understanding, mitigating, and managing risks; understanding the software you’re using; understanding the operating system you’re using; understanding your network environment and topography; awareness; and always, always, always keeping in mind that anything connected to the Internet is at risk. Sounds like a lot of work, doesn’t it?

“But,” I hear you protest, “system administration is about taking care of the users, resetting their passwords when they forget, and making sure everything is hunky-dory when they pull up their TPS reports! You’re describing a consultant’s job.”

Nope, son, that’s the help desk. They’re the colleged-aged kids up front who daydream of murderous intentions every time someone calls up thinking capslock is cruise control for cool and just can’t seem to get their password right. Consultants are the guys you hire to design (or build) the system. System administrators (probably you) are the guys who are hired to understand the system, fix it when it breaks, and be on call when Godzilla comes gnawing through the company’s upstream. If you think system administration should be easy, there are places you can go to for help.

Why Does it Matter?

Part of the reason I like the minimalistic approach FreeBSD and Gentoo both take to configuring (and I use that term loosely) Memcached is because they make no assumptions–other than the defaults, of course. It doesn’t matter whether you’re running a Memcached instance for a farm of 50 webservers and need to set it up to listen on all interfaces–uh oh! defaults!–since the machine is only attached to a private network. It doesn’t matter whether you’re going to be setting Memcached to use 64 megs (the default!) of RAM or 16 gigs. What does matter is that you’ve looked at the configuration your package maintainer–the guy who knows everything–shipped with the software you installed to figure out what it’s going to do. Better yet, it helps to read the manual page. The world isn’t full of pop-out coloring books, and if you’re going to maintain systems for a living, you’re going to have to do some reading. Manpages are a fantastic place to start.

I’ve read the excuse on Slashdot more often than I care to admit that the reason Memcached should listen by default on localhost is because there are far more instances installed by mom and pop web sites that just expect to install it and have it working than there are large scale installations that listen on multiple network interfaces (well, 0.0.0.0:11211 or :::11211). Is that true? I don’t know. Maybe there’s a significant number of webserver admins out there who have heard about this Memcached business and know only enough to know they need to install it. Maybe there’s a lot of people who just run apt-get install memcached, emerge memcached, or whatever install script happens to float their collective fleet of rubber rafts because they like to watch a bunch of text fly across their screen. I have no idea.

I do know that there’s a huge problem with the assumption that most administrators installing Memcached aren’t aware of the security implications. Why? Simple:

  • Memcached is a semi-niche market solution for a large scale problem. By the time you start looking around for a solution to offloading some of your server load elsewhere, you’ve probably done enough research to know what Memcached is and why you need it.
  • If you’re administering a large site that needs Memcached (see previous point), you’re probably well aware of network security. You wouldn’t let the world connect to every port on your home computer just for fun, so why would you let them do that to your web server?
  • Most system administrators are competent, careful, methodical individuals who consider their server farm something like their own children. If we lesser beings so much as tried to touch their machines with our grubby mitts, they’d chase us out of the data center with a shotgun. This whole problem, and the article linked to by Slashdot, doesn’t mention these guys. Those same guys who spend sleepless nights making sure db-140.cluster.lax.internal-net.megacorp.com is running smoothly are the same guys who firewalled their Memcached servers and were not noticed in the report. They couldn’t be, because the report was about incorrectly configured servers. The world is full of surprises, and it may surprise you to learn that there are competent individuals out there.

The other excuse that drove me to the verge of insanity was the proposal that Memcached is a great object store that can increase the performance of a mom and pop webserver, so it’s just as likely to be installed on bobs-fancy-widgets.com as it is on a massive load-balanced site. Thus, since this target market likely doesn’t know the first thing about installing a webserver, Memcached should be set up safely by default. Fine–maybe this is true–but it also assumes that bobs-fancy-widgets.com isn’t firewalled. It should be, and if it’s not, then I’m afraid there’s not much any of us can do for him. Maybe Bob doesn’t know anything about firewalls. Maybe Bob is also receiving a gazillion SSH scans from China, too. Just wait ’til they find out his password is “password”.

I don’t have the data available that these security researchers do. Therefore, I have to base my assumptions on the sample data they released. The companies affected by this security oversight (it’s not a hole) weren’t small fly-by-night organizations. These were just about everything from trendy start-ups to major television network webservers! My guess? It was probably an honest mistake more often than not. Someone forgot to enable the firewall or configure Memcached. They got burned. It sucks, but it happens often enough to serve as a reminder to the rest of us.

Lesson Learned

There’s a valuable lesson to take away from this, and while I realize this essay is reaching almost epic proportions, it’s important enough to warrant just a few more sentences. Never make assumptions about your software’s configuration. If you’re running a business or being paid by a business to run their machines, you should always double-check (or triple-check) the configurations of software you’ve installed to ensure they’re going to run as you expect them to. Make certain to read the appropriate manpages for your software, too, because they often have valuable insight into what the software does or how to configure it. Search the Internet. Oh, and never forget–if you’re using a *nix-based system, there are tools to help you understand what’s running and what’s not. Use utilities like ps and netstat until you can read the output of these programs by heart. netstat is of particular importance; with it, you can understand at a glance what your system is reporting as listening on network interfaces. (-p helps, too, as does understanding similar non-Linux tools like FreeBSD’s sockstat.)

Remember: If some 16 year old hacker discovers the CEO’s password and sends out “Your Fired!” e-mails to the company, wipes the database server, and turns your web server into an archive that’d make the Pirate Bay blush, you’re the one who’s going to get that uncomfortable call at 3:00AM. The Memcached devs? Well, they’ll be sound asleep while you’re sweating bullets. After all, that’s what you’re paid for, right?

No comments.
***

Symfony 1.3/1.4 and Suhosin

I read about Symfony some time back when I was working on a project in CakePHP. Symfony struck my interests primarily because it uses Doctrine (by default) as its ORM, and I’ve used Doctrine in a couple of other projects. This weekend, I elected to give Symfony a try. After reading through the documentation–which I’ll usually do for a few hours before taking the final plunge–I was satisfied that Symfony would be a great framework to learn next.

Updated March 2nd, 2010. Click here to view my updated thoughts.

Following the initial guide was easy until I attempted to view the example application. Then I encountered this:

Fatal error: SUHOSIN – Use of preg_replace() with /e modifier is forbidden by configuration in /home/bshelton/phpapps/sf-test/lib/symfony/lib/response/sfWebResponse.class.php(409) : regexp code on line 409

Uh oh! This struck me as odd, considering Symfony prides itself on being a secure-by-default framework and it triggers a Suhosin warning from the stock install? Great. But don’t be too alarmed as the offending code isn’t a security hole (newlines were inserted by me to increase readability on the web):

return preg_replace('/\-(.)/e',
  "'-'.strtoupper('\\1')",
  strtr(ucfirst(strtolower($name)), '_', '-'));

The Fix

While the /e modifier does have the potential to execute PHP code, the offending line merely takes the first character immediately following a “-“ and uppercases it (the strtoupper call is the only eval’d code). However, rather than disable parts of Suhosin–which could be bad if third party code is included that also does something similar and requires auditing–I decided to fix the issue. According to the Suhosin docs, the fix is fairly simple. Simply change line 409 in symfony/lib/response/sfWebResponse.class.php to:

PHP 5.2 and below:

return preg_replace_callback('/\-(.)/',
        create_function('$matches', 'return \'-\'.strtoupper($matches[1]);'),
        strtr(ucfirst(strtolower($name)),
        '_', '-'));

PHP 5.3 and up:

return preg_replace_callback('/\-(.)/',
        function ($matches) { return '-'.strtoupper($matches[1]); },
        strtr(ucfirst(strtolower($name)),
        '_', '-'));

In this case, rather than evaluating PHP code as part of the replacement, a callback function is created that performs the same approximate test. Obviously, the solution for PHP 5.2 and below could be equally as dangerous, but if we’re careful, we can create a function exactly as we intended.

Performance Concerns

After I discovered this solution by browsing the PHP documentation, it occurred to me that callbacks in preg_replace_callback are slightly slower than performing a call to preg_replace without callbacks. I couldn’t recall precisely how slow, so I elected to perform a benchmark using several different ideas at tackling this particular problem. This section highlights these benchmarks along with each potential solution. Ironically, using preg_replace_callback is about 30% slower than performing character-by-character replacement.

The Methodology

The methodology I used to test various solutions consisted of the following steps:

  1. Generate a large data set to aid in characterizing performance differences and elevate measurable performance discrepancies above the noise floor
  2. Write each solution such that it consumes the data generated in step 1
  3. Compare the performance of each method against the stock Symfony solution

Since the stock Symfony code is designed to “normalize” (their words) headers generated and sent to the client, the source data is created by a script that picks a random number of characters (all lowercase) and joins them together using a dash (-) or an underscore (_) in order to emulate strings like “content-type” or “content_type.” The same data set is used for all subsequent tests. No more than one dash or underscore is used per string; although this is unlikely cause for much concern as most headers are unlikely to have more than one or two separators.

You can download the script that generates this data, along with the source data itself, in the archive toward the bottom of this post.

Note: These solutions were tested on PHP 5.2 only. You’ll likely discover differences when testing on other PHP versions.

The Results

Before we examine each solution, let’s take a look at the results:

Benchmark Plot

As you can tell from this chart, the Suhosin-recommended solution of using preg_replace_callback is the slowest. The stock Symfony code performs rather well but it still slower than simplified string replacement calls. We’ll examine each solution in the following section.

The Code

The code for each solution is provided below along with a brief description of what it does. The actual benchmarks (and benchmark data) is also provided. Remember that each of these solutions is modified to use our source data.

Solution 1: Stock Symfony Solution
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    preg_replace('/\-(.)/e', "'-'.strtoupper('\\1')", strtr(ucfirst(strtolower($s)), '_', '-'));
}

The stock Symfony solution to the normalization problem is to first convert the string to lowercase, capitalize the first character, and translate all underscores (_) into dashes (-). preg_replace is then run and evaluates the replacement, which converts the first character immediately following a dash to its uppercase equivalent.

Solution 2: preg_replace_callback
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    preg_replace_callback('/\-(.)/',
        create_function('$matches', 'return \'-\'.strtoupper($matches[1]);'),
        strtr(ucfirst(strtolower($name)),
        '_', '-'));
}

As hinted by the Suhosin documentation, preg_replace_callback is recommended over evaluating PCRE replacements. Unfortunately, this solution is also the slowest. In structure, it is most similar to the stock Symfony code but replaces the evaluated code with an anonymous function. I have not tested true anonymous as presented in PHP 5.3, however. Given that create_function likely has to create an additional interpreter instance in PHP 5.2, this solution might be slightly faster in later versions of PHP when using this code:

return preg_replace_callback('/\-(.)/',
        function ($matches) { return '-'.strtoupper($matches[1]); },
        strtr(ucfirst(strtolower($name)),
        '_', '-'));
Solution 3: Array and String Manipulation Only
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = strtr(strtolower($s), '_', '-');
    $tmp = explode('-', $s);
    foreach ($tmp as &$t) {
        $t = ucfirst($t);
    }
    $buf = implode('-', $tmp);
}

This solution was among the fastest but its performance remains within the margin of error and may therefore be tied with Solution 4. As with solutions 1 and 2, this solution translates all characters to lowercase and replaces all underscores (_) with dashes (-). However, unlike the previous solution, this one splits each header along the dash boundary, loops through the remaining items, converts them in-place using ucfirst, and then joins them together again with a dash.

This solution is not optimal, and I expected it to be among the slowest. I was surprised to discover that this solution performed faster than the others, and I initially assumed the overhead of preg_replace was due in no small part to the requirement of loading the PCRE engine. It is also likely that increasing the number of split points in the header (by way of more dashes) will likewise increase the amount of time this solution requires to run.

My presumed explanations for the performance of this code segment were challenged with the next test.

Solution 4: Two Replacements
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = strtr(ucfirst(strtolower($s)), '_', '-');
    preg_match('/\-./', $s, $matches);
    str_replace($matches[0], strtoupper($matches[0]), $s);
}

I confess that the file name for this test is slightly misleading, and I’ll correct it in the posted archive. The initial intention was to utilize two preg_replace statements, but I elected (at the last minute, no less), to use only one in conjunction with an str_replace. In this solution, a dash (-) followed by any single character is captured, capitalized, and then replaced into the final product. As with Solution 3, this is among the fastest of the 5 tested. This solution may also scale slightly better than Solution 3 for headers containing more than a single dash.

Note that the performance of this solution hints that the slightly slower behavior of solutions 1 and 2 are likely due to code evaluation rather than the overhead of loading PCRE.

Solution 5: Character-by-Character Replacement
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = ucfirst(strtolower($s));
    $buf = '';
    for ($i = 0; $i < strlen($s); $i++) {
        if ($s[$i-1] == '-')
            $buf .= strtoupper($s[$i]);
        else
            $buf .= $s[$i];
    }
}

I initially assumed this would be the slowest performing benchmark of the 5. I was surprised to discover that it is the second slowest. Indeed, this solution performed approximately as well compared to preg_replace_callback as the stock Symfony code did in contrast with this solution.

Conclusion

The Symfony stock code is quite fast but still appears to create instances (at least in sfWebResponse.class.php) where partial evaluation of PHP code is necessary. eval‘d code isn’t necessarily evil, and while it most certainly is a vector for exploitation, it is the manner in which code is evaluated that makes it dangerous. Regardless, I think it is appropriate to evaluate (I made a pun!) circumstances in which code itself is eval‘d and question whether such calls are necessary. While these benchmarks are admittedly highly artificial, it is fairly obvious that in some situations, less code does not necessarily translate to better performance. More importantly, highly compressed code can be difficult for new maintainers to understand and therefore become more error prone or more likely to encounter breakage than simpler but more verbose statements.

The benchmarks listed in this article may be downloaded here, along with the spreadsheet used to chart the data points (apologies that it is in .xlsx format; I’ve been testing the Office 2010 beta–I’ll post a version with an .ods later!). Please recall that this benchmark isn’t scientific. The tests I conducted are highly artificial and are presented for only an exceedingly small subset of data variation. It is likely that other methods are faster, more efficient, and more appropriate for the given solution. Indeed, the entire purpose for this exercise was two-fold: 1) To avoid having to make any configurational changes to Suhosin and 2) surprise myself with how many unique solutions I could create for a single problem. I think I succeeded on both counts.

If you post commentary, please keep in mind that this was conducted for my own purposes and curiosity only. It is neither intended to be malicious to the Symfony project nor issued as a correction to their sources (though a patch that eliminates the use of /e in preg_replace would be nice!). I’ve been highly impressed by the Symfony sources. Unlike most PHP code, they’re clean, concise, easy to understand, and well-documented. It’s a breath of fresh air compared to a vast majority of PHP-based projects. My thanks to the Symfony project in general and Fabien Potencier in particular. Please feel free to post corrections to my methods but be polite about it!

Recommendations? I’d suggest using solution #4 if you’re encountering issues with Suhosin and Symfony.

Updates: March 2nd, 2010

I’m growing increasingly less impressed with Symfony and its internal design. It appears the developers have a strong affinity for preg_replace‘s /e modifier. Yes, I understand, you audit your code and it’s secure. But do you know what the most significant problem is with suggesting users disable Suhosin’s suhosin.executor.disable_emodifier flag is? Users are likely to do it globally–rather than in an .htaccess file somewhere–and as a consequence, they’ll likely open their systems up for (admittedly unlikely but possible) far worse things. I haven’t tried Symfony 2.0 yet, but I hope they’ve resolved this. It’s stupid. And it pisses me off.

Anyway, if you’re looking for some quick fixes and are going through the Symfony tutorial and do not want to disable any part of Suhosin, here’s what you’re going to need to do.

First, change line 409 in symfony/lib/response/sfWebResponse.class.php to:

    $name = strtr(ucfirst(strtolower($name)), '_', '-');
    if (preg_match('/\-./', $name, $matches))
        return str_replace($matches[0], strtoupper($matches[0]), $name);
    return $name;

Then change line 281 in symfony/lib/form/addon/sfFormObject.class.php to:

    if (preg_match_all('#(?:/|_|-)+.#', $text, $matches)) {
        foreach ($matches[0] as $match)
            $text = str_replace($match, strtoupper($match), $text);
        return ucfirst(strtr($text, array('/' => '::', '_' => '', '-' => '')));
    }
    return ucfirst($text);

Naturally, you’re probably better off listening to the advice of the developers directly, but this is my solution. Your mileage may vary.

Note to the developers: If you stumble upon this post in the near future (obviously this doesn’t apply to versions of Symfony beyond 1.4), feel free to add your commentary. Be mindful that some of us actually want to leave suhosin.executor.disable_emodifier set to on, including for your product. I will admit that the lines of code you’ve written look reasonably safe and do not appear to be accepting tainted input, but I’m not going to risk that. I’ve been running several versions of phpBB–and we all know the sorts of holes that software has–along with WordPress and a few other PHP-based web apps without ever having encountered this issue before. Seriously, it makes me kind of worried about the rest of your code! Let’s take a look at the comments in the Suhosin INI file:

; The /e modifier inside preg_replace() allows code execution. Often it is the
; cause for remote code execution exploits. It is wise to deactivate this
; feature and test where in the application it is used. The developer using the
; /e modifier should be made aware that he should use preg_replace_callback()
; instead.

‘Nuff said.


Here’s a good source on preg_replace, why you should always use single quotes, common mistakes, and why you should really just avoid using /e in the first place.

No comments.
***
Page 2 of 41234