Symfony 1.3/1.4 and Suhosin

I read about Symfony some time back when I was working on a project in CakePHP. Symfony struck my interests primarily because it uses Doctrine (by default) as its ORM, and I’ve used Doctrine in a couple of other projects. This weekend, I elected to give Symfony a try. After reading through the documentation–which I’ll usually do for a few hours before taking the final plunge–I was satisfied that Symfony would be a great framework to learn next.

Updated March 2nd, 2010. Click here to view my updated thoughts.

Following the initial guide was easy until I attempted to view the example application. Then I encountered this:

Fatal error: SUHOSIN – Use of preg_replace() with /e modifier is forbidden by configuration in /home/bshelton/phpapps/sf-test/lib/symfony/lib/response/sfWebResponse.class.php(409) : regexp code on line 409

Uh oh! This struck me as odd, considering Symfony prides itself on being a secure-by-default framework and it triggers a Suhosin warning from the stock install? Great. But don’t be too alarmed as the offending code isn’t a security hole (newlines were inserted by me to increase readability on the web):

return preg_replace('/\-(.)/e',
  "'-'.strtoupper('\\1')",
  strtr(ucfirst(strtolower($name)), '_', '-'));

The Fix

While the /e modifier does have the potential to execute PHP code, the offending line merely takes the first character immediately following a “-“ and uppercases it (the strtoupper call is the only eval’d code). However, rather than disable parts of Suhosin–which could be bad if third party code is included that also does something similar and requires auditing–I decided to fix the issue. According to the Suhosin docs, the fix is fairly simple. Simply change line 409 in symfony/lib/response/sfWebResponse.class.php to:

PHP 5.2 and below:

return preg_replace_callback('/\-(.)/',
        create_function('$matches', 'return \'-\'.strtoupper($matches[1]);'),
        strtr(ucfirst(strtolower($name)),
        '_', '-'));

PHP 5.3 and up:

return preg_replace_callback('/\-(.)/',
        function ($matches) { return '-'.strtoupper($matches[1]); },
        strtr(ucfirst(strtolower($name)),
        '_', '-'));

In this case, rather than evaluating PHP code as part of the replacement, a callback function is created that performs the same approximate test. Obviously, the solution for PHP 5.2 and below could be equally as dangerous, but if we’re careful, we can create a function exactly as we intended.

Performance Concerns

After I discovered this solution by browsing the PHP documentation, it occurred to me that callbacks in preg_replace_callback are slightly slower than performing a call to preg_replace without callbacks. I couldn’t recall precisely how slow, so I elected to perform a benchmark using several different ideas at tackling this particular problem. This section highlights these benchmarks along with each potential solution. Ironically, using preg_replace_callback is about 30% slower than performing character-by-character replacement.

The Methodology

The methodology I used to test various solutions consisted of the following steps:

  1. Generate a large data set to aid in characterizing performance differences and elevate measurable performance discrepancies above the noise floor
  2. Write each solution such that it consumes the data generated in step 1
  3. Compare the performance of each method against the stock Symfony solution

Since the stock Symfony code is designed to “normalize” (their words) headers generated and sent to the client, the source data is created by a script that picks a random number of characters (all lowercase) and joins them together using a dash (-) or an underscore (_) in order to emulate strings like “content-type” or “content_type.” The same data set is used for all subsequent tests. No more than one dash or underscore is used per string; although this is unlikely cause for much concern as most headers are unlikely to have more than one or two separators.

You can download the script that generates this data, along with the source data itself, in the archive toward the bottom of this post.

Note: These solutions were tested on PHP 5.2 only. You’ll likely discover differences when testing on other PHP versions.

The Results

Before we examine each solution, let’s take a look at the results:

Benchmark Plot

As you can tell from this chart, the Suhosin-recommended solution of using preg_replace_callback is the slowest. The stock Symfony code performs rather well but it still slower than simplified string replacement calls. We’ll examine each solution in the following section.

The Code

The code for each solution is provided below along with a brief description of what it does. The actual benchmarks (and benchmark data) is also provided. Remember that each of these solutions is modified to use our source data.

Solution 1: Stock Symfony Solution
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    preg_replace('/\-(.)/e', "'-'.strtoupper('\\1')", strtr(ucfirst(strtolower($s)), '_', '-'));
}

The stock Symfony solution to the normalization problem is to first convert the string to lowercase, capitalize the first character, and translate all underscores (_) into dashes (-). preg_replace is then run and evaluates the replacement, which converts the first character immediately following a dash to its uppercase equivalent.

Solution 2: preg_replace_callback
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    preg_replace_callback('/\-(.)/',
        create_function('$matches', 'return \'-\'.strtoupper($matches[1]);'),
        strtr(ucfirst(strtolower($name)),
        '_', '-'));
}

As hinted by the Suhosin documentation, preg_replace_callback is recommended over evaluating PCRE replacements. Unfortunately, this solution is also the slowest. In structure, it is most similar to the stock Symfony code but replaces the evaluated code with an anonymous function. I have not tested true anonymous as presented in PHP 5.3, however. Given that create_function likely has to create an additional interpreter instance in PHP 5.2, this solution might be slightly faster in later versions of PHP when using this code:

return preg_replace_callback('/\-(.)/',
        function ($matches) { return '-'.strtoupper($matches[1]); },
        strtr(ucfirst(strtolower($name)),
        '_', '-'));
Solution 3: Array and String Manipulation Only
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = strtr(strtolower($s), '_', '-');
    $tmp = explode('-', $s);
    foreach ($tmp as &$t) {
        $t = ucfirst($t);
    }
    $buf = implode('-', $tmp);
}

This solution was among the fastest but its performance remains within the margin of error and may therefore be tied with Solution 4. As with solutions 1 and 2, this solution translates all characters to lowercase and replaces all underscores (_) with dashes (-). However, unlike the previous solution, this one splits each header along the dash boundary, loops through the remaining items, converts them in-place using ucfirst, and then joins them together again with a dash.

This solution is not optimal, and I expected it to be among the slowest. I was surprised to discover that this solution performed faster than the others, and I initially assumed the overhead of preg_replace was due in no small part to the requirement of loading the PCRE engine. It is also likely that increasing the number of split points in the header (by way of more dashes) will likewise increase the amount of time this solution requires to run.

My presumed explanations for the performance of this code segment were challenged with the next test.

Solution 4: Two Replacements
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = strtr(ucfirst(strtolower($s)), '_', '-');
    preg_match('/\-./', $s, $matches);
    str_replace($matches[0], strtoupper($matches[0]), $s);
}

I confess that the file name for this test is slightly misleading, and I’ll correct it in the posted archive. The initial intention was to utilize two preg_replace statements, but I elected (at the last minute, no less), to use only one in conjunction with an str_replace. In this solution, a dash (-) followed by any single character is captured, capitalized, and then replaced into the final product. As with Solution 3, this is among the fastest of the 5 tested. This solution may also scale slightly better than Solution 3 for headers containing more than a single dash.

Note that the performance of this solution hints that the slightly slower behavior of solutions 1 and 2 are likely due to code evaluation rather than the overhead of loading PCRE.

Solution 5: Character-by-Character Replacement
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = ucfirst(strtolower($s));
    $buf = '';
    for ($i = 0; $i < strlen($s); $i++) {
        if ($s[$i-1] == '-')
            $buf .= strtoupper($s[$i]);
        else
            $buf .= $s[$i];
    }
}

I initially assumed this would be the slowest performing benchmark of the 5. I was surprised to discover that it is the second slowest. Indeed, this solution performed approximately as well compared to preg_replace_callback as the stock Symfony code did in contrast with this solution.

Conclusion

The Symfony stock code is quite fast but still appears to create instances (at least in sfWebResponse.class.php) where partial evaluation of PHP code is necessary. eval‘d code isn’t necessarily evil, and while it most certainly is a vector for exploitation, it is the manner in which code is evaluated that makes it dangerous. Regardless, I think it is appropriate to evaluate (I made a pun!) circumstances in which code itself is eval‘d and question whether such calls are necessary. While these benchmarks are admittedly highly artificial, it is fairly obvious that in some situations, less code does not necessarily translate to better performance. More importantly, highly compressed code can be difficult for new maintainers to understand and therefore become more error prone or more likely to encounter breakage than simpler but more verbose statements.

The benchmarks listed in this article may be downloaded here, along with the spreadsheet used to chart the data points (apologies that it is in .xlsx format; I’ve been testing the Office 2010 beta–I’ll post a version with an .ods later!). Please recall that this benchmark isn’t scientific. The tests I conducted are highly artificial and are presented for only an exceedingly small subset of data variation. It is likely that other methods are faster, more efficient, and more appropriate for the given solution. Indeed, the entire purpose for this exercise was two-fold: 1) To avoid having to make any configurational changes to Suhosin and 2) surprise myself with how many unique solutions I could create for a single problem. I think I succeeded on both counts.

If you post commentary, please keep in mind that this was conducted for my own purposes and curiosity only. It is neither intended to be malicious to the Symfony project nor issued as a correction to their sources (though a patch that eliminates the use of /e in preg_replace would be nice!). I’ve been highly impressed by the Symfony sources. Unlike most PHP code, they’re clean, concise, easy to understand, and well-documented. It’s a breath of fresh air compared to a vast majority of PHP-based projects. My thanks to the Symfony project in general and Fabien Potencier in particular. Please feel free to post corrections to my methods but be polite about it!

Recommendations? I’d suggest using solution #4 if you’re encountering issues with Suhosin and Symfony.

Updates: March 2nd, 2010

I’m growing increasingly less impressed with Symfony and its internal design. It appears the developers have a strong affinity for preg_replace‘s /e modifier. Yes, I understand, you audit your code and it’s secure. But do you know what the most significant problem is with suggesting users disable Suhosin’s suhosin.executor.disable_emodifier flag is? Users are likely to do it globally–rather than in an .htaccess file somewhere–and as a consequence, they’ll likely open their systems up for (admittedly unlikely but possible) far worse things. I haven’t tried Symfony 2.0 yet, but I hope they’ve resolved this. It’s stupid. And it pisses me off.

Anyway, if you’re looking for some quick fixes and are going through the Symfony tutorial and do not want to disable any part of Suhosin, here’s what you’re going to need to do.

First, change line 409 in symfony/lib/response/sfWebResponse.class.php to:

    $name = strtr(ucfirst(strtolower($name)), '_', '-');
    if (preg_match('/\-./', $name, $matches))
        return str_replace($matches[0], strtoupper($matches[0]), $name);
    return $name;

Then change line 281 in symfony/lib/form/addon/sfFormObject.class.php to:

    if (preg_match_all('#(?:/|_|-)+.#', $text, $matches)) {
        foreach ($matches[0] as $match)
            $text = str_replace($match, strtoupper($match), $text);
        return ucfirst(strtr($text, array('/' => '::', '_' => '', '-' => '')));
    }
    return ucfirst($text);

Naturally, you’re probably better off listening to the advice of the developers directly, but this is my solution. Your mileage may vary.

Note to the developers: If you stumble upon this post in the near future (obviously this doesn’t apply to versions of Symfony beyond 1.4), feel free to add your commentary. Be mindful that some of us actually want to leave suhosin.executor.disable_emodifier set to on, including for your product. I will admit that the lines of code you’ve written look reasonably safe and do not appear to be accepting tainted input, but I’m not going to risk that. I’ve been running several versions of phpBB–and we all know the sorts of holes that software has–along with WordPress and a few other PHP-based web apps without ever having encountered this issue before. Seriously, it makes me kind of worried about the rest of your code! Let’s take a look at the comments in the Suhosin INI file:

; The /e modifier inside preg_replace() allows code execution. Often it is the
; cause for remote code execution exploits. It is wise to deactivate this
; feature and test where in the application it is used. The developer using the
; /e modifier should be made aware that he should use preg_replace_callback()
; instead.

‘Nuff said.


Here’s a good source on preg_replace, why you should always use single quotes, common mistakes, and why you should really just avoid using /e in the first place.

No comments.
***

Routing v0.2 Released

I’ve released a new version of Routing v0.2.1. This update resolves issues with the route preferences and correctly prioritizes static routes. The internal route decisions are now handled by a scoring system. There are still a few bugs, but Routing is fast moving toward something usable.

Head over here to grab the latest version. I also wrote some documentation this evening for you to examine!

No comments.
***

PHP Routing

If you’ve ever written web applications in Python using Pylons or TurboGears (>=1.1), you’re probably familiar with an incredibly handy tool known as Routes. It’s modeled after the Ruby on Rails routes system and does quite a marvelous job at mapping friendly, SEO URLs to useful code. If you’re not aware of it, you really should check it out.

However, I was curious to see if it were possible to create a Routes-inspired PHP library that would perform essentially the same task. I’m sure there’s an already-written PHP routes system, and I’ve allowed myself to fall into the not invented here trap. But, that’s part of the fun in coding: If you don’t like what someone else wrote or you want to give it a hand yourself, no one is stopping you. It’s great.

Thus, I’ve written Routing, the PHP routes system. It’s similar in syntax and intent to Python Routes with the exception that it’s PHPized (in other words: bastardized). If you read the sources, you might be able to see a little Python influence here and there (along with my frustrations toward PHP as a language vented in silence). I wouldn’t recommend it as anything short of entertainment as yet; I haven’t cleaned up the sources, and I expect they’re going to be really messy for a little while. The route search mechanism is awful, and I plan on refactoring it in a future revision. For now, the system works, and that’s good enough. (That’s my INTJ side talking.)

So, if you’re curious enough to take a look, head over to the Routing Trac page. I have some downloads posted there in a variety of formats (.zip, .tar.gz, and .tar.bz2). There’s no API documentation yet. I’m going to add it soon. Since Routing was born on Wednesday, August 5th, 2009 it’s quite young. However, I have some preliminary unit tests written (also incomplete) that you can read to get a general understanding of how Routing works. If you’re more interested in the principle behind Routing, you might want to read the documentation for Python Routes. Routes and Routing work roughly the same with the exception that Routing, unlike Python Routes, is extraordinarily buggy. I hope I can evolve this into something of an acceptable sibling to Routes but that will take time and a whole lot of effort. I have a few additional ideas, too.

1 comment.
***