Symfony 1.3/1.4 and Suhosin

I read about Symfony some time back when I was working on a project in CakePHP. Symfony struck my interests primarily because it uses Doctrine (by default) as its ORM, and I’ve used Doctrine in a couple of other projects. This weekend, I elected to give Symfony a try. After reading through the documentation–which I’ll usually do for a few hours before taking the final plunge–I was satisfied that Symfony would be a great framework to learn next.

Updated March 2nd, 2010. Click here to view my updated thoughts.

Following the initial guide was easy until I attempted to view the example application. Then I encountered this:

Fatal error: SUHOSIN – Use of preg_replace() with /e modifier is forbidden by configuration in /home/bshelton/phpapps/sf-test/lib/symfony/lib/response/sfWebResponse.class.php(409) : regexp code on line 409

Uh oh! This struck me as odd, considering Symfony prides itself on being a secure-by-default framework and it triggers a Suhosin warning from the stock install? Great. But don’t be too alarmed as the offending code isn’t a security hole (newlines were inserted by me to increase readability on the web):

return preg_replace('/\-(.)/e',
  "'-'.strtoupper('\\1')",
  strtr(ucfirst(strtolower($name)), '_', '-'));

The Fix

While the /e modifier does have the potential to execute PHP code, the offending line merely takes the first character immediately following a “-” and uppercases it (the strtoupper call is the only eval’d code). However, rather than disable parts of Suhosin–which could be bad if third party code is included that also does something similar and requires auditing–I decided to fix the issue. According to the Suhosin docs, the fix is fairly simple. Simply change line 409 in symfony/lib/response/sfWebResponse.class.php to:

PHP 5.2 and below:

return preg_replace_callback('/\-(.)/',
        create_function('$matches', 'return \'-\'.strtoupper($matches[1]);'),
        strtr(ucfirst(strtolower($name)),
        '_', '-'));

PHP 5.3 and up:

return preg_replace_callback('/\-(.)/',
        function ($matches) { return '-'.strtoupper($matches[1]); },
        strtr(ucfirst(strtolower($name)),
        '_', '-'));

In this case, rather than evaluating PHP code as part of the replacement, a callback function is created that performs the same approximate test. Obviously, the solution for PHP 5.2 and below could be equally as dangerous, but if we’re careful, we can create a function exactly as we intended.

Performance Concerns

After I discovered this solution by browsing the PHP documentation, it occurred to me that callbacks in preg_replace_callback are slightly slower than performing a call to preg_replace without callbacks. I couldn’t recall precisely how slow, so I elected to perform a benchmark using several different ideas at tackling this particular problem. This section highlights these benchmarks along with each potential solution. Ironically, using preg_replace_callback is about 30% slower than performing character-by-character replacement.

The Methodology

The methodology I used to test various solutions consisted of the following steps:

  1. Generate a large data set to aid in characterizing performance differences and elevate measurable performance discrepancies above the noise floor
  2. Write each solution such that it consumes the data generated in step 1
  3. Compare the performance of each method against the stock Symfony solution

Since the stock Symfony code is designed to “normalize” (their words) headers generated and sent to the client, the source data is created by a script that picks a random number of characters (all lowercase) and joins them together using a dash (-) or an underscore (_) in order to emulate strings like “content-type” or “content_type.” The same data set is used for all subsequent tests. No more than one dash or underscore is used per string; although this is unlikely cause for much concern as most headers are unlikely to have more than one or two separators.

You can download the script that generates this data, along with the source data itself, in the archive toward the bottom of this post.

Note: These solutions were tested on PHP 5.2 only. You’ll likely discover differences when testing on other PHP versions.

The Results

Before we examine each solution, let’s take a look at the results:

Benchmark Plot

As you can tell from this chart, the Suhosin-recommended solution of using preg_replace_callback is the slowest. The stock Symfony code performs rather well but it still slower than simplified string replacement calls. We’ll examine each solution in the following section.

The Code

The code for each solution is provided below along with a brief description of what it does. The actual benchmarks (and benchmark data) is also provided. Remember that each of these solutions is modified to use our source data.

Solution 1: Stock Symfony Solution
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    preg_replace('/\-(.)/e', "'-'.strtoupper('\\1')", strtr(ucfirst(strtolower($s)), '_', '-'));
}

The stock Symfony solution to the normalization problem is to first convert the string to lowercase, capitalize the first character, and translate all underscores (_) into dashes (-). preg_replace is then run and evaluates the replacement, which converts the first character immediately following a dash to its uppercase equivalent.

Solution 2: preg_replace_callback
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    preg_replace_callback('/\-(.)/',
        create_function('$matches', 'return \'-\'.strtoupper($matches[1]);'),
        strtr(ucfirst(strtolower($name)),
        '_', '-'));
}

As hinted by the Suhosin documentation, preg_replace_callback is recommended over evaluating PCRE replacements. Unfortunately, this solution is also the slowest. In structure, it is most similar to the stock Symfony code but replaces the evaluated code with an anonymous function. I have not tested true anonymous as presented in PHP 5.3, however. Given that create_function likely has to create an additional interpreter instance in PHP 5.2, this solution might be slightly faster in later versions of PHP when using this code:

return preg_replace_callback('/\-(.)/',
        function ($matches) { return '-'.strtoupper($matches[1]); },
        strtr(ucfirst(strtolower($name)),
        '_', '-'));
Solution 3: Array and String Manipulation Only
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = strtr(strtolower($s), '_', '-');
    $tmp = explode('-', $s);
    foreach ($tmp as &$t) {
        $t = ucfirst($t);
    }
    $buf = implode('-', $tmp);
}

This solution was among the fastest but its performance remains within the margin of error and may therefore be tied with Solution 4. As with solutions 1 and 2, this solution translates all characters to lowercase and replaces all underscores (_) with dashes (-). However, unlike the previous solution, this one splits each header along the dash boundary, loops through the remaining items, converts them in-place using ucfirst, and then joins them together again with a dash.

This solution is not optimal, and I expected it to be among the slowest. I was surprised to discover that this solution performed faster than the others, and I initially assumed the overhead of preg_replace was due in no small part to the requirement of loading the PCRE engine. It is also likely that increasing the number of split points in the header (by way of more dashes) will likewise increase the amount of time this solution requires to run.

My presumed explanations for the performance of this code segment were challenged with the next test.

Solution 4: Two Replacements
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = strtr(ucfirst(strtolower($s)), '_', '-');
    preg_match('/\-./', $s, $matches);
    str_replace($matches[0], strtoupper($matches[0]), $s);
}

I confess that the file name for this test is slightly misleading, and I’ll correct it in the posted archive. The initial intention was to utilize two preg_replace statements, but I elected (at the last minute, no less), to use only one in conjunction with an str_replace. In this solution, a dash (-) followed by any single character is captured, capitalized, and then replaced into the final product. As with Solution 3, this is among the fastest of the 5 tested. This solution may also scale slightly better than Solution 3 for headers containing more than a single dash.

Note that the performance of this solution hints that the slightly slower behavior of solutions 1 and 2 are likely due to code evaluation rather than the overhead of loading PCRE.

Solution 5: Character-by-Character Replacement
<?php
 
include_once 'source-data.php';
 
foreach ($source as $s) {
    $s = ucfirst(strtolower($s));
    $buf = '';
    for ($i = 0; $i < strlen($s); $i++) {
        if ($s[$i-1] == '-')
            $buf .= strtoupper($s[$i]);
        else
            $buf .= $s[$i];
    }
}

I initially assumed this would be the slowest performing benchmark of the 5. I was surprised to discover that it is the second slowest. Indeed, this solution performed approximately as well compared to preg_replace_callback as the stock Symfony code did in contrast with this solution.

Conclusion

The Symfony stock code is quite fast but still appears to create instances (at least in sfWebResponse.class.php) where partial evaluation of PHP code is necessary. eval‘d code isn’t necessarily evil, and while it most certainly is a vector for exploitation, it is the manner in which code is evaluated that makes it dangerous. Regardless, I think it is appropriate to evaluate (I made a pun!) circumstances in which code itself is eval‘d and question whether such calls are necessary. While these benchmarks are admittedly highly artificial, it is fairly obvious that in some situations, less code does not necessarily translate to better performance. More importantly, highly compressed code can be difficult for new maintainers to understand and therefore become more error prone or more likely to encounter breakage than simpler but more verbose statements.

The benchmarks listed in this article may be downloaded here, along with the spreadsheet used to chart the data points (apologies that it is in .xlsx format; I’ve been testing the Office 2010 beta–I’ll post a version with an .ods later!). Please recall that this benchmark isn’t scientific. The tests I conducted are highly artificial and are presented for only an exceedingly small subset of data variation. It is likely that other methods are faster, more efficient, and more appropriate for the given solution. Indeed, the entire purpose for this exercise was two-fold: 1) To avoid having to make any configurational changes to Suhosin and 2) surprise myself with how many unique solutions I could create for a single problem. I think I succeeded on both counts.

If you post commentary, please keep in mind that this was conducted for my own purposes and curiosity only. It is neither intended to be malicious to the Symfony project nor issued as a correction to their sources (though a patch that eliminates the use of /e in preg_replace would be nice!). I’ve been highly impressed by the Symfony sources. Unlike most PHP code, they’re clean, concise, easy to understand, and well-documented. It’s a breath of fresh air compared to a vast majority of PHP-based projects. My thanks to the Symfony project in general and Fabien Potencier in particular. Please feel free to post corrections to my methods but be polite about it!

Recommendations? I’d suggest using solution #4 if you’re encountering issues with Suhosin and Symfony.

Updates: March 2nd, 2010

I’m growing increasingly less impressed with Symfony and its internal design. It appears the developers have a strong affinity for preg_replace’s /e modifier. Yes, I understand, you audit your code and it’s secure. But do you know what the most significant problem is with suggesting users disable Suhosin’s suhosin.executor.disable_emodifier flag is? Users are likely to do it globally–rather than in an .htaccess file somewhere–and as a consequence, they’ll likely open their systems up for (admittedly unlikely but possible) far worse things. I haven’t tried Symfony 2.0 yet, but I hope they’ve resolved this. It’s stupid. And it pisses me off.

Anyway, if you’re looking for some quick fixes and are going through the Symfony tutorial and do not want to disable any part of Suhosin, here’s what you’re going to need to do.

First, change line 409 in symfony/lib/response/sfWebResponse.class.php to:

    $name = strtr(ucfirst(strtolower($name)), '_', '-');
    if (preg_match('/\-./', $name, $matches))
        return str_replace($matches[0], strtoupper($matches[0]), $name);
    return $name;

Then change line 281 in symfony/lib/form/addon/sfFormObject.class.php to:

    if (preg_match_all('#(?:/|_|-)+.#', $text, $matches)) {
        foreach ($matches[0] as $match)
            $text = str_replace($match, strtoupper($match), $text);
        return ucfirst(strtr($text, array('/' => '::', '_' => '', '-' => '')));
    }
    return ucfirst($text);

Naturally, you’re probably better off listening to the advice of the developers directly, but this is my solution. Your mileage may vary.

Note to the developers: If you stumble upon this post in the near future (obviously this doesn’t apply to versions of Symfony beyond 1.4), feel free to add your commentary. Be mindful that some of us actually want to leave suhosin.executor.disable_emodifier set to on, including for your product. I will admit that the lines of code you’ve written look reasonably safe and do not appear to be accepting tainted input, but I’m not going to risk that. I’ve been running several versions of phpBB–and we all know the sorts of holes that software has–along with WordPress and a few other PHP-based web apps without ever having encountered this issue before. Seriously, it makes me kind of worried about the rest of your code! Let’s take a look at the comments in the Suhosin INI file:

; The /e modifier inside preg_replace() allows code execution. Often it is the
; cause for remote code execution exploits. It is wise to deactivate this
; feature and test where in the application it is used. The developer using the
; /e modifier should be made aware that he should use preg_replace_callback()
; instead.

‘Nuff said.


Here’s a good source on preg_replace, why you should always use single quotes, common mistakes, and why you should really just avoid using /e in the first place.

No comments.
***

Scala: A Prologue (Or The Learning of a Language)

Borrowing an idea from Will, I’ve decided that I will be detailing the progress I make as I teach myself a new programming language: Scala. I’ve heard nothing but good things about it and have toyed around with a couple of (simple!) tutorials. I’m impressed enough to have ordered a book which I expect to receive sometime toward the end of this week. While this is simply a prologue to the actual play, I would like to discuss my motivation for sharing this.

First, I don’t have a particularly good recollection of my first steps taken with any programming language early on in my education or life. The first one I learned–contrary to popular belief–was not PHP; it was Perl. It’s almost embarrassing to admit, not so much because I find Perl to be a terrible language (although it is something of a write-once language), but because most hackers cut their teeth on languages closer to bare metal like C. I’m sure I still have some of my first Perl programs available, and I’d be hard pressed not to blush were I to browse them, all while wondering who the idiot was writing such awful code! I wish I had written a journal during those early days to record my first few steps. Though my memory of that particular time period isn’t great, I do recall some months of playing with Perl and CGI all to print random quotes to a web page. That’s what fascinated me most, and I think that was what spawned my love affair with web application development.

Second, I’d like to record mostly for my own future reference my steps through a language that differs significantly from the ones I currently know. I have quite a few under my belt (Python, PHP, Java, Perl, C#, and VB.NET–though the latter two are virtually identical thanks to the CLR with only syntax placing a wall between them) these days, and I think it’s generally a wonderful idea to have a history available for reference, either for personal fulfillment or as a utility available to others. Since hindsight is often viewed through rose-colored glasses, it should be interesting both to myself and anyone else reading this future work (even years from now) to peruse the sorts of “gotchas” I encounter–and what I find easiest to grasp. If nothing else, it might equip future Scala coders with some tools to combat the inevitable stumbling blocks they’re bound to encounter.

Third, I think this might be a great way to encapsulate my own thoughts on the learning process. Perhaps I could grant some insight on what happens inside a developer’s mind as they learn a new language especially one that exists outside their comfort zone. Scala combines the worlds of OOP and functional programming offering features I’m certain I have yet to understand. I’ve never delved into a functional language before (Python borrows some ideas from functional programming, mind you), and I expect to get caught by a hitch or two along the way. For current Java programmers, my work here may offer some clues as to the sorts of things you should expect to both enjoy and suffer along the road. Not that I wish to imply you’ll suffer through the language; rather, I’d like to hint that there are certain things that might not immediately seem intuitive to the Java developer. Nevertheless, being a JVM-hosted language, Scala presents us with some distinct advantages. There is also a .NET CLR-hosted port of Scala, too, for those of you living in the kingdom of Microsoft. Suffice it to say that there’s a little something for everyone.

Finally, I’d like to offer this as an analog to Will’s project I mentioned earlier. He came up with this idea some months back (it’s still on his blog, in fact), and I commended him on the utility of the notion. It’s such a great idea, I’d like to present my own implementation of it–but from a different perspective. Indeed, this project in particular–this recording of my own experiences–might serve itself to him as a guidebook of should he choose to do any sort of programming and more importantly extend that programming into other languages. Programmers these days don’t get by with a single language alone. I’ve read from sources too numerous to list that the average programmer should take the time to learn at least one language per year during their career. No, they won’t jumble themselves together–though you might notice a couple of conflicts (indented formatting habits while switching from Python to PHP or terminating lines with semi-colons while switching from Java to Python).

Along these lines, it’s important to point out that once you learn a language, the general rule of thumb is that it gets easier to learn others. Perhaps this has something to do with the formation of synapses that make the learning process more streamlined rather than establishing a particular bias toward certain constructs, syntaxes, and idea abstraction. This holds true, I think, for both natural (spoken) language and programming. Those in my circle of peers who know at least one language other than their native tongue have a much easier time grasping new ones, and I certainly know from experience, personal and otherwise, that learning programming languages becomes easier as more and more make their way into one’s skill set.

Scala looks to be a great general purpose language. It even has a web framework called Lift that can operate on any fairly modern servlet container (like Tomcat), so there’s some utility to be had from it outside the domain of generic applications. Its allure to me lies in its management of higher concurrency, which is something that still seems slightly awkward in languages like Python and Java. (To be fair, C# has advantages over Java mostly because it is newer, but there are still other things I like most about Java’s model–many of which Python borrows.) If nothing else, perhaps this coming project of mine will make (or break) your decision to adopt Scala.

Don’t expect this writing style to be formative for my future narration on this topic. Academic-speak and structure is something I enjoy, but I also enjoy writing entertaining works of literature. Thus, when I actually write the Real Thing, I hope it’ll be fun. You may grow to love or hate it. It may even take a month or two–maybe longer–to write in its entirety, but the results will be worth it. I promise.

Until then, stay tuned for the Links of the Week coming Wednesday or later. I have a few things in the queue for this week, so LotW might be delayed. If so, I apologize ahead of time.

Update Links of the Week will probably wait ’til Friday. I’ve been pretty busy, and I’m going to be putting together a tutorial on vBulletin tutorials for my own purposes. I know Will really finds the LotW entertaining; I apologize ahead of time, but I’ve really been a little busy.

Also, I’ve fixed some mistakes in this post and rephrased a few parts to make clear some of my original intent. It still isn’t perfect. I couldn’t finish my updates in Firefox due to persistent crashes, and if there’s something missing, I’ve noticed that Chrome didn’t open the original version of the document when I went to edit it. Weird.

5 comments.
***

Routing v0.2.2 Released

I’ve released Routing v0.2.2. This version fixes a few outstanding issues and reduces the number of notices encountered if error_reporting is set to E_ALL. You can also read the changelog here.

No comments.
***
Page 1 of 612345Next »...Last »