I have decided to separate the benchmarks into two phases. The first phase (this one) consists of the benchmarks I performed prior to writing my own network handler to determine where Python’s primary deficiencies are. Remember: Each test comparing Python’s performance in this phase with the exception of two (Stackless + tasklet accept() with stacklesssocket.py and Pylons) is performed using classes available via the standard library.
The second phase will be posted Saturday, Feb. 14 (Valentine’s Day!). This phase will include benchmarks covering network code written from scratch using a simple select() server. I will post the source code as well, and I may also benchmark Pylons using the latest version, 0.9.7, in addition to the integrated server provided by Paster. Near as I can tell, Python performs nearly as well as PHP (a little slower) when using raw socket code that isn’t hidden by several layers of abstraction.
Tests Round One: 20 Concurrent Connections, 2000 Total
In this test, the best performers of the lot were Apache and Tomcat. Toward the middle of the pack were the PHP passthrough and Tomcat running behind the AJP connector (mod_jk). The performance penalty incurred by using mod_jk is certainly interesting to note, and its performance appears to be on par with PHP. All Python-based services trailed the tests with Python + BaseHTTP slightly ahead of its peers. Stackless with BaseHTTP and Python’s select() server + mod_wsgi were quite close in these tests with Stackless performing slightly better. The worst performers were my Stackless application server and the Pylons framework. I suspect the reason for Stackless’ poor performance is my network code; I’m using Richard Tew’s stacklesssocket.py reference implementation which is not intended for production, and I suspect I’m not using it correctly within the context of Stackless’ tasklets.
During the first rounds of tests, Pylons was the only platform to completely fail. The first two attempts worked quite well, but the third attempt marked a failure rate in connections of nearly one half. It appears that under heavy concurrency load, Pylons will simply fail to respond within a reasonable period of time. (In all fairness, this probably affects all of the Python frameworks, not just Pylons. However, I never noticed it with anything else.) Pylons was the only tested platform in which I did not load the source document from the file system; instead, the document contents were included directly inside a Pylons controller. Even this advantage didn’t pay off.
Tomcat also caused some unusual issues when I was testing it as a bare HTTP server without mod_jk. After the six trials, I attempted to run a few more in order to verify that Tomcat’s first run (shown on the chart as below Base Apache) was simply a fluke. Tomcat froze for approximately one minute before its shutdown scripts would restart it. I suspect this might have something to do with the tests exhausting most available local sockets.
Tests Round Two: No Concurrency, 500 Total
The second round of tests were performed one connection at a time, as fast as possible, for 500 total connections. Since the connections per second estimate extends beyond the number of connections actually performed, be aware that these numbers are upward-biased; in other words, the connections per second will appear higher since the connection limit is well below the estimated capacity of each service. I may rerun these trials with a total of 3000 to 4000 connection attempts.
(Slight edit: According to trials with 4000 connection attempts and no concurrency, much of the variability seen on the graphs disappears. The estimates pictured here are rather close, and some data suggests the actual values may be higher. It would seem that my guess of an upward bias in the graphed results is incorrect. The graphs may actually be biased toward producing lower values!)
Similar to the first series of tests, Tomcat and Apache (with and without .htaccess) outperform everything else. However, it is interesting to note that Python + BaseHTTP performed slightly better than both PHP and Tomcat using the AJP connector. As with the previous tests, Stackless + Base HTTP performs approximately as well as the standard Python distribution communicating to a WSGI front end via a select() server. Pylons performed relatively poorly, and my application server possessed the most abysmal rate. Again, I suspect this has to do with the asyncore backend used in stacklesssocket. I could be wrong, but my going theory is that the socket polling catches a new connection about every tenth of a second, even though the call suggests it should be twice that. This may be due to the processing overhead which, when finished, consumes enough of a cycle such that only every other poll is caught. In either case, it appears that I’ll be needing to rewrite the network code for my application server!
Here is a graph of the same data plotted according to the number of seconds per trial on a logarithmic graph (lower is better). The actual mean values for each server are:
|Server Type||Trial Time (seconds)|
|Stackless + tasklet accept()||50.136136|
|Stackless + BaseHTTP||0.926437|
|Base Python + select() + WSGI Proxy||1.346253|
|Base Python + BaseHTTP||0.547384|
|Base Python + Pylons||1.939826|
|PHP Passthrough (fopen)||0.592777|
|Base Apache (.htaccess disabled)||0.364311|
|Tomcat 6 + AJP 1.3 (mod_jk)||0.806039|
Nothing peculiar to record for this series.
A Stackless solution using
stacklesssocket.py for processing incoming connections via WSGI appears to need a lot of work. I’m currently leaning toward forking off connection handling into a separate thread and using stackless for the internal cooperative handling of data. Failing this, I may stick with standard Python and simply rewrite the entire system to use threads for data processing. Further benchmarks will be included once I commit to a specific design. Feel free to post any corrections or questions, but be mindful of the biases that exist in this benchmark. It is not scientific!
2 Responses to “Brief Comparison of Servers and Frameworks”
Wow. Glad you did that rather than me. Would have took me forever to screw around with this.
You’d be surprised! I’ve been working on collecting data for this benchmark off and on for a while. I do need to change the graphs, though. With as many lines as there are, I’m afraid it’s getting cluttered. Displaying a comparison of mean values as bars might be easier on the eyes!
The results are still very interesting. It’s just a shame that some of the standard library stuff works so poorly. Though, it isn’t surprising; much of that is intended as example code.
Leave a comment