PHP: Things that Annoy Me

It seems that nearly everyone who has dabbled to one extreme or another in PHP has a few gripes related to the language’s ad hoc design. It isn’t a very elegant language, nor is it well-suited to much outside the world of web applications. What it does do it does reasonably well, and its bar of entry is low enough that even novices can have a working site up and running with some dynamic components here and there within a few days of playing with sample code. Unlike more complicated frameworks of the Java, Python, or Ruby worlds, PHP requires little–if any–knowledge of common design patterns and practices. For better or for worse it’s a language that grew out of a set of utilities and still retains much of this feel–and design.

The volume of “PHP annoys me” articles have blossomed tremendously over the years. I think this is partially the result that those of us who essentially “grew up” with the language are now discovering with increasing frustration that the very things which made the language easy to use are what cause it (or more appropriately us) so much grief.

I’d like to contribute a little to beating this particular dead horse. Maybe it’s not quite dead yet.

Ahh, how I could make a Monty Python reference out of that!

Update, June 21, 2009 If you like this article, you might be interested in this site.

But it’s so easy…

Without a doubt, PHP is an easy language to learn. In the early days of the web when the competing technologies were largely a choice between various dialects of ASP (this was a bit before .NET grew into its own useful framework for the web), CGI using Perl or custom solutions written in any other language (including C), or the new kid on the block: PHP, PHP provided programmers then with an incredibly easy way of interfacing with databases. Why, with only 10 or so lines of code, you could have a simple list of your favorite movies, foods, actors, or whatever else you fancied at the time pulled straight from a database. It sure beat having to deal with the intricacies of memory management and the likes of which C is well known for.

But the problem with PHP’s ease of use came at a price. Certain useful things like support for prepared statements in MySQL was non-existent. Then there was the issue of magic quotes, GPC variables, escaping user input, addslashes, stripslashes, slashslashes, and a dozen other convoluted solutions to problems that arguably shouldn’t have surfaced in the first place. PHP became its own worst enemy, and it wasn’t until about 4.1 and 4.2 when most of the legacy rubbish was finally being swept under the rug.

Now, don’t get me wrong: Hindsight makes for an unfair analysis and that’s why I’m not going to complain about the obvious things that people tend not to like about the language. Well, maybe a little. It’s hard not to.

I’m also not going to touch the issue of it being a dynamically typed language, because I happen to like dynamically typed languages. Python is a great example of this (although it is strongly typed).

Lies, Damned Lies, and Databases

PHP’s popularity may only be matched on the LAMP stack by MySQL (coincidentally one of the member projects referred to in the acronym!), a DBMS upon which it meshes well with and is often married to. WordPress is one such application that is so exclusively MySQL-centric that using alternative databases is almost impossible. It’s no surprise that, at the time, since MySQL was arguably one of the most popular free databases available, PHP grew around support for it in its core. Of course, the astute reader will probably point out that this isn’t any longer the case, and that integrated MySQL support in PHP has long been deprecated. Indeed it is.

The unfortunate part of this whole popularity thing that struck PHP is that much of the underlying code actually has to change. People want features. Perhaps more importantly, people want working features. Then there’s the problem with feature creep. But not only that, the libraries also evolve: Some are removed, some added, and many of them change. Sometimes the entire ABI compatibility layer gets blown up leaving messy bits all over the outskirts of a deeply excavated crater where someone’s web application once stood.

It’s a similar story for extension authors. Sooner or later, all those neat little libraries suddenly become the bane of their own existence. The introduction of MySQLi and, later, PDO serve as a somewhat indirect acknowledgment of why: The standard MySQL libraries just don’t cut it.

To understand why the MySQL libraries that shipped with PHP from the start were a problem, we have to examine the feature set of other database libraries like those that support PostgreSQL. Even the PDO class adopted certain features and went as far as to implement them internally if the backend driver lacked native support. I’m talking, of course, about prepared statements.

Let’s take a look at the following query:

1
SELECT * FROM users WHERE username = '$username'

This innocuous statement harbors a classic example of the dreaded SQL injection. The mysql_query() function does have a distinct advantage in that it only executes one statement at a time (I wouldn’t rely on it for security, however!), but what if this query were intended for authentication? Further, let’s assume for the sake of argument that to become an administrator, all we had to do was pass in the data:

1
$username = "admin' -- "

So our query, complete with expanded variables, now looks like:

1
SELECT * FROM users WHERE username = 'admin' -- '

That’s why prepared statements are a good thing. With a prepared statement, we could rewrite the query as:

1
SELECT * FROM users WHERE username = ?

Then, when we pass in our variable, the driver (optimally) treats it as a separate data stream and we needn’t worry about SQL injection. Of course, you’d still need to test the data for validity and correctness, but that’s another issue entirely–the simplest but most common attack against web applications has been mostly thwarted.

Just remember the golden rule of web applications: All user-supplied data is tainted, and tainted data is bad.

While we’re on the subject of databases, I’d like to point out a personal gripe of mine:

Why must you explicitly pass to the mysql_connect() function a value indicating that you want to CLONE the connection?

Yeah, I get it–most people aren’t going to use more than one database, so it only makes sense that further calls to mysql_connect() return the same database connection that the previous ones did! It’s stupid: If I make two calls to mysql_connect(), you can bet your pale geeky posterior that I did it intentionally. Most programmers don’t write code because they think it looks shiny (although that point is hotly contested in some circles); they write code that they think is going to do something for a particular purpose. This is where PHP violates the principle of least surprise: Don’t surprise the user and if you have to, surprise them only in a manner that makes sense.

Oh, and the next person who tells me that it’s designed this way for the purpose of efficiency will find a fart targeted in their general direction.

One man’s false is another man’s… true?

I guess this is the PHP world’s way of incorporating the framework of relativism into the language. Regardless of why it’s still something that puzzles me. Sure, I realize that PHP being a dynamically typed language means that certain things have no other choice but to evaluate to true or false–but c’mon!

===

Is
Exactly
Equal
To.

Those of you old-time PHP developers know exactly what I’m getting at here. For the rest, let’s put it into perspective.

Assume we have a script on our server and we’re accessing it by going to the URL: http://www.example.com/scripts/equality.php?test=0.

Now, in our equality.php file we have:

1
2
3
4
5
6
7
<?php
 
if ($_GET['test'] == 0) {
    print "Your value is {$_GET['test']}!";
}
 
?>

If you’re not familiar with this nuance of the language, I’ll bet I know what you’re thinking. It’s going to say “Your value is 0!” right? You bet. Guess what happens if you call it with: http://www.example.com/scripts/equality.php?test=

(Notice how there’s no 0 attached to the test variable.)

I’ll give you a moment.

It shouldn’t run anything, right? After all, it’s supposed to be checking to see if test is zero; if not, it doesn’t print anything at all.

Wrong.

This script will happily print away “Your value is”. That’s it. Nothing. Instead, you have to use the is exactly equal to operator as follows:

1
2
3
4
5
6
7
<?php
 
if ($_GET['test'] === 0) {
    print "Your value is {$_GET['test']}!";
}
 
?>

It’s a very subtle but important difference. How many times data errors have crept into someone’s code (and database) because of this I’ll never know.

If that’s not enough, there’s more. PHP also has–wait for it–a not exactly equal to operator or !==. So, you run into the same thing. Testing to see if a value is not equal to zero doesn’t necessarily mean it is zero. It could be an empty string. It could be false. You don’t ever know unless you use $value !== 0, then you’re set.

Now, I needn’t be too judgmental here: Other dynamically typed languages suffer from similar a problem domain. However, other dynamic languages tend to make up for it by either adding additional operators or Doing the Right Thing. Let’s take a look at the following Python typed into the interactive shell:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Python 2.5.2 (r252:60911, Apr  3 2009, 01:21:50)
[GCC 4.1.2 (Gentoo 4.1.2 p1.3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> False == 0
True
>>> "" == 0
False
>>> "" == False
False
>>> False == None
False
>>> False is 0
False
>>> False is ""
False
>>> False is False
True

Notice that the is operator determines whether an object (which everything is in Python) is of the exact same type as another object. This is a pretty good way to test if something is something else. Fancy.

How does PHP fair?

1
2
3
4
5
6
7
8
9
10
if (false == 0) // prints true, evaluates to true
    print 'true';
if ("" == 0) // prints true, evaluates to true
    print 'true';
if ("" == false) // prints true, evaluates to true
    print 'true';
if (false == null) // also prints true, evaluates to true
    print 'true';
if (false instanceof 0) // syntax error
    print 'true';

See how Python does what the novice PHP developer would expect? Zero is equivalent to false but the empty string is equal to neither 0 nor false. The same thing for None, Python’s null: Neither of these are equivalent. Of course, Python still lets you evaluate these for truth like so:

1
2
3
4
5
6
7
8
9
10
11
12
>>> if not None:
...     print 'true'
...
true
>>> if not "":
...     print 'true'
...
true
>>> if not False:
...     print 'true'
...
true

What’s the purpose of this lengthy rant? Simple: False may not be what you think it is. In fact, false may not really be anything. It might be something else entirely–a zero, an empty string, a null–you never know in PHP. If you have to know for certain, the only way to tell is to use === or !==.

I remember once when someone was looking over my PHP code. They asked me if === was a typo. That burns.

Namespaces? Nameschmaces!

I’m going to keep this rant short mostly because it’s been covered in detail elsewhere. Just look up the Slashdot article that covered PHP’s announcement that their new namespace resolution operator is slated to be the backslash (\). I wish I were kidding.

I think this guy writes it best. Scroll down to #4 where he discusses PHP’s current lack of namespace support. In fact, the entire article is an amazing gem (get it? an article about Ruby, and I’m talking about gems–oh, I kill me!)!

And yes, it’s a joke. It’s sarcasm. In fact, it’s a brilliant masterpiece. That sort of work gets bookmarked in stone.

Why do we have arguments?

Some of the PHP library’s more annoying issues involve function argument ordering. Take for example array_key_exists($key, $array) versus the array manipulation functions like array_push($array, $value). I will confess that PHP’s other array search functions are consistent among each other and accept argument order as ($needle, $haystack).

Some consistency is better than none, I suppose. However…

Contrast this with functions like strpos($haystack, $needle[, $offset]) and strstr($haystack, $needle). Which is it? ($needle, $haystack) or ($haystack, $needle)?

And let’s not even start with the naming of PHP functions. There is no standard there that I’m aware of other than whatever_the_devs_feel_like().

Then there were aliases…

Oh, and not to be forgotten are the dozens of aliases that do the same thing as their sibling functions. But, more importantly, functions that you mistakenly think work in a very simple manner, like split() don’t because they rely on the regex engine. Not only that, but split() uses the horrifically slow POSIX regular expression functions; I probably wouldn’t have such heartburn over it if this were instead an alias to preg_split(). At least the Perl-compatible regex library is pretty fast.

Fortunately, more sensible languages don’t do anything overly fancy with split. There’s truth in advertising there!

N.B.: I’m aware that PHP has its roots in Perl and this is probably the reason why split() implements itself as a regex function in the first place. However, Perl is extremely good at what it does, and I think it’s a bit disingenuous to have a function that uses the slowest of PHP’s regex libraries under the guise that it does something Perl-ish.

What about the built-ins?

This is a silly gripe but an important one (didn’t I say that earlier?). In fact, you may not even be aware of this if you’re not a PHP coder and even then only if you’ve bothered to dig around the manual a tad. Let’s take the differences between echo() (or just echo) and print() (or plainly print). Neither is a real function. They’re both language constructs, so they can be used with or without parenthesis.

First, and I know this is probably a hugely pedantic complaint, but let’s look at the documentation for print:

Description

int print ( string $arg )

Outputs arg.

print() is not actually a real function so you are not required to use parentheses with its argument list.

If it’s not a function, why does it bother to return anything? Yeah, you could argue that something along the lines of $output = $a || $b “returns” something, but while we’re discussing inaccuracies, I’d like to point out that this little code segment doesn’t return anything as much as it evaluates to something. print and echo both do something but, like a function, print has the option to return a value. The ridiculous bit is that print always returns 1–according to the manual. What’s the point of returning a value if the value is meaningless? Maybe it’s a legacy thing: did print at one point indicate whether it was successful or not at printing?

This rant in particular is kinda pointless, isn’t it? Well, maybe. There’s a much more detailed discussion on the differences between echo and print. While minor, some of these differences are important and may bite the new programmer with unexpected behavior.

There’s always a silver lining

As much as I gripe about the eccentricities of PHP, I should add that it does provide some useful features and advantages. Infrequently used functions like money_format() could be useful for further internationalization of a web application; though, this particular feature is of limited use when contrasted with the likes of currency formatting in .NET and Python’s locale module, both of which automate the process. As of PHP 5.3, it appears that the NumberFormatter class may provide a service similar to .NET’s formatter. Better late than never.

PHP also has a massive library of built-in functions. Functions exist for nearly everything if you have the supporting modules built and building them isn’t very difficult. Generally, there isn’t much of a need to “roll your own” functionality and those few edge cases that require it can probably find a solution on PECL or in PEAR. This benefit isn’t free, and it’s accompanied by the drawback that most of PHP’s functionality is implemented via extensions written in C.

Another advantage to PHP is that it is wildly popular. It has supplanted many other languages to rise as the king of web applications. Though, as of 2009, it seems that PHP is losing ground mostly to Ruby, some Python, and maybe Java. (I don’t really subscribe to the notion that .NET is gaining significant ground because of certain questionable practices.)

PHP’s popularity does do a pretty decent community service for us: For one, it serves as a great benchmark for testing web hosting providers. Chances are, if a particular provider doesn’t offer a PHP package you want to run away from them very quickly in the opposite direction. Also, it might help to scream loudly. Anyone worth their salt offers PHP, at least for now.

There are also hundreds upon hundreds of useful applications written in PHP. In fact, you’re reading this post via one. PHP has also spawned dozens of commercial source-provided applications like vBulletin and a few additional market sectors in its own right. Lots of us IT types make a living off of PHP, so I think that’s justification enough to praise the language for what it does.

I’d also like to argue against those who suggest PHP’s “lowering the bar of entry” into web application development is a bad thing. How? So there’s thousands of novice developers who don’t know the first thing about security–big deal. Everyone has to start somewhere and they may as well do something useful while they’re learning. I suppose they could start with C or C++, but it’s my understanding that these languages involve lots of rope and something to do with ships. Besides, telling the novice to start with a language that isn’t immediately useful to them is sort of like teaching someone to ride a bike when the nearest town is 50+ miles away. Sure, it’s handy, but it’s not immediately useful.

We’ve all written awful code in our lives. I know I have. Heck, whenever I have to write something really quickly, sometimes the quality suffers. It’s one of those things that you know you could do better, but if you’re writing a tool for a very specific purpose that you’re only going to write once, there’s not much point developing it so that you can add plugins, features, extend it, and so forth at a future date. Though, if the need to use the tool arises again it might be a good idea to sincerely consider such options. Likewise, there’s a lot of PHP code in the world that is designed to do one purpose. It may not be especially pretty but it does the job well, and I think that sometimes that’s all that matters.

I hope you’ve enjoyed reading this rather lengthy rant on the world of PHP as much as I’ve enjoyed writing it! Stay tuned for future installments. I haven’t any idea what my next target will be, but rest assured I’ll think of something.

Be safe. Code for fun.

Post comments and corrections below.

***

Leave a comment

Valid tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>