The Blessing and Curse of Standards

Pages: 1 2

Some months back, I was playing around with software I had written to automatically decompress World of Warcraft addons, record their contents, and place them in a single zipped archive for download via our guild site. I ran into some peculiar issues with a handful of WoWAce-based addons that I eventually traced to the developer’s use (incorrectly, I might add) of Subversion externals. Oddly, the archive would behave correctly under Windows while using both the built-in Windows zip utility and 7-zip. That was when I started thinking about the blessing and curse of standards.

Obviously, standards are a necessity in this network-attached world. Without them, we would be unable to communicate as effectively as we do; the Internet certainly wouldn’t exist as it does, and data interchange between organizations would be nearly impossible. Standards are important for many reasons I won’t go into in this short blurb, but I think you can deduce that on your own if you’re not familiar with their application to Internet technologies–literally everything from e-mail (SMTP), web (HTTP), and the underlying communications layers (TCP, UDP, and so forth). Let’s not forget the countless ad hoc and unofficial “standards” that have surfaced over the years that may not have been fully documented and certainly never guided by a proper consortium.

Unfortunately, there are times when the de facto implementation of a standard varies ever so slightly from the printed or official standard. ZIP files are a good example of this deviation in implementation versus standard and has been the source of minor issues like the one I experienced. For some immediate background on this, Wikipedia has a really good write-up about the ZIP standard and its various implementations.

PKWare’s official documentation on the ZIP format indicate that directory paths within an archive MUST be denoted by a forward slash (/). However, there are very few implementations that enforce this and many including PKWare’s PKZip and 7-Zip have absolutely no difficulty in opening archives that use (incorrectly) the backslash (\) for their path separator. Although it isn’t discussed to my knowledge in PKWare’s documentation on the ZIP standard, most applications that handle .zip archives silently do the Right Thing and transform all backslashes into forward slashes before processing the paths.

Ordinarily, you wouldn’t think that getting a “\” mixed with a “/” would be such a bad thing; after all, no sensible file system would ever possibly support such characters in a file name. And certainly no one would ever do anything that might generate a ZIP file that deviates from the PKWare standard. Of course not!

The problem is they do. In fact, they have.

When it was possible to obtain WoWAce addons directly, there were at least a half-dozen archives I encountered which contained backslashes contrary to the design and intent of the standard. What’s more disturbing is that the ZIP utility they were using didn’t bother to transform backslashes into forward slashes.

Now enter error propagation. In error propagation, the general idea is that errors encountered early on in a system are not correctly caught and handled at their source. These errors eventually build until they are themselves encountered by another subsystem that is fault intolerant. Let’s examine the following situation:

  • A particular addon references other addons via Subversions externals. The path used to reference these externals uses a backslash to denote separators.*
  • When the addon is exported via an external ZIP utility, the path is injected into an archive without translating incorrect slash usage.**
  • The addon is downloaded manually or automatically on a non-Windows platform
  • The utility unzip(1L) under most Linux distributions (and possibly *BSD) reads the path information as is and refuses to translate the invalid separators (\) to value separators (/).***
  • The file is decompressed to an NTFS partition using ntfs-3g; the ntfs-3g FUSE driver does not trigger an error preventing the write. Instead, file names are written as their full path. The directory path Addons/addonname/somefile.lua suddenly becomes written as the file Addons\addonname\somefile.lua with those backslashes as part of the file name.****
  • WoW under Wine loads corrrectly but refuses to load the decompressed addon. After all, if there is no path structure supporting the addon and each *.lua file is instead named after the relative path it was intended to be decompressed into, the file technically doesn’t exist where WoW thinks it should.
  • Booting under Windows reveals something more sinister at play: the files containing backslashes in their names are displayed correctly under Windows Explorer but they cannot be deleted. The only way to remove files created in this manner is to reboot into *nix, remount the NTFS partition, and delete the offending file from there.

(*, **, ***, and **** denote the levels at which the error was propagated and passed on to the next stage where the mistake could have been caught.)

Ultimately, there is a question regarding who is to blame: Should Subversion be blamed for supporting backslashes in the externals reference? Maybe, maybe not. The Subversion guide makes clear that the code is intended to run on both Windows and *nix platforms alike; thus, both separators are supported. If Subversion allowed only one separator, this circumstance would have likely never surfaced. But is it really the fault of SVN? I don’t think so.

The ZIP utility WoWAce was using was certainly violating standards by compressing files into an archive using invalid separators. Obviously, this is another violation–but should the WoWAce developers (and their zip utility) be blamed? Once again, the issue isn’t terribly clear. Every other decompression utility that supports ZIPs except the one shipping from Info-Zip handles path references containing backslashes correctly. Thus, when we get to the third stage (***) in our example, the offending utility happily creates files containing backslashes (or attempts to).

(It is noteworthy to mention that extracting such archives under any other file system besides those mounted with ntfs-3g will correctly refuse to create files containing a backslash.)

ntfs-3g shouldn’t allow for the creation of files containing backslashes, of course, but if it refused as ext3, FFS, and others do–correctly, I might add–then the archive will never be extracted! In this circumstance, we’d have to appeal to the developers to recreate their archive using the correct path separators. But what if they refused (and at least one did, claiming that it was a non-issue)? You’re stuck, or you can write your own utility to handle unzipping invalid archives. I did the latter. You can download the script from the second page, too.

But again, why is this even a problem? Clearly, the error could have been stopped at any point during its upward cascade and it wasn’t. Yes, there are many links in the chain which were each at fault to one degree or another, but I think this exercise illustrates an incredibly important point:

Standards can suck.

“What was that?” you might ask.

Here, I’ll say it again:

Standards can suck.

The problem here is that following the letter of the law–as it were–in the form of standards creates a clear violation to Postel’s Law, also known as the Robustness Principle. Postel’s Law is best summarized as: “Be liberal (or relaxed) in what you receive, be conservative (or strict) in what you send.”

To put it another way, when generating data, be absolutely certain that the data you send is correct. While receiving data, be very generous and accept even loose interpretations of the standard to ensure interoperability. In our example, the utilities used to compress and decompress are at fault; the former was not strict in adhering to the standard and neither was the latter liberal in what data it accepted.

This is where standards break down. If Info-Zip did what 80% of the other ZIP-capable archive software on the market does without modification, this would be a non-issue. Strict conformity with standards often breaks interoperability if the standard is poorly written, incomplete, or common implementations (i.e. what everyone else is doing) do things differently and can actually create problems by not adhering to expectations.

I don’t mean to imply that standards are bad. Quite the contrary! I think it’s important to follow standards strictly wherever possible and appropriate. However, it’s also important to deviate from the standard where necessary to ensure the users’ expectations aren’t impaired and interoperability with other software doesn’t break. This doesn’t mean that popular is right; instead, this means that doing the Right Thing often means that you have to be wrong. For that matter, standards aren’t always right, and sometimes standards are broken. Never substitute your own intuition and common sense just because someone says it’s what you should do. There are many well-written and well thought out standards. There are also many poor standards, loose standards, and standards designed after the fact that haven’t a snowball’s chance in Hell of ever being implemented as written.

If you’re interested in observing just how heated a debate can get over something as simple as standards, take a look at this article on ext4 data loss over at Slashdot and read the comments. It’s amazing–to say the least–how some people can be so vehement about what the standard says, or what they think it says, that they throw all common sense out the window. Standards are written by people; until people are infallible, neither will standards be perfect.

I’ll be writing another piece this weekend on developer arrogance as it ties very closely into the standards topic. It does illustrate an interesting concept, so I’ll leave you with some thoughts: Are developers always right? It’s their software, after all, but if their ideas break expected functionality, who should be blamed? The developers for making a poor decision or the users for wanting (or expecting) too much?

Pages: 1 2

***

Leave a comment

Valid tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>