Software - Benjamin Shelton's Musings

“I hate systemd” and other Ill-conceived Diatribes

Saturday October 24th, 2020

It’s a popular statement in a world where many distributions have standardized on systemd. “I hate systemd” comes the quip–a statement designed to evoke emotion rather than contemplation. Just a mere mention of Lennart Poettering provokes near-Pavlovian salivation in some persons afflicted by such a malady, and they often haven’t the foggiest notion why beyond it’s different.

This post isn’t intended to be a deliberate defense of systemd, although it can certainly be construed as such. There are valid reasons to eschew its use, just as there are equally valid reasons to praise it for its novelty–and its belligerence to upend established convention surrounding sysvinit. What I hope to accomplish herein is to reduce the emotionally-charged nature of the anti-systemd response and convince proponents that opposition to such views isn’t antithetical to strongly held convictions of traditionalism in the context of UNIX design and philosophy. Whether they come away from this in continued opposition to systemd (or not) is largely uninteresting to me; I’d much rather someone walk away with a better understanding of their own views (and opinions) rather than convince them otherwise.

It has been my experience that systemd opponents typically fall into three camps: The first, people who have a limited understanding of systemd but have read volumes of opinion pieces explaining how bad it is, accepting opinions as truth; the second, people who feel it violates traditional mores of the UNIX world; and the third (and smallest) group who disagree with its adoption strictly on technical grounds. The latter group is unlikely to glean anything useful from this post as their opinions are likely founded on reason and experience (though I may disagree). The former two find themselves in opposition mostly by force of ignorance. Fortunately, ignorance is curable (albeit powerful), and worst case, we can lift those readers into the objectivism of the third category: Informed but still in opposition.

Readers who are mostly indifferent to systemd because they’re uninterested in learning it or are satisfied with the sysvinit (or sysvinit-alike, e.g. OpenRC) that was installed with their distribution are not considered the target audience. Though they may collect useful arguments, either for or against, I don’t expect them to find much else of interest. They may opt to skip this post entirely.

systemd Violates UNIX Principles

The concept that systemd grossly violates UNIX principles is an argument that usually establishes a few key points (erroneously): 1) systemd is monolithic; 2) systemd violates the principle of “everything should be text;” and 3) systemd is unnecessary complex. On occasion, this may coincide with a fourth point that establishes systemd as an unnecessary replacement for some subsystems or forcibly requires the use of its own internal replacements (such as for syslog or DHCP).

Of these, the third and fourth (surrogate) arguments bear the most weight. As such, I will address them last.

Is systemd monolithic? Yes and no. Yes, the PID 1 replacement that fundamentally exists as “systemd” in its binary form comprises a lot of moving parts, but it’s helpful to understand that systemd as a service runner exposes a significant number of kernel internals: cgroups, capabilities(7), namespaces, read-only file system views, complex service dependency management, and more. systemd is complex but it’s not necessarily monolithic.

Indeed, browsing through the number of binaries in a typical systemd installation will expose a wide assortment of services that perform one specific task. systemd-networkd, for instance, can manage network interfaces. Its companion, systemd-resolved, handles resolver configuration via resolv.conf (and honestly does a much better job of it than dhcpcd hooks or resolvconf).

Does systemd violate the principle that everything should be text? Not really. Whenever this gripe surfaces, it’s usually framed in the context of the systemd journal which does store its output in a binary format. It can also be configured to forward its data to syslog, but I don’t think this argument matters. journalctl comes with tools that can transparently read its binary form just fine thank you very much, along with filtering options that are arguably more powerful than your typical less/grep inspection can muster. In fact, to head off the argument that it doesn’t use “standard tooling,” I might argue that syslog doesn’t either–you have to use other user space tools to open and search through the logs; tools that have become a de facto standard through longevity. Nay, the difference exists mostly in the reality that systemd-journald’s output can’t be read by a tool the system administrator might author independently. Leastwise, not without some work.

There is a strength in what systemd does that isn’t easily replicated via syslog. As an example, it’s possible to configure remote hosts to pack up their binary logs and ship them to another location for centralized logging. Yes, you can do this with one or more syslog distributions, but it’s not easy. Compare this with systemd-journal-remote(8), systemd-journal-gatewayd(8) and journal-remote.conf(5), and you’ll learn it’s a simplified process that does little more than upload binary blobs to a remote HTTP API. Bonus: Because it’s a binary format log, you can selectively extract entries from a remote journald instance. Yes, really.

Aside: I recognize some astute readers will find it cleverly ironic that I’d reference three separate manual pages in the context of the claim that remote logging is easier in systemd, whereas their favorite solution requires reading one or two (and ancillary works; web searches; or more). The facts aren’t quite so malleable: In systemd it’s a matter of enabling the correct services and applying the appropriate configuration changes. There isn’t much need to do anything else.

Returning to our original train of thought: The third dispute, “systemd is too complex,” is a matter of whether all this complexity is useful. Most arguments against systemd’s complexity tend to focus on everything it can do rather than what it does, and I think this disconnect stems from a misunderstanding of what systemd actually is. In particular, it’s apparent that the complaint isn’t really about the complexity of the entire corpus of what systemd can do. It’s a complaint following from the belief that it does all of this by default. This isn’t true; many of the systemd services (systemd-networkd, systemd-resolved, and the more recently notorious systemd-homed) are entirely opt-in. They aren’t necessary and they aren’t typically activated by default (though some distributions make that choice).

I would argue that this complaint ought to focus instead on systemd’s reliance on dbus for its internal messaging apparatus. dbus itself is fairly complex (it’s a message bus…), but it also allows systemd to do a lot of its work under the hood by using an existing message passing system rather than reinventing its own (a surprise to some!). Perhaps it could be argued that repurposing a desktop bus was something of an ambitious choice on Poettering’s behalf, but again, it’s not an illustration of a violation of UNIX principles. If anything, repurposing existing tools should be praised as an example of avoiding systemic problems common among Not-Invented-Here adherents!

At this point, I would presume this section has implicitly answered, by proxy, the question of whether systemd’s replacement of conventional tools is strictly necessary or desirable. If not, then I would posit that more competition is good. As an example, systemd-networkd is far easier to configure and start than DHCPcd or dhclient. systemd-networkd has supported DUIDs out of the box for quite some time, and if you examine the contents of /run/systemd/netif/leases/*, you can copy the DUID between installations to retain static IPv4 assignments leased via DHCP. I’ve done this. (Yes, you have to chop the identifier up a bit, but that’s beyond the scope of this essay.)

systemd Integrates Itself too Deeply

systemd is an init (PID 1) replacement process. Deep integration is its job.

Okay, I get it: systemd as a whole is replacing “too many” established packages. As an example, it contains replacements for one or more of the following: DHCPcd/dhclient, resolv.conf manipulation tools, NTPD, timezone management (more on this in a minute), syslog (we’ve touched on this), hostname management, cron replacement, and probably a dozen other things that I haven’t thought about while writing this post.

I would argue this isn’t strictly a bad thing. Is competition against current DHCP clients particularly egregious? I’d think not–you can still use them if you like. systemd-resolved takes much of the guesswork out of configuring a DHCP client and its helpers to properly update resolv.conf. NTP clients are in the same ballpark–it’s entirely opt-in. syslogging, well, we’ve touched on that. And so on goes the list.

Of course, cron replacement seems to be a particularly touchy point. I don’t really know why, because many of the DIY distributions like Arch and Gentoo don’t actually ship cron by default. You have to install one yourself. Then you have to configure it. Although I’ve never made much use of systemd timers, I will say that shipping software targeting systemd means that if I also ship a systemd timer, I no longer have to worry about whether someone has a cron correctly setup and configured on their system at all. This means that if they deploy a bare container image (say LXD) containing a recent Debian or Ubuntu image, they can install my software and expect it to perform periodic tasks as required without any further intervention. systemd timers also do a bit more than cron. Suggested reading: systemd.timer(5).

And, well, let’s be honest. The crontab syntax is just a little eccentric. Sure, it has its charm, and it’s very compact and easy to read (ahem–once you learn it), but are we defending cron on merit or by way of inertia? Contemplate that question for a while before you continue.

So: Timezones. What exactly does systemd need to manipulate the system timezone for? Well, the short answer is that it doesn’t. The long answer is that the traditional way to configure a timezone for the system was to either allow the installer to do it or to manually place a symlink at the location /etc/localtime pointing to the appropriate zoneinfo file in /usr/share/zoneinfo. systemd itself doesn’t actually manipulate this file, but it does include a tool (timedatectl) that does this for you. Admittedly, this tool does many other things, but it also configures the local timezone for you. Is it worth replacing manual invocation of ln(1)? Probably not, but it’s not exactly doing anything new or particularly troublesome either.

systemd is Pushing Toward a Monoculture

I won’t deny this is a real risk. As of this writing, 8 of the top 10 Linux distributions according to DistroWatch.com use systemd as their sysvinit replacement.

(Ignoring FreeBSD, because I know someone will say “but there’s 11 on that list!”–even though it’s not Linux.)

This is both good and bad. I’ll start with the bad.

Any time there’s a software monoculture, particularly one that’s controlled by a comparatively small number of people, there is a real danger of locking ourselves into a specific mindset. The irony is not lost on me that this is the reason for one of the chief complaints against systemd: Traditionalists who cling to their archaic script-based sysvinit lock themselves into the mindset that sysvinit should only be done with scripts. systemd, by virtue of its market penetration, presents a quandary that we may eventually conclude systemd is the only way to do things. This is bad, but the fallout from this is comparatively minor versus, say, macOS which has no other system besides launchd (from which systemd drew substantial inspiration).

For one, I would imagine that there will always be distributions using alternatives to systemd, even if only as a token gesture to appease the naysayers. Gentoo, while it supports systemd as an alternative init system, uses OpenRC by default. Slackware has options to use a traditional sysvinit or OpenRC. Devuan formed as a consequence of Debian switching to systemd (although I believe users can choose among other inits). Void Linux, in what is probably one of the most novel approaches of a recent distribution, thumbs its nose at convention and elects to use runit instead. While it’s true that most distributions–arguably consuming the plurality of users–have standardized on systemd, I don’t believe there’s any real risk that systemd will become a true monoculture.

As a developer, I think systemd is a good thing for a couple of reasons. First, I can distribute unit files with my software that will initialize it exactly as I intended, and it’ll work across any system running systemd. Provided there’s a fairly recent kernel in place, I can have access to each of the features (see above) that can be used to harden running processes. I don’t even need to write distribution-specific initscripts to ensure the process(es) start up as I’d expect. Maintainers can be happier too, because they then only need to copy out whatever unit files were included with the distribution, patch them as they see fit (if necessary–usually not), and go about their business. Second, if I target a system with systemd, I don’t have to worry about specialized packages like supervisord. Instead, I can be reasonably assured that the process supervisor for the entire system will do exactly what I want. Process fails? It’ll restart. Complex dependency chain when distributing microservices? No big deal; systemd will manage that for me.

The best part? All of that comes for free.

Does this mean I don’t think we can do better? Of course not. I’ve heard it stated before that, paraphrasing, “systemd is a disaster, but whatever comes after systemd will be better than what we have now.”

There may be some truth to such a statement, but I don’t think the statement itself is necessarily a reflective (or is that reflexive?) truth either. systemd can be improved upon, sure, but it exposes a large feature set that once required special tooling (C wrappers, anyone?). More importantly, systemd can be used now. Not next year. Not in 5 years. Not in a decade. Now.

I think the push-back against systemd is potentially dangerous, because it risks frightening off people who might come into the field with new ideas. Seeing how Poettering has been attacked personally (he’s even received death threats–yes, really), they may decide it’s not worth the trouble. That would be criminal.

If any time a new idea surfaced, we immediately bowed to pressure and confessed the naysayers were right before so much as beginning the journey, we’d still be a tribalistic people wandering the plains.

systemd was Written by Lennart Poettering–Just like PulseAudio

You got me. I have nothing else to say.

No, really. I’ll be honest–I’ve seen this argument before, and I’ve seen it it presented as a straight-faced counter to explain why systemd is so awful. This argument is anything but objective and seeks to paint systemd based on either the personality (or personalities) behind the project and on his previous work. I’m not sure this is a particularly good argument, because PulseAudio does actually resolve some long standing issues with Linux audio.

If you don’t know why, then it’s plausible you’ve never dug too deeply into PulseAudio before. But, to humor you: If I have multiple audio cards outputting to multiple devices, I can easily switch between them (think speakers and headsets) by changing the sink output from the mixer. That’s something you can’t easily do from Windows without opening the sound options and fiddling around with default output devices. In Pulse, it’s literally three clicks: Open the mixer, right-click the application, and select the output device.

Let’s be honest: “I hate PulseAudio” is absolutely not a valid argument against systemd. It’s intellectually lazy. It’s strawmanning.

Conclusion

systemd’s criticisms are certainly not without their merits, and I think its worth looking at its deficiencies in the context of what it does right as well as what it does wrong. systemd isn’t perfect–no software is–but I think there’s an argument to be made that sysvinit and sysvinit-compatible init systems are long in the tooth. It’s good to see that there are distributions exploring alternatives (again, Void Linux) and that many others have standardized on an init system that solves long standing issues with process supervision and dependency resolution.

Once upon a time, I used to rely on supervisord to manage multiple processes and to provide some guarantees that if an application failed, it would be restarted. Before that, I relied on DJB’s daemontools (hello qmail!). Each of these solved the deficiencies that existed in traditional sysvinits–and did a darn good job of it. Having said that, I think it’s time that PID 1 finally take control over process life cycle management. Windows had this for a long time. It’s time Linux did too.

No comments.

***

My Journey with ZFS on Linux

Friday February 17th, 2017

Toward the latter half of 2016, I decided to convert my home file server’s file system to ZFS as an experiment on real hardware in my own test environment. I knew before embarking on this journey that ZFS on Linux (hereafter referenced as ZoL) is stable but not production ready. The experiment is nearing its end, and I’ll be switching to an mdadm-based array with ext4 for reasons I’ll be exploring in this post. If your storage needs require the redundancy and self-healing properties of ZFS, consider other operating systems where it’s better supported and stable, like FreeBSD.

This post illustrates my thinking as of early 2017 and why I’m not going to continue with ZoL (for now).

Conclusions

I don’t wish to bore you with excess detail, so I figured I’d present my conclusions up front. You can then decide if the rest of this post is worth reading. Let’s begin exploring.

First, using ZoL in a home file server environment presents us with the following pros and cons:

Pros

Transparent compression can save a surprising amount of space with with LZ4 (useful!)
Lower administrative overhead (a wash; not always true in practice)
Data integrity guarantee (always make backups)
Self-healing (always have backups)
It’s not Btrfs (you’ll probably use your backups)

Cons

May require more extensive deployment planning
Some applications may require dataset tweaking
Administrative overhead not always as advertised (see above), but this is mostly the fault of the current state of ZoL
Poor performance for some workloads (especially databases; versus ext4)
Lack of recovery tool support limits possible restoration options when things go wrong (backups!)

Noteworthy bullet points when using ZoL:

Stable but not production ready (an important distinction!)
Upgrades will be painful!
/boot on ZFS is possible but…
DKMS ZFS modules are a horrible idea; don’t do this–don’t ever do this
Always create bootable media with ZFS tools installed in case something goes wrong (it will–don’t forget backups!)

The Meat and Potatoes of ZoL

I would not recommend ZoL for anything outside a test environment at this time (I’ll explain in a moment). ZoL may be useful for long term storage or a backup NAS box if you’re particularly brave. However, if you plan on deploying ZFS permanently or at least deploying it in a long term installation, I’d recommend using an OS with which ZFS is tightly integrated; recommendations include FreeBSD and FreeBSD-based distributions (including FreeNAS) or OpenIndiana-based platforms. I’d also shy away from using ZFS on systems that are intended to be used as general purpose machines; in my experience, ZFS really shines in a heavily storage-centric configuration, such as NAS, where throughput isn’t as important as integrity. Outside this environment, ZFS may be a performance liability, and current benchmarks demonstrate underwhelming behavior when used as a backing store on Linux for databases. In effort to work around this, ZFS requires planning and tuning for use with RDBMSes and may also impact write-heavy applications. Read-only loads are something of a wash–file system choice is better made with regards to workload requirements: Is raw performance or data integrity more important? Note that and the latter–data integrity–can be solved via other means, depending on use case, but ZFS’s automated self-healing capabilities are hugely attractive in this arena.

As of early 2017, ZFS on Linux is still exceedingly painful to install and maintain for rolling release distributions. In particular, on Arch Linux, ZFS will cripple most upgrade paths available for the kernel, itself, and applications that require ZFS dataset tuning. It’s almost impossible to upgrade to more recent kernels if you pride your sanity (and system stability) unless you expect to wait until such time as the latest kernel has been tested, new ZFS PKBUILDs are ready, and you’re open to an afternoon of potential headaches. Generally speaking, the upgrade process itself isn’t frustrating, but it should always be preceded by a fresh backup–and keep your rescue media handy!

Never use ZFS with DKMS in effort to shortcut kernel versioning requirements, even if upstream ZoL appears to support the new version–the package you’re using may not be updated yet, and the AUR DKMS PKGBUILDS for ZFS are not as stable or well maintained as the kernel-pinned packages. With DKMS, if there’s a mismatch, even slightly, you risk potential kernel panics and a long, painful recovery process. I was bit by this early on and discovered that some kernel panics aren’t immediate; instead, they may occur after an indeterminate period of time depending on file system activity, and the recompilation process of ZFS and SPL will consume a sufficient amount of time such that your system will likely panic before the build completes. Stick with the version-fixed builds; don’t use the DKMS modules. Of course, this introduces a new problem: Now you have to pin ZFS to a specific kernel version, leading to extra work on rolling release distributions like Arch…

It’s necessary to plan for a lot of extra work when deploying ZoL. Estimate how much time you’re likely (or willing) to spend and multiply it by two. I’m not even kidding. ZoL on distributions without completely pre-built packages (possibly others) require all of the following: Building SPL, its utilities; ZFS and its utilities; installing unstable versions of grub if you plan on keeping /boot on ZFS (with the implication that you’re one upgrade away from an unbootable system at all times); dataset tweaking per application; and potential bugs. Lots of them. When I first deployed ZFS on Linux, I was completely unaware of a deadlock condition which affected the arc_reclaim kernel process, and everytime I’d run rsync, arc_reclaim would hang, CPU usage would spike, and it’d be necessary to manually intervene. To say nothing of the RAM usage…

ZFS performance under Linux is also relatively poor for my use case. While read/write speeds are acceptable, it’s nowhere near ext4, and it’s likely slower than ZFS on the same hardware running FreeBSD. Furthermore, the memory pressure due to the ZFS driver’s ARC failing to release RAM back to the system (it’s supposed to, but I’ve yet to see it in practice) under memory-intensive operations can cause an out-of-memory condition, swapping, and a potentially fatal invocation of the Linux OOM killer. For this reason alone, I could never recommend ZoL on a general purpose server. If your deployment is intended exclusively as a NAS and you have literally no other services besides NFS/Samba, ZFS’ memory usage would be perfectly fine, provided it’s on a system with 8+ GiB RAM (though you should have more). Then there’s the possibility of L2ARC devices, caching devices, and so forth. If you’re planning on running other services in addition to a ZFS-backed NFS server, such as GitLab or Minecraft, you’ll quickly find that striking a balance between RAM allocation for the ARC versus other applications becomes a somewhat tedious chore. In fact, I might even venture as far as to suggest that you shouldn’t consider running ZFS plus other applications on anything less than 16 GiB RAM–preferably 32 to hand off a nice big chunk to the ARC, particularly if you plan on expanding drive capacity, and you still shouldn’t run anything you don’t absolutely need (seriously: Make it a pure NAS).

Tweaking ZFS for database loads doesn’t seem particularly noisome–certainly not on the surface–until you encounter one or more upgrade cycles that require more than just a dump/load. If you follow the default Arch Linux upgrade process for PostgreSQL, you’ll quickly find ZFS less flattering than alternatives. Not only is it necessary to tweak the recordset size in addition to a few other file system attributes for performance reasons (though you only do this at dataset creation), but suddenly, following the Arch upgrade guide by moving the old data directory and creating a new one is plagued with matters of shuffling around extra files, remounting previously tweaked datasets to the appropriate location, and more. In my case, I had to copy all of the original data to a new directory, wipe the old mount point for both the data directory and the write-ahead log, then create a new data directory in the old dataset mount point, copy the WAL into the WAL-specific mount point, mount it at pg_xlog, and only then could I complete the upgrade process. MySQL on ZFS is generally easier to upgrade, in my experience, but I also use it much less frequently. Be aware that MySQL still requires dataset tweaks, and the tweaks applied depend on whether you’re using primarily MyISAM or InnoDB. I’ve not experimented sufficiently to understand whether it’s possible to tweak datasets for individual storage engines.

Of course, there’s a few other relatively minor niggles that depend on your specific case. For example, grub2 naturally has no understanding of ZFS (or XFS for that matter), so it’s necessary to install grub-git if your /boot partition isn’t on a separate ext3/4 install. Under Arch, it’s also necessary to make certain your initrd is correctly configured via /etc/mkinitcpio.conf, and it’s almost always a good idea to re-run mkinitcpio even after upgrading or installing the kernel just in case it didn’t pick up your ZFS modules (you may need the binaries, too). Otherwise, you’ll be resorting to your emergency boot media to fix the dilemma (you did create it, didn’t you?).

A Less Optimal Solution

I consider the experiment with ZFS on Linux a fantastic success even though I’m already planning to migrate away from it. For my needs, I’m reluctant to run FreeBSD even though I’ve used it for similar purposes in the past. Thus, I’ll be reinstalling the machine with a combination of ext4 + mdadm (actually copying it back over, but there’s no functional difference insofar as downtime). In retrospect, I’ll probably miss ZFS’ transparent compression the most. Given my relatively modest data size and the fact that it defaults to lz4 compression (speed optimized), it’s surprising that it’s saved close to 200 GiB of storage! No other file system, save for Btrfs, provides transparent compression, and in spite of the integrity guarantees ZFS provides, I think compression is a far more pragmatic upside since its impact is real and immediate rather than theoretical.

Although I’d like to wax philosophical about ZFS’ touted benefits, I still can’t help but think it’s solving a problem that is gratuitously overblown. Perhaps bitrot is a real, material risk, but I’ve rarely been affected by it (ancient backup CDROMs notwithstanding). Has it affected my archives? Almost certainly so, but if it has, it’s never had an impact on photos or media, much less other, more critical data; the few times I’ve performed checksum validation of archives versus physical disk contents, I haven’t encountered a mismatch. Indeed, although it’s a problem ZFS is almost tailor-made to fix, it still doesn’t beat regular, extensive backups. Of course, that assumes you have a mechanism in place that would prevent your backups from being adulterated or overwritten by later, corrupted snapshots (and that your backups aren’t subject to bitrot as well), but I think Google’s solution here is far more apropos: Keep no fewer than three copies of your most important data. Surely one of them, statistically, will survive.

You’ll notice that I haven’t mentioned ZFS snapshots (or send/receive), because I’ve yet to encounter a circumstance (besides upgrades, perhaps) where they’re useful to me. While I’d like to use them with containers, there’s still the very real problem of running software inside a container that requires dataset tweaking, and there’s also the specter of lingering problems with ZoL’s implementation of ZFS which has had problems as recently as last year with snapshots (mostly with send/receive if memory serves). In my case, I tend to avoid advanced features if there’s a risk of causing damage because they’re either not well-tested, buggy, or have had a recent history of inducing failures. But alas, I’m conservative at heart; I’d happily poke about with ZFS snapshots in a virtual machine to see how they work, but I’m much less happy about using them on a real server that’s doing real work where downtime would likely interfere with important activities when those same kernel drivers are of dubious stability. I also have no other ZFS systems where send/receive would benefit me.

There is an alternative file system some of the more astute readers among you may have noticed in my list of omissions: Btrfs. I considered Btrfs for my server, testing it briefly, but at the time (mid-2016), I encountered some evidence that suggested Btrfs may not be particularly stable in spite of it being among the list of default file systems for some distributions. Btrfs’ tools feel lacking, dampening my confidence further.

The Btrfs authors have as recently as August 2016 admitted to substantial, possibly unfixable problems with then-current Btrfs’ RAID5/6 implementations. Although I’m running a simple mirror, the fact that such bug would be present in a file system some distributions have optimistically labeled as “stable” is worrisome (but just don’t use its RAID5/6 features–or whichever other features happen to be broken). I’ve seen comments from as early as 2014/2015 lauding the benefits of Btrfs as a stable, tested platform, but I have serious reservations substituting caution with optimism, particularly when 1-2 years later in 2016, it would appear such optimism is horribly misplaced. Consequently, I don’t see Btrfs as a viable alternative (yet!), and that’s without addressing Btrfs’ performance history with regards to PostgreSQL and other databases. This may change, eventually, and Btrfs does look promising. There are some valid criticisms that Btrfs is simply reinventing ZFS (poorly), but being as ZFS will likely never be included in the kernel due to licensing conflicts, Btrfs is better poised to reach parity with the likes of ext4 and company. I’m not optimistic about the Btrfs timeline, and while I’d be surprised if it attains feature completeness before 2020, I do believe many of its misgivings will be resolved well before then.

Back to ZFS: Will I revisit it in the future? Absolutely. If I do, I may put together a FreeBSD NAS for additional storage or the likes. For now, however, ext4 is suitable enough for my needs. It’s certainly good enough for Backblaze, and while I don’t share their requirements, I take a more pragmatic approach. If ext4 is good enough for them, it’s good enough for me.

No comments.

***

To 404 or not to 404?

Wednesday July 9th, 2014

If you’ve been following my (admittedly) rare posts over the course of the last few years, you’re likely to have noticed a growing aggression toward PHP as a language. It’s not that I hate the language per se (although I sometimes do), it’s that there’s so much crap written in PHP that it’s almost impossible to find something well-written that’s pleasant to work with (major props to Fabien Potencier of SensioLabs for writing some of the best PHP I’ve ever had the pleasure to work with–see, I can be even-handed!). There are some positive developments, particularly in commercial PHP, but all things considered, we’ve a long way to go.

XenForo (as an example) is leaps and bounds better than vBulletin in terms of actual design. They use a well-tested framework (Zend), mostly keep to the philosophy of separation of concerns, and isolate components from each other without polluting the global namespace (hi, vBulletin!). However, (you were waiting for this, weren’t you?) the coding style still remains abysmal (what’s wrong the K&R style, guys? seriously!) and it’s almost infuriating how much magic is used to create classes on the fly for various tasks. Need an example? Just look through some of the deferred tasks: Nearly everything is generated by factory classes that accept a class to instantiate as a string argument, then return the instantiated class. This absolutely screws with a developer’s ability to rationalize about types being passed around, and if it weren’t so infuriating, I think most of us would break down in tears. Do you want to see a grown man cry?

Yes, I can understand why the XenForo developers have made their choices (no need for a long switch statement, object selection for instantiation becomes the concern of the calling code, etc.), but it’s absolutely a tremendous pain in the arse when one happens to be trawling through the sources to figure out why something isn’t working. Let’s not even get into the whole bazillion singleton methods splattered everywhere like the ground zero of a misplaced spittoon. How’s that for a visual? I think I just gave myself a fist-bump.

But alas, I distract myself with unnecessary things. This post is supposed to be about 404 errors, isn’t it? And here I was about to launch into an argument about how there’s so many libraries out there that support dependency injection, you’d have to be insane to use a hundred different singleton classes.

Today I encountered a relatively curious thing with XenForo. John (of Forum Foundry, Inc.) was having some difficulty with the nginx configuration on one of his sites and had convinced himself it was his fault because of something he had changed (it wasn’t). Something, somewhere was causing a 404 page to crop up every time users would attempt to reset their passwords. It was bugging him, and he asked if I’d take a look. So I did.

After examining his nginx configuration, I was convinced he did nothing wrong. But disconcertingly, I grew increasingly more convinced that it wasn’t nginx’s fault, either. Strangely, though, the 404 page was being spat out by nginx–not by XenForo–so it seemed impossible for the fault to lay elsewhere. It had to be nginx. Yet, I found myself battling with a sort of cognitive dissonance: Posting (using curl or similar) to the lost-password URL returned a 200 OK. Using GET (or other methods) returned a 405 Method Not Allowed (as expected). If it were the fault of nginx, the 404 error should be returned regardless of the method (GET, POST, or otherwise).

We tried various things, mostly involving the nginx debug log, but couldn’t quite get a handle on the source of the problem. I was pretty sure PHP was somehow at fault, but I couldn’t duplicate bubbling a 404 error up from PHP to nginx. Puzzled, I sat back and thought about the problem, and on a whim decided to do some digging around. It wasn’t long before I stumbled on the documentation for nginx’s fastcgi_intercept_errors directive, which seemed as if it might exacerbate upward error propagation, from application code. From the documentation:

[fastcgi_intercept_errors] [d]etermines whether FastCGI server responses with codes greater than or equal to 300 should be passed to a client or be redirected to nginx for processing with the error_page directive.

What does this mean? It means that, if fastcgi_intercept_errors is enabled, HTTP response codes with a value of 300 or greater (of which 404 is one) will trickle up and be intercepted by nginx regardless of the application (PHP) layer’s intent. Thus, even if the application displays something useful with the 404 message, nginx will intercept the response and display its own spartan 404 page.

Then a lightbulb came on.

Around line 63 or so of the XenForo lost-password handler, a 404 response is generated if a user’s name or email address doesn’t exist when a request is submitted to reset their password. This means that, in all likelihood, if someone mistypes their username (or email), they’ll receive a 404 message from nginx and no error message from XenForo. Rather than attempting to hit the back button, change their submission, and try again, the affected user is likely to believe the site is broken and they are therefore unable to reset their password. This couldn’t be further from the truth.

But it gets better. I didn’t stumble on fastcgi_intercept_errors initially, because I couldn’t get the 404 responses to bubble up to nginx for interception on my system (Arch Linux). The reason for this is that the /etc/nginx/fastcgi_params that ships with Arch is fairly sparse (the intent being that if you need more specific parameters, you add them yourself). But on this server, that wasn’t the case, because fastcgi_intercept_errors was enabled. Moreover, tagging along with it was a number of other parameters (I initially though it was the fault of Ubuntu), but it became clear that these were not the defaults that shipped with nginx (it doesn’t ship with fastcgi_intercept_errors enabled, for instance) or Ubuntu. Thus, any time a 404 error was dispatched from the application code, nginx intercepted it and displayed its 404 error page instead.

How many others might be affected by this? Who knows. If there’s a tutorial out there that recommends adding fastcgi_intercept_errors to /etc/nginx/fastcgi_params (it turns out there might be–read the update below), it goes without saying that the suggestion is a terrible idea if you expect your applications to generate and handle 404 errors (or greater). Of course, for high performance configurations, you probably want nginx to do as much of the heavy lifting as possible, but in this particular circumstance, we encountered an issue where using certain configuration options can yield unexpected results. In other words, don’t blindly copy and paste “suggested performance tweaks” from random Internet blogs without first consulting the documentation. The results may surprise you. (I don’t know if that’s what happened in this case, but I certainly can’t rule it out. For all I know, it may have been–and probably was–an accident.)

Is this the fault of XenForo? I’d argue no. While sending a 404 error in response to members that don’t exist who have attempted to reset their passwords is probably intended to be a RESTful response (not a member? respond with a 404) that does something unexpected (nginx 404 instead), it’s not a bug per se. However, it’s a potential privacy concern: Anyone who has access to a large corpus of email addresses can deduce based on the response returned by XenForo whether or not those email addresses are registered with the server. By doing so, a potential attacker can then target those accounts that are known to exist on the server for nefarious deeds (or violate their privacy).

But here is where we diverge into a matter of directed concerns: Do you provide users (some of whom might be terrible typists) with immediate feedback, indicating that the information they submitted was wrong, thereby increasing the usability of the site; or do you protect users’ privacy by generating the same response whether or not the information was submitted correctly? There is no right or wrong answer to this question, and generally it boils down purely to the matter of usability. Do you value ease of use or privacy? You can’t have both. Therefore, the answer rests on the shoulders of the site operator to decide. For sites that value users’ privacy, perhaps it’s better to sacrifice some usability for security.

For everyone else, it’s probably best to leave /etc/nginx/fastcgi_params at its default unless you absolutely must change the defaults. Better: Place the tweaks you need in a separate file and include that file separately.

Update I think I may have discovered one of the culprits for fastcgi_intercept_errors here, listed as the “Ultimate Speed Guide for WordPress on NGINX.” Hint: Don’t blindly copy everything someone on the Internet tells you is a good idea. If you don’t know what something does, always read the documentation. If you don’t, it will bite you.

No comments.

***