VPNs are No Panacea

I sometimes encounter the question “should I use a VPN?” with the inevitable shower of comments along the lines of “yes, it’ll make you more secure!” or “it’ll protect your privacy!” Occasionally, I see VPNs recommended as a solution against doxxing, such as when someone comments about their profession or business, competitors, or potential employers. Perhaps someone in industry has quipped about sending “anonymous” emails criticizing a particular organization or offered unsavory political opinions that would otherwise get them fired.

First, I should state that I am no security expert. I just happen to write software, and I have a vague curiosity into the world of information security. I enjoy reading the opinions of individuals who are considered experts in the field, and they almost uniformly warn of the same folly: VPNs are no panacea!

I think this advice is offered as therapeutic more than curative. In particular, it seems plausible users attracted to VPNs may place unwarranted trust in the software and provider, engaging in activities that suggest a degree of carelessness. Caution is nevertheless a desirable trait even under the warm embrace of cryptography. I’ll explain why.

A VPN may be useful to disguise your activity if you’re posting on Twitter, and you wish to avoid the danger of clicking on links that may be able to collate information about your activities or track usage behavior. VPNs may also provide some limited protection if you’re prone to torrenting your entertainment (up to a certain dollar amount, after which legal recourse against you becomes economically viable–maybe even necessary). However, even in the latter case, use of VPNs is of dubious utility, and they may not always keep you anonymous. Just this month (April 2019), NordVPN has been the subject of increased scrutiny over sending information to a series of unusual domains (billed as an anti-censorship strategy). Three months ago, NordVPN was also accused of tracking its users. None of this is surprising. Usage of a VPN is surrendering your privacy to a single firm (in most cases) in the hopes they will protect you from others doing naughty things with your browsing habits while simultaneously doing nothing of the sort themselves.

Nota bene: This behavior isn’t limited to NordVPN. They’re simply one of the most popular providers and therefore examined by more people with an equivalent increase in negative press. Regardless of intent, I find I can’t fault them for running analytic tracking on their user base: There are cases where traffic analysis (latency, throughput, timeouts, etc) may be useful for providing better quality-of-service and improving customer experience. In the event of an endpoint failure, I’m sure such analysis can be incredibly helpful re-routing packets toward other endpoints within a margin of acceptable latency. If the unusual domains their applications have directed traffic to this month is an anti-censorship countermeasure, I have to commend them for a bold strategy, even if it makes a few users nervous. To be clear: I neither endorse NordVPN nor am I overly critical of their decisions. I don’t care either way. I don’t use VPNs.

Now we can get to the meat of this discussion.

I believe the most important consideration as a user of a VPN service is to quantify your threat model. To illustrate, let’s take an example from earlier: For most people, doxxing isn’t a significant threat. Those at greatest risk often draw attention to themselves, either deliberately through their actions (whistle-blowers), or through online interactions (gaming, comments, etc.) that turn sour. Some may be victims of cyberstalking. In these cases, a VPN may be useless, because the victims often post sufficient information online to identify who they are, where they live, and numerous other details about their lives that a determined third party can piece together. Simply put: VPNs aren’t magic and they cannot protect you from yourself.

For most of us, our opinions and online interactions aren’t important or interesting enough to attract attention. If you think your opinions are interesting enough to be the target of a harassment campaign, then perhaps a VPN may be useful, but it isn’t the only tool you should rely on. To put it another way, if you’re afraid you might be identified online, you must firewall everything about your life that may be exposed through writing, your interactions with other people, and the media you post.

Ask yourself this question: What’s your threat model?

Most of the people I’ve spoken with who espouse the use of a VPN do so because they’re concerned about their identity being leaked, they may worry about employers identifying them online, they don’t want to become targets of harassment, or they simply wish to share politically unpopular opinions online that might draw the ire of one group or another (this may cross over into any of the prior points). As a free speech advocate, I can sympathize with their desire for further anonymity; losing your job because you’re the subject of a targeted harassment campaign is the antithesis of a free society. Neither should people be subject to hecklers or harassment, especially of the sort that crosses over from the online world to their front doorstep. Unfortunately, this is the world we live in.

A VPN isn’t going to provide unlimited protection against adversaries, and neither will a VPN protect users from disseminating information about themselves to interested but malevolent third parties. They can provide an additional layer of security when using Internet connections in a public location (airports, hotels, coffee shops, etc.), and they may be able to circumvent regional restrictions on entertainment or information (Google, YouTube, Netflix) by the state or licensing institutions. You should not expect a VPN to keep you completely anonymous, but they may be useful as part of a defense-in-depth strategy.

However, cautious use of the Internet can bring you 80% of the way toward a safer online presence. In particular: Don’t click links you don’t trust; avoid sharing secure information with services that are not offered via TLS (HTTPS); if you reply to an unknown third party via email, be cautious of using SMTP with your provider (this may divulge your client IP) and stick with the web or mobile interfaces; and don’t post information about yourself you don’t want publicly accessible. You may not have a choice in some cases, depending on your line of work, so this advice may not be applicable. I do believe it is broadly useful for the majority of people. Take heed.

There are limitations to VPNs that less technologically-inclined consumers may not be aware of. Key to understanding this is to understand the technology behind VPNs (typically IPsec with some authentication layer) and their history as a tool to extend company or school network boundaries off-premise, providing employees and students a means of connecting to internal services. It was never designed as a mechanism preventing the controlling institution (in this case the VPN provider) from classifying or logging traffic. Partial anonymity is a useful side effect but it wasn’t the design goal. Neither was complete security.

VPNs can have surprising utility if your adversary is intermediate tracking, or you don’t trust your ISP. Providers like Comcast have demonstrated this by injecting advertisements into sites their users visit, and others have been accused of using traffic analysis to track user behavior, possibly for targeted advertising. VPNs can protect against this threat by acting as a secure tunnel between your computer and your VPN provider’s endpoint.

Before I conclude this post, I should leave my readers with some particularly interesting tidbits of research that may be helpful in deciding whether your use case justifies paying for a commercial VPN. There was a paper written in 2016 titled “Characterization of Encrypted and VPN Traffic using Time-related Features.” This paper discusses techniques in traffic analysis to determine the protocol and type of traffic transmitted over encrypted connections, including VPNs, and could differentiate between VoIP, browsing, or streaming behaviors. There are other related papers including “Realtime Classification for Encrypted Traffic” (2010) and “Analyzing HTTPS Encrypted Traffic to Identify User’s Operating System, Browser, and Application” (2017); the latter describes attacks capable of defeating countermeasures intended to obfuscate payloads in transit. Although I cannot find it at this time, I also recall reading a paper that presented deep packet analysis techniques to defeat random noise injected into streams, successfully categorizing the encrypted traffic despite efforts to thwart would-be adversaries. This is an area of active research, and I expect with advancements in deep learning and greater access to GPUs capable of training neural networks tuned toward traffic analysis, VPNs may not present significant defense against adversaries that couldn’t already be achieved via other forms of encryption, e.g. TLS. Yes, I am aware of SNI-related information leaks due to how TLS presently works.

To put it more succinctly: You have to decide on your threat model.

No comments.
***

Remediation Service: Windows 10’s Dirty Secret

I don’t use Windows often. Much of my time is spent in Arch Linux except on the rare occasion I have an interest in doing something that requires Windows (typically gaming or Reason). Imagine my surprise when I booted in Windows about a week or two ago and started noticing a series of processes consuming a significant amount of disk bandwidth and appearing to scan the entirety of a) installed applications and b) everything in my user profile directory.

It turns out that sometime late last year (November 2018, possibly earlier), Microsoft released a series of patches for “reliability improvements” which include the “remediation service” that performs a few interesting tasks. Notably, this includes a service that “may compress files in your user profile directory to help free up enough disk space to install important updates.” If you’ve seen sedlauncher.exe in Windows Resource Monitor, it belongs to the remediation service and is the tool design to scan your user profile directory, presumably for files that may be candidates for compression.

sedlauncher.exe‘s malware-like behavior stems from the fact that a) it isn’t strictly launched when Windows Update requires additional space and b) it performs a thorough scan of everything in the user profile directories (pidgin chat logs, pictures, media, desktop files–everything). I assume this is because it is collating a list of files it would compress in the event Windows Update runs out of space based on some heuristic, but what perplexes me is that it is impossible to tell precisely how well a file will compress until the file is actually compressed. Yes, there’s a few heuristics you could apply (it is a file type known to compress well) but these don’t always hold true: Imagine a virtual machine image that contains a large number of compressed archives. VM images do compress well, generally, but only because the contents of the image aren’t typically compressed. But this also presents the question: Why scan for compression targets when there’s already plenty of disk space available to Windows Update? What exactly is this tool doing?

Most guides online direct visitors to one of two solutions: Remove the applicable updates or disable the Windows Remediation Service. The former isn’t a sustainable solution, because the updates will eventually be applied or because Windows’ stellar history of absolutely no security flaws (sarcasm) strongly suggests skipping updates isn’t wise. Curiously, the latter option–that is, disabling the culprit service–appears to be a foolhardy solution as well, because sedlauncher.exe returns, diligently, to its previous state of scanning everything it can access. It’s likely Windows Remediation Service scanners are launched via the task scheduler, but I’ve yet to find exactly where or how.

There is one particular solution that might work. Unlike most other core Windows tools, sedlauncher.exe is not contained in the Windows root. Instead, it resides under C:\Program Files\rempl. This rather bizarre choice suggests Microsoft has a keen interest in packaging this tool separately for other operating systems or wishes to disguise it as an installed application to keep it from prying eyes. You decide.

I’ve found renaming sedlauncher.exe to something else appears to work as a temporarily solution (but only temporary) with the appropriate caveats applied (exercise caution as this may break things). I expect it to be reinstalled with a future update, but for now it won’t be scanning my profile directory for files to assault. Whether this works in your case (or not) is left as an exercise to the reader, but be aware this may break other parts of Windows Update. I have no idea how deep the tendrils of this telemetry run into the dark recesses of Windows 10.

No comments.
***

My Journey with ZFS on Linux

Toward the latter half of 2016, I decided to convert my home file server’s file system to ZFS as an experiment on real hardware in my own test environment. I knew before embarking on this journey that ZFS on Linux (hereafter referenced as ZoL) is stable but not production ready. The experiment is nearing its end, and I’ll be switching to an mdadm-based array with ext4 for reasons I’ll be exploring in this post. If your storage needs require the redundancy and self-healing properties of ZFS, consider other operating systems where it’s better supported and stable, like FreeBSD.

This post illustrates my thinking as of early 2017 and why I’m not going to continue with ZoL (for now).

Conclusions

I don’t wish to bore you with excess detail, so I figured I’d present my conclusions up front. You can then decide if the rest of this post is worth reading. Let’s begin exploring.

First, using ZoL in a home file server environment presents us with the following pros and cons:

Pros

  • Transparent compression can save a surprising amount of space with with LZ4 (useful!)
  • Lower administrative overhead (a wash; not always true in practice)
  • Data integrity guarantee (always make backups)
  • Self-healing (always have backups)
  • It’s not Btrfs (you’ll probably use your backups)

Cons

  • May require more extensive deployment planning
  • Some applications may require dataset tweaking
  • Administrative overhead not always as advertised (see above), but this is mostly the fault of the current state of ZoL
  • Poor performance for some workloads (especially databases; versus ext4)
  • Lack of recovery tool support limits possible restoration options when things go wrong (backups!)

Noteworthy bullet points when using ZoL:

  • Stable but not production ready (an important distinction!)
  • Upgrades will be painful!
  • /boot on ZFS is possible but…
  • DKMS ZFS modules are a horrible idea; don’t do this–don’t ever do this
  • Always create bootable media with ZFS tools installed in case something goes wrong (it will–don’t forget backups!)

The Meat and Potatoes of ZoL

I would not recommend ZoL for anything outside a test environment at this time (I’ll explain in a moment). ZoL may be useful for long term storage or a backup NAS box if you’re particularly brave. However, if you plan on deploying ZFS permanently or at least deploying it in a long term installation, I’d recommend using an OS with which ZFS is tightly integrated; recommendations include FreeBSD and FreeBSD-based distributions (including FreeNAS) or OpenIndiana-based platforms. I’d also shy away from using ZFS on systems that are intended to be used as general purpose machines; in my experience, ZFS really shines in a heavily storage-centric configuration, such as NAS, where throughput isn’t as important as integrity. Outside this environment, ZFS may be a performance liability, and current benchmarks demonstrate underwhelming behavior when used as a backing store on Linux for databases. In effort to work around this, ZFS requires planning and tuning for use with RDBMSes and may also impact write-heavy applications. Read-only loads are something of a wash–file system choice is better made with regards to workload requirements: Is raw performance or data integrity more important? Note that and the latter–data integrity–can be solved via other means, depending on use case, but ZFS’s automated self-healing capabilities are hugely attractive in this arena.

As of early 2017, ZFS on Linux is still exceedingly painful to install and maintain for rolling release distributions. In particular, on Arch Linux, ZFS will cripple most upgrade paths available for the kernel, itself, and applications that require ZFS dataset tuning. It’s almost impossible to upgrade to more recent kernels if you pride your sanity (and system stability) unless you expect to wait until such time as the latest kernel has been tested, new ZFS PKBUILDs are ready, and you’re open to an afternoon of potential headaches. Generally speaking, the upgrade process itself isn’t frustrating, but it should always be preceded by a fresh backup–and keep your rescue media handy!

Never use ZFS with DKMS in effort to shortcut kernel versioning requirements, even if upstream ZoL appears to support the new version–the package you’re using may not be updated yet, and the AUR DKMS PKGBUILDS for ZFS are not as stable or well maintained as the kernel-pinned packages. With DKMS, if there’s a mismatch, even slightly, you risk potential kernel panics and a long, painful recovery process. I was bit by this early on and discovered that some kernel panics aren’t immediate; instead, they may occur after an indeterminate period of time depending on file system activity, and the recompilation process of ZFS and SPL will consume a sufficient amount of time such that your system will likely panic before the build completes. Stick with the version-fixed builds; don’t use the DKMS modules. Of course, this introduces a new problem: Now you have to pin ZFS to a specific kernel version, leading to extra work on rolling release distributions like Arch…

It’s necessary to plan for a lot of extra work when deploying ZoL. Estimate how much time you’re likely (or willing) to spend and multiply it by two. I’m not even kidding. ZoL on distributions without completely pre-built packages (possibly others) require all of the following: Building SPL, its utilities; ZFS and its utilities; installing unstable versions of grub if you plan on keeping /boot on ZFS (with the implication that you’re one upgrade away from an unbootable system at all times); dataset tweaking per application; and potential bugs. Lots of them. When I first deployed ZFS on Linux, I was completely unaware of a deadlock condition which affected the arc_reclaim kernel process, and everytime I’d run rsync, arc_reclaim would hang, CPU usage would spike, and it’d be necessary to manually intervene. To say nothing of the RAM usage…

ZFS performance under Linux is also relatively poor for my use case. While read/write speeds are acceptable, it’s nowhere near ext4, and it’s likely slower than ZFS on the same hardware running FreeBSD. Furthermore, the memory pressure due to the ZFS driver’s ARC failing to release RAM back to the system (it’s supposed to, but I’ve yet to see it in practice) under memory-intensive operations can cause an out-of-memory condition, swapping, and a potentially fatal invocation of the Linux OOM killer. For this reason alone, I could never recommend ZoL on a general purpose server. If your deployment is intended exclusively as a NAS and you have literally no other services besides NFS/Samba, ZFS’ memory usage would be perfectly fine, provided it’s on a system with 8+ GiB RAM (though you should have more). Then there’s the possibility of L2ARC devices, caching devices, and so forth. If you’re planning on running other services in addition to a ZFS-backed NFS server, such as GitLab or Minecraft, you’ll quickly find that striking a balance between RAM allocation for the ARC versus other applications becomes a somewhat tedious chore. In fact, I might even venture as far as to suggest that you shouldn’t consider running ZFS plus other applications on anything less than 16 GiB RAM–preferably 32 to hand off a nice big chunk to the ARC, particularly if you plan on expanding drive capacity, and you still shouldn’t run anything you don’t absolutely need (seriously: Make it a pure NAS).

Tweaking ZFS for database loads doesn’t seem particularly noisome–certainly not on the surface–until you encounter one or more upgrade cycles that require more than just a dump/load. If you follow the default Arch Linux upgrade process for PostgreSQL, you’ll quickly find ZFS less flattering than alternatives. Not only is it necessary to tweak the recordset size in addition to a few other file system attributes for performance reasons (though you only do this at dataset creation), but suddenly, following the Arch upgrade guide by moving the old data directory and creating a new one is plagued with matters of shuffling around extra files, remounting previously tweaked datasets to the appropriate location, and more. In my case, I had to copy all of the original data to a new directory, wipe the old mount point for both the data directory and the write-ahead log, then create a new data directory in the old dataset mount point, copy the WAL into the WAL-specific mount point, mount it at pg_xlog, and only then could I complete the upgrade process. MySQL on ZFS is generally easier to upgrade, in my experience, but I also use it much less frequently. Be aware that MySQL still requires dataset tweaks, and the tweaks applied depend on whether you’re using primarily MyISAM or InnoDB. I’ve not experimented sufficiently to understand whether it’s possible to tweak datasets for individual storage engines.

Of course, there’s a few other relatively minor niggles that depend on your specific case. For example, grub2 naturally has no understanding of ZFS (or XFS for that matter), so it’s necessary to install grub-git if your /boot partition isn’t on a separate ext3/4 install. Under Arch, it’s also necessary to make certain your initrd is correctly configured via /etc/mkinitcpio.conf, and it’s almost always a good idea to re-run mkinitcpio even after upgrading or installing the kernel just in case it didn’t pick up your ZFS modules (you may need the binaries, too). Otherwise, you’ll be resorting to your emergency boot media to fix the dilemma (you did create it, didn’t you?).

A Less Optimal Solution

I consider the experiment with ZFS on Linux a fantastic success even though I’m already planning to migrate away from it. For my needs, I’m reluctant to run FreeBSD even though I’ve used it for similar purposes in the past. Thus, I’ll be reinstalling the machine with a combination of ext4 + mdadm (actually copying it back over, but there’s no functional difference insofar as downtime). In retrospect, I’ll probably miss ZFS’ transparent compression the most. Given my relatively modest data size and the fact that it defaults to lz4 compression (speed optimized), it’s surprising that it’s saved close to 200 GiB of storage! No other file system, save for Btrfs, provides transparent compression, and in spite of the integrity guarantees ZFS provides, I think compression is a far more pragmatic upside since its impact is real and immediate rather than theoretical.

Although I’d like to wax philosophical about ZFS’ touted benefits, I still can’t help but think it’s solving a problem that is gratuitously overblown. Perhaps bitrot is a real, material risk, but I’ve rarely been affected by it (ancient backup CDROMs notwithstanding). Has it affected my archives? Almost certainly so, but if it has, it’s never had an impact on photos or media, much less other, more critical data; the few times I’ve performed checksum validation of archives versus physical disk contents, I haven’t encountered a mismatch. Indeed, although it’s a problem ZFS is almost tailor-made to fix, it still doesn’t beat regular, extensive backups. Of course, that assumes you have a mechanism in place that would prevent your backups from being adulterated or overwritten by later, corrupted snapshots (and that your backups aren’t subject to bitrot as well), but I think Google’s solution here is far more apropos: Keep no fewer than three copies of your most important data. Surely one of them, statistically, will survive.

You’ll notice that I haven’t mentioned ZFS snapshots (or send/receive), because I’ve yet to encounter a circumstance (besides upgrades, perhaps) where they’re useful to me. While I’d like to use them with containers, there’s still the very real problem of running software inside a container that requires dataset tweaking, and there’s also the specter of lingering problems with ZoL’s implementation of ZFS which has had problems as recently as last year with snapshots (mostly with send/receive if memory serves). In my case, I tend to avoid advanced features if there’s a risk of causing damage because they’re either not well-tested, buggy, or have had a recent history of inducing failures. But alas, I’m conservative at heart; I’d happily poke about with ZFS snapshots in a virtual machine to see how they work, but I’m much less happy about using them on a real server that’s doing real work where downtime would likely interfere with important activities when those same kernel drivers are of dubious stability. I also have no other ZFS systems where send/receive would benefit me.

There is an alternative file system some of the more astute readers among you may have noticed in my list of omissions: Btrfs. I considered Btrfs for my server, testing it briefly, but at the time (mid-2016), I encountered some evidence that suggested Btrfs may not be particularly stable in spite of it being among the list of default file systems for some distributions. Btrfs’ tools feel lacking, dampening my confidence further.

The Btrfs authors have as recently as August 2016 admitted to substantial, possibly unfixable problems with then-current Btrfs’ RAID5/6 implementations. Although I’m running a simple mirror, the fact that such bug would be present in a file system some distributions have optimistically labeled as “stable” is worrisome (but just don’t use its RAID5/6 features–or whichever other features happen to be broken). I’ve seen comments from as early as 2014/2015 lauding the benefits of Btrfs as a stable, tested platform, but I have serious reservations substituting caution with optimism, particularly when 1-2 years later in 2016, it would appear such optimism is horribly misplaced. Consequently, I don’t see Btrfs as a viable alternative (yet!), and that’s without addressing Btrfs’ performance history with regards to PostgreSQL and other databases. This may change, eventually, and Btrfs does look promising. There are some valid criticisms that Btrfs is simply reinventing ZFS (poorly), but being as ZFS will likely never be included in the kernel due to licensing conflicts, Btrfs is better poised to reach parity with the likes of ext4 and company. I’m not optimistic about the Btrfs timeline, and while I’d be surprised if it attains feature completeness before 2020, I do believe many of its misgivings will be resolved well before then.

Back to ZFS: Will I revisit it in the future? Absolutely. If I do, I may put together a FreeBSD NAS for additional storage or the likes. For now, however, ext4 is suitable enough for my needs. It’s certainly good enough for Backblaze, and while I don’t share their requirements, I take a more pragmatic approach. If ext4 is good enough for them, it’s good enough for me.

No comments.
***