Another use for the JCR mail computer

February the 1st 2011 at 4 o’clock pm

Abstract

In which Daniel suggests another use for the computer in the JCR, and other file-related thoughts

1. Another line of thought: filesystems
- 1.1. Preamble
- 1.2. Filesystems

I have this problem: sometimes I forget to pick up files, or don’t even know I will need them later. When I leave the computer in my room turned on, things are easy, because I can grab my files from it anywhere in Cambridge (and, in fact, based on the way the JANET routing seemed to be described in yesterday’s very interesting CUS tour, since it seems that Norwich and Anglia have their internet routed through our JANET–Internet gateway, it’s possible our Cambridge block of one million addresses might extend a little further; I only saw the network diagrammes from a distance). In any case, the real problem is when my computer is not switched on.

I can wake it up from anywhere in Peterhouse, but Daniel confirmed that broadcasting will only work within our part of the Peterhouse subnet (it seems likely there are probably two or three, since I am in a /24 which only contains 254 addresses at most, with about thirty active when I check). If and when the JCR computer ever gets used again, probably for its original task of checking Hermes, it could have a guest account we could use to get access to the Peterhouse subnet when we are not inside it.

Another thought I have had is to create a guest account on my computer, which is after all turned on quite a bit of the time, along the same philosophy as having world-readable things on the SRCF. I think the safety implications are moderately fine, but it would raise questions.

1. Another line of thought: filesystems

1.1. Preamble

I keep intending to replace my hard drives, a long-deferred task. They only have about 10000 hours each now, but there are rather a lot of dodgy sectors. The main motivation though is that I have been running on 95% capacity for about a year, which is awkward when /tmp fills up. The concern for safety is secondary, but I am more concerned now about watching for problems. I had to destroy a bad sector’s worth of data last week, which slightly worried me at the time because I couldn’t identify which file I was destroying data from. I tried using debugfs, which is meant to identify which file uses each inode or else tell you that the journal is using it, but in this case it failed to do either. The infoweb informs me it could be superblock corruption, but no files seemed to vanish when I eventually had to zero the sector out. Perhaps it was an ext4 extent and my version of debugfs doesn’t search them properly. In any case, I have beefed up my backups with more frequent rsync (a cron job).

The trouble with buying hard drives is that you can never get just one, because the right price point this year would quadruple my capacity, forcing me to buy another drive for backup. Getting good hard drives (brand name, three or five year warranty, low energy) forces me to get at least 1Tb if I want good value, and two of those is a nuisance to buy.

1.2. Filesystems

The point is, I have some major re-partitioning coming up, I expect. It has been three years since I re-partitioned, and I want to pick the right filesytem for the job.

My first shock is that actually I have been on the wrong filesystem all my years on Linux. The best, the only thing, I really recommend at the moment is JFS. Produced by IBM, it is an enterprise-quality system with good performance, the best protection of data available on Linux, and crucially, snapshots. Snapshots are basically the right way of managing online backups at the storage level. The lack of this feature in XFS and so on is problematic. I realise that you can take LVM snapshots, but those are implemented in a rather inefficient way, by holding a rather large table of mappings. They are good for freezing the filesystem for an hour while copying a consistent backup image, but not holding a backup for weeks over a RAID array, as the table of mappings grows rather quickly and can degrade seek performance if you get unlucky. Filesystems which handle this natively can do it rather more efficiently.

Secondly, less of a shock, is that all the hype about ZFS is really right. It features basically everying in one sweet bundle:

Hashes to both guarantee and correct integrity of data from RAM to disk and back again. With typical RAID, the data is written to the disk in a stripe with one disk’s portion holding an XOR of all the data. If a disk or sector fails, you can recover the data, but if there is a hardware error and the wrong data is returned (which does happen, perhaps due to a fuzzy power supply for example), you are cooked. With ZFS, the hash gives enough information to do some quick fiddling around, checking for each disk whether ignoring its data and reconstructing it from the parity gives the correct hash. Memory-to-memory the data is infallibly preserved, or else a fatal error can be raised.
Atomic-write RAID. This is a big one: think what happens when only part of a strip is written out in normal RAID. You have to do a read of that stripe from all the other disks to be able to generate the parity, which is a big hit. Even worse, you can’t guarantee that writes to each disk will happen at the same time, so if power goes and the parity section of the stripe has been written but the data not, or vice-versa, you get cooked. Hardware RAID controllers can cleverly work around these problems, but it’s pricey. ZFS does it the ‘right way’ by fitting the stripes to the filesystem blocks to prevent redundant reads, and writing the data to disk non-atomically in some free space, but only updating the filesystem pointers atomically to point to the new location of the data once the write is finished.
Storage pools, with accelerated journal disks and cache. I’m tired of repeating the many excellent articles.
Much, much, more: no file limits, deduplication, transparent encryption or compression, lightweight snapshots and volume management, and so on.

Without a doubt, ZFS is the best filesystem to use and is a no-brainer if you have a terabyte or more of storage, and no really industrial requirements. The big issue at the moment is Linux support. It may come, perhaps soon if the beta release of the drivers gets done soon, or it may not, but Solaris is looking good at the moment with various GNU/Solaris distributions picking up.