All posts for the month January, 2014

While migrating data from my ReiserFS-formatted disks over to ext4 volumes, I ran into a weird issue with a Seagate drive. It’s a Barracuda 7200.14, model ST3000DM001, with the latest firmware. It’s been running fine, and I just copied all of its data off with no problems. Copying new data onto it, though, a short bit into the transfer it slows way down, to below 1MB/s, and eventually drops off of the SATA link entirely. Upon a reboot, it’s all back, and SMART diagnostics show no errors ever detected by the drive. Doing a diagnostic test on the drive shows nothing wrong. Reading the data works fine. I’ve tried the drive in 3 different drive controllers so far, disabled Native Command Queuing (NCQ), replaced cables, no difference. At this point I can just power up the system (which contains multiple drives of the same model that don’t exhibit this problem), and start writing information to that drive without ever reading it, and it starts to slow down within 30 seconds. It drops offline a few minutes later. When I turned off NCQ, it didn’t drop offline during the time I tested it, but it did slow way down, then speed back up, then slow way down again, repeatedly.

It’s not just that this is not how drives are supposed to behave. This isn’t how drives are supposed to fail, either. If there’s a defect on the media, it’s detected when the drive tries to read that section, then reported as a failure and put on a list of sectors pending relocation to a spare area on the disk. The relocation doesn’t happen until that section is overwritten, because the drive then knows that it’s safe to give up on ever reading the old data. None of this explains the behavior of reading being fine, and writing hosing everything without logging a problem on the drive.

I’ve seen 2 or 3 posts online from people clearly describing the exact same problem with this model of drive, but never with a solution; the thread either never went anywhere, or the poster RMA’d the drive. Mine isn’t under warranty according to Seagate’s web page.

At this point, the easy options seem to be exhausted. The next things I can think of to try are:

  • Downgrade the firmware to an older version, if it will let me.
  • Connect a TTL RS232 adapter to the diagnostic port on the drive’s board and see what it says during powerup, and during failure. I haven’t delved into Seagate’s diagnostic commands before, so maybe there’s something there to help.
  • Pull out my new hot air rework station, swap the drive’s BIOS chip with a spare board from a head-crashed drive, and see if that’s any better.

I am vexed by this drive.

As mentioned in the last post, I’ve been using the unRAID linux distribution on my home server for a few years now. I’m a big fan of it, and I heartily recommend it, but my recent experience made me wonder if I’d outgrown it.

Partly this was because of the single-drive redundancy that unRAID is limited to, but it’s also because unRAID is designed to boot off of a flash drive, loading the OS into a RAM disk. This is great for setting up a storage appliance, but the more services you want the machine to run, the clunkier it gets to have everything loaded up and patched into the OS at every boot. Also, unRAID uses only ReiserFS for all of its drives (presumably because it was the only choice at the time for growing a mounted filesystem), which doesn’t have TRIM support for SSDs. Because unRAID’s write performance is sluggish, I was using a cache drive on it, where new files were placed until a nightly cronjob moved them to the protected array. I used an SSD for this, so TRIM support was a big deal.

In the past, some people have documented the process for putting the unRAID-specific components on a full Slackware install (unRAID is based on Slackware), but not as of the latest version. There has also been talk of supporting ext4 (and therefore TRIM) on unRAID’s cache drives, but nothing solid yet.

So, I went looking for potential replacements. The features I was looking for were:

  • Ability to calculate parity across an array of separate filesystems, with the ability to expand the array dynamically. Ideally with multi-drive redundancy.
  • The ability to present a merged view of the filesystems. Historically union filesystems haven’t merged subdirectory contents, so this was potentially tricky.
  • Ideally, it would be a supported platform for Plex Media Server, so I wouldn’t have to go screwing around making it work on a different distribution.

I looked briefly at Arch Linux, which looked like a great learning experience, but the full-manual installation process turned me off. Yes, I know how to do those things, but I’d sure like to not have to do them when I’m in a time crunch to get a replacement server running.

I ended up with CentOS as the base OS; it’s a supported platform for Plex, and I’ve used it on our Asterisk server at work with good experiences.

For the parity calculation, the best bet looked to be SnapRAID. SnapRAID calculates parity across groups of files, not block devices. This means it doesn’t care what the underlying filesystem format is, but it also doesn’t do live parity calculation; it’s updated via a cronjob, so files added since the last update aren’t protected. This didn’t scare me off, since the same thing is true of unRAID when using a cache disk. SnapRAID also supports multiple-drive redundancy, which is a plus.

For the merged filesystem view, I liked aufs. However, it needs support to be compiled into the kernel, so I wasn’t going to be able to use the stock CentOS kernel. I found a packaged aufs-included kernel for CentOS, but it was v3.10 instead of 2.6, which meant that other kernel modules for CentOS wouldn’t work on it. This was problematic, because I would need a kmod to install support for ReiserFS in order to read my existing array disks. I ended up just rebuilding the kernel myself with both features included.

Once that was figured out, the next trick would be to migrate the data disks from ReiserFS to ext4. The plan for this was to set up one new blank ext4 disk, use SnapRAID to fill it with parity from the rest of the (read-only) volumes, and once that was done, reformat the unRAID parity disk as ext4 and start copying data to it. Every time I’d finish cloning a disk’s files, I’d remount the new ext4 volume in that disk’s place, make sure SnapRAID was still happy with everything, and repeat. This worked fine, until I ran into a very strange disk problem, explained later.

(side note: I decided to try actually using my blog for stuff like this; expect more.)

Background: I have a large home media server, previously housed in a Norco 4U rackmount case; in the interests of being able to move it, I rebuilt it a while ago into an NZXT H2 tower case. I was very pleased with the outcome; the machine is reasonably compact, extremely quiet, and housed 14 drives with no problem. All of the SATA cables were purchased as close to the right length as possible, and I custom-made all of the drive power cables to eliminate clutter and maximize airflow.

When it came time to move the whole thing up to Seattle, I had the drives packed separately from the case, but both sets of things were damaged. The case itself is dented by the power supply, but it otherwise fine. One of the drives sounds like it had a head crash, and another one was banged around enough that part of its circuit board was smashed up. Replacing the only visibly smashed component (an SMT inductor) on the board didn’t fix things up.

Other background: I was running unRAID on the server, a commercial distribution of linux designed for home media servers. It uses a modified form of RAID-4, where it has a dedicated drive for parity, but it doesn’t stripe the filesystems on the data drives. This means the write performance is about 25% of a single drive’s throughput, but it can spin drives down that aren’t in use. It also means that, while it has single-drive redundancy like RAID-4 or 5, losing two drives doesn’t mean you lose everything; just (at most) two drives’ worth.

Well, I wasn’t interested in losing two drives worth of stuff. The head-crash drive (3TB Seagate) was clearly a lost cause; at best I’d be able to use it for spare parts for fixing other drives of the same model in the future. The smashed drive, however, had hope. I had another of the same model (Samsung 2TB), and swapping the circuit board between them meant that the smashed drive was about 80% working. (This trick normally requires swapping the drive’s 8-pin BIOS chip, but Samsung drives are more forgiving.)

So, I grabbed whatever spare drives I could, and set about cloning the 80% of the 2TB drive that I could. I used ddrescue for this, which is great — it copies whatever data it can, with whatever retry settings you give it, and keeps a log of what it’s accomplished, so it can resume, or retry later, or retry from a clone (great for optical media). I used it to clone what could be read off of the Samsung drive onto a replacement, and then used its “fill” mode to write “BADSECTOR” over every part of the replacement drive that hadn’t been copied successfully. I then brought up the system in maintenance mode, with the replacement 2TB clone and blank 3TB replacement for the head-crash drive. I had to recreate the array settings (unRAID won’t let you replace two drives at once), but then let the system rebuild the 3TB drive from parity. (Mid-process, one of the other drives threw a few bad sectors. I used ddrescue to copy that disk to /dev/null, and kept the log of the bad sectors. I then used fill-mode to write “BADSECTOR” over the failed sections, forcing them to be reallocated.)

Once the 3TB drive was rebuilt, I then used the ddrescue log files to write “BADSECTOR” on the just-rebuilt drive as well, because areas that were rebuilt off of failed sectors on other drives weren’t to be trusted. (This involved scripting some sector-math, since the partition offset of the drives weren’t the same, and unRAID calculated parity across partitions, not drives.) After that, I fsck’d the 3 drives involved, and then grepped through all files on all of them looking for BADSECTOR, thereby identifying whichever files could no longer be trusted.

This didn’t include files that were just outright missing; I didn’t have a complete list of files, but for the video files at least, I was able to determine what was missing by loading up the sqlite database used by Plex Media Server, which indexed all of those.

In the end, everything was working again, with the lost data reduced down to about 10% of what it would have been. It did get me thinking about changing out the server software, though; but that’s another post.