On recovering from disk failure

July 28, 2005

Sometimes, it pays to do things right. Reformatting a disk is one of those things.

I am on my third disk crash in about a year. The first one was a total loss; the drive would no longer spin up. All my knowledgeable friends recommended writing the data off if I did not want to spend hundreds or thousands of dollars on drive recovery services (I tried both replacing the outboard PCB, and freezing the disk. Neither worked, and the old PCB is actually still in use on another disk).

Crash number two happened under peculiar circumstances (kids playing games in OS9 compatibility mode, plus behaving badly) so we assumed it was just software. I had a day-old backup, so we just reformatted and carried on. However, I made a crucial mistake when I reformatted — I did not take the time to “write zeros” to the new drive. It turns out that this is a good thing to do, because it allows the formatter to detect bad sectors and replace (remap) them with spares.

Crash number three happened a month later. Now that the dust is clearing, and after much web-searching and data recovery, I managed to figure out roughly what happened. A very small number of blocks on a 250GB drive were bad; I think about 1 block in 5,000,000, and they appeared to be clustered. That means that if the initial restore did not use any of those bad blocks, the machine would appear to work just great, and it would continue to work great as long as nobody wrote vital data to one of those blocks. One in 5 million is pretty good odds, too, so it’s no surprise that things looked good for a while.

Until, of course, we got unlucky. Our backup was several days old, and we had unloaded some birthday party pictures, so we really didn’t want to just scratch it. I tried DiskWarrior first, but after about 15 hours (actually, more like 25, but a power failure set me back) I decided to give Data Rescue a try. Data Rescue did not repair the disk, but it did obtain the valuable data for me. My understanding of the problem is that DiskWarrior would eventually have succeeded, but it gave no feedback about its progress. Data Rescue had a “thorough” option that I tried first; however, after hitting a few bad blocks it started estimating completion times in the 12,000 minute range (i.e., 200 hours) so I tried a quick scan that took about 30 minutes, and found all the missing files.

Right now, I am reformatting and writing zeros, in hopes that the result will be a fully functional disk. If it weren’t for the hassle, I would have played the AppleCare card and just gotten a replacement drive.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: