I seem to attract hardware failures like rotten meat attracts flies. Maybe it is because I sometimes run slightly “bleeding edge” gear, or perhaps it is something environmental (* /em looks accusingly at seven cats shedding hair into computer intakes*) Whatever the cause is, I take steps to make sure I have reasonably current backups of my systems. Unfortunately, that rarely seems to save me from frustration…
The Failure begins, and plans are laid
A few weeks ago, my RAID 0 (striping for performance not reliability) drive array consisting of two 10k RPM 150GB WD Raptor drives started posting some errors. Frustratingly, the Western Digital drive diagnostics only work on drives that are *not* in an array. To bring technical folks up to speed, I’m running Vista Ultimate 64 bit edition on an Intel-based Asus Maximus motherboard. That means RAID is handled by Intel’s X38/ICH9R (warning: PDF) chipset. Intel’s RAID BIOS didn’t help much: it was reporting a drive error on one of the two drives in the array with an error code. There was nothing else available to give me a clue as to the failure condition.
I pondered trying to break the RAID array, run diagnostics on the separate drives, and get to the root of the problem. Or I could just assume it was an actual physical drive problem and buy new hardware. Given that I’d had problems in the past with my attempt to run a RAID desktop, I decided to cut my losses: buy a new drive, and give up on RAID at least for now. I ran a thorough chkdsk (including bad block scan) on the array, which found some problems and corrected them. Then I ran a Windows Image backup and placed my order for the new drive.
I decided to give Western Digital the benefit of the doubt and assume, although this was my second failed WD Raptor drive in a year (my old 75 GB RAPTORS also failed), that the drives themselves were basically sound. Bad drives happen, or maybe the RAID hardware just was causing some sort of non-hardware problem with the drives. Since I didn’t want to invest in deep diagnostics on the drives, and since WD still makes the fastest drives out there, I wanted to give them one more chance. The new drive I ordered was a single WD Velociraptor 300 GB drive. Note: two 150 GB Raptors in a RAID 0 configuration provide 279.5 GB of formatted capacity, and a formated 300 GB Velociraptor also provides 279.5 GB of formatted capacity. My thinking was that this would be a match from the perspective of restoring my Vista Image Backup.
My order was placed just after Christmas, and took a couple weeks to arrive. I also ordered a couple of other parts
- an updated chassis: an Antec P182 to replace my nearly-identical two year old P180 case. I did this because the P182 has a new feature which allows you to route cables underneath the motherboard, and I have been wanting to clear the airflow up in my case for a long time. Yeah, silly to spend $150 just for that, but the cable mess bugs me
- a temperature display/fan controller: I wanted this to provide me with more detailed data while my machine is running regarding temperatures at various locations in the chassis. Related to my cable re-routing, I wanted to have more confidence regarding how airflow was working in my case. My system is moderately overclocked and I’ve never been 100% I’m always running at comfortable temperatures. This display would answer that question
While I was waiting for delivery, the second drive in my array turned up with the same error. Intel’s BIOS still said that the array status was normal, but both drives in the array were reporting errors. More worrisome- my machine started crashing, and more bad sectors and file corruption started occurring during operation. The arrival of my replacement hard drive couldn’t have been more timely.
The wheels begin to fall off
I was prepared for having to do some messing around to move to my new drive, but hey, I have a perfectly good image backup, so I’m golden, right? My objective seemed simple enough: completely duplicate an existing bootable drive configuration from an existing drive to a new drive with identical disk capacity. To move things along quicker, I decided to use Paragon’s Partition Manager 9.0 to duplicate the contents of the original drive configuration to the new one. I had bought and used Partition Manager previously when the 75 GB drive RAID array on my wife’s computer failed, and it had worked perfectly. If I had problems with that, I still had that image backup- how could things go wrong?
- Problem #1:- Partition Manager 9.0 doesn’t work properly with 64 bit Vista: The site says it does work with 64 bit Vista, but I got a “This action can not be performed with 64bit Vista: run using recovery disk” error message when I tried. Of course, I had to make sure by downloading the latest update: when I tried to install it on the original hard drive array the system crashed due to file system corruption, which of course isn’t too surprising. The updated version also failed
- Problem #2- Partition Manager 9 recovery disk IOT error. maybe?: I then created the bootable Partition Manager 9 recovery disk, and tried to perform the partition copy using that. When I asked it to do a partition copy, it warned me that the destination seemed to contain a bootable partition: this made no sense to me (the brand new drive had never been formatted and clearly showed up as unformatted/not partitioned even in Partition Manager 9’s menus, so I double checked and clicked proceed. The resulting partition display appeared a fraction of second later (I.E.: without copying progress) and showed my original drive had suddenly gained an empty partition: argh, somehow despite double checking I had over-written my original drive with the contents of the blank drive? I couldn’t see how, but I must have made a human error. A classic IOT (idiot on terminal) situation- ah well, I have that backup, so I’ll use that to restore to my new identically sized drive
- Problem #3- Backup drive enclosure fails: After removing the original drive array and reconfiguring the system BIOS to work with just the new drive, I boot the Windows Vista install CD. Rather than installing, choose recovery options and the Restore from image backup. There is a strange grinding sound from my external backup drive and my heart goes “thump” when the list of available backups comes up empty. I open the recovery command line and confirm that the formerly perfectly functional external SATA backup drive is not showing up. I jiggle some cables, power everything off and on, and check the BIOS- the backup drive isn’t showing up anymore there either. Brief panic: then a sense of fatalistic calm. Well, I could be lucky- maybe its just the drive enclosure, and the drive itself is fine? Fifteen minutes later, after prying open the “easy to open” Cooler Master drive enclosure (sometimes “tool-less” is not as good as “requires tools”), I’m able to confirm that the drive itself is fine.
- Problem #4- Vista Image Backup is junk: Finally, I am able to reboot the Vista install disk, select the restore system image option, and find my backup. Great! The restore process clunks away for a few seconds, then tells me it can’t find a target partition large enough to restore to. Huh? I’m restoring a 279.5 GB image, and I have a 279.5 GB drive. What’s the problem?
- Much digging and experimenting confirmed the awful truth: Vista’s Image Backup is a piece of crap. I think that is a charitable description. If you make an image backup, and restore to the *same* drive, it will work. But usually you are using an image backup because, oh, I don’t know, maybe the original drive failed? Well, if you get exactly the same model of drive, it might work. Maybe. If you get a drive that is exactly the same size but a slightly different model, it almost certainly will *not* work. Even after a great deal of fooling around, I couldn’t get it to work. Absolute truth time here: after much fussing, I determined that the old drive array and my new drive were, in fact, different in size. By about 500 bytes. So maybe, if the drive was *exactly* the same size, Image backup would work. I can’t be bothered to even entertain that possibility- personally, I stand by my statement that Image backup in Vista is a total waste of time and effort
- now, you might be saying “but what about file backup- that works, right?”. Yes, indeed it does. I was able to recover all of my files. But absolutely none of my configuration. Zero installed applications. No drivers, no settings, no users: just my files, which is I will agree better than nothing. But what I as a selfish user want is the ability to completely and rapidly restore my system to a fully functional and configured state. Although getting my files back is good, it still leaves me with a dozen or more hours of recovering my applications and configuration
- Problem #5- Unknown boot error on partition copy: At this point I rebooted the system and was contemplating my final options. I was stunned when the system booted off the old hard drive array, right into Vista. Apparently, Problem #2 didn’t actually over-write the original drive. I am not sure exactly what happened here, but it seemed like I still had one more chance to make this all work without having to struggle through re-installing my apps. I carefully booted the Partition Manager recovery CD and was overjoyed when this time it was able to copy my original drive partition to the new drive without any hassle. This naturally took a couple of hours, but I thought it was time well spent. The copy completed, and I reconfigured the BIOS and rebooted from the new drive. The familiar “Starting Vista” screen appeared, the progress bar started to move: a brief flash of blue with some sort of error message, and the machine rebooted. For the love of… now what?
- Investigation revealed that the copied drive contents appeared complete- I.E.: there was a directory structure, and the directories I expected to see were there. The boot process was failing during execution of something named crcdisk.sys. I read a bit about this: references indicate possible disk corruption, or driver conflicts. I spent several more hours using the Vista install disk recovery tools, running chkdsk with bad block detection, re-writing the master boot record- all to no avail. My speculation: The Windows image I was trying to boot had been running on a RAID array. A fully supported Intel RAID array, mind you, one which doesn’t require an optional driver disk, but a RAID array none-the-less. I suspect that this meant the booting OS was trying to boot from what it expected to be a RAID array, loading inappropriate drivers.
- If true, this smacks of poor Vista design assumptions. Why is the OS assuming it is booting from the same physical device it was originally installed on? This makes use of any kind of full image restore dependent on restoration to the same type of hard drive configuration- once again making me shake my head in dismay
My path of last resort to getting my system back to an operational state was to perform a full Vista install and restore my individual files. Yes, that worked. No, I wasn’t happy about knowing I still have to re-install all of my applications and configuration settings. My one happy moment: realizing I have largely stopped using my Vista machine as my main system, and have moved most of my “productivity” apps to my Macintosh. That means far less to reinstall…