I find the failure of a computer’s power supply to be one of the most difficult failures to diagnose. And earlier this week I had that perception reinforced.
The second computer in my office, the “guest” gaming PC, has become my wife Irene’s primary EverQuest II machine. It is built out of parts replaced by my most recent upgrades: basically, it is a two year old “bleeding edge” computer, which means it is still pretty decent for playing EverQuest.
A few weeks ago, Irene experienced a couple of odd events while using this machine. On one occasion when I was watching, the screen suddenly became “corrupted”, with garbage characters appearing amongst the graphics. A screen refresh cleaned things up. Then there was the “crackle” the machine emitted one day just before rebooting. I decided this week that it was time to give that computer some TLC.
My thinking initially was that the machine was probably a dust clogged nightmare. I could see globs of dusty fur sticking out of the drive bay, so I imagined discovering a huge mat of fur and dust inside. When I cracked the case, however, it actually wasn’t too bad: yes, it definitely needed cleaning, but the dust/hair build up had mostly been stopped from entering the chassis by the air filter on the case. The innards were in need of a few blasts of compressed air, and the corners of the case benefited from some vacuuming.
The whole cleaning process took an hour or two. When I reassembled everything and turned the box on, I got…nothing. No POST dialog, no Windows start up, no beeps. The fans started, the machine *seemed* to power up, but it didn’t seem to actually get into the boot process at all. My initial assumption was that dust was causing a short, or that I had damaged a component while performing the cleanup. But that wasn’t it….
I started the usual debugging process: pulling components one by one and seeing if the system recovered. After an hour or so I had the machine down to the bare motherboard: nothing but the CPU plugged in, and was still seeing the same behavior. Then I noticed that the power button seemed to not be working correctly. As a typical ATX motherboard based machine, the power switch on the front is really a software switch: it doesn’t actual disconnect power. But the expected behavior is that pressing this button and holding it when the power is on will cause the machine to “power off”. Instead, the machine would power off then immediately power back on. Weird.
I also noticed that initial power on after performing a “hard” power off (I.E.: powering off the power supply completely rather than using the soft power switch) resulted in another strange behavior. The power indicator lights on the front of the machine, instead of coming on steady, would flash for a minute or so. When I listened closely to the fans, I noticed that it seemed like the power was going off and on at about a half second cycle during that minute, before finally “catching” and staying on. It was at this point, after eliminating everything except the motherboard components and the CPU, that I started to think “power supply”.
But I had to be sure. So, with no “spare” power supply, what do I do? I remembered that the power supply on my current primary machine was pretty high end, and has a wonderful collection of nice, long power connectors. I pulled both covers, put the two machines closely together, and pulled the minimum set of cables from my main machine to the failing system. Motherboard power, CPU power, video card power….that’s enough for now. Connect up video, keyboard and mouse, turn on the power supply…and voila: the “dead” machine springs back to life without a hiccup. I finished up, just to be sure, by hooking power to the hard drives: it completed a full boot into Windows without a single error.
As I was sitting there, two machines stuck so close that they were touching, cables pulled between them…I was struck by several simultaneous images that this reminded me of. First, if you’ve ever seen someone using one car to give another a boost…it looked just like that. The second image was open heart surgery, with the chest splayed open and the heart being “shocked” back into life after the surgery is complete.
I ordered a nice, new 620 watt Corsair modular power supply for the secondary machine. A little over a day later and the secondary computer…Irene’s link to EverQuest II…is good as new. Better, maybe, since the new power supply is even more stable and (supposedly) reliable than the high end one in my primary machine.
Ah that’s the price you pay for mucking around inside your machines all the time. In my case, most Hardware failures are power supply failures so that’s what I look for first 😉
Yeah, but even my oldest machines, which are going on five years or more old, don’t have failing power supplies. I also support dozens of aging machines at work, and power supply failures remain uncommon yet vexing problems.
You are right, though, that my experience frames my problem solving approach. I don’t look at power supplies early in the process since I haven’t encountered a high percentage of situations where power supply failure was the root cause of a problem.
The worst part is that power supply failure can look like just about any other kind of problem. And it is arguably the hardest part to isolate: you can disconnect hard drives, memory, disk drives, video cards, network interfaces..and the computer should still work to some limited degree. Disconnecting the power supply without replacing it doesn’t tell you much at all 🙂
The “right” way to diagnose power supply failure is to use a multimeter, put the supply under load, and measure the output. But I think the last multimeter I owned was a $10 thing I bought when I was about 20 years old….it has been AWOL for a decade at least, and I’ve never really missed it.