The System Is Down
Posted on 2007-03-20 (Updated on 2019-01-22)
One of the perks of my job is the flexible schedule that goes along with it - I come in around ten, and nobody minds too much if I'm a couple (20) minutes late as long as I make up the time somewhere else.
One of the crappy parts of my job is the flexible schedule that goes along with it - I leave at six, unless I'm not done.
Tonight was one of those nights. We were upgrading one of the servers, and since I work after hours, I was the prime candidate for doing it. Just a simple RAM upgrade, right? No problem. Except it's a server upgrade, and the servers in question just happen to house the financial system that manages the whole county's purchasing and payroll. Goody goody.
So shortly after 5:30 PM, I start hunting down the stragglers that are still logged in (turning off the financial system with people working on it is a great way to get your tea poisoned), and Payroll informs me that they can be done by 6:00 PM. So, I'm starting after my shift ends. But six rolls around, they log off, and I get started.
For those of you who have never seen a server room, it looks like this: There are dozens of servers sandwiched together in what we affectionately call a "rack". If they're put in right, it's a great way to conserve space. If they're not, it's a great way to make them hard to work on.
The first server was not easy to get out. Instead of being mounted on rails that slide out, it was simply screwed to the side of the rack. So, instead of sliding it out and doing the upgrade, I had to wrench the thing out of place and plop it on a desk somewhere. The case required two people to open, so I had to grab a janitor - The night crew are great people. Anyways, that was finished with, so I wrestled it back into place and moved on to server #2.
The first thing I noticed was that this server was installed right, and slid right out. The rails it was on kept it suspended in mid-air right in front of me, which was nice.
The second thing I noticed was that it was completely exposed already. There was no case to remove, I was already staring at the internals of the server. Apparently the last time anyone worked on this server, they neglected to put the case back on. That was three years ago.
If one of the battery backups directly above this server had leaked, the County could have lost millions of dollars in records (all the way back to 2004), as well as millions of dollars worth of work. I quickly did my maintenance, dug the rest of the case out of a corner, and reassembled the thing.
I ended up getting out an hour late, but everything went pretty flawlessly with the upgrades. All in all, things could have gone much worse, and this has definitely shown me how valuable a good initial setup is. Without it, everything from that point on becomes an exercise in torture.