Rich Lafferty's LiveJournal (mendel) wrote,
Rich Lafferty's LiveJournal
mendel

  • Mood:

This is the most annoying hardware problem ever.

I have an Ultra 10 at work which handles mail for a small group of users who haven't moved onto Notes for whatever reason. Lately it's been hanging over the weekend: console reports that /var is full and / is out of inodes, and a hard reboot brings it back up without a full /var or an inode-full /. Last weekend I managed to have a console actually connected when it failed, saw some additional IDE errors. Ok, for some reason there's a scheduled reboot on Saturdays, I guess it doesn't like that, because a hard power cycle fixes things. Comment out the crontab entry and away we go.

And then it failed again this weekend, and this time I thought ahead a bit and tried a probe-ide at the console. No hard drive. It just forgets that anything's connected to the first IDE interface. Alright, that explains why it fails the way it does. Came back after a power cycle again, but now it was bugging me, so I started digging through syslog to get some idea of the timing.

So now I know that I have a machine which hangs solid at 4:27 PM on Saturday. Every week. The scheduled reboot happens after that so wasn't happening at all. There's nothing in anyone's crontab at 4:27. It shares a rack with a handful of other boxes, but nothing that requires weekend intervention like a tape drive. The other U10s in that rack are unmolested. It's very underloaded, and there's no significant mail traffic around that time.

What the hell?
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 10 comments