Warning

 

Close

Confirm Action

Are you sure you wish to do this?

Confirm Cancel
Member Login
Site Notices
Posted: 4/22/2002 7:51:30 AM EDT
Alright, I am at a dead end. I have a ProLiant 7000 (yeah, I know it's old) that restarts itself at least once a day. It is running NT 4.0 sp4 and the event log shows nothing except that it detected an ASR. Insight Manager shows the ASRs as well, and occassionally shows "system overheating - temperature unknown". Insight also shows the temperature of the CPU at 45 degrees C and the shutdown threshold is 47. However, Compaq is telling me that if the problem is system overheating, the server would show many thermal errors and would not automatically restart itself. They say it is a software problem, but there have been no changes to the system in a long time. I have changed the memory and two of the four CPUs to no avail. Any ideas? Compaq wants me to upgrade the service pack, but this is a validated system so I can't do that without a MAJOR hastle. Thanks, Chimborazo
Link Posted: 4/22/2002 7:55:57 AM EDT
Disable ASR and capture the NT bugcheck. Make sure the clear plexi that covers the cpu's is still there. If this dinosaur is near the top of your rack, move it to the bottom. You did not specify if it is a PPro artifact or a XEON fossil?
Link Posted: 4/22/2002 7:58:56 AM EDT
1.) Upgrade to the latest service pack. Once you have done that, that will rule out support telling you that it is a software problem. 2.) Make sure all of your ROMpaqs are up to date as well. 3.) Think about trying to get the temperature down in the room where the server is. Check the fans for excess dust, etc. You need proper airflow in that beast to keep it cool. Yeah, I know, all are easier said than done, especially with a production server, and particularly when it comes to applying service packs. But 4 is a bit outdated. Even the contract I work on is up to SP5... =) IM me if you need more info. the_reject
Link Posted: 4/22/2002 8:25:10 AM EDT
Originally Posted By ar50troll: Disable ASR and capture the NT bugcheck. Make sure the clear plexi that covers the cpu's is still there. If this dinosaur is near the top of your rack, move it to the bottom. You did not specify if it is a PPro artifact or a XEON fossil?
View Quote
I disabled ASR but haven't seen it lock yet and the clear plexi is there. It is at the bottom of the rack, and is the only server in there. It is a PPro. Unfortunately, updating the SP and the ROMPaqs are not an option because of the amount of paperwork and approvals involved to get this authorized (we are constrained by gov't regulations as far as changes go). Also, I cannot yet justify it because the system has been fine forever and there have been no software changes. This points everyone to hardware. Also, I have the room at 68 degrees F. I would LOVE to update that stuff, but by the time the changes get approved it will be the weekend. Thanks a lot for your responses!!
Link Posted: 4/22/2002 8:34:31 AM EDT
If everyone you speak with is hell-bent on a hardware problem, I'd attack the system board first. That's really about the only part that could be causing the problem (excusing actual thermal overheats, software, spikey-haired mutants, etc). the_reject
Link Posted: 4/22/2002 9:22:04 AM EDT
Thanks...I'll see if I can get a system board in here ASAP. If you think it could be due to a faulty temp sensor too?
Link Posted: 4/22/2002 11:59:06 AM EDT
Just shoot the damn thing with your AR. Say that it attacked you, and you had to shoot it in self-defense. Then go out and buy some good hardware, something other than a Cumquat.
Link Posted: 4/22/2002 12:14:39 PM EDT
Originally Posted By Chimborazo:
Originally Posted By ar50troll: Disable ASR and capture the NT bugcheck. Make sure the clear plexi that covers the cpu's is still there. If this dinosaur is near the top of your rack, move it to the bottom. You did not specify if it is a PPro artifact or a XEON fossil?
View Quote
I disabled ASR but haven't seen it lock yet and the clear plexi is there. It is at the bottom of the rack, and is the only server in there. It is a PPro. Unfortunately, updating the SP and the ROMPaqs are not an option because of the amount of paperwork and approvals involved to get this authorized (we are constrained by gov't regulations as far as changes go). Also, I cannot yet justify it because the system has been fine forever and there have been no software changes. This points everyone to hardware. Also, I have the room at 68 degrees F. I would LOVE to update that stuff, but by the time the changes get approved it will be the weekend. Thanks a lot for your responses!!
View Quote
how about those processor voltage regulator chips/cards. Not sure if the 7000's have those. I did have a issue with memory on a 2500, and one of the mem chips did go bad on me. If it is a hardware error, got to do some diagnostics.
Link Posted: 4/22/2002 12:21:28 PM EDT
As a fellow IS professional professional I recommend that you PRECIPITATE some kind of event that forces your employer to buy a new freakin' server. Be sure to back up the data before you accidentally hit the box with a 12-gauge slug.
Link Posted: 4/22/2002 12:24:19 PM EDT
I vote for the main board as well. For a shotgun approach that is your best bet for lock-up/crashing.
Link Posted: 4/22/2002 12:26:49 PM EDT
Shoot it first, then buy a new computer.
Link Posted: 4/22/2002 12:35:55 PM EDT
Originally Posted By Chimborazo: Thanks...I'll see if I can get a system board in here ASAP. If you think it could be due to a faulty temp sensor too?
View Quote
Well, it's really the only hardware in the box that could be generating a thermal shutdown/overheat condition. IIRC (haven't visited the tech page since I responded this morning), the critical temp is 45 degrees Celsius. And I'd almost bet that the thermal condition is what is causing your ASRs. Granted, I do agree with keeping the SoftPaqs, service packs, and ROMpaqs all up-to-date, but this is real hard to do with validated production systems. A new system board will certainly isolate this. If you are certain that the temps are low enough in the room (which it should be, else it would have doing this for the life of the machine), and you are certain that the airflow is unobstructed, this is the piece of harware that I'd lean towards. And to the folks that recommend taking a slug or two at it, these older Pentium Pro boxes are still used widely. These monster servers still perform great at whatever is tossed at 'em. No reason to toss them closer to the edge of obsolescence. the_reject
Top Top