Posted: 9/2/2004 9:50:54 AM EDT
|
Looking to tap the mighty ARFcom well of knowledge. Since our management has decided that we don't need to have operators in our Network Operations Center anymore (against the advice of the people who actually do the work!!!), our Network Admin group has been tasked with part of the jobs the NOC used to handle. One of these processes is the rebooting of servers, both Windows 2K and Unix, that have hung up. Realizing that the Unix boxes are more easily remotely rebooted than Windoze machines, are there any products that you guys can recommend that will let us remotely reboot the servers? If we need separate products for the OS's, that's fine. Thanks! Michael |
| I was just about to mention the ilo/rilo cards if you have hp servers. Pretty sweet little cards. Allow you to login and start servers up, reboot servers, replay the original shutdown sequence. You can even use a PDA to reboot the servers with the ilo/rilo cards. |
|
|
We have RIB boards in several of our HP servers, and that is an option, we were just wondering if there is something better out there. We're talking about 100 W2K servers, and about 85 Unix boxes. And I agree, something is going to bite us in the ASS, and then they'll decide having operators is cheaper than having the owner of the company bitch about not getting his morning reports when he's used to. Michael |
Indeed. I have a pile of batch files that run in Task Scheduler on a management XP station that just run down the list once a month and reboot everything preventitively on a Sunday. (16 W2k Servers, never had a problem) One could do the same thing with some creative SSH scripting for the *nix boxes. |
|
I guess I better modify my answer seeing as the shutdown /r /f may not work depending on how locked up the server it. We're a dell shop and they all have DRAC cards, which allow you to power cycle the box remotely all the way down and all the way up. However, I will say that in my experience monthly preventitive reboots can go a long way in helping avoid emergency reboots. |
|
The datacenters I manage are 80% lights out. If you're talking about hung processes, but you can still gain access to the OS, You should use remote desktop for MS, and a terminal server (Like a Cisco 2600) for the Unix systems. The Cisco product will allow you to connect to it via ssh, then connect to the Unix machine via the console port. If you're talking about a hung system, then gaining access to either will not give you any advantadge. I suggest one of the power management devices, there's plenty out there. Basically an IP enabled power switch which allows you to execute a turn on, off, or reboot (turnoff wait X seconds, then turn back on). Feel free to email me if you need any more specific details |
Just went with this solution for our web boxes and edge equipment. Hope it works. We are having LOTS of power issues right now (Imagine that with 2 hurricanes down and one on the way). |
|
+1 Joe_Blacke If it's a Compaq/HP shop, get RILO cards. We have 6 remote datacenters with 70+ DL380's per site . . . I've never set foot in any of them. The only thing you need physical access for is for network issues, hardware failures and upgrades. They work regardless of the OS and are accessable through a web interface. The problem with software methods is that if the server is locked, so is your remote management software. RILO's are 100% hardware and powered seperately from the main box. I'm sure that RILO like devices are available for most other platforms also. IM for more info if you're interested. _Disconnector_ |
Yup, DRAC is the Dell Version I mentioned above. |
What could be better than a RIB card? |
|
You can slap an sshd server onto a windows box and then just ssh in and reboot from the command line. The cygwin version works. (www.cygwin.com). Free, too. Of course, it's no help if the server is so wedged that the sshd server is down. But that's true of anything. |
|
I agree - ILO for HP and DRAC for Dell. This is a minimum requirement before my department will even support a box. We manage thousands of servers... and less than 100 are local. We use Terminal Services/VNC/Tivoli/etc... for day to day remote control, and DRAC/ILO for hung server issues. IP based KVM is also an appealing emerging technology... which is promising remote hung server reboot capability as well. Easier and cheaper retrofit than adding drac/ilo to every server if you dont already have that. |
Anything but the rib card. There can be no OS on the box and you can still connect. there can be no hard drive even. There is a reason theat HP/Proliant servers are the best Wintell box made. And don't chime in with some Dell is better shit. Dell is just a Proliant clone. Dell has been fighting the suits from Compaq for years. |
|
Make sure your computers are either using AT power supplies or support motherboards with the option to turn the system on when power is restored. APC makes a product called the Master Switch which has a telnet and web interface as well as recepticles on it. Through the interface you can tell the master switch what recepticles to turn on or off. The switch even has serial ports that can be hooked to the server that, when the server is running APC's software, will give the computer notice to safely shutdown in advance if it is not hung up. Like I said, If the system board does not support system on when power retored then you don't have much hope. Perhaps if the board supports wake-on-lan yuo could do something to help out there. -Foxxz |
|
Guys, he's talking about if the OS locks up. If the OS is locked, you obviously can't get in using Terminal Services or VNC. You need to be able to go in out of band. The newest generation of Dell PowerEdge servers will have a BMC (baseboard management controller) that operates out of band, so if your OS locks up, you can still remotely access the BMC and bounce the box from there. This will be implemented in the x8xx series of PowerEdge servers. Pretty neat stuff. |
|
I know a lot of places that use remote managed power strips. No matter what kind of OS you use, sooner or later that sucker is going to hang for one reason or another. With the power strip you can remotely power cycle individual power outlets. Crap Foxxz beat me to it |
That BMC is pretty worthless IMHO..... in the Dell 8G line, it will be disabled anytime there is a DRAC installed, and pretty much anything in the 8G line will be coming with DRAC4 on the montherboard. The 2800, 2850, 1850.... etc... all come with DRAC4. You wanna talk COOL? Wait till you see the DRAC4. They fixed everything that sucked about DRAC3 and then some! Its awesome! Dell is touting the integrated BMC just because thats going to be anindustry standard... blah blah blah. But its nothing in comparison to DRAC! |
|
I worked in a MasterSwitch shop. The only problem is that you cannot get a true console connection to see boot-time errors. For term services to work (and VNC, Tivoli, etc) the server must be up . . . if you have a boot-time issue you are SOL unless you have a DRAC/RILO. MasterSwitch is basically a remote power switch. RILO's are that and much more. Our RILO's are powered off of a seperate circuit and UPS system. Even if there is an issue with one of the redundant power supplys, we can connect to the server itself. KVM over IP is super cool, but extremely bandwidth intensive. Be prepared for mega latency unless you are on a gigalan. I will never again work in a shop that isn't RILO equipped. _Disconnector_ |
Not any more! The newer stuff uses the same technology in RILO cards... and has similar latency. Works fine over VPN/DSL connections even.... and "bearable" over dialup (if anything on dialup is actually "bearable") |
As others have mentioned, IP based KVM. We use both RILO and IP KVM (Avocent switches with DSView), and I prefer the KVM's. Especially with Windows servers. I haven't had any problems with latency at WAN speeds. |
Yes, but they are not mutually exclusive. Managment cards give you remote floppy, remote CDROM, alert management, light out power up, power down, resets, in addition to remote out of band KVM capabilities. I see IP based KVM as an excellent solution for daily use remote control, but it does not remove the need for an out of band mgmt card... it just enhances your capabilities. The comparison is kind of apples to oranges, if you makes them againt each other. |
Yes, you have a seperate port. Where I'm at we use teaming NICS as well. So that's a minimum of 3 ports per server. My client has over 800 servers, so you do the math. And that's just front end. peoplesoft is on Hitachi servers and many *NiX boxes too. I also freelance at Enron and Lyondell, both of them do the same thing and have over 500 servers. If you have to worry about port cost, you got bigger problems. |
Thats a great issue! I have TRIED to get Dell to put two interfaces on their DRAC mgmt ports.... like a two port mini-hub, so we could daisy chain the DRAC ports with only one primary switch port required per rack. I think thats a genius idea for the little used mgmt ports but Dell hasnt embraced it yet. They probably will wait and copy HP once HP starts doing it. But yes.... switch ports aside..... you gotta have out of band management! |
|
Best thing is to set the RILO on a seperate switch and router if possible. This will allow you to connect via the RILO in case their are issues with the network or power. I've been using RILO for years. It was my best tool when I worked for our Server Monitoring group. I had over 1350 Windows Servers I was monitoring real time with CIM, IBM Director, and Dell Openview, BMC Patrol, and a few other tools. Only two of those servers could be considered "local" and all the rest were managed via TS, Computer Management, and Scripts. That's everything from Domain Controllers, Exchange server, SNA Gateway servers, SQL boxes and everything in between. If terminal services was not responding, I could connect via the RILO and perform a virtual power down and reboot, provided I couldn't kill the offending processes with command line utilities. It's also a wonderfull tool if your server is stuck at an F1 prompt on an abnormal boot sequence. You can literally watch the entire boot sequence, which is something you can't do with many of the software products which require the server to be at least to the user interface (TS, RDP, .ect). |
RIBS rule...Not the roofie kind And yeah, Dell will wait till HP does it and copy them (See above post) Why do you think Dell is the last to market with blade servers. Why do you think Dells Mngt Suite looks just like Insight Manager....hmmmmm |