BMC Nagios

From Secure Computing Wiki
Revision as of 09:35, 10 December 2007 by Ecrist (Talk | contribs) (added all but two command descriptions - saving my work.)

Jump to: navigation, search


We just purchased a Dell PowerEdge 2950. Incidentally, we use FreeBSD on everything but most of our desktops. Our new, super-fast, 2950 has the Dell PERC 5/i RAID On MotherBoard card (ROMB), for which there are no user-land management utilities for FreeBSD. I was tasked with coming up with a way to monitor our RAID health, along with monitoring the other sensors that are available on the motherboard.

In addition to the ROMB health, I've written 5 other nagios plugins to monitor ambient temperature, system voltages (go/no-go), system fan speed, chassis intrusion detection, and power supply health. I'll cover some general setup requirements for use with the BMC (Baseboard Management Controller) and getting our nagios installation to talk with it.

Note: I'll be updating the scripts available on NagiosExchange to include support for Nagios Perfdata, to enable graphing for things like Oreon and Zenoss.

BMC Configuration

To set up our Dell's BMC for communication with FreeBSD, we had to enter the BIOS on the BMC by pressing Ctl-E during system boot. From the menu, there is an option to enable LAN access. Also, you'll need to setup an IP address that is accessible from your Nagios monitoring station. Lastly, set a password and user name for your BMC. For our tests here, we set to root and 1234. There are ways to add additional users and privilege levels to your BMC, but that is out of the scope of this page.

Under the network section, enable the 'Shared' options, unless you have the DRAC card, where you will have a dedicated network card. FreeBSD works fine with the Shared option enabled. This allows the BMC to use the primary network interface on your server. This special interface has its own MAC address, so there's little conflict. Apparently, some higher class switches have a problem, as they see connections to multiple MAC addresses on a single port. You should be able to allow this by setting the appropriate policy.

Because of the use of IPMI, your monitoring system will require ipmitool!

Script Descriptions


This check queries the sdr type Drive Slot / Bay and looks at the text output. You could also do this by parsing the hex codes. The table below is the general logic of the script, with an additional column with the known hex codes for the RAID state.

Nagios Level IPMI Text IPMI Hex Code
Normal Drive Present 0x0180
Warning Drive Present, Parity Check In Progress 0x1180
Critical Drive Present, In Critical Array 0xA180

This command takes the following arguments: IP/HOSTNAME USERNAME PASSWORD.

Sample Nagios Config:

define service{
        host_name                       hostname
        service_description             Check PERC 5/i RAID
        use                             generic-service
        check_command                   check_bmc_dell_raid!!root!1234

Power Supply Check

This check queries the sdr type Power Supply and looks at the text output. The following tables displays the Nagios state and how it relates to the output from IPMI:

Nagios State IPMI Output
Normal Presence detected
Normal Fully Redundant
Critical Failure detected, Power Supply AC lost
Critical Presence detected, Failure detected
Critical Presence detected, Failure detected, Power Supply AC lost


Example Nagios configuration:

define service {
        host_name                       newcheetah
        service_description             Power Supply Supply
        use                             generic-service
        check_command                   check_bmc_ps!!root!1234

Intrusion Sensor

Many modern server cases have a magnetic reed switch that detects whether the case is open or closed. There are a couple of advantages to having this sensor.

  1. Detect when someone opens the chassis cover.
  2. When cover is open, adjust fan speed and airflow to compensate.


This is a pretty straight-forward script. The case is either open or not. Here's the example Nagios config:

define service{
        host_name                       newcheetah
        service_description             Check Intrusion
        use                             generic-service
        check_command                   check_bmc_intrusion!!root!1234

Fan Speeds

On the Dell BMC we have, there are 6 sensors, but only 4 fans in the system, not counting the fans within the power supplies. When we query the BMC, the 2 other sensors are shown as disabled. This script should work OK on most other dell systems, as it will query all of the fan sensors, and display data, as well as indicate the number of disabled sensors.

This script could use a little cleaning up, and I'll work on it as I play with our system more. One thing that should be added is a couple of command line options to set a high/low threshold for fan speed. Here's the following Nagios state as it relates to sensor output:

Normal ### RPM (where ### > 0)
Normal Disabled
Critical 0 RPM
Critical Redundancy Lost


Sample Nagios Config:

define service{
        host_name                       hostname
        service_description             Check IPMI Fan Status
        use                             generic-service
        check_command                   check_bmc_fan_status!!root!1234

Voltage (go/no-go)

Ambient Temperature