PDU Nagios
We've recently acquired a Tripp Lite PDUMH20ATNET, which is a power-switching appliance. We're using it at our data center, where there was a recent PDU failure. Since we were on the PDU that failed, and we only had power connections to that single PDU, we lost all of our systems.
Going forward, we've added a second PDU circuit to our rack and have the Tripp Lite PDU switch for our single-power supply systems such as switches and a couple routers. Having this unit would help mitigate further problems, but as this unit supports SNMP monitoring, I've written a script that allows Nagios to poll via SNMP the input and output voltages, frequency, and output current.
In addition, check out the PDU Cacti page for the cacti graphs I've created, along with the perl script to go along with it.
Contents
Requirements
This script has the following prerequisites:
- Perl 5.8.8 or newer (may work with older versions, I'm not perl expert.)
- SNMP Perl Module (/usr/ports/net-mgmt/p5-SNMP-Utils on FreeBSD)
- SNMP MIBs.
Perl Script
Here's the perl script. This requires the SNMP perl module, as well as the obligatory nagios perl modules, be installed.
#!/usr/bin/perl # #use warnings; use SNMP; # requires net-mgmt/p5-SNMP-Utils use lib "/usr/local/libexec/nagios"; use utils qw(%ERRORS); my $usage = ' Usage: ${0} hostname snmp_community key min max Connects via SNMP to a UPS or PDU and pulls Input/Output voltages and frequencies, as well as current output (amps). hostname is the name of the host youi\'re checking snmp_community is the SNMP community string for authenticaiton key, is the specific key you\'re requesting, from: inputf......Input Frequency inputv......Input Voltage outputf.....Output Frequency outputv.....Output Voltage outputc.....Output Current This script outputs performance data compatible with Nagios. $Id$ '; $ENV{'MIBS'} = "ALL"; $host = $ARGV[0]; die $usage unless defined $host; $community = $ARGV[1]; die $usage unless defined $community; $key = $ARGV[2]; die $usage unless defined $key; $min = $ARGV[3]; die $usage unless defined $min; $max = $ARGV[4]; die $usage unless defined $max; $session = new SNMP::Session (DestHost => $host, Community => $community, Version => "2c"); $oids = new SNMP::VarList (['UPS-MIB::upsIdentManufacturer'], #0 ['UPS-MIB::upsIdentModel'], #1 ['UPS-MIB::upsInputVoltage'], #2 ['UPS-MIB::upsInputFrequency'], #3 ['UPS-MIB::upsOutputVoltage'], #4 ['UPS-MIB::upsOutputFrequency'], #5 ['UPS-MIB::upsOutputCurrent']); #6 @status = $session->getnext($oids); $manuf = $status[0]; $model = $status[1]; $inputv = $status[2]; $inputf = $status[3]/10; $outputv = $status[4]; $outputf = $status[5]/10; $outputc = $status[6]/10; if (($min < $${key}) and ($${key} < $max)){ print "NORMAL: $manuf($model) $$key | inputf=${inputf}"; exit $ERRORS{'OK'}; } else { print "CRITICAL: $manuf($model) $$key | intputf=${inputf}"; exit $ERRORS{'CRITICAL'}; }
Nagios Configuration
checkcommands
The script above takes 5 arguments, 2 of which it's possible to easily hard-code into the checkcommand section of your configuration. The first, hostname, can be gleaned from the $HOSTNAME$ variable, while the community_string can be hardcoded, as long as you use the same read-only community string across your campus.
Here' what we use:
define command{ command_name check_pdu command_line $USER1$/check_pdu $HOSTADDRESS$ comm_string $ARG1$ $ARG2$ $ARG3$ }
service configuration
Following configuring the command within the checkcommand section, we can define a service as follows:
define service{ host_name hostname service_description Input Voltage use generic-service check_command check_pdu!inputv!105!118 }
The above section with poll for key inputv (input voltage), with a minimum of 105 volts A/C and a maximum of 118 volts A/C. For each key, follow a similar template, setting correct min/max values.
MIB Notes
Tripplite doesn't have very good public MIB files available, that I've been able to find (even after contacting support). Some useful tidbits:
Environment Sensor .1.3.6.1.4.1.850.101.1.1.1.0 = Temperature, Celsius .1.3.6.1.4.1.850.101.1.1.2.0 = Temperature, Fahrenheit .1.3.6.1.4.1.850.101.1.1.3.0 = Low Temp Threshold .1.3.6.1.4.1.850.101.1.1.4.0 = High Temp Threshold .1.3.6.1.4.1.850.101.1.1.5.0 = ? .1.3.6.1.4.1.850.101.1.2.1.0 = Humidity .1.3.6.1.4.1.850.101.1.2.2.0 = Low Humidity Threshold .1.3.6.1.4.1.850.101.1.2.3.0 = High Humidity Threshold .1.3.6.1.4.1.850.101.1.2.4.0 = ? Environment Sensor Dry Contacts .1.3.6.1.4.1.850.101.2.1.1.2.N: String Description .1.3.6.1.4.1.850.101.2.1.1.3.N: Alarm = 1, Normal = 0 .1.3.6.1.4.1.850.101.2.1.1.4.N: Normally Open = 0, Normally Closed = 1