InfoDoc ID   Synopsis   Date
21825   RM6 - Power Supply Failure / Monitoring on A1000/A3000   15 Nov 2000

Status Issued

Description
- Current Default Situation - 
A Power Supply failure on A1000/A3000 disk subsystems currently results 
in no alarm. The equipment has to be physically inspected periodically 
to detect power supply failures in one of the redundant power supplies.  
Although these systems have redundant power supplies there is no current
method to notify anyone that one of the supplies have failed.  Without 
periodic inspections, the failure of a second power supply would be 
catastrophic.

Site Configurations that can benefit from this InfoDoc include:

Hardware:  
 - A3000 attached E3500 X 491(primary configuration) 
 - A1xxx 
 - A3xxx 
 - E5000 
 - E6500 
 - E10000

- How to enable Power Supply failure monitoring - 
RAID Manager 6.1(RM6) software that controls and monitors A1000/A3000 
RAID arrays has a utility that performs a "health check" on the entire 
storage enclosure.  By creating an hourly cronjob on the RM6 server 
an administrator can capture power Supply failure notifications. 

An administrator could execute;

"/usr/lib/osa/bin/healthck - a | grep Pwr >>/var/adm/messages"

and have any arrays controlled by RM6 software place any power supply 
failures that happened within the hour into the RM6 server's 
"/var/adm/messages" file.  The format of these messages would be:

"Servername" Drive Tray-Pwr Supp Failure

The "var/adm/messages" file can then be reviewed on a regular basis to 
detect when one of the redundant Power Supplies has generated a failure 
message.  The appropriate repair actions could then be performed on or 
during a scheduled maintenance window.

- Additional monitoring capabilities for Sun Remote Services(SRS) Customers -

SRS 1.x currently can search the RM6 server's "/var/adm/messages" file 
and can be configured to look for lines containing the string 
"Pwr Supp Failure" .  It can therefore report this failure with minimum 
SRS modification.  The only addition to the system would be to add this
previously mentioned cronjob to RM6 server and modify the SRS 1.x 
"search string library" to include the string "Pwr Supp Failure".  An 
Alert Notification of the failure would then be created with the existing 
SRS 1.x alert notification methods.

Using the same cronjob, an SRS 2.x administrator could use the "File 
Watcher Module" in the Symon software to monitor the "/var/adm/messages" 
file on the RM6 server and report any additions to this log to the 
Symon software "file changes table".  An alarm would be triggered when 
a "Pwr Supp Failure" is listed in the "file changes table".  An alert 
notification would then be sent to the appropriate persons using the 
existing SRS 2.x alert mechanism, and without any modifications to 
current SRS 2.x technology.

There is also an enhancement to SRS 2.x that is expected to have an 
alternative monitoring capability on A1xxx, A3xxx, A5xxx, and Enterprise 
Servers' disk subsystems for the "Full Disk monitoring", "Full Interconnect
monitoring", and "Full Enclosure monitoring".  By adding a specific 
hardware module for the specific platform to the "Config Reader Module", 
one can acquire power supply monitoring for all power supplies in the 
enterprise enclosure.  This would eliminate the requirement for the use 
of the cronjob listed previously.

The current production version of SRS is 1.x, and SRS 2.x is scheduled 
for release Feb 7th.  Existing SRS 1.x site migrations are scheduled to 
commence in July at which time all customers will be migrated to SRS 2.x.

- Additional reference information can be found the following manuals - 
Sun Management Center 2.1 for Midrange Servers Platforms 
Sun Management Center 2.1 for Starfire Enterprise Servers 806-1581-10 
Sun Management Center 2.1
User Guide Symon 201 Config            
INTERNAL SUMMARY:
Change Record:
Revision:01
Date:02/09/00
Prepared By: STAR Room TSE Team
Reviewed By:Tom Bull            
SUBMITTER: Charles Price APPLIES TO: Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge A1000, Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge A3000, Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge A3500, Storage/RAID Manager, AFO Vertical Team Docs, AFO Vertical Team Docs/Hardware, AFO Vertical Team Docs/Storage ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.