C H A P T E R  3

SMS Internals

SMS operations are generally performed by a set of daemons and commands. This chapter provides an overview of how SMS works and describes the SMS daemons, processes, commands, and system files. For more information about daemons, commands, and system files, refer to the System Management Services (SMS) 1.2 Reference Manual .



caution icon

Caution - Changes made to files in /opt/SUNWSMS can cause serious damage to the system. Only very experienced system administrators should risk changing the files described in this chapter.




Startup Flow

The events that take place when the SMS boots are as follows:

  1. User powers on the Sun Fire 15K (CPU/disk, and CD-ROM). The Solaris operating environment on the SC boots automatically.

  2. During the boot process, the /etc/init.d/sms script is called. This script, for security reasons, disables forwarding, broadcast and multicasting over the MAN network. It then starts the SMS software by invoking a background process, which starts and monitors ssd . ssd is the SMS startup daemon responsible for starting and monitoring all the SMS daemons and servers.

  3. ssd (1M) in turn invokes: mld , pcd , hwad , tmd , dsmd , esmd , mand , osd , dca , efe, and smnptd .

    For more information, see SMS Daemons , Message Logging . For efe , refer to the Sun Management Center User's Guide .

  4. Once the daemons are running, you can use SMS commands such as console .

    SMS startup can take a few minutes during which time any commands run will return an error message indicating that SMS has not completed startup. The message "SMS software start-up complete" is posted to the platform log when startup is complete and can be viewed using the showlogs (1M) command.


SMS Daemons

The SMS 1.2 daemons play a central role on the Sun Fire 15K system. Daemons are persistent processes that provide SMS services to clients using an API.



Note - SMS daemons are started by ssd and should not be started manually from the command line.



Daemons are always running, initiated at system startup, and restarted whenever necessary. Each daemon is fully described in its corresponding man page (with the exception of efe , which is referenced separately in the Sun Management Center documentation).

This section looks at the SMS daemons, their relationship to one another, and includes which CLIs (if any) access them.

FIGURE 3-1 illustrates the Sun Fire 15K client server overview.

FIGURE 3-1 Sun Fire 15K Client Server Overview



Note - The domain X server (dxs) and domain configuration agent (dca), while not daemons, are essential server processes and included in the following table and section. There is an instance of dxs and dca running for each domain up to eighteen instances.



TABLE 3-1 Daemons and Processes

Daemon Name

Description

dca

The domain configuration agent provides a communication mechanism between the dca on the system controller and the domain configuration server ( dcs ) on the specified domain. There is an instance of dca for every domain up to 18 domains.

dsmd

The domain status monitoring daemon monitors domain status and OS heartbeat for up to 18 domains.

dxs

The domain X server provides software support for a domain. There is an instance of dxs for each domain up to 18 domains.

efe

The event front end daemon is part of Sun Management Center and acts as an intermediary between the Sun Management Center agent and SMS. It is not covered further in this manual. For more information on efe , refer to the Sun Management Center User's Guide

esmd

The environmental status monitoring daemon monitors system cabinet environmental conditions.

fomd

The failover monitoring daemon detects faults on the local and remote SCs and takes appropriate action (directing/taking a failover.)

frad

The FRU access daemon provides the mechanism by which SMS daemons can access any FRUs SEEPROM on the Sun Fire 15K system.

hwad

The hardware access daemon provides hardware access to SMS daemons and a mechanism for all daemons to exclusively access, control, monitor and configure the hardware.

kmd

The key management daemon manages the IPSec security associations (SAs) needed to secure the communication between the system controller (SC) and servers running on a domain.

mand

The management network daemon supports the MAN drivers, providing required network configuration.

mld

The messages logging daemon provides message logging support for the platform and domains.

osd

The OpenBoot PROM server daemon provides software support for OpenBoot PROM.

pcd

The platform configuration database daemon provides and manages controlled access to platform, domain, and system board configuration data.

ssd

The SMS startup daemon starts, stops, and monitors all the key SMS daemons and servers.

tmd

The task management daemon provides task management services, such as scheduling for SMS.


Domain Configuration Administration

dca (1M) supports remote dynamic reconfiguration (DR) by enabling communication between applications and the domain configuration server ( dcs ) running on a Solaris 8 or Solaris 9 domain. One dca per domain runs on the SC. Each dca communicates with its dcs over the Management Network (MAN).

ssd (1M) starts dca when the domain is brought up. ssd restarts dca if it is killed while the domain is still running. dca is terminated when the domain is shut down.

dca is an SMS application that waits for dynamic reconfiguration requests. When a DR request arrives, dca creates a dcs session. Once a session is established, dca forwards the request to dcs . dcs attempts to honor the DR request and sends the results of the operation to the dca . Once the results have been sent, the session is ended. The remote DR operation is complete when dca returns the results of the DR operation.

FIGURE 3-2 illustrates the DCA client server relationship to the SMS daemons and CLIs.

FIGURE 3-2 DCA Client Server Relationships

Domain Status Monitoring Daemon

dsmd (1M) monitors domain state signatures, CPU reset conditions and Solaris heartbeat for up to 18 domains. It also handles domain stop events related to hardware failure.

dsmd detects timeouts that can occur in reboot transition flow and panic transition flow, and handles various domain hung conditions.

dsmd notifies the domain X server ( dxs (1M)) and Sun Management Center of all domain state changes and automatically recovers the domain based on the domain state signature, domain stop events, and automatic system recovery (ASR) Policy. ASR Policy consists of those procedures which restore the system to running all properly configured domains after one or more domains have been rendered inactive. This can be due to software or hardware failures or to unacceptable environmental conditions. For more information, see Automatic System Recovery (ASR) and Domain Stop Events .

FIGURE 3-3 illustrates DSMD client server relationship to the SMS daemons and CLIs.

FIGURE 3-3 DSMD Client Server Relationships

Domain X Server

dxs (1M) provides software support for a running domain. This support includes virtual console functionality, dynamic reconfiguration support, and HPCI support. dxs handles domain driver requests and events. The virtual console functionality allows one or more users running the console program to access the domain's virtual console. dxs acts as a link between SMS console applications and the domain virtual console drivers.

A Sun Fire 15K system can support up to 18 different domains. Each domain may require software support from the SC, and dxs provides that support. The following domain related projects require dxs support:

There is one domain X server for each Sun Fire 15K domain. dxs is started by ssd for every active domain and terminated when the domain is shut down.

FIGURE 3-5 illustrates DXS client server relationship to the SMS daemons.

FIGURE 3-4 DXS Client Server Relationships

Environmental Status Monitoring Daemon

esmd (1M) monitors system cabinet environmental conditions, for example. voltage, temperature, fan tray, and power supply. esmd logs abnormal conditions and takes action to protect the hardware, if necessary.

See Environmental Events for more information on esmd .

FIGURE 3-5 illustrates ESMD client server relationship to the SMS daemons.

FIGURE 3-5 ESMD Client Server Relationships

Failover Management Daemon

fomd (1M) is the core of the SC failover mechanism. fomd detects faults on the local and remote SCs and takes the appropriate action (directing a failover/takeover).

fomd ensures that important configuration data is kept synchronized between both SCs. fomd runs on both the master and spare SC.

FIGURE 3-6 illustrates FOMD client server relationship to the SMS daemons.

FIGURE 3-6 FOMD Client Server Relationships

FRU Access Daemon

frad (1M) is the field replaceable unit (FRU) access daemon for SMS. frad provides controlled access to any SEEPROM within the Sun Fire 15K platform that is accessible by the SC. frad supports dynamic FRUID which provides improved FRU data access.

frad is started by ssd .

FIGURE 3-7 illustrates FRAD client server relationship to the SMS daemons.

FIGURE 3-7 FRAD Client Server Relationships

Hardware Access Daemon

hwad (1M) provides hardware access to SMS daemons and a mechanism for all daemons exclusively to access, control, monitor, and configure the hardware.

hwad runs in either main or spare mode when it comes up. The failover daemon ( fomd (1M)) determines which role hwad will play.

At startup, hwad opens all the drivers ( sbbc , echip , gchip , and consbus ) and uses ioctl (2) calls to interface with them. It reads the contents of the device presence register to identify the boards present in the system and makes them accessible to the clients. hwad also configures the local system clock and sets the clock source for each board present in the system.

IOSRAM and Mbox interfaces are also provided by hwad . This helps communication between the SC and the domain. For dynamic reconfiguration (DR), hwad directs communication to the IOSRAM (tunnel switch).

For darb interrupts, hwad notifies the dsmd (1M) if there is a dstop or rstop . It also notifies related SMS daemon(s) depending on the type of the Mbox interrupt that occurs.

hwad detects and recovers console bus and jtag errors.

Hardware access to the Sun Fire 15K system on the SC is done either by going through the PCI bus or console bus. Through the PCI bus you can access:

Through the Console bus you can access:

FIGURE 3-8 illustrates HWAD client server relationship to the SMS daemons and CLIs.

FIGURE 3-8 HWAD Client Server Relationships

Key Management Daemon

The key management daemon provides a mechanism for managing security for socket communications between the SC and the domains.

The current default configuration includes authentication policies for the dca (1M) and dxs (1M) clients on the SC, which connect to the dcs (1M) and cvcd (1M) servers on a domain.

kmd (1M) manages the IPSec security associations (SAs) needed to secure the communication between the SC and servers running on a domain.

kmd manages per-socket policies for connections initiated by clients on the SC to servers on a domain.

At system startup, kmd creates a domain interface for each domain that is active. An active domain has both a valid IOSRAM and is running the Solaris operating environment. Domain change events can trigger creation or removal of a domain kmd interface.

kmd manages shared policies for connections initiated by clients on the domain to servers on the SC. The kmd policy manager reads a configuration file and stores policies used to manage security associations. A request received by kmd is compared to the current set of policies to ensure that it is valid and to set various parameters for the request.

Static global policies are configured using ipsecconf (1M) and associated data file ( /etc/inet/ipsecinit.conf ). Global policies are used for connections initiated from the domains to the SC. Corresponding entries are made in the kmd configuration file. Shared security associations for domain to SC connections are created by kmd when the domain becomes active.



Note Note - In order to work properly, policies created by ipsecconf and kmd must match.



The kmd configuration file is used for both SC-to-domain and domain-to-SC initiated connections. The kmd configuration file resides in
/etc/opt/SUNWSMS/config/kmd_policy.conf .

The format of the kmd configuration files is as follows:

dir:d_port:protocol:sa_type:aut_alg:encr_alg:domain:login

where:

dir

is identified using the sctodom or domtosc strings.

d_port

is the destination port.

protocol

is identified using the tcp or udp strings.

sa_type

is the security association type. Valid choices are the ah or esp strings.

auth_alg

is the authentication algorithm. The authentication algorithm is identified using the none or hmac-md5 strings or leaving the field blank.

encr_alg

is the encryption algorithm. The encryption algorithm is identified using the none or des strings or leaving the field blank.

domain

is the domain_id associated with the domain. Valid domain_id s are integers 0-17, space Using a space in the domain_id field defines a policy that applies to all domains. A policy for a specific domain overrides a policy applied to all domains.

login_name

is the login name of the user affected by the policy. Currently this includes sms-dxs , sms-dca , and sms-mld .


For example:

# Copyright (c) 2001 by Sun Microsystems, Inc.
# All rights reserved.
#
#
# This is the policy configuration file for the SMS Key Management Daemon.
# The policies defined in this file control the desired security for socket 
# communications between the system controller and domains.
#
# The policies defined in this file must match the policies defined on the
# corresponding domains. See /etc/inet/ipsecinit.conf on the Sun Fire 15K domain.
# See also the ipsec(7P), ipsecconf(1M) and sckmd(1M) man pages.
# 
# The fields in the policies are a tuple of eight fields separated by the pipe 
'|' # character.
#
#<dir>|<d_port>|<protocol>|<sa_type>|<auth_alg>|<encr_alg>|<domain>|<login>|
#
# <dir>         --- direction to connect from. Values: sctodom, domtosc
# <d_port>      --- destination port
# <protocol>    --- protocol for the socket. Values: tcp, udp
# <sa_type>     --- security association type. Values: ah, esp
# <auth_alg>    --- authentication algorithm. Values: none, md5, sha1
# <encr_alg>    --- encryption algorithm. Values: none, des, 3des
# <domain>      --- domain id. Values: integers 0 - 17, space
#                   A space for the domain id defines a policy which applies
#                   to all domains. A policy for a specific domain overrides
#                   a policy which applied to all domains.
# <login>       --- login name. Values: Any valid login name
#
# ----------------------------------------------------------------------------
sctodom|665|tcp|ah|md5|none| |sms-dca|
sctodom|442|tcp|ah|md5|none| |sms-dxs|

FIGURE 3-9 illustrates KMD client server relationship to the SMS daemons.

FIGURE 3-9 KMD Client Server Relationships

Management Network Daemon

mand (1M) supports the Management Network (MAN). See Management Network Services mand runs in either main or spare mode when it comes up. The failover daemon ( fomd (1M)) determines which role mand plays.

At system startup, mand creates the mapping between domain_tag and IP address in the platform configuration database ( pcd ), and configures the SC-to-SC private network. This information is obtained from the file /etc/opt/SUNWSMS/config/MAN.cf , which is created by the smsconfig (1M) command. mand then obtains domain configuration information from the pcd and programs the scman (7d) driver accordingly. After initializing the pcd and the scman driver, mand registers for domain keyswitch events, tracks changes in domain active board lists, tracks active Ethernet information from the dman (7d) driver and updates the scman driver, as appropriate.

mand also communicates system startup MAN information to each domain when the domain is powered on ( setkeyswitch on). This information includes Ethernet and MAN IP addressing information. This information is used during the initial software installation on the domain.

FIGURE 3-10 illustrates MAND client server relationship to the SMS daemons.

FIGURE 3-10 MAND Client Server Relationships

Message Logging Daemon

The message logging daemon, mld , captures the output of all other SMS daemons and processes. mld supports three configuration directives: File, Level, and Mode, in the /var/opt/SUNWSMS/adm/.logger file.

mld monitors the size of each of the message log files. For each message log type, mld keeps up to ten message files at a time, x.0 though x.9. For more information on log messages, see Message Logging

FIGURE 3-11 illustrates MLD client server relationship to the SMS daemons and CLIs.

FIGURE 3-11 MLD Client Server Relationships

OpenBoot PROM Support Daemon

osd (1M) provides support to the OpenBoot PROM Process running on a domain. osd and OpenBoot PROM communication is through a mailbox that resides on the domain. The osd daemon monitors the OpenBoot PROM mailbox. When the OpenBoot PROM writes requests to the mailbox, osd executes the requests accordingly.

osd runs at all times on the SC even if there are no domains configured. osd provides virtual TOD service, virtual NVRAM, and virtual REBOOTINFO for OpenBoot PROM and an interface to dsmd (1M) to facilitate auto-domain recovery. osd also provides an interface for the following commands: setobpparams (1M), showobpparams (1M), setdate (1M) and showdate (1M). See also Chapter 4 .

osd is a trusted daemon in that it will not export any interface to other SMS processes. It exclusively reads and writes from and to all OpenBoot PROM mailboxes. There is one OpenBoot PROM mailbox for each domain.

osd has two main tasks; to maintain its current state of the domain configuration, and to monitor the OpenBoot PROM mailbox.

FIGURE 3-12 illustrates OSD client server relationship to the SMS daemons and CLIs.

FIGURE 3-12 OSD Client Server Relationships

Platform Configuration Database Daemon

pcd (1M) is a Sun Fire 15K system management daemon that runs on the SC with primary responsibility for managing and providing controlled access to platform and domain configuration data.

pcd manages an array of information that describes the Sun Fire system configuration. In its physical form, the database information is a collection of flat files, each file appropriately identifiable by the information contained within it. All SMS applications that want to access the database information must go through pcd .

In addition to managing platform configuration data, pcd is responsible for platform configuration change notifications. When pertinent platform configuration changes occur within the system, the pcd sends out notification of the changes to clients who have registered to receive the notification.

FIGURE 3-13 illustrates PCD client server relationship to the SMS daemons and CLIs.

FIGURE 3-13 PCD Client Server Relationships

Platform Configuration

The following information uniquely identifies the platform:

Domain Configuration

The following information is domain related:

System Board Configuration

The following information is related to system boards:

SMS Startup Daemon

ssd (1M) is responsible for starting and maintaining all SMS daemons and domain X servers.

ssd checks the environment for availability of certain files and the availability of the Sun Fire 15K system, sets environment variables, and then starts esmd (1M). esmd monitors environmental changes by polling the related hardware components. When an abnormal condition is detected, esmd handles it or generates an event so that the correspondent handlers will take appropriate action and/or update their current status. Some of those handlers are: dsmd , pcd and Sun Management Center (if installed). The main objective of ssd is to ensure that the SMS daemons and servers are always up and running.

FIGURE 3-14 illustrates SSD client server relationship to the SMS daemons.

FIGURE 3-14 SSD Client Server Relationships

Scripts

ssd uses a configuration file, ssd_start to determine which components and in what order to start up the SMS software. This configuration file is located in the
/etc/opt/SUNWSMS/startup directory.



caution icon

Caution Caution - This is a system configuration file. Mistakes in editing this file can render the system inoperable. args is the only field that should ever be edited in this script. Refer to the daemon man pages for specific options and pay particular attention to syntax.



ssd_start consists of entries in the following format:

name:args:nice:role:type:trigger:startup_timeout:shutdown_timeout:uid:start_order:stop_order

where:

name

is the name of the program.

args

are the valid program options or arguments. Refer to the daemon man pages for more information.

nice

specifies a process priority tuning value. Do not adjust.

role

specifies whether the daemon is platform or domain specific.

type

specifies whether the program is a daemon or a server.

trigger

specifies whether the program should be started automatically or upon event reception.

startup_timeout

is the time in seconds ssd will wait for the program to startup.

stop_timeout

is the time in seconds ssd will wait for the program to shutdown.

uid

is the user_id the associated program will run under.

start_order

is the order in which ssd will startup the daemons. Do not adjust. Changing the default values can result in the SMS daemons not working properly.

stop_order

is the order in which ssd will shutdown the daemons. Do not adjust. Changing the default values can result in the SMS daemons not working properly.


Spare Mode

Each time ssd starts, it comes up in spare mode. Once ssd has started the platform core daemons running, it queries fomd (1M) for its role. If the fomd query returns with spare , ssd will stay in this mode. If the fomd returns with main , then ssd transitions to main mode.

After this initial query phase, ssd only switches between modes through events received from the fomd .

When in spare mode, ssd starts and monitors all of the core platform role, auto trigger programs in the ssd_start file. Currently, this list is made up of the following programs.

If, while in main mode, ssd receives a spare event, then ssd shuts down all programs except the core platform role and auto trigger programs found in the ssd_start file.

Main Mode

ssd will stay in spare mode until it receives a main event. At that time, ssd starts and monitors (in addition to the already running daemons) all of the platform role (main only) event trigger programs, in the ssd_start file. Currently, this list is made up of the following programs.

Finally, after starting all the platform role, event trigger programs, ssd queries the pcd to determine which domains are active. For each of these domains, ssd starts all the domain role, event trigger programs found in the ssd_start file.

Domain-specific Process Startup

ssd uses domain start and stop events from pcd as instructions for starting and stopping domain-specific servers.

Upon reception, ssd either starts or stops all of the domain role, event trigger programs (for the domain identified) found in the ssd_start file.

Monitoring and Restarts

Once ssd has started a process, it monitors the process and restarts in the event the process fails.

SMS Shutdown

In certain instances, such as SMS software upgrades, the SMS software needs to be shut down. ssd provides a mechanism to shut down itself and all SMS daemons and servers under its control.

ssd notifies all SMS software components under its control to shut down. After all the SMS software components have been shut down, ssd shuts itself down.

Task Management Daemon

tmd (1M) provides task management services such as scheduling for SMS. This reduces the number of conflicts that can arise during concurrent invocations of the hardware tests and configuration software.

Currently, the only service exported by tmd is the hpost (1M) scheduling service. In the Sun Fire 15K system, hpost is scheduled based on two factors.

FIGURE 3-15 illustrates TMD client server relationship to the SMS daemons.

FIGURE 3-15 TMD Client Server Relationships

Environment Variables

Basic SMS environment defaults must be set in your configuration files to run SMS commands.

Setting other environment variables when you log in can save time. TABLE 3-2 suggests some useful SMS environment variables.

TABLE 3-2 Example Environment Variables

SMSETC

The path to the /etc/opt/SUNWSMS directory containing miscellaneous SMS-related files.

SMSLOGGER

The path to the /var/opt/SUNWSMS/adm directory containing the configuration file for message logging, .logger .

SMSOPT

The path to the /opt/SUNWSMS directory containing the SMS package binaries, libraries, and object files; configuration and startup files.

SMSVAR

The path to the /var/opt/SUNWSMS directory containing platform and domain message and data files.