Administer > Diagnostic tests > Extending the Health Check Monitor

Extending the Health Check Monitor

This section is intended for advanced system administrators with experience in UNIX shell programming and SA administration.

The HCM is implemented as a series of UNIX shell scripts that perform local or global tests on the core servers. The scripts conform to specific naming conventions and reside in predefined directories. You can extend the HCM by writing your own scripts and copying them to the correct directories under /opt/opsware/oi_util.

Requirements for extensions to HCM local tests

An HCM local test is a script that is run by the /etc/init.d/opsware-sas script (see Running HCM Local Tests). A local test script must meet the following requirements:

  • UNIX Shell Script: It is a UNIX shell script that runs as root.
  • Component Server: The script resides and runs on the server of the component validated by the script. For example, if the script validates the Data Access Engine (spin), it resides on the server that runs the Data Access Engine.
  • Executable: The script is an executable file (chmod u+x).
  • File Name: The file name of the script has the following syntax:

    <int><test>.sh

    In this syntax, int is an integer that specifies the test execution order and test is the name of the test. Note that the HCM scripts provided with SA contain OPSW in the script file name; for example, 100_OPSWportping.sh.

  • Directory: The script resides in the following directory:

    /opt/opsware/oi_util/local_probes/<component>/[verify_pre | verify_post | verify_functionality]/

    In this path, component is the internal name of the core component, such as spin or twist. The directories beneath the component directory match the category of the test. For example, if the test performs a runtime validation on a core component, the script resides in the verify_functionality subdirectory. For details, see Categories and Local Test Directories.

    The directories beneath the component directory map to the mode options of the /etc/init.d/opsware-sas command. For example, if you save a script in the verify_pre subdirectory, the script is executed when you run opsware-sas with the verify_pre option. If you specify the health option of opsware-sas, the scripts in all three directories are executed. The following table describes the mapping between the directory names and the mode options.

Modes of opsware-sas and the subdirectories of local test scripts

Mode option of command line

Subdirectory of sripts run for this option

health

verify_pre
verify_post
verify_functionality

status

verify_post

verify_functionality

verify_functionality

verify_post

verify_post

verify_pre

verify_pre

  • Exit Code: The script returns an exit code of zero to indicate success or nonzero for failure. The /etc/init.d/opsware-sas command uses the exit code to determine the status for the test.
  • Results Displayed: The script displays test results on stdout.
  • Local Preamble Script: The test script runs the local_probe_preamble.sh script, as shown by HCM Local Test Example. The local_probe_preamble.sh script contains a superset of the libraries and shell variables used by the /etc/init.d/opsware-sas command.

    The local_probe_preamble.sh script performs the following tasks:

    • Sets shell variables used by the local tests. For example, it sets $PYTHON (which points to the Python interpreter) and $UTILS_DIR (which points to the directory of utilities available to the tests).
    • Parses the command line, evaluates all name=value pairs, and sets shell variables. For example, if you specify timeout=60 on the command line when running /etc/init.d/opsware-sas, the local_probe_preamble.sh script sets the variable $timeout to the value 60.
    • Provides access to useful functions such as retry, which executes a command multiple times until it succeeds or exceeds the specified timeout.
  • Shell Variables: The test script takes into account the variables specified by the name=value options on the command line. For a list of predefined names, see the name=value option in Options for the HCM Local Test Script.

Categories and local test directories

The /opt/opsware/oi_util directory has the following subdirectories.

local_probes/<component>/verify_pre

This directory includes prerequisite tests for each component. These tests validate that the necessary conditions exist for the component to operate. For example, the directory twist/verify_pre contains the test script 10check_localhost_spin.sh because the Data Access Engine component must be available for the Web Services Data Access Engine component to function.

local_probes/<component>/verify_post

This directory includes validation tests for each component. These tests verify that a given component is available. For example, the directory spin/verify_post contains the test script 10check_primary_spin.sh to validate that the Data Access Engine component is listening on port 1004 and responds to basic queries.

local_probes/<component>/verify_functionality

This directory includes runtime validation tests for each component. These tests verify that a component is fully operational. They are similar to verify_post tests; however, they might take longer to run. You might choose to skip these tests to save time.

 

Directory layout for HCM local tests

The following directory layout shows where the local tests reside:

/opt/opsware/oi_util/

 |

 |_lib

 | |_local_probe_preamble.sh

 |

 |_local_probes

   |

   |_COMMON

   | |_<test>

   | |_ ...

   |

   |_<component>

   | |

   | |_verify_pre

   | | |_ <int><test> (can be symlink to ../../COMMON/<test>)

   | | |_ ...

   | |

   | |_verify_post

   | | |_ <int><test> (can be symlink to ../../COMMON/<test>)

   | | |_ ...

   | |

   | |_verify_functionality

   | |_<int><test> (can be symlink to ../../COMMON/<test>)

   | |_...

   |

   |_<component>

     ...

HCM local test example

The following script verifies that the cron utility is running on the local server:

#!/bin/sh

# Verify that cron is running

# Read in our libraries / standard variable settings and parse

# the command line.

/opt/opsware/oi_util/lib/local_probe_preamble.sh

printf "Verify \"cron\" is running:"

process_running=`ps -eo fname | egrep '^cron$' | head -1`

if [ -z "$process_running" ]; then

echo "FAILURE (cron does not exist in the process table)"

exit 1

else

echo "SUCCESS"

exit 0

fi

Requirements for extensions to HCM global tests

An HCM global test is a script invoked by the run_global_probes.sh command (see Running HCM Global Tests). A global test script must meet the following requirements:

  • UNIX Shell Script: It is a UNIX shell script that runs as root.
  • Model Repository Server: The script resides on the Model Repository Server, but it can run remotely on any core server.
  • Executable: The script is an executable file (chmod u+x).
  • File Name: The file name of the script has the following syntax:

    <int><test>.sh[.remote]

    In this syntax, int is an integer that specifies the test execution order and test is the name of the test specified on the command line. Note that the HCM scripts provided with SA contain OPSW in the script file name; for example, 300_OPSWcheck_time.sh.

  • Remote Execution: If the test script runs on a core server other than those described in Overview of HCM Global Tests, then the file name must have the .remote extension. When you execute run_all_probes.sh and specify such a test, the script is automatically copied to all specified servers and executed remotely with the SSH protocol.

    The .remote file name extension is not required for tests that run on the same server as the Model Repository. Multimaster Component (in non-sliced installations) or the Management Gateway/Infrastructure Component (in Sliced installations). Examples of these tests are the checks for Model Repository integrity and multimaster conflicts. If the script does not have the .remote extension and it needs to communicate with remote servers, the script must use SSH. The global preamble script includes helper functions for handing remote communications with SSH.

  • Directory: The script resides in the following directory:

    /opt/opsware/oi_util/global_probes/[verify_pre | verify_post ]/

    For details, see HCM Global Test Directories.

  • Exit Code: The script returns an exit code of zero to indicate success or nonzero for failure. The run_global_probes.sh command uses the exit code to determine the status for the test.
  • Results Displayed: The script displays test results on stdout.
  • Global Preamble Script: The test script runs the global_probe_preamble.sh script, as shown by HCM Global Test Example. The global_probe_preamble.sh script contains a superset of the libraries and shell variables used by the HCM global tests.

    The global_probe_preamble.sh script performs the following tasks:

    • Sets shell variables used by the tests.
    • Parses the command line and evaluates all name=value pairs, setting them as shell variables. For example, if you specify hosts="sys1:pw1 sys2:pw2" on the command line with run_all_probes.sh, the global_probe_preamble.sh script sets the variable $hoststo the value "user1@sys1:pw1 user2@sys2:pw2".
    • Provides access to the following functions:
      • copy_and_run_on_multiple_hosts: Copies and executes a shell script on multiple remote servers.
      • copy_from_remote: Copies a file from a remote server.
      • copy_to_remote: Copies a file to a remote server.
      • run_on_multiple_hosts: Runs an existing command on multiple servers.
      • run_on_single_host: Runs an existing command on a single server.
  • Shell Variables: The test script takes into account the shell variables specified by the name=value options on the command line.
  • Authentication: The script sets up authentication or public/private key generation. See Setting Up Passwordless SSH for Global Tests.

HCM global test directories

The /opt/opsware/oi_util directory has the following subdirectories:

global_probes/verify_pre

This directory includes tests that determine whether the specified servers are core servers. When a global test in this category determines that a server is not running an SA component or the server is unreachable, no further tests are run against that server.

Only tests with a .remote extension are allowed under the verify_pre directory.

global_probes/verify_post

This directory includes tests to determine the state of a specific aspect of the entire core. For example, the directory includes the 600_OPSWcheck_OS_resources
.sh.remote
script, which checks resources such as virtual memory and disk space.

Directory layout for HCM global tests

The following directory layout shows where the global tests reside:

/opt/opsware/oi_util/

 |_bin

 | |_run_all_probes.sh

 | |_remote_host.py

 | |_<support_utility>

 | |_...

 | |_lib

 | |_global_probe_preamble

 |

 |_global_probes

   |

   |_verify_pre

   | |_<int><probe>.remote

   |

   |_verify_post

     |_int<probe>[.remote]

     |_ ...

HCM global test example

The following script checks the free disk space of the file systems used by SA. This script runs on the core servers specified by the hosts option of the run_all_probes.sh command:

# Check for freespace percentage on Opsware SA filesystems

# Read in our libraries, standard variable settings, and parse

# the command line.

/opt/opsware/oi_util/lib/global_probe_preamble.sh

MAX_PERCENTAGE=80

for filesystem in /opt/opsware /var/opt/opsware \

/var/log/opsware; do

#  The leading and trailing spaces in the following printf

#  are to improve readability.

printf " Checking $filesystem: "

percent_free=`df -k $filesystem 2> /dev/null | \

grep -v Filesystem | \

awk '{print $5}' | \

sed 's/%//'`

if [ $percent_free -ge $MAX_PERCENTAGE ] ; then

echo "FAILURE (percent freespace > $MAX_PERCENTAGE)"

exit_code=1

else

echo "SUCCESS"

exit_code=0

fi

done

exit $exit_code