Search for	Example	Results
A single word	`cat`	Topics that contain the word "cat". You will also find its grammatical variations, such as "cats".
A phrase. You can specify that the search results contain a specific phrase.	`"cat food"` (quotation marks)	Topics that contain the literal phrase "cat food" and all its grammatical variations. Without the quotation marks, the query is equivalent to specifying an OR operator, which finds topics with one of the individual words instead of the phrase.

Search for	Operator	Example
Two or more words in the same topic	`AND` `and` `+` (plus symbol) `&` (ampersand)	`cat AND dog` `"cat food"+milk` `"cat food"&"dog food"`
Either word in a topic	`OR` `or` `\|` (pipe)	`cat OR dog` `cat \| dog`
Topics that do not contain a specific word or phrase	`NOT` `not` `!` (exclamation point)	`NOT cat` `! dog`
Topics that contain one string and do not contain another	`^` (caret)	`cat ^ mouse`
A combination of search types	`( )` parentheses	`cat + (dog \| mouse)` `cat \| dog + (! mouse)`

Operations Manager i10.62

Administer > Setup and Maintenance > Monitored Nodes > Configure health checks

Configure health checks

An OMi deployment typically consists of the OMi servers, one or more connected servers, and the nodes monitored by Operations Agents. The agents enable you to collect event data, discover topology, and run actions. When you monitor nodes using agents, it is important to check that the agent is running correctly, and that the server and agent can communicate with each other.

Health checks are especially important in agent installations that facilitate the integration of data from other management systems (for example, OpsCx, SiteScope, or ArcSight Logger). If the agent on such a system fails, the event flow from the integration also stops.

By default, health checking (of the type Agent & Server) is enabled for all agents monitored by OMi and, also by default, agents send heartbeat events only to their primary manager.

Learn More

Tasks

How to configure health checks for individual agents

The following task configures health checking for one or more systems with an Operations Agent installed.

Open the properties of the monitored nodes that you want to configure:
1. Navigate to Monitored Nodes:
  
  Administration > Setup and Maintenance > Monitored Nodes
  
  Alternatively, click Monitored Nodes.
2. Make sure a node filter (from any of the filter categories) is selected in the Node Views browser. Alternatively, select a node group.
3. Select the node and click Edit. The Monitored Node Properties dialog box opens. Click the Health Check tab.
  
  To configure multiple nodes at once, hold down the Ctrl or Shift key while selecting them. Then click Edit. The Edit Health Check Configuration dialog box opens.
The values for Health Check Configuration indicate the current health check configuration of the agent:
- Off disables health checking for the system. The agent continues to send heartbeat events but the gateway server discards them. No further processing takes place.
- Default means that the default settings from the infrastructure settings are used.
- Custom enables you to override the default settings.
In Health Check Type, configure the type of health check you want to perform:
- Agent Only configures the agent to send heartbeat events at a regular interval.
- Agent & Server configures the server to actively check the health of the agent if a heartbeat event does not arrive within the configured timeout.
Set the Agent Heartbeat Interval. This is the interval at which the agent sends heartbeat events to the server.

The server creates a deployment job each time the Agent Heartbeat Interval is changed. For more information about deployment jobs, see Deployment Jobs.
Set the Agent Heartbeat Grace Period. This is the period that the server allows before generating an agent problem event or, if Agent & Server checking is enabled before contacting the agent.

The server expects to receive a heartbeat event from the agent within the time period defined by the Agent Heartbeat Interval plus the Heartbeat Grace Period.

How to configure health check defaults

The following task configures health checking globally for all Operations Agents that are managed by OMi.

When a new monitored node is added, the agent inherits the default health check settings from the infrastructure settings.

Open the Infrastructure Settings Manager:

Administration > Setup and Maintenance > Infrastructure Settings

Alternatively, click Infrastructure Settings.
Select the context Monitoring Automation and scroll to the Monitoring Automation - Health Check Settings table.
Make sure health checking is enabled on the server. Set the value of Enable Health Check to true.
Optional. Modify the Default Agent Heartbeat Interval. This is the interval at which the agent sends heartbeat events to the server.

Each time you change the agent heartbeat interval, a deployment job is created for all nodes that are configured to use the default settings from the infrastructure settings. For more information about deployment jobs, see Deployment Jobs.
Optional. Modify the Default Grace Period. This is the period that the server allows before generating an agent problem event or, if Agent & Server checking is enabled, before contacting the agent.
Optional. Configure the Default Health Check Type. Agent Only configures the agent to send heartbeat events at a regular interval. When the server detects that a heartbeat event is missing, it waits for the grace period to pass before generating an event. In addition to configuring the agent to send heartbeat events, Agent & Server configures the server to contact the agent when heartbeat events are not arriving and the grace period has passed.
Optional. Change Enable Health Check on New Agents. By default, health checking is enabled for new agents. To disable health checking for new agents, change this setting to false.
Optional. Configure additional advanced server settings such as the logging of agent up events and the severity of the event that the server generates when it detects a problem with the agent health.

Troubleshooting

OPC_HB_MSG_INTERVAL configuration variable for Agents 12.00 and later

When you configure the agent heartbeat interval in the infrastructure settings or in the node properties, the configuration variable OPC_HB_MSG_INTERVAL is set on the agent system.

You can verify that the agent heartbeat interval is set correctly on the agent system by running an opr-agt ‑get_config_var query on the server (see also opr-agt Command-Line Interface).

Example:

opr-agt ‑username myU ‑password myPwd ‑node_list "node1.example.com" ‑get_config_var eaagt:OPC_HB_MSG_INTERVAL

Note The heartbeat interval on the agent is configured in seconds.

The agent sends heartbeat events at the interval configured for OPC_HB_MSG_INTERVAL. The server, however, expects to receive heartbeat events at the interval configured for the Default Agent Heartbeat Interval setting in the infrastructure settings or node properties. If the interval set on the agent is longer than the interval set on the server, the server expects to receive more heartbeat events than the agent actually sends, and generates an agent problem event or initiates a health check although the node is up.

The next time you modify the health check interval in the infrastructure settings or node properties, OPC_HB_MSG_INTERVAL on the agent is overwritten with the settings from the server. It is therefore recommended that you do not manually change the OPC_HB_MSG_INTERVAL setting directly on the agent system. Instead, use OMi to configure the Agent Heartbeat Interval.

For more information about the Operations Agent configuration variables, see the Operations Agent Help.

Health checking problems in manager-of-manager environments

By default, agents send heartbeat events only to their primary manager and only the primary manager checks the health of an agent; that is, only the primary manager expects to receive the heartbeat events from an agent. In Manager-of-Manager (MoM) environments, the agent evaluates the rules defined in the flexible management policy to determine the target manager of the heartbeat event.

Health check problems can occur, for example when the agent does not send events because of a misconfiguration, when the agent sends the events to the wrong server, or when a server does not notice a failed agent or agent connection. Use the following troubleshooting tips to determine the cause of the problem and to identify the solution:

Symptom 1: The server generates an agent problem event even though the agent is running and the connection works.

Problem 1: The agent is not sending any heartbeat events because the OPC_HB_MSG_INTERVAL variable is not set or set to the wrong interval.

Solution 1: Verify that the agent heartbeat interval is set correctly on the agent system. For details, see OPC_HB_MSG_INTERVAL configuration variable for Agents 12.00 and later.

Problem 2: The agent is not sending any heartbeat events because the scheduled task policy template is not deployed or the wrong heartbeat interval is set.

Solution 2: Verify that the policy template is deployed and that the agent heartbeat interval is set correctly on the agent system.

Problem 3: The agent sends heartbeat events to another server.

Solution 3: Check the flexible management policy on the agent. If the server does not receive the heartbeat events because it is not the primary manager, add event target rules to the policy that send the heartbeat events to the server expecting the events.

Heartbeat events have the following event attributes:

Event Attribute	Value
Title	`Heart Beat`
Event Key	`8c72e1fa-b1f1-4def-8c7e-71ecee643351`