Administer > Set Up Application Failover

Set Up Application Failover

To deploy NNMi in an application failover cluster, install NNMi on two servers. This section refers to these two NNMi management servers as the active and standby servers. During normal operation, only the active server runs NNMi services.

The active and standby NNMi management servers are part of a cluster that monitors a heartbeat signal from both of the NNMi management servers. If the active server fails, resulting in the loss of its heartbeat, the standby server becomes the active server.

For application failover to work successfully, the NNMi management servers must meet the following requirements:

  • Both NNMi management servers must be running the same type of operating system. For example, if the active server is running a Linux operating system, the standby server must also be running a Linux operating system.
  • Both NNMi management servers must be running the same NNMi version. For example, if NNMi 10.30 is running on the active server, the identical NNMi version, NNMi 10.30, must be on the standby server. The NNMi patch levels must also be the same on both servers.
  • The system password must be the same on both NNMi management servers.
  • Do not completely disable HTTP access to NNMi before configuring application failover. After successfully configuring the application failover cluster, you can disable HTTP and other unencrypted access.
  • For NNMi installations on Windows operating systems, the %NnmDataDir% and %NnmInstallDir% system variables must be set to identical values on both servers.
  • Both NNMi management servers must be running the same database. For example, both NNMi management servers must be running Oracle or both NNMi management servers must be running the embedded database. You cannot mix the two database types if you plan to use the application failover feature.
  • Both NNMi management servers must have identical licensing attributes. For example, the node counts and licensed features must be identical.
  • The active and standby servers must have unrestricted network access to each other.

To configure an application failover cluster, use the wizard-based configuration or manual configuration instructions available in this topic. This topic presents the following sections:

Network Latency and Bandwidth Considerations

NNMi application failover works by exchanging a continuous heartbeat signal between the nodes in the cluster. It uses this same network channel for exchanging other data files such as the NNMi embedded database, database transaction logs, and other NNMi configuration files. recommends using a high performance, low latency connection for NNMi application failover when implementing it over a WAN (wide area network).

The NNMi embedded database can become quite large, and can grow to 1GB or more even though this file is always compressed. Also, NNMi generates hundreds, or even thousands, of transaction logs during the built-in backup interval (a configuration parameter that defaults to six hours). Each transaction log can be several megabytes, up to a maximum size of 16 MB. (These files are also compressed). Example data collected from an test environments is shown here:

Number of nodes managed: 15,000
Number of interfaces: 100,000
Time to complete spiral discovery of all expected nodes: 12 hours 
Size of database: 850MB (compressed)
During initial discovery: ~10 transaction logs per minute (peak of ~15/min)
-----------------------------
10 TxLogs/minute X 12 hours = 7200 TxLogs @ ~10MB = ~72GB

This is a lot of data to send over the network. If the network between the two nodes is unable to keep up with the bandwidth demands of NNMi application failover, the standby node can fall behind in receiving these database files. This could result in a larger window of potential data loss if the active server fails.

Similarly, if the network between the two nodes has a high latency or poor reliability, this could result in a false loss-of-heartbeat between the nodes. For example, this can happen when the heartbeat signal does not respond in a timely manner, and the standby node assumes that the active node has failed. There are several factors involved in detecting loss-of-heartbeat. NNMi avoids false failover notification as long as the network keeps up with the application failover data transfer needs.

Application failover works with both the embedded and the Oracle database for NNMi 10.30. However, with Oracle, the database resides on a server that is separate from any NNMi management server, When you configure NNMi to work with an Oracle database, there is no database replication. This results in reduced network demands for application failover using an Oracle database. When using application failover with Oracle, the network uses less than 1% of the network demands as compared to using application failover with the embedded database. The information contained in this section explains NNMi traffic information related to application failover using the embedded database.

After you configure NNMi using the embedded database for application failover, NNMi does the following:

  1. The active node performs a database backup, storing the data in a single ZIP file.
  2. NNMi sends this ZIP file across the network to the standby node.
  3. The standby node expands the ZIP file, and configures the embedded database to import transaction logs on the first startup.
  4. The embedded database on the active node generates transaction logs, depending on database activity.
  5. Application failover sends the transaction logs across the network to the standby node, where they accumulate on the disk.
  6. When the standby node becomes active, NNMi starts, and the database imports all transaction logs across the network. The amount of time this takes depends on the number of files and complexity of the information stored within those files (some files take longer to import than other files of comparable size).
  7. After the standby node imports all of the transaction logs, the database becomes available, and the standby node starts the remaining NNMi processes.
  8. The original standby node is now active, and the procedure starts over at step 1.

Network Traffic in and Application Failover Environment

NNMi transfers many items across the network from the active node to the standby node in an application failover environment:

  • Database Activity: the database backup, as a single ZIP file.
  • Transaction logs.
  • A periodic heartbeat so that each application failover node verifies that the other node is still running.
  • File comparison lists so that the standby node can verify that its files are in sync with those on the active node.
  • Miscellaneous events, such as changes in parameters (enable/disable failover and others) and nodes joining or node leaving the cluster,

The first two items generate 99% of the network traffic used by application failover. This section explores these two items in more detail.

Database Activity: NNMi generates transaction logs for all database activity. Database activity includes everything in NNMi. This activity includes, but is not limited to, the following database activities:

  • Discovering new nodes.
  • Discovering attributes about nodes, interfaces, VLANs, and other managed objects.
  • State polling and status changes.
  • Incidents, events, and root cause analysis.
  • Operator actions in the NNMi console.

Database activity is outside of your control. For example, an outage on the network results in NNMi generating many incidents and events. These incidents and events trigger state polling of devices on the network, resulting in updates to device status in NNMi. When the outage is restored, additional node up incidents result in further status changes. All of this activity updates entries in the database.

Although the embedded database itself grows with database activity, it reaches a stable size for your environment, with only moderate growth over time.

Database Transaction Logs: The embedded database works by creating an empty 16 MB file, then writing database transaction information to that file. NNMi closes this file, then makes it available to application failover after 15 minutes, or after writing 16 MB of data to the file, whichever comes first. That means that a completely idle database will generate one transaction log file every 15 minutes, and this file will be essentially empty. Application failover compresses all transaction logs, so an empty 16 MB file compresses down to under 1MB. A full 16MB file compresses to about 8 MB. Keep in mind that during periods of higher database activity, application failover generates more transaction logs in a shorter period of time, since each file gets full faster.

An Application Failover Traffic Test

The following test resulted in an average of about 2 transaction log files per minute, with an average file size of 7 MB per file. This is due to the database activity associated with discovery of the additional 5000 nodes added with each failover event. The database in this test case eventually stabilized at about 1.1GB (as measured by the size of the backup ZIP file), with 31,000 nodes and 960,000 interfaces.

Testing Method: During the first 4 hours, test personnel seeded NNMi with 5,000 nodes and waited until discovery stabilized. After 4 hours, test personnel induced failover (the standby node became active, and the previous-active node became standby). Immediately after failover, test personnel added approximately 5,000 more nodes, waited another 4 hours to let the NNMi discovery process stabilize, then induced another failover (failed back to t previous active node). Test personnel repeated this cycle several times with some variation in the time between failover (4 hours, then 6 hours, then 2 hours). After each failover event, test personnel measure the following:

  • The size of the database backup ZIP file (created when the node first became active).
  • The transaction logs: the total number of files and disk space utilization.
  • The number of nodes and interfaces in the NNMi database immediately before inducing failover.
  • Time to complete failover. This included the time from the initial ovstop command on the active node until the standby node became fully active with NNMi running.

The following table summarizes the results:

Application Failover Test Results

Hours

DB.zip

Size (MB)

No. of

Tx Logs

Tx Logs

(GB)

Nodes

 

Interfaces

FailoverTime (Minutes)

4

6.5

50

.3

5,000

15,000

5

8

34

500

2.5

12,000

222,000

10

12

243

500

2.5

17,000

370,000

25

16

400

500

3.5

21,500

477,000

23

20

498

500

3.5

25,500

588,000

32

26

618

1100

7.5

30,600

776,000

30

28

840

400

2.2

30,600

791,000

31

30

887

500

2.5

30,700

800,000

16

Observations: When NNMi transferred files from the active node to the standby node, the transfer averaged about 5 GB every 4 hours, which is a continuous throughput of approximately 350KB/s (kilobytes per second) or 2.8 Mb/S (megabits per second).

This data does not include any other application failover traffic, such as the heartbeat, file consistency checks, or other application failover communication. This data also excludes the overhead of network I/O, such as packet headers. This data only included the actual network payload of each file’s contents moving across the network.

The traffic generated by NNMi application failover environment is very bursty. Application failover identifies new transaction logs on the active node every five minutes and sends these logs to the standby node. Depending on network speed, the standby node should receive all of the new files in a short time, resulting in a relatively idle network for the remainder of that 5-minute interval.

Every time the active and standby nodes switch roles (the standby node becomes active and the active node becomes standby), the new active node will generate a complete database backup and send this across the network to the new standby node. This database backup also occurs periodically, backing up every 24 hours by default. Every time NNMi generates a new backup, it sends this backup to the standby node. Having this new backup available on the standby node reduces the failover time, as all of the transaction logs NNMi generated in that 24 hour interval are already in the database, and do not need to be imported at failover time.

The information provided in the above section will help you understand how the network might perform after a failover when using NNMi with application failover using the embedded database.

Configure Application Failover with a Wizard

This procedure works only for the NNMi installation that is configured to use the embedded database. If you use oracle, skip to Configure Application Failover Manually.

  1. Launch the Cluster Setup Wizard by entering the following into a supported Web browser:

    <NNMi-FQDN>:<port>/cluster

    In this instance, <NNMi-FQDN> is the FQDN of the server that is identified to be the active server; <port> is the HTTP or HTTPS port that NNMi is configured to use on this server.

  2. Type the NNMi system user name and password, and then click Login.
  3. On the Define Cluster Nodes page, provide the following details:

    Field Description
    Local Hostname Type the FQDN of the system that is identified to be the active server.
    Remote Cluster Node Type the FQDN of the system that is identified to be the standby server.
  4. Click Next. The Communication Results page appears.

  5. If the result is successful, click Next.
  6. On the Communication Results page, review the communication verification results. If an error occurs, click Previous and fix the problem; otherwise, click Next.

    The Define Cluster Properties page opens.

  7. Provide the following details:

    Field Description
    Cluster Name Specify a name for the cluster.
    Backup Interval Specify the backup interval in hours. NNMi backs up the database of the active server at this interval.
  8. Click Next. The Starting Cluster Ports page appears.

  9. On the Define Cluster Ports page, type the Starting Cluster Port and File Transfer Port values.

    NNMi in an application failover cluster uses four contiguous ports beginning with the Starting Cluster Port.

  10. Click Next, review the summary, and then click Commit.
  11. Immediately stop NNMi on both nodes by running the ovstop command on both nodes.
  12. Start NNMi on the desired active node by running the nnmcluster command.
  13. On the standby node, run the ovstart command.

Configure Application Failover Manually

To manually configure application failover, follow these steps:

  1. Stop NNMi on both the servers.

    To stop NNMi:

    1. Log on to the server as root or administrator.
    2. Run the following command:

      ovstop

  2. Log on to one of the servers, and then follow these steps:

    1. Open the following file with a text editor:

      • On Windows: %NnmDataDir%\shared\nnm\conf\props\nms-cluster.properties
      • On Linux: /var/opt/OV/shared/nnm/conf/props/nms-cluster.properties
    2. Set the com.hp.ov.nms.cluster.name parameter to a unique name (for example, MyCluster).

    3. Set the com.hp.ov.nms.cluster.member.hostnames parameter to the FQDNs of both the servers, separated by a comma.

      For example:

      com.hp.ov.nms.cluster.member.hostnames=server-A.domain.com,server-A.domain.com
    4. Optional. Specify other com.hp.ov.nms.cluster* parameters within the nms-cluster.properties file. Follow the instructions contained within the nms-cluster.properties file for modifying each parameter.

      If you use Oracle as your database, NNMi ignores the database parameters contained in the nms-cluster.properties file.

    5. Save the file.

    6. Transfer a copy of the nms-cluster.properties file to the following directory on the other server in the cluster:

      • On Windows: %NnmDataDir%\shared\nnm\conf\props
      • On Linux: /var/opt/OV/shared/nnm/conf/props

      Both the servers in the cluster must contain identical copies of the nms-cluster.properties file.

  3. Transfer a copy of the following file as well from one server to another:

    • On Windows: %NnmDataDir%\shared\nnm\conf\nnmcluster\cluster.keystore
    • On Linux: /var/opt/OV/shared/nnm/conf/nnmcluster/cluster.keystore
  4. On each server, run the following command:

    nnmcluster

  5. Log on to the server that you identified as the active server, and then run the following command:

    nnmcluster -daemon

    As a result, this server assumes the active state.

  6. Log on to the server that you identified as the standby server, and then run the following command:

    nnmcluster -daemon

    As a result, this server assumes the standby state.

  7. If a failover occurs, the NNMi console no longer functions. Close the NNMi console session and initiate a browser session to the server that is currently active (which was the standby server originally).

    Instruct NNMi users to store two bookmarks in their browsers, one to server the active NNMi management server and one to the standby NNMi management server). If a failover occurs, users can connect the currently active NNMi by using the secondary bookmark.

  8. Instruct network operations center (NOC) personnel to configure their devices to send traps to both the servers.

Configure Cluster Communications

This is an optional procedure. During installation, NNMi queries all Network Interface Cards (NICs) on the system to find one to use for cluster communications (the first available NIC is chosen). If your system has multiple NICs, you can choose which NIC to use for nnmcluster operations by doing the following:

  1. Run nnmcluster -interfaces to list all available interfaces. For more information, see the nnmcluster reference page.
  2. Edit the following file:

    • Windows:

      %NnmDataDir%\conf\nnm\props\nms-cluster-local.properties
    • Linux:

      $NnmDataDir/conf/nnm/props/nms-cluster-local.properties
  3. Look for a line containing text similar to the following:

    com.hp.ov.nms.cluster.interface =<value>
  4. Change the value as desired.

    The interface value must pertain to a valid interface; otherwise, the cluster might not be able to start.

  5. Save the nms-cluster-local.properties file.

    The com.hp.ov.nms.cluster.interface parameter permits NNMiadministrators to select the communication interface used for nnmcluster communication. This interface is not the interface used for the embedded database or Secure Sockets Layer communication.

    To configure communications so that application failover is honored by a specific interface, use the IP address in the com.hp.ov.nms.cluster.member.hostnames parameter, as opposed to using a hostname. Set the com.hp.ov.nms.cluster.member.hostnames parameter in the following file:

    Windows:

    %NnmDataDir%\shared\nnm\conf\props\nms-cluster.properties

    Linux:

    $NnmDataDir/shared/nnm/conf/props/nms-cluster.properties

Return to the Original Configuration Following a Failover

If the active node fails and the standby node is functioning as the active node, after the former active node is fixed, you can return to the original configuration.

Perform the following steps:

  1. Fix the problem with the former active node.
  2. Run the following command on the desired active node to return to the original configuration:

    nnmcluster -acquire

For more information, see the nnmcluster reference page, or the Linux manpage.

Restart the NNMi Management Servers

You can restart the standby NNMi management server at any time with no special instructions. If you restart both the standby and active NNMi management servers, restart the active NNMi management server first.

To restart either the active or the standby NNMi management server, do the following.

  1. Run the nnmcluster -disable command on the NNMi management server to disable the application failover feature.
  2. Restart the NNMi management server.

    1. Run the ovstop command on the NNMi management server.
    2. Run the ovstart command on the NNMi management server.
  3. Run the nnmcluster -enable command on the NNMi management server to enable the application failover feature.

Restore NNMi Failover Environment

Restoring NNMi failover environment on a different set of servers requires obtaining backup of both NNMi active and standby systems, restoring them on the required servers, and also changing the hostnames in certain property files.

To restore NNMi failover environments, follow these steps:

  1. Obtain a complete offline backup of all NNMi data on both Active and Standby systems in the source failover environment.
  2. Copy the backup files to the respective destination Active and Standby systems.
  3. Install NNMi to the same version and patch level as were in place for the backup.

  4. Restore NNMi data on both Active and Standby systems.

    • Embedded Database: Use nnmrestore.ovpl command to do a full restore.
    • Oracle Database: Use the restore command similar to the following to restore only the system files.

      nnmrestore.ovpl -partial -source nnmi_backups\offline\<newest_backup>

  5. On both active and standby NNMi management servers, do the following:

    1. Identify hostnames of both active and standby NNMi management servers.
    2. Open the following file.

      • Windows: %NnmDataDir%\shared\nnm\conf\props\nms-cluster.properties
      • Linux: $NnmDataDir/shared/nnm/conf/props/nms-cluster.properties
    3. Add the hostnames of both active and standby nodes to the com.hp.ov.nms.cluster.member.hostnames parameter.

      com.hp.ov.nms.cluster.member.hostnames = fqdn_for_active, fqdn_for_standby

  6. Configure NNMi failover environment to use SSL certificates for secure communication.

Cluster File Transfer Warning Configurations

NNMi Application Failover feature continuously synchronizes database files and configuration files by periodically transferring them from the Primary server to the Standby server. A network transport issue might cause the file transfers to fail and thereby cause databases to be out of sync.

NNMi internally tracks the time since the last file transfer failed and generates health warnings in the NNMi Health Report when the file transfers consistently fail. Different health warnings are generated for the durations specified in the table below. These durations can be reconfigured to meet your requirements.

Durations for Generating Health Warnings
Health Warning Level Timeout Durations
Minor 15 minutes
Major 30 minutes
Critical 45 minutes

You can reconfigure the health warning timeout durations by uncommenting and modifying the following properties in the nms-cluster.properties file.

  • #com.hp.ov.nms.cluster.timeout.filetransfer.MINOR = 15

  • #com.hp.ov.nms.cluster.timeout.filetransfer.MAJOR = 30
  • #com.hp.ov.nms.cluster.timeout.filetransfer.CRITICAL = 45

The nms-cluster.properties file is located at:

Windows: %NnmDataDir%\shared\nnm\conf\props\nms-cluster.properties

Linux: $NnmDataDir/shared/nnm/conf/props/nms-cluster.properties

File transfer timeout duration for minor, major, or critical should be:

  • longer than the directory scan interval
  • a multiple of the directory scan interval

For example, for a directory scan interval of 15 minutes, the transfer timeout durations in minutes can be 30 (15*2) for minor, 45 (15*3) for major, 60 (15*4) for critical warnings. You can confirm the directory scan interval in nms-cluster.properties file against the property com.hp.ov.nms.cluster.timeout.scandir.

Apply Patches to an Application Failover Environment

Both NNMi management servers must be running the same NNMi version and patch level. To add patches to the active and standby NNMi management servers, use one of the following procedures:

Apply Patches to Application Failover (Shut Down Both Active and Standby)

This procedure results in both NNMi management servers being non-active for some period of time during the patch process. To apply patches to the NNMi management servers configured for application failover, follow these steps:

  1. As a precaution, run the nnmconfigexport.ovpl script on both the active and standby NNMi management servers before proceeding.
  2. As a precaution, back up your NNMi data on both the active and standby NNMi management servers before proceeding.
  3. Note the com.hp.ov.nms.cluster.name property value in the nms-cluster.properties file. You will need this value after the patch installation. This file is in the following location:

    • Windows: %nnmDataDir%\shared\nnm\conf\props\nms-cluster.properties

    • Linux: $nnmDataDir/shared/nnm/conf/props/nms-cluster.properties

  4. As a precaution, on the active NNMi management server, do the following steps:
    1. Run the nnmcluster command.
    2. Embedded database only: After NNMi prompts you, type dbsync, then press Enter. Review the displayed information to make sure it includes the following messages:

      ACTIVE_DB_BACKUP: This means that the active NNMi management server is performing a new backup.

      ACTIVE_NNM_RUNNING: This means that the active NNMi management server completed the backup referred to by the previous message.

      STANDBY_READY: This shows the previous status of the standby NNMi management server.

      STANDBY_RECV_DBZIP: This means that the standby NNMi management server is receiving a new backup from the active NNMi management server.

      STANDBY_READY: This means that the standby NNMi management server is ready to perform if the active NNMi management server fails.

  5. Run the nnmcluster-halt command on the active NNMi management server. This shuts down all nnmcluster processes on both the active and standby NNMi management servers.
  6. To verify there are no nnmcluster nodes running on either server, complete the following steps on both the active and standby NNMi management servers.

    1. Run the nnmcluster command.
    2. Verify that there are no nnmcluster nodes present except the one marked (SELF).
    3. Run exit or quit to stop the interactive nnmcluster process you started in step a.
  7. On the active NNMi management server, comment out the com.hp.ov.nms.cluster.name parameter in the nms-cluster.properties file.

    1. Edit the following file:

      • Windows: %NNM_SHARED_CONF%\props\nms-cluster.properties
      • Linux: $NNM_SHARED_CONF/props/nms-cluster.properties
    2. Comment out the com.hp.ov.nms.cluster.name parameter.
    3. Save your changes.
  8. Apply the NNMi patch to the active NNMi management server using the instructions provided with the patch.
  9. On the active NNMi management server, uncomment the com.hp.ov.nms.cluster.name parameter in the nms-cluster.properties file.

    During patch installation the com.hp.ov.nms.cluster.name property value is replaced with the NNMi default value. After you uncomment the line that contains the com.hp.ov.nms.cluster.name parameter, you also need to replace the com.hp.ov.nms.cluster.name property value with the value that was configured before the patch was installed.

    1. Edit the following file:

      • Windows: %NNM_SHARED_CONF%\props\nms-cluster.properties
      • Linux: $NNM_SHARED_CONF/props/nms-cluster.properties
    2. Uncomment the com.hp.ov.nms.cluster.name parameter in the nms-cluster.properties file on the active NNMi management server.

    3. Replace the default value of the com.hp.ov.nms.cluster.name property with the name that was configured in nms-cluster.properties before the patch was installed.

    4. Save your changes.
  10. Run the ovstart command on the active NNMi management server.
  11. Verify that the patch installed correctly on the active NNMi management server by viewing information on the Product tab of the Help > System Information window in the NNMi console.
  12. Run the nnmcluster -dbsync command to create a new backup.
  13. On the standby NNMi management server, comment out the com.hp.ov.nms.cluster.name parameter in the nms-cluster.properties file as shown in step a through step c.

  14. Apply the NNMi patch to the standby NNMi management server.

  15. On the standby NNMi management server, uncomment and update the com.hp.ov.nms.cluster.name parameter in the nms-cluster.properties file as shown in step a through step d.

  16. Run the ovstart command on the standby NNMi management server.
  17. If you installed the NNM iSPI Performance for Metrics, run the NNM iSPI enablement script on both the active and standby NNMi management servers.

Apply Patches to Application Failover (Keep One Active NNMi Management Server)

This procedure results in one NNMi management server always being active during the patch process.

This process results in continuous monitoring of the network, however NNMi loses the transaction logs occurring during this patch process.

To apply NNMi patches to the NNMi management servers configured for application failover, follow these steps:

  1. As a precaution, run the nnmconfigexport.ovpl script on both the active and standby NNMi management servers before proceeding.
  2. As a precaution, back up your NNMi data on both the active and standby NNMi management servers before proceeding.
  3. Note the com.hp.ov.nms.cluster.name property value in the nms-cluster.properties file. You will need this value after the patch installation. This file is in the following location:

    Windows: %nnmDataDir%\shared\nnm\conf\props\nms-cluster.properties

    Linux: $nnmDataDir/shared/nnm/conf/props/nms-cluster.properties

  4. Run nnmcluster on one of the nodes.
  5. Enter dbsync on the NNMi management server used in the previous step to synchronize the two databases.

    The dbsync option works on an NNMi management server using the embedded database. Do not use the dbsync option on an NNMi management server configured to use an Oracle database.

  6. Wait until the active NNMi management server reverts to ACTIVE_NNM_RUNNING and the standby NNMi management server reverts to STANDBY_READY. before continuing.
  7. Exit or quit from the nnmcluster command.
  8. Stop the cluster on the standby NNMi management server by running the following command on the standby NNMi management server:
    nnmcluster -shutdown
  9. Make sure the following processes and services terminate before continuing:

    • postgres
    • ovjboss
  10. Make sure the nnmcluster process terminates before continuing. If the nnmcluster process will not terminate, manually kill the nnmcluster process only as a last resort.

  11. Edit the following file on the standby NNMi management server:

    Windows: %nnmDataDir%\shared\nnm\conf\props\nms-cluster.properties

    Linux: $nnmDataDir/shared/nnm/conf/props/nms-cluster.properties

  12. Comment out the cluster name by placing a # at the front of the line, then save your changes:

    #com.hp.ov.nms.cluster.name = NNMicluster

  13. Install the NNMi patch on the standby NNMi management server.
  14. At this point, the standby NNMi management server is patched but stopped, and the active NNMi management server is unpatched but running. Stop the active NNMi management server and immediately bring the standby NNMi management server online to monitor your network.
  15. Shut down the cluster on the active NNMi management server by running the following command on the activeNNMi management server:
    nnmcluster -halt
  16. Make sure the nnmcluster process terminates. If it does not terminate within a few minutes, manually kill the nnmcluster process.
  17. On the standby NNMi management server, uncomment the cluster name from the nms-cluster.properties file.

    During patch installation the com.hp.ov.nms.cluster.name property value is replaced with the NNMi default value. After you uncomment the line that contains the com.hp.ov.nms.cluster.name parameter, you also need to replace the com.hp.ov.nms.cluster.name property value with the value that was configured before the patch was installed.

    1. Edit the following file:

      • Windows: %NNM_SHARED_CONF%\props\nms-cluster.properties
      • Linux: $NNM_SHARED_CONF/props/nms-cluster.properties
    2. Uncomment the com.hp.ov.nms.cluster.name parameter in the nms-cluster.properties file on the active NNMi management server.

    3. Replace the default value of the com.hp.ov.nms.cluster.name property with the name that was configured in nms-cluster.properties before the patch was installed.

    4. Save your changes.

  18. Start the cluster on the standby NNMi management server by running the following command on the standby NNMi management server:
    nnmcluster -daemon
  19. Install the NNMi patch on the active NNMi management server.
  20. At this point, the previous active NNMi management server is patched but offline. Bring it back into the cluster (as the standby NNMi management server) by performing the following:

    1. Uncomment the com.hp.ov.nms.cluster.name parameter in the nms-cluster.properties file on the active NNMi management server.
    2. Replace the default value of the com.hp.ov.nms.cluster.name property with the name that was configured in nms-cluster.properties before the patch was installed.

    3. Start the active NNMi management server using the following command:
      nnmcluster -daemon
  21. To monitor the progress, run the following command on both the active and standby NNMi management servers:

    nnmcluster

    Wait until the previous active NNMi management server finishes retrieving the database from the previous standby NNMi management server.

  22. After the previous active NNMi management server displays STANDBY_READY, run the following command on the previous active NNMi management server:
    nnmcluster -acquire
  23. If you installed the NNM iSPI Performance for Metrics, run the NNM iSPI enablement script on both the active and standby NNMi management servers.

Integrated Applications

When other Software or third-party products are integrated with NNMi, the affect of NNMi application failover on an integration depends on how a product communicates with NNMi. For more information, see the appropriate integration document.

If an integrated product must be configured with information about the NNMi management server, the following information applies:

  • If long-term, you can update the NNMi management server information within the integrating product configuration. For more information, see the appropriate integration document.
  • If the outage appears to be temporary, you can resume using the integrating product after server X returns to service. To return server X to service, follow these steps:
  1. On server X, run the following command:

    nnmcluster -daemon

    Server X joins the cluster and assumes a standby state.

  2. On server X, run the following command:

    nnmcluster -acquire

    Server X changes to the active state.

If you anticipate that the original server X will be out of service for a longer time, you can update the NNMi management server IP address within the integrating product. For instructions on how to modify the IP address field, see the integrating product documentation.

Disable Application Failover

The following information explains how to completely disable application failover. Complete the following instructions, including actions on both the active and standby NNMi management servers configured in the application failover cluster.

  1. Run nnmcluster -enable command on the activeNNMi management server.
  2. Run the nnmcluster -shutdown command on the activeNNMi management server.
  3. Wait a few minutes for the old standby NNMi management server to become the new active NNMi management server.
  4. Run the nnmcluster -display command on the new active (old standby) NNMi management server.
  5. Search the displayed results for the ACTIVE_NNM_RUNNING status. Repeat step 4 until you see the ACTIVE_NNM_RUNNING status.
  6. Run the nnmcluster -shutdown command on the new active (old standby) NNMi management server.
  7. Run the nnmcluster -display command repeatedly on the new active (old standby) until you no longer see a DAEMON process.
  8. Edit the following file both NNMi management servers configured in the cluster:

    • Windows: %NnmDataDir%\shared\nnm\conf\props\nms-cluster.properties
    • Linux: $NnmDataDir/shared/nnm/conf/props/nms-cluster.properties
  9. Comment out the com.hp.ov.nms.cluster.name option on both NNMi management servers and save each file.
  10. Edit the following file on both NNMi management servers:

    • Windows: %NnmDataDir%\shared\nnm\databases\Postgres\postgresql.conf
    • Linux: $NnmDataDir/shared/nnm/databases/Postgres/postgresql.conf
  11. Remove the following lines, which are automatically added by application failover. This is an example of what these lines could look like. These lines might look slightly different on your server.

    # The following lines were added by the NNM cluster.
    archive_command = ...
    archive_timeout = 900
    max_wal_senders = 4
    archive_mode = 'on'
    wal_level = 'hot_standby'
    hot_standby = 'on'
    wal_keep_segments = 500
    listen_addresses = 'localhost,16.78.61.68'

    Make sure to save your changes.

  12. If these are Windows NNMi management servers, navigate to the Services(Local) console and do the following on each server:

    1. Set the Startup type for the NNM Cluster Manager to Disabled.
    2. Set the Startup type for the HP OpenView Process Manager to Automatic.
  13. Create the following trigger file, which tells Postgres to stop running in standby mode and to start fully running:

    Windows%NnmDataDir%\tmp\postgresTriggerFile

    Linux: $NnmDataDir/tmp/postgresTriggerFile

  14. Run the ovstart command on the former active NNMi management server only. In the application failover configuration, this is the NNMi management server that has a permanent NNMi license.
  15. If you were using a non-production license on the former standby server. Do not run the ovstart command on the former standby NNMi management server. In the application failover configuration, this is the NNMi management server that has a non-production license. To run this NNMi management server as a standalone server, you must purchase and install a permanent license.
  16. If both NNMi management servers start successfully, then remove the following directory from both the standby and active NNMi management servers:

    • Windows: %NnmDataDir%\shared\nnm\databases\Postgres_standby
    • Linux: $NnmDataDir/shared/nnm/databases/Postgres_standby

      This directory is a default directory and is the value of the com.hp.ov.nms.cluster.archivedir parameter located in the nms-cluster.properties file. These instructions assume you did not change this value. If you changed the value of the com.hp.ov.nms.cluster.archivedir parameter in the nms-cluster.properties file, then remove the directory that equates to the new value.

  17. Remove the following directory from both the standby and active NNMi management servers:

    • Windows: %NnmDataDir%\shared\nnm\databases\Postgres.OLD
    • Linux: $NnmDataDir/shared/nnm/databases/Postgres.OLD