Product Startup Times Out (Windows WSCS 2008)

After upgrading to NNMi 10.30, if the app resource (<resource>-app) in the Failover Cluster Manager changes from "Pending" to "Failed", there might be a timeout issue. If this situation occurs, do the following:

  1. Use the cluster log /gen command to generate the cluster.log file.
  2. Open the log located in the following directory:

    C:\Windows\cluster\reports\cluster.log
  3. If you see an error in the cluster.log file similar to the following, you have a DeadlockTimeout issue:

    ERR [RHS] Resource <resource-name>-APP handling deadlock. Cleaning current operation.

    The DeadlockTimeout is the total time for failover when the agent might be blocked. The PendingTimeout represents either the online or offline operation. The DeadlockTimeout default value is 45 minutes (2,700,000 milliseconds), and the PendingTimeout default value is 30 minutes (1,800,000 milliseconds).

    You can change the DeadlockTimeout and the PendingTimeout values. For example, to set a DeadlockTimeout of 75 minutes and a PendingTimeout of 60 minutes, you can run the following commands:

    cluster res "<resource group>-APP" /prop DeadlockTimeout=4500000
    cluster res "<resource group>-APP" /prop PendingTimeout=3600000

    See your High Availability vendor documentation for more information

.