Troubleshoot Jgroups on TCP

This topic describes how to resolve some common issues that may occur when you configure Jgroups to use TCP.

Issue Resolution
The sm -reportlbstatus command returns nothing when package loss is higher than 5%.

This issue is caused by a JVM bug. If you experience this issue, run the sm -reportlbstatus command again.

When you start Service Manager from Service Console on Windows and there are more than 30 servlets configured on one host, you receive the following error message:

Error 1053: The service did not respond to the start or control request in a timely fashion

This issue does not affect the startup process. You can ignore this error message.

When you shut down and then immediately restart all the nodes on a host, errors that resemble the following are logged in the sm.log file.

JRTE I address: 59cfb5bb-82ba-1e02-b42a-7a043a29b3a1 is suspected of having crashed.

JGRP000032: IWFVM02454-55985: no physical address for 59cfb5bb-82ba-1e02-b42a7a043a29b3a1, dropping message

This issue does not affect functionality. You can ignore this error message.

The following message is logged in the sm.log file on a Linux-based system:

JGRP000032: IWFVM02439-49391: no physical address for 5f31f288-5c4f-b2f6-974216693a1bcc65, dropping message

Increase the maximum number of processes for the Service Manager user until this message no longer appears.

For example, run the following command to increase the maximum number of user processes:

ulimit -S -u 95536

If this configuration fails due to lack of permissions, contact your System Administrator.

Some Service Manager processes are running, but the sm -reportlbstatus command only returns the command ifself:

------------------Server Instances---------------------------
ProcessID ClusterAddress HttpPort HttpsPort Sessions DbgMode QMode LB State LowMem JAVA_USED/MAX/PERCENT HEAP_USED/MAX/PERCENT ----------Non Server Instances----------------------------
ProcessID ClusterAddress State LowMem JAVA_USED/MAX/PERCENT HEAP_USED/MAX/PERCENT Command Line
14196 LIHAOS2-14196 RUN N (3763512/67108864/5) (338305024/4294836224/7.) -reportlbstatus

Check whether the GossipRouter is started and is working as expected.

If the GossipRouter is started and working as expected, check whether a GossipRouter is configured in sm.ini but is not started or is down. In this situation, messages that resemble the following are printed to the sm.log file:

failed reconnecting stub to GR at peterenvlx1.cpe.sm.com/192.168.0.8:12001: java.lang.Exception: Could not connect to peterenvlx1.cpe.sm.com/192.168.0.8:12001

In this situation, start the GossipRouter process.