【故障现象】
PowerHA版本:cluster.es.server.rte
6.1.0.17
snmpV3
/usr/es/sbin/cluster/utilities/cldump命令执行失败,报错信息如下:
rshexec: cannot connect to node frontdb1
cldump: Waiting for the Cluster SMUX peer
(clstrmgrES)
to stabilize....rshexec: cannot connect
to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
.rshexec: cannot connect to node frontdb1
|
而/usr/es/sbin/cluster/clstat命令可以正常执行。
clstat - HACMP Cluster Status
Monitor
-------------------------------------
Cluster: frontdb_cluster (1491332842)
Wed Jun 22 14:17:20 BEIST 2016
State: UP Nodes: 2
SubState: STABLE
Node: frontdb1 State:
UP
Interface: frontdb1_boot1 (0)
Address: 1.1.8.25
State: UP
Interface: frontdb1_boot2 (0)
Address: 1.2.8.25
State: UP
Interface: frontdb1_tty0_01 (2)
Address: 0.0.0.0
State: UP
Interface: frontdb_svc (0)
Address: 192.1.8.25
State: UP
Resource Group: frontdb_rg State: On line
Node: frontdb2 State:
UP
Interface: frontdb2_boot1 (0)
Address: 1.1.8.26
State: UP
Interface: frontdb2_boot2 (0)
Address: 1.2.8.26
State: UP
Interface:
frontdb2_tty0_01 (2) Address:
0.0.0.0
State: UP
|
【问题分析】
曾经遇到过由于SNMPv3配置文件错误,导致PowerHA6.1无法查看集群状态的问题。但是现象是clstat与cldump都不能执行。而这里遇到的现象是,clstat可以正常执行,只是cldump执行时报错:Waiting for the Cluster SMUX
peer (clstrmgrES) to stabilize....rshexec: cannot connect to node frontdb1。
按照以前的经验,修改SNMPv3配置文件 /etc/snmpdv3.conf,检查以下四项:
(1)VACM_VIEW
defaultView internet -
included -
(2)VACM_VIEW
defaultView
1.3.6.1.4.1.2.3.1.2.1.5 -
included -
(3)COMMUNITY public public noAuthNoPriv 0.0.0.0 0.0.0.0 -
(4)smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
修改完成后的/etc/snmpdv3.conf文件如下:
VACM_GROUP group1 SNMPv1 public
-
VACM_VIEW defaultView internet -
included -
VACM_VIEW defaultView 1.3.6.1.4.1.2.2.1.1.1.0 - included -
VACM_VIEW defaultView 1.3.6.1.4.1.2.6.191.1.6 - included -
# exclude snmpv3 related MIBs from the
default view
VACM_VIEW defaultView snmpModules - excluded -
VACM_VIEW defaultView 1.3.6.1.6.3.1.1.4 - included -
VACM_VIEW defaultView 1.3.6.1.6.3.1.1.5 - included -
VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included -
# exclude aixmibd managed MIBs from the
default view
VACM_VIEW defaultView 1.3.6.1.4.1.2.6.191 - excluded -
VACM_ACCESS group1 - - noAuthNoPriv SNMPv1 defaultView - defaultView -
NOTIFY notify1 traptag trap -
TARGET_ADDRESS Target1 UDP 127.0.0.1 traptag trapparms1 - - -
TARGET_PARAMETERS trapparms1 SNMPv1 SNMPv1
public noAuthNoPriv -
COMMUNITY public public noAuthNoPriv 0.0.0.0 0.0.0.0 -
DEFAULT_SECURITY no-access - -
logging file=/usr/tmp/snmpdv3.log enabled
logging size=100000 level=0
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
|
然后重新刷新snmp服务:
stopsrc -s snmpd; sleep 5; startsrc -s
snmpd
故障现象依旧,clstat可以正常执行,cldump继续报相同错误。
【解决方案】
原因是/usr/es/sbin/cluster/utilities/cldump脚本的BUG,需要手动VI编辑该文件,在
GET_MIB_INFO="clrexec $ACTIVE_NODE
SNMPINFO"
一行的clrexec前面加上绝对路径/usr/es/sbin/cluster/utilities/
修改前:
# Gather info from active node. Obtain only what
# is needed, so as not to exceed the
environment limit
GET_MIB_INFO="clrexec $ACTIVE_NODE
SNMPINFO"
|
修改之后:
# Gather info from active node. Obtain only what
# is needed, so as not to exceed the
environment limit
GET_MIB_INFO="/usr/es/sbin/cluster/utilities/clrexec
$ACTIVE_NODE SNMPINFO"
|
然后重新执行cldump命令成功。
<frontdb1>:#./cldump
Obtaining information via SNMP from Node:
frontdb1...
_____________________________________________________________________________
Cluster Name: frontdb_cluster
Cluster State: UP
Cluster Substate: STABLE
_____________________________________________________________________________
Node Name: frontdb1 State: UP
Network Name: net_ether_01
State: UP
Address: 1.1.8.25 Label:
frontdb1_boot1 State: UP
Address: 1.2.8.25 Label:
frontdb1_boot2 State: UP
Address: 192.1.8.25 Label:
frontdb_svc State: UP
Network Name: net_rs232_02
State: UP
Address: Label:
frontdb1_tty0_01 State: UP
Node Name: frontdb2 State: UP
Network Name: net_ether_01
State: UP
Address: 1.1.8.26 Label:
frontdb2_boot1 State: UP
Address: 1.2.8.26 Label:
frontdb2_boot2 State: UP
Network Name: net_rs232_02
State: UP
Address: Label:
frontdb2_tty0_01 State: UP
Cluster Name: frontdb_cluster
Resource Group Name: frontdb_rg
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next
Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node Group State
----------------------------
---------------
frontdb1 ONLINE
frontdb2 OFFLINE
|