当前位置: 主页 > 运维经验 > 运维故障 >

heartbeat出现裂脑故障之iptables篇(附解决方法详解

时间:2013-05-23 14:39来源:www.linuxyw.com 作者:admin 点击:
昨日做个heartbeat实验,但出现了裂脑状况 现在先说说本人实验环境: 系统:CentOS 5.8 64 位 用的是Vmware虚拟机 主机名 网卡 IP 注明 drfdai-21主 eth0 192.168.1.233 (桥接)(外网) eth1 100.100.
欢迎大家分享自己的文档,请点击查阅:分享方法,Linux系统运维
如果你喜欢这文章,可以点击文章结尾处百度分享,分享到你的各种社区收藏,或推荐给朋友……


昨日做个heartbeat实验,但出现了裂脑状况
现在先说说本人实验环境:

系统:CentOS 5.8  64
用的是Vmware虚拟机
主机名 网卡 IP 注明
drfdai-21主 eth0 192.168.1.233 (桥接)(外网)
  eth1 100.100.1.233 (VM3)(心跳线)
  eth2 172.1.1.233 (VM2)(内网)
       
drfdai-22备 eth0 192.168.1.234 (桥接)(外网)
  eth1 100.100.1.234 (VM3)(心跳线)
  eth2 172.1.1.234 (VM2)(内网)
       

VIP:192.168.1.235

在主服务器上配置hosts文件
172.1.1.233   drfdai-21

在从服务器上配置hosts文件
172.1.1.234   drfdai-22

在二台服务器上增加了主机路由,分别是:
route add -host 100.100.1.234 dev eth1(主)
route add -host 100.100.1.233 dev eth1(备)

配置文件(略)


配置好环境,安装好heartbeat,设置好配置文件后,
分别在二台启动heartbeat,发现2台都获取到了VIP,并绑定eth0:0上
如下:
eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:F7:5E:31  
          inet addr:192.168.1.235  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 
开始以为是配置出错了,网上找了很多资料,发现没有什么不对,又问朋友要了点资料,把配置文件换上,再重启,还是一样问题。
查看日志也很奇怪,发现主备都会说对方node drfdai-21: is dead和node drfdai-22: is dead。

日志截取段:(drfdai-21主上的)
heartbeat[5812]: 2013/05/23_11:57:43 info: Heartbeat generation: 1369255042
heartbeat[5812]: 2013/05/23_11:57:43 info: glib: UDP multicast heartbeat started for group 225.0.0.33 port 694 interface eth1 (ttl=1 loop=0)
heartbeat[5812]: 2013/05/23_11:57:43 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5812]: 2013/05/23_11:57:43 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5812]: 2013/05/23_11:57:43 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[5812]: 2013/05/23_11:57:43 info: Local status now set to: 'up'
heartbeat[5812]: 2013/05/23_11:59:43 WARN: node drfdai-22: is dead
heartbeat[5812]: 2013/05/23_11:59:43 info: Comm_now_up(): updating status to active
heartbeat[5812]: 2013/05/23_11:59:43 info: Local status now set to: 'active'
heartbeat[5812]: 2013/05/23_11:59:43 WARN: No STONITH device configured.
heartbeat[5812]: 2013/05/23_11:59:43 WARN: Shared disks are not protected.
heartbeat[5812]: 2013/05/23_11:59:43 info: Resources being acquired from drfdai-22.
harc[5822]:     2013/05/23_11:59:43 info: Running /etc/ha.d/rc.d/status status
mach_down[5889]:        2013/05/23_11:59:43 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[5889]:        2013/05/23_11:59:43 info: mach_down takeover complete for node drfdai-22.
heartbeat[5812]: 2013/05/23_11:59:44 info: mach_down takeover complete.
heartbeat[5812]: 2013/05/23_11:59:44 info: Initial resource acquisition complete (mach_down)
IPaddr[5865]:   2013/05/23_11:59:44 INFO:  Resource is stopped
heartbeat[5823]: 2013/05/23_11:59:44 info: Local Resource acquisition completed.
harc[5959]:     2013/05/23_11:59:44 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[5959]:  2013/05/23_11:59:44 received ip-request-resp IPaddr::192.168.1.235/24/eth0 OK yes
ResourceManager[5980]:  2013/05/23_11:59:44 info: Acquiring resource group: drfdai-21 IPaddr::192.168.1.235/24/eth0
IPaddr[6007]:   2013/05/23_11:59:44 INFO:  Resource is stopped
ResourceManager[5980]:  2013/05/23_11:59:44 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.235/24/eth0 start
IPaddr[6105]:   2013/05/23_11:59:44 INFO: Using calculated netmask for 192.168.1.235: 255.255.255.0
IPaddr[6105]:   2013/05/23_11:59:44 INFO: eval ifconfig eth0:0 192.168.1.235 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[6076]:   2013/05/23_11:59:44 INFO:  Success
heartbeat[5812]: 2013/05/23_11:59:54 info: Local Resource acquisition completed. (none)
heartbeat[5812]: 2013/05/23_11:59:54 info: local resource transition completed.

日志截取段:(drfdai-22备上的)
heartbeat[5526]: 2013/05/23_12:00:31 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[5527]: 2013/05/23_12:00:31 info: heartbeat: version 2.1.3
heartbeat[5527]: 2013/05/23_12:00:31 info: Heartbeat generation: 1369255059
heartbeat[5527]: 2013/05/23_12:00:31 info: glib: UDP multicast heartbeat started for group 225.0.0.234 port 694 interface eth1 (ttl=1 loop=0)
heartbeat[5527]: 2013/05/23_12:00:31 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5527]: 2013/05/23_12:00:31 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5527]: 2013/05/23_12:00:31 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[5527]: 2013/05/23_12:00:31 info: Local status now set to: 'up'
heartbeat[5527]: 2013/05/23_12:02:32 WARN: node drfdai-21: is dead
heartbeat[5527]: 2013/05/23_12:02:32 info: Comm_now_up(): updating status to active
heartbeat[5527]: 2013/05/23_12:02:32 info: Local status now set to: 'active'
heartbeat[5527]: 2013/05/23_12:02:32 WARN: No STONITH device configured.
heartbeat[5527]: 2013/05/23_12:02:32 WARN: Shared disks are not protected.
heartbeat[5527]: 2013/05/23_12:02:32 info: Resources being acquired from drfdai-21.
harc[5541]:     2013/05/23_12:02:32 info: Running /etc/ha.d/rc.d/status status
heartbeat[5542]: 2013/05/23_12:02:32 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys drfdai-22] to acquire.
mach_down[5558]:        2013/05/23_12:02:32 info: Taking over resource group IPaddr::192.168.1.235/24/eth0
ResourceManager[5596]:  2013/05/23_12:02:32 info: Acquiring resource group: drfdai-21 IPaddr::192.168.1.235/24/eth0
IPaddr[5623]:   2013/05/23_12:02:32 INFO:  Resource is stopped
ResourceManager[5596]:  2013/05/23_12:02:32 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.235/24/eth0 start
IPaddr[5721]:   2013/05/23_12:02:32 INFO: Using calculated netmask for 192.168.1.235: 255.255.255.0
IPaddr[5721]:   2013/05/23_12:02:32 INFO: eval ifconfig eth0:0 192.168.1.235 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[5692]:   2013/05/23_12:02:32 INFO:  Success
mach_down[5558]:        2013/05/23_12:02:32 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[5558]:        2013/05/23_12:02:32 info: mach_down takeover complete for node drfdai-21.
heartbeat[5527]: 2013/05/23_12:02:32 info: mach_down takeover complete.
heartbeat[5527]: 2013/05/23_12:02:32 info: Initial resource acquisition complete (mach_down)
heartbeat[5527]: 2013/05/23_12:02:42 info: Local Resource acquisition completed. (none)
heartbeat[5527]: 2013/05/23_12:02:42 info: local resource transition completed.
用tcpdump查看数据包,发现是正常有广播过来的,以下是截取段
[root@drfdai-22 ~]# tcpdump -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
14:23:32.605022 IP 100.100.1.234.44856 > 225.0.0.181.ha-cluster: UDP, length 212
14:23:32.779021 IP 100.100.1.233.34088 > 225.0.0.181.ha-cluster: UDP, length 212
14:23:34.605654 IP 100.100.1.234.44856 > 225.0.0.181.ha-cluster: UDP, length 212
14:23:34.779734 IP 100.100.1.233.34088 > 225.0.0.181.ha-cluster: UDP, length 212
14:23:36.604790 IP 100.100.1.234.44856 > 225.0.0.181.ha-cluster: UDP, length 212

最后,才想到防火墙问题,因为这环境是前段时间做过别的实验留下来的,那实验需开启防火墙
所以,把防火墙关闭,把selinux也disabled掉
然后再重启两台heartbeat,正常了,再看下(drfdai-22)备上的日志:
tail -f /var/log/ha-log 
heartbeat[5795]: 2013/05/23_13:57:44 info: glib: UDP multicast heartbeat started for group 225.0.0.181 port 694 interface eth1 (ttl=1 loop=0)
heartbeat[5795]: 2013/05/23_13:57:44 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5795]: 2013/05/23_13:57:44 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5795]: 2013/05/23_13:57:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[5795]: 2013/05/23_13:57:44 info: Local status now set to: 'up'
heartbeat[5795]: 2013/05/23_13:57:45 info: Link drfdai-21:eth1 up.
heartbeat[5795]: 2013/05/23_13:57:45 info: Status update for node drfdai-21: status up
harc[5802]:     2013/05/23_13:57:45 info: Running /etc/ha.d/rc.d/status status
heartbeat[5795]: 2013/05/23_13:57:45 info: Comm_now_up(): updating status to active
heartbeat[5795]: 2013/05/23_13:57:45 info: Local status now set to: 'active'
heartbeat[5795]: 2013/05/23_13:57:46 info: Status update for node drfdai-21: status active
harc[5822]:     2013/05/23_13:57:46 info: Running /etc/ha.d/rc.d/status status
heartbeat[5795]: 2013/05/23_13:57:56 info: remote resource transition completed.
heartbeat[5795]: 2013/05/23_13:57:56 info: remote resource transition completed.
heartbeat[5795]: 2013/05/23_13:57:56 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat[5838]: 2013/05/23_13:57:56 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys drfdai-22] to acquire.

现在只有主(即drfdai-21)上才有VIP,备(drfdai-22)上已没有了
把主stop掉,备(drfdai-22)立即可以接管到VIP,再开启主(drfdai-21),主(drfdai-21)又可以从备(drfdai-22)上接管到VIP,一切正常

又是防火墙惹的祸,半天的时间啊,就这么浪费了
大家要吸取教训啊,做实验前,一定要注意这个问题。
 

转载请注明linux系统运维
http://www.linuxyw.com/linux/yunweiguzhang/20130523/442.html

------分隔线----------------------------
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
验证码: 点击我更换图片
栏目列表
推荐内容