取消
显示结果 
搜索替代 
您的意思是: 
cancel
7160
查看次数
0
有帮助
10
回复

【求助】网络骨干cisco设备间ospf邻居频繁异常中断

yufengmin75029
Level 1
Level 1
本帖最后由 yufengmin75029 于 2020-4-25 22:07 编辑
214448kzjh3jewvwo3kgoj.jpg
拓扑结构如上,出口路由器GSR12416,核心路由器CRS-3,双归属分布式结构。
近期突然出现各路由器之间的OSPF断开重新建立的问题,已经排除链路问题、接口MTU值问题。求助各位高手给个思路。在GSR-A设备上查看日志如下(由于设备地址涉密,用上图的设备名称代替):
.Apr 25 05:05:03: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from FULL to DOWN, Neighbor Down: Too many retransmitions
Apr 25 05:05:03: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from FULL to DOWN, Neighbor Down: Too many retransmitions
.Apr 25 05:06:03: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from DOWN to DOWN, Neighbor Down: Ignore timer expired
.Apr 25 05:06:03: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from DOWN to DOWN, Neighbor Down: Ignore timer expired
.Apr 25 05:06:09: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from DOWN to INIT, Received Hello
.Apr 25 05:06:09: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from INIT to 2WAY, 2-Way Received
.Apr 25 05:06:09: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from 2WAY to EXSTART, AdjOK?
.Apr 25 05:06:09: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from EXSTART to EXCHANGE, Negotiation Done
.Apr 25 05:06:09: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from EXCHANGE to LOADING, Exchange Done
.Apr 25 05:06:09: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet5/0/0 from LOADING to FULL, Loading Done
.Apr 25 05:06:11: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from DOWN to INIT, Received Hello
.Apr 25 05:06:11: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from INIT to 2WAY, 2-Way Received
.Apr 25 05:06:11: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from 2WAY to EXSTART, AdjOK?
.Apr 25 05:06:11: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from EXSTART to EXCHANGE, Negotiation Done
.Apr 25 05:06:11: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from EXCHANGE to LOADING, Exchange Done
.Apr 25 05:06:11: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet3/0/0 from LOADING to FULL, Loading Done
该链路为波分链路,日志中无接口up down的告警信息。由于从来没有遇到过,求助各位大佬提个思路,感激不尽。
GSR-B的日志:
.Apr 25 05:05:33: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet12/0/0 from FULL to DOWN, Neighbor Down: Too many retransmitions
.Apr 25 05:05:33: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet0/0/0 from FULL to DOWN, Neighbor Down: Too many retransmitions
.Apr 25 05:06:33: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet12/0/0 from DOWN to DOWN, Neighbor Down: Ignore timer expired
.Apr 25 05:06:33: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet0/0/0 from DOWN to DOWN, Neighbor Down: Ignore timer expired
.Apr 25 05:06:38: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet12/0/0 from LOADING to FULL, Loading Done
.Apr 25 05:06:42: %OSPF-5-ADJCHG: Process 10, Nbr CRS-B on GigabitEthernet0/0/0 from LOADING to FULL, Loading Done
中断时间一致,告警信息一致,但该链路为楼内光缆直连,链路不存在问题。
CRS-B的日志:
RP/0/RP1/CPU0:Apr 22 09:50:02 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr GSR-B on TenGigE0/1/0/7 in area 0 from FULL to INIT, 1-Way, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:50:08 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr C7609-B on TenGigE0/5/0/3 in area 0 from FULL to INIT, 1-Way, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:50:10 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr GSR-A on TenGigE0/1/0/2 in area 0 from FULL to INIT, 1-Way, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:50:10 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr GSR-A on TenGigE0/1/0/10 in area 0 from FULL to INIT, 1-Way, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:50:12 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr GSR-B on TenGigE0/1/0/3 in area 0 from FULL to INIT, 1-Way, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:51:08 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr C7609-B on TenGigE0/5/0/3 in area 0 from INIT to 2WAY, 2-Way Received, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:51:08 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr C7609-B on TenGigE0/5/0/3 in area 0 from 2WAY to EXSTART, AdjOK?, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:51:08 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr C7609-B on TenGigE0/5/0/3 in area 0 from EXSTART to EXCHANGE, Negotiation Done, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:51:08 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr C7609-B on TenGigE0/5/0/3 in area 0 from EXCHANGE to LOADING, Exchange Done, vrf default vrfid 0x60000000
RP/0/RP1/CPU0:Apr 22 09:51:08 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process dqyt, Nbr C7609-B on TenGigE0/5/0/3 in area 0 from LOADING to FULL, Loading Done, vrf default vrfid 0x60000000
再后面的日志都是OSPF建立过程了。
这类中断每天都要出现2次-3次,查了很久没找到原因,排除链路问题、流量拥塞、接口MTU值,设备CPU过高等问题。
求助大大们看看有没有办法解决。
10 条回复10

YilinChen
Spotlight
Spotlight
本帖最后由 YilinChen 于 2020-4-26 09:43 编辑
基于BFD/或调整OSPF邻居hello包发包时间,思科可以在毫秒级;
如果对应接口已有类似配置,而现网情况是OSPF邻居会中断,但实际链路通信是正常的,则可以考虑延长HELLO包的发包间格(恢复默认值),让OSFP认为邻居没有超时中断;
其它排查方向:
1、可以考虑检查 NSF等相关特性功能;
2、日志上有显示VRF,这也是一个排查方向;
3、不清楚现网的OSPF域有多大,LSA条目数是否过多,这方面也可以排查一下;
4、有维保就开个CASE吧。

感觉有点像环路了,如果hello,mtu,dead 都正常,可以看下VRRP HSRP 配置是否有问题

感觉有点像环路,HELLO,DEAD,都正常的话可以看下 VRRP 和HSRP

yunqing
Level 1
Level 1
有条件还是开CASE吧

Wubin2010
Spotlight
Spotlight
show interface信息看一下,CRC,ERR等信息是否存在异常

18653465190
Spotlight
Spotlight
看楼主的日志,是3口,0口,2口之间的频繁OSPF邻接建立再DOWN,怀疑广域网路由器与核心路由器之间配置的OSPF的router-id冲突,能否排查一下。

hanxiaorui08
Spotlight
Spotlight
感觉有点像环路,也看下网内所有路由器设备是否有router-id之类的地址冲突。如果hello,mtu,dead 都正常,可以看下VRRP HSRP 配置是否有问题

likuo
Spotlight
Spotlight
要仔细检查一下,配置有问题。

Mansur
Spotlight
Spotlight
生产环境没有人改配置的话,考虑下中间链路是否有路由改造,排查下物理层问题:网线、光纤、光模块。
看看物理接口下的有没有新产生的error计数

wuhao0015
Spotlight
Spotlight
这种情况之前也遇到过,我觉得和设备配置没有关系,和链路的质量有关系。
快捷链接