We have what looks to be a spanning tree problem that's disabling links between our switches. We're not sure why and are hoping someone may have the answer.
Basic network architecture;
- There is a pair of 8132F switches configured as a stack for our core
- Edge switches are 7048P, and all have single physical or logical (LACP) links back to the core
- All links from core to edge are configured as trunks
- None of the edge switches have links between each other, so there should be no possibility of a loop
In terms of spanning tree configuration;
- All switches have 802.1w RSTP enabled
- The 8132F stack is configured with a root bridge priority of 4096
- A 7048P closest to the core is configured with a root bridge priority of 8192
- All other edge switches are set with the default priority of 32768
- Edge ports on all switches are manually set to portfast
- Trunk links between the core and edge switches do not have portfast enabled
All firmware is current - v6.1.0.6 on the 8132 stack, and v5.1.5.1 on the 7048Ps
Below is an excerpt from the logs on the core switch. For reference, these are the ports on the core switch connecting to several of our edge switches.
- Te1/0/22 is a single 1GbE copper link to an edge 7048 - switch 1
- Te1/0/1 and Te2/0/1 are aggregated 10GbE fiber links, forming LACP port-channel 2 to a second edge 7048 - switch 2
- Te1/0/5 and Te2/0/5 are aggregated 10GbE fiber links, forming LACP port channel 4 to a third edge 7048 - switch 3
The sequence of events begins after rebooting edge switch 1, attached to port Te1/0/22
Aug 25 19:27:24 TRAPMGR Link Up: Te1/0/22
Aug 25 19:27:24 TRAPMGR Te1/0/22 is transitioned from the Forwarding state to the Blocking state in instance 0
Aug 25 19:27:27 TRAPMGR Te1/0/22 is transitioned from the Learning state to the Forwarding state in instance 0
Aug 25 19:27:39 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
Aug 25 19:27:39 TRAPMGR Spanning Tree Topology Change Received: MSTID: 0 Te1/0/22
Aug 25 19:27:41 TRAPMGR Link Down: Te2/0/1
Aug 25 19:27:41 TRAPMGR Link on Te2/0/1 is failed
Aug 25 19:27:41 TRAPMGR Link Down: Te1/0/1
Aug 25 19:27:41 TRAPMGR Link on Te1/0/1 is failed
Aug 25 19:27:41 TRAPMGR Po2 is transitioned from the Forwarding state to the Blocking state in instance 0
Aug 25 19:27:41 TRAPMGR Spanning Tree Topology Change Received: MSTID: 0 Te1/0/22
<We reboot edge switch 2 which restores its link>
Aug 25 19:37:58 TRAPMGR Link Up: Te1/0/1
Aug 25 19:37:58 TRAPMGR Link Up: Te2/0/1
Aug 25 19:38:02 DRIVER Port USP : 1/0/0 LAG USP : 0/3/1
Aug 25 19:38:02 DRIVER Port USP : 2/0/0 LAG USP : 0/3/1
Aug 25 19:38:02 TRAPMGR Po2 is transitioned from the Forwarding state to the Blocking state in instance 0
Aug 25 19:38:05 TRAPMGR Po2 is transitioned from the Learning state to the Forwarding state in instance 0
Aug 25 19:38:18 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
Aug 25 19:38:20 TRAPMGR Spanning Tree Topology Change Received: MSTID: 0 Po2
Aug 25 19:38:22 TRAPMGR Spanning Tree Topology Change Received: MSTID: 0 Po2
Aug 25 19:38:22 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
Aug 25 19:38:22 TRAPMGR Spanning Tree Topology Change Received: MSTID: 0 Po2
Aug 25 19:38:25 TRAPMGR Link Down: Te2/0/5
Aug 25 19:38:25 TRAPMGR Link on Te2/0/5 is failed
Aug 25 19:38:25 TRAPMGR Link Down: Te1/0/5
Aug 25 19:38:25 TRAPMGR Link on Te1/0/5 is failed
Aug 25 19:38:25 TRAPMGR Po4 is transitioned from the Forwarding state to the Blocking state in instance 0
So to summarize, when the link comes up on switch 1 after rebooting, the link for switch 2 is disabled. After rebooting switch 2 and its link is restored, this knocks out the link to switch 3. I suspect the links are being disabled on the edge switches rather than the core, but as the logs are lost on a reboot I can't be sure. Can logging be configured to survive reboots? For the time being we have disabled spanning tree to ensure these links stay up, but would like to resolve the problem so it can be enabled once again.
Can anyone please offer suggestions as to why these links are failing after a topology change, especially given there are no redundant paths? What steps might we need to go about to resolve it?
Thanks