Quantcast
Channel: PowerConnect Forum - Recent Threads
Viewing all articles
Browse latest Browse all 2954

PowerConnect PCM8024-k (v5.1.6.3) switch stack fails after removal of switch master and re-insertion

$
0
0

I have been working on testing failover of a pair of PowerConnect PCM8024-k blade switches before putting production workloads on them.  The switches are in the "A" fabrics of the Blade Chassis -- i.e. A1 and A2.  The switches are stacked using a pair of twinax (DAC) cables, and stacking was configured according to the Dell best practices guides.  The stack works properly under normal operating conditions.  We have a total of (4) uplink ports to the Core switches (in a 4-member LACP) -- two of them on switch A1 and two on switch A2.  The Dell M620 blades are running VMware ESXi 5.5 Update 2 with a few test VMs.  To perform failover testing, I am physically unplugging one PowerConnect switch module at a time.  When I unplug switch A1 (Master), all traffic continues passing on switch A2.  All VLANs stay up, the LACP port-channel stays up (with two out of four uplinks), and the VMware blades work great (they have a vSwitch consisting of two uplinks -- one to the A1 switch and one to the A2 switch).  Basically half of the uplinks are down, but all traffic passes just fine, which is what we would expect.

The problem occurs, oddly enough, when I re-plug-in the A1 switch.  After the A1 switch boots up, ALL NETWORK TRAFFIC STOPS -- on both the A1 and A2 switch.  Everything goes down.  The solution is bizzarre.  The solution is to unplug switch A1 again, and network traffic begins to pass again (on switch A2).  Then I re-plug-in switch A1 again, let it boot up, and network traffic passes correctly on both A1 and A2.  I've tested and re-tested this a dozen times and it is 100% reproducible.  I also tested it in a different data center with a completely different Blade Chassis and completely different PowerConnect PCM8024-k switches.  Same exact problem.

All switches are running firmware v5.1.6.3.

I suspect this is a firmware bug, possibly relating to the Master switch going offline, then coming back online, thinking it is still the Master, while the other switch thinks it is the Master.  Perhaps the second reboot is enough to force the stack to figure out who the Master is?

Has anyone else run into this problem?  Does Dell know about this bug?  Workaround?  Fix?  It's a pretty serious bug in my opinion, because the failure of a switch, and subsequent replacement of the switch, will cause the entire network to go down....

Thanks.


Viewing all articles
Browse latest Browse all 2954

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>