We've got a new m1000e chassis manufactured in 2014 with an A and B fabric consisting of Force10 MXL, dual QSFP+ daughter card. The chassis is loaded with a new set of M630 that has Broadcom dual 10G ports for the B Fabric and the similar Quad port cards for the A Fabric. We also have a pair of s4810's dedicated for Front End IP traffic and another pair dedicated to iSCSI only. FE/IP switches are on the B Fabric and iSCSI dedicated to the A Fabric. We also have several new PS6610 arrays (S/X) which is dedicated to this infrastructure.
The problem we're having is that NO blades seem to be able to pass traffic over the A Fabric internally. So I can't pass basic switch traffic between blades on the A Fabric MXL. I CAN however pass traffic from my VLAN interface on the MXL to the iSCSI switches, Storage Arrays, Internet traffic and vice versa. But I can not pass any traffic from the Blades or to the blades from any other interface. Our B Fabric functions just fine. Same code versions on both Fabric MXL's. Both QSFP+ daughter cards. This is baffling.
What I'm seeing, is arp entries which don't match up to their local corresponding mac entries on the MXL TenGig interface. So if I look at Slot1, port te0/1 (first port on the quad card), it's MAC address looks proper, however it's ARP address is different and more importantly, the other corresponding interfaces also have the same ARP address. I CAN ping only the first port on the quad card but still, my true MAC address isn't represented in the ARP tables.
As an FYI - this is a base CentOS install. We've tried multiple other Linux distributions and M600 and M610 blades with different network cards yet have the exact same results. I've performed all the upgrades I can perform but I thought I'd turn to the community as I can't find ANY information out there and I also can't believe I'm the only person who has encountered this config/setup/problem. We do have support working with us but this is a SLOW moving process.