Hello everyone,
we have a "star" shaped network where all the peripheral switches are connected to a central N3024F using fiber cable.
Some switches are connected using a 2 port LAG but most of them are using a single cable because the network is still work in progress, others are connected on a single port.
We already experienced a loss of connection to a switch twice (once in august and once in may), to bring back the interface we had to force it down from the N3024F configuration and then bring it back up, so no actions were taken on the peripheral switch but only on the N3024F.
My idea was that the Dynamic LAG with a single cable (the case of the switch we lost) could be a problem and I was planning to configure all the LAGs as static (we shouldn't need the dynamic configuration anyway) but friday night we had a major downtime caused by the same problem occuring on 4 switches at the same time, 2 of them configured with no LAG, the log on the N3024F show this:
Notice | Nov 20 21:50:10 | DOT3AD | ch10 is down. |
Notice | Nov 20 21:50:10 | DOT3AD | ch9 is down. |
Notice | Nov 20 21:50:10 | TRAPMGR | Link on Gi1/0/22 is failed |
Notice | Nov 20 21:50:10 | TRAPMGR | Link Down: Gi1/0/22 |
Info | Nov 20 21:50:10 | DOT3AD | Interface Gi1/0/19 detached from ch10. |
Notice | Nov 20 21:50:10 | TRAPMGR | Link on Gi1/0/21 is failed |
Notice | Nov 20 21:50:10 | TRAPMGR | Link Down: Gi1/0/21 |
Info | Nov 20 21:50:10 | DOT3AD | Interface Gi1/0/17 detached from ch9. |
Notice | Nov 20 21:50:10 | TRAPMGR | Link on Gi1/0/19 is failed |
Notice | Nov 20 21:50:10 | TRAPMGR | Link Down: Gi1/0/19 |
Notice | Nov 20 21:50:10 | TRAPMGR | Link on Gi1/0/17 is failed |
Notice | Nov 20 21:50:10 | TRAPMGR | Link Down: Gi1/0/17 |
Info | Nov 18 15:06:04 | CLI_WEB | [WEB:root:192.168.0.1] Disconnected due to Idle Timeout |
The solution was the same as before, force down the ports/LAGs and then bring them back up.
So what could cause a problem like this? I know we have an outdated firmware on the switch (6.2.0.5), but could be only that? Should I run some tests on the fibers, the patches and the transceiver? Could be a HW related problem on the backplane itself or to something else like high temps or such things?
Thanks