Catalyst 6500 with Dual Sup2-MSFC2 and Undocumented L2_DIST_LRN-SP Errors

Unanswered Question
Feb 6th, 2012
User Badges:

So I have a Catalyst 6500 with dual Sup2-2GE supervisors, SFMs and it seems to function fine pretty much all the time.


Recently it had a very disturbing issue with the Dell 6248 Blade Switch stack that it connects to. It has an LACP port-channel to (2 x 1gbit cat5e connections) from the Cat 6500 to the Dell switch stack. One cable is on the WS-X6516-GE-TX and the other is on the WS-X6148A-GE-TX (this disparity is planned to be rectified with another WS-X6516-GE-TX to be added shortly).


The issue was that suddenly, and with no apparent reason in the logs, the port-channel interface on the 6500 went down/down and would not come back up. The dell switch showed that it was 'active' which is usually what it shows when the bundle is up and LACP is working.


The physical interfaces in the port-channel on the 6500 both showed up/up but the port-channel did not. We tried shutting down the interfaces one at a time and then both on each end (the 6500 and the Dell) and then bringing them back up. The interfaces responded properly but the port-channel would not come up. It would report admin down when we shut down the port-channel but stay down/down no matter what we did.


Eventually we ended up rebooting the Dell switch and this finally brought the port-channel up. At this time we found that a similar issue was occurring with some Linux OS blade servers that also connect to the Dell switch using dual port LACP bundles. The physical links were up but the 'bundle' appeared to be down. This lead us to believe that the Dell was at fault (which is probably true).


The reboot of the Dell switch that seemed to fix the port-channel problem with the 6500 did *not* fix the LACP bundle problem with the servers connected to it. 


We ended up installing the latest firmware upgrade on the Dell (a few newer patch releases than what we had on the switch) and after rebooting for that, all the links came up and all is well again.


So it seems the Dell was at fault.


However, during this outage the first problem I noticed in the logs on the 6500 was a strange error in the log that I have never seen before:


%L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp


It seems to have been happening for at least a short time before the LACP problem started but I am not sure how long. I can find no real information on this error and the documentation result for a similar error (L2_DIST_LRN-6-NO_PKT_SEND) are not helpful. Essentially it says 'open a ticket', probably because this is not supposed to happen if your hardware is working right.


Here is the 'show mod':


Mod Ports Card Type                              Model            

---  ----- -------------------------------------- ------------------

  1    2  Catalyst 6000 supervisor 2 (Active)    WS-X6K-SUP2-2GE   

  2    2  Catalyst 6000 supervisor 2 (Hot)       WS-X6K-SUP2-2GE  

  3   16  SFM-capable 16 port 10/100/1000mb RJ45 WS-X6516-GE-TX  

  4   48  48-port 10/100/1000 RJ45 EtherModule   WS-X6148A-GE-TX  

  5    0  Switching Fabric Module-136 (Active)   WS-X6500-SFM2    

  6    0  Switching Fabric Module-136 (Standby)  WS-X6500-SFM2  

  9   48  48-port 10/100 mb RJ45                 WS-X6148-RJ45V    



Log excerpt:


Feb  2 15:06:08: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  2 20:39:28: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  3 00:07:49: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  3 07:04:29: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  3 18:52:49: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  3 20:16:10: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  4 07:02:00: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  4 18:29:31: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  4 19:11:11: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  4 21:16:11: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  5 04:12:51: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp

Feb  5 04:54:31: %L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp


You can see that the errors can be quite far apart or sometimes fairly close together, minutes or hours separating them.


The devices connected to this switch are pretty predictable, layer 2 topology should not be changing very often, if at all. Certainly no interfaces are going up and down in between those errors.


So back to the only clue I have for this...


Sometime before this error started happening, a new VTP trunk was connected to this switch (with another 6500 at the other end which does not have these errors). It is also an LACP port-channel and it has VLAN mapping in use on it for a couple of VLANs. This VLAN mapping is also in use on the port-channel to the Dell switch. The port-channel to the Dell switch is one port on the 16 port card and one on the 48 port card, as I mentioned. The 48 port card does not support VLAN mapping but somehow, when it was applied to the port-channel, it allowed it to be configured on the one port in the port-channel but not the other. I think this might be the cause of the error.


Does anyone have any insight into this error message or whether it could be caused by improper port-channel design and VTP VLAN mapping?


I've looked briefly through a show tech and no other errors or diagnostics show any issues.


Thanks

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Jonathan Bayless Tue, 02/07/2012 - 07:20
User Badges:

No joy? Hmm, I'll probably wait until I get my additional gig line cards so I can move the second link for the port channel with VLAN mapping onto it and see if that fixes the error.

Jonathan Bayless Mon, 02/13/2012 - 07:09
User Badges:

Well I put in the second WS-X6516-GE-TX card and moved my second port channel member to that card so the port channel has a link on each card and still get the same error every hour or two (sometimes as frequent as a few minutes):


%L2_DIST_LRN-SP-6-NO_PKT_SEND: Unable to send L2 Dist Lrn packet: l2_dist_lrn_ff_msg_to_sp


Hmm. Very frustrating.


I also tried rebooting the second Sup just to see if that did anything. No change and no other errors.

Jonathan Bayless Tue, 02/14/2012 - 09:01
User Badges:

Oh, well. Uh, that's nice. http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCee55184


It's a cosmetic bug that happens under rare circumstances when using Dual Supervisors and have an Etherchannel (port-channel interface) using DEC (spanning ports on different line cards).


Well, now there are google results for it in case anyone else happens to see this pop up.

Actions

This Discussion

Related Content