Solved: FATAL Error After Power Cycle- GVRP related?

Andrew Bailey · ‎01-03-2014

Hello all,

Happy New Year!

I have a network of four 300 Series small business switches. All are running software version 1.3.5.58 (latest version).

Three of them are connected by 2 Port LAG groups to the central switch which in turn connects to the internet facing router. There is no loops in the architecture and spanning tree is running. There are a total of 4 VLANs and GVRP is enabled with the four VLANs statically created on the main switch.

Normally everything runs very happily and is perfectly stable.

However, a power failure this morning resulted in a power cycle on two of the switches. They did not recover gracefully and kept restarting.

I removed the ports from the LAG groups in the main switch (not impacted by the power cycle) and then rebuilt them to recover connectivity and restore normal operation.

Both of the switches which restarted had multiple entries in their log files like this:-

%GVRP-F-NOSTATREG: GVRPP_checkStaticToDynamic_update: Port isn't Statically registered on vlan 100 ***** FATAL ERROR ***** Reporting Task: BRMN. Software Version: 1.3.5.58 (date 10-Oct-2013 time 17:15:41) 0x16adbc 0x166f28 0x6df2b0 0x48fad8 0x4903e8 0x490608 0x51a234 0x79dfd8 0x7b9f8c 0x7c1500 0x445d 88 0x41e04c 0x7d4c80 0x61cb68 0x1223f0 ***** END OF FATAL ERROR *****

Has any one seen this before? Have I made some "rookie" mistake with by GVRP configuration?

Kind Regards,

Andy.

Tom Watts · ‎01-03-2014

Hi Andrew, yes this sounds about right. Essentially every change made off the master GVRP advertisement affects every vlan, not just one of them.So I think rebooting the downstream switch which forces to rejoin will update correctly to the master join message.

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

View solution in original post

Tom Watts · ‎01-03-2014

Hi Andrew, the GVRP database is kind of a strange creature (the protocol itself is just strange). By removing a VLAN or interface from the GVRP database, when the join message is sent it will send an update to the downstream switch. I suspect that by removing the LAG there was an unintended database change (in fact the database on the receiving switch didn't have the desired database as you modified the trunk which strips all VLAN). So what happens is the downstream switch "defaulted" to 1 vlan then the re-establishment of the LAG sent more VLAN and the GVRP database didn't match up.

Does this kind of make sense?

If you're in for a simple lab you can create a trunk between switches with multiple vlans tagged (do like 5 vlans, not just 2). Then remove one of those vlans off the master switch. Observe the behavior.

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Andrew Bailey · ‎01-03-2014

Tom,

Thanks, yes I think that helps a bit. Certainly what you are describing explains why removing and readding the LAG group seems to fix the issue.

One thing I have noticed since I posted is the the "Auto Voice VLAN" was inconsistent. On the main switch I had changed to VLAN 100 for the Auto voice VLAN- the other 3 were still at the default value of 1.

I suspect this caused an a problem- the Auto Voice VLAN was being propagated from the master but the others had a different VLAN set.

I've corrected that inconsistency and rebooted one of the switches that power cycled this morning- it seems to have recovered normally.

Would that explain it all?

Kind Regards,

Andy.

Tom Watts · ‎01-03-2014

Hi Andrew, yes this sounds about right. Essentially every change made off the master GVRP advertisement affects every vlan, not just one of them.So I think rebooting the downstream switch which forces to rejoin will update correctly to the master join message.

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Andrew Bailey · ‎01-06-2014

Tom,

Thanks again. I'll mark this as "answered"- everything seems to be working well now.

Couple of points though:-

- This behaviour didn't occur in previous software releases. The first time I saw it occur was when I activated 1.3.5.58. Is this an issue with the 1.3.5.58 release? Should this be tracked as a bug?

- The impact of this (relatively) minor configuration error is pretty severe. The fatal errors and constant switch restarts aren't easy to resolve and certainly has service impact.

Any thoughts?

Thanks again,

Andy.

FATAL Error After Power Cycle- GVRP related?

Cisco Business Product Family

Cisco Switching Product Family