cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3051
Views
20
Helpful
7
Replies

switch stack often crashes with error message

pokemon
Level 1
Level 1

Hello,

I appreciate if you can give me a direction about switch crash ..

We are using SG500 switch stack but it often crashes with below error in syslog server.

abcdefg01.log:Jun 13 19:14:15 abcdefg01 1 2016-06-13T23:14:14Z abcdefg01 STCK SYSL - UNITMSG - %STCK SYSL-A-UNITMSG: UNIT ID 2,Msg:%SYSLOG-F-OSFATAL: UDPP_inet_dns_transmit_helper - transmit_handle was not initialized  ***** FATAL ERROR *****   Reporting Task: AAAT.  Software Version: 1.4.2.4 (date  21-Dec-2015 time  16:47:05)  0x16bd2c  0x16c290  0x167e50  0x7f5ce4  0x4581ac  0x45ca04  0x678a64  0x67959c  0x679a58  0x7203f4  0x1223f0    ***** END OF FATAL ERROR *****

Is there anyone who see the same issue ?

And is there solution for this ?

If I see the uptimes, then master and backup crashes repeatedly randomly.

The unit number 1 and 2 has short uptime. But unit number 3 or bigger has much longer uptimes.

Thanks in advance,

Miyoshi

7 Replies 7

Mark Malone
VIP Alumni
VIP Alumni

Hi

I would move off that software version and go to 1.4.5.02 released 20th of last month looks like some form of bug output that you have hit

Thanks Jonathan and Mark,

I think I found what caused this crash...

Actually I thought this is something related to the stacking ..

Because all other SG500 are in stack in other countries.

But I see the same issue with standalone switch.

This is the firmware information from crashed switch.

And all others use the same boot code and software.

atviesw01#sh ver

Unit SW version Boot version HW version
------------------- ------------------- ------------------- -------------------
1 1.4.5.02 1.4.0.02 V03

atviesw01#

I picked up only suspicious part from configuration.

This switch is running L2 mode with 4 queues.

vlan database
default-vlan vlan 988
exit

dot1x system-auth-control

radius-server host dkradius.mydomain.com priority 10 usage dot1.x
radius-server host jpradius.mydomain.com priority 1 usage dot1.x
radius-server host source-interface vlan 989

ip domain name mydomain.com
ip name-server 10.100.104.103 10.100.32.110 10.100.32.109
ip domain polling-interval 8

interface vlan 106
name Guest
dot1x guest-vlan
!

interface vlan 989
name Data
ip address 10.100.28.21 255.255.252.0
no ip address dhcp
!

When I enable 802.1x in a interface as below, nothing happened yet.

interface gigabitethernet1/1/1

 dot1x guest-vlan enable
 dot1x reauthentication
 dot1x authentication 802.1x mac
 dot1x radius-attributes vlan static
 dot1x port-control auto

But client computer connect to this interface and switch start 802.1x auth, then it crashed with below error message.

Jul  8 13:26:01 atviesw01 1 2016-07-08T11:26:00Z atviesw01 SYSLOG - OSFATAL - %SYSLOG-F-OSFATAL: UDPP_inet_dns_transmit_helper - transmit_handle was not initialized  ***** FATAL ERROR *****   Reporting Task: AAAT.  Software Version: 1.4.5.02 (date  20-Apr-2016 time  12:24:28)  0x16bd2c  0x16c290  0x167e50  0x7f2848  0x454d10  0x459568  0x6755c8  0x676100  0x6765bc  0x71cf58  0x1223f0

This error seems to be something related to DNS.

Actually I use host name in tacacs servers or sntp servers too. But they work pretty fine.

However host name in radius server seems to have problem and caused system crash.

If I change the host name to IP address, then 802.1x worked just fine.

What do you think ? Is this bug ?

I appreciate your opinions very much !

Miyoshi

I can confirm that bug still exist on 1.4.10.6 for my SG300.

Step to reproduce:

 

1. Add two RADIUS servers by DNS-names with different priorities.

2. Make first server in list somehow unavailable (stop service or disable radius client, or make your way).
3. Try to connect several (more than one, because with single everything is ok) 802.1x-enabled workstations to switch at same time (1-3 seconds).
4. CABOOOM! FATAL ERROR, switch is rebooting.

 

I tried same config with only one RADIUS as DNS-name - no problem (except that you don`t have a backup radius connection).
I tried to connect only one workstation at time - no problem, switch uses backup radius.
But a lot of WS on same time and more than one RADIUS as DNS = FATAL ERROR.

 

Can we get some fix for that or SG300 is close to EOL?

I was also just hit by this bug with the same exact configuration as outlined by you. This is on the lower-end SG200 series running firmware 1.4.9.4.

 

Hi guys,

i have in my network sf500 - sf300 -sg500 (4 units in stack) ALL devices with same problem with the last firmware (January firmware)

Step to reproduce:

1)configure radius web or ssh autentication with 2 radius server , in my case the first one is with the IP and backup one with fqdn name(just in case i'll set new ip in my radius)

2)swtich off the the radius server

3)try to login in web ui or ssh and...CABOOOM! FATAL ERROR, switch is rebooting.

 

Step to fix:

set the backup radius server with different priority!!!!!!!!! yes in my case it was enough set different priority....

before there are 2 servers with priority 1 and after 1 and 2.

 

Dear Cisco can you check your firmware?

I have 15 units all with the same problem....

 

Regards Cristian

 

The Sx550 family.... is not affected

 

 

 

jonrodr2
Level 1
Level 1

Hello my name is Jonathan and i am one of the Engineers here at the Cisco SBSC team. I apologize for the inconvenience with this units.

I would suggest to first check if all of your units on the stack are on the same firmware version and boot code. if they are not matching among all of the switches, the first step would be to do the upgrade as the latest version is 1.4.5.2 as Mark mentioned and have them match, then it is a best practice to do a factory reset after the upgrade. You can save a backup configuration file in case you would like to proceed with it.

If after these steps you continue to have issues, feel free to contact us and open a ticket at 1-866-606-1866. thanks and have a great day.

Lance L
Level 1
Level 1

I am having the same exact problem with SG300 switches.

The following nmap command  will cause any switch running 1.4.5.02 to crash.

nmap -sS -T3 -A -F

We have some switches running 1.3.0.62 which appear unaffected.

EDIT: I meant to post this here => https://supportforums.cisco.com/discussion/13071686/system-crash-again 

>_<

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Switch products supported in this community
Cisco Business Product Family
  • CBS110
  • CBS220
  • CBS250
  • CBS350
Cisco Switching Product Family
  • 110
  • 200
  • 220
  • 250
  • 300
  • 350
  • 350X
  • 550X