×

Warning message

  • Cisco Support Forums is in Read Only mode while the site is being migrated.
  • Cisco Support Forums is in Read Only mode while the site is being migrated.

CSS 11000 redundancy problems - Both are masters

Unanswered Question
Sep 12th, 2005
User Badges:

Hi all,


I had a strange problem with the redundancy between two of my CSS11000.

They were both master at the same time. With resulted in total apocalypse :(


07:44:35 5/1 49369 IPV4-4: Duplicate IP address detected: xxx.xxx.xxx.xxx xx-xx-xx-xx-xx-xx 07:44:35 5/1 49370 IPV4-4: Incoming CE 0x401f00, incoming (0 based) SLP 0x1


Just before CSS01 switched to backup mode I see it saying SNTP-6: No SNTP replies in 3*poll-interval secs. When CCS01 switches back to master mode I can see this same message on my CSS02. But I don’t see the CSS02 switching back to backup mode. So they where both master at the same time and it was disaster time.


When I logged in and saw the problem, I rebooted CSS02. After the reboot the situation restored itself. But I now need to find out why it happened and how to prevent this to happen in the future.


The only thing I can see is the SNTP errors. Does anyone has any idea why this happened and could this be a result of the SNTP errors. If you need additional information just let me know.


css01


07:20:19 5/1 49322 SNTP-6: No SNTP replies in 3*poll-interval secs.

07:20:21 5/1 49323 REDUNDANCY-4: Transition to redundancy backup, master is x.x.x.x

…

07:43:58 5/1 49345 REDUNDANCY-4: Transition to redundancy master


css02


02:58:43 5/1 48126 SNTP-6: Setting time to <02:58:43>

07:20:22 5/1 48127 REDUNDANCY-4: Transition to redundancy master

…

07:43:57 5/1 48217 SNTP-6: No SNTP replies in 3*poll-interval secs.



Thanks in advance for your time and help.


With kind regards,


Geert Hermans


  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 2.8 (6 ratings)
Loading.
pknoops Mon, 09/12/2005 - 05:05
User Badges:
  • Bronze, 100 points or more

Geert,


Could we see little more info prior to the master/master situation in the sys.log ? It is possible the MASTER was so busy that it did not answer the heartbeat polls to the backup and also could not process the sntp polls ?


Regards

Pete..

Greenwolf Tue, 09/13/2005 - 01:55
User Badges:

Hi Pete,



First of all thank you for your reply.


It is possible the MASTER was so busy that it did not answer the heartbeat polls to the backup and also could not process the sntp polls ?


That could be possible but explain me than this. Let’s say he’s so busy that he can’t reply to the heartbeat polls to the backup and he also couldn’t process sntp polls. Where did he find than the resources to send the syslog to the logging server witch is on the same subnet as the sntp server? He doesn’t have resource to send heartbeat polls to the backup and also no resources to process the sntp polls. But he does has resource to process the logging! Sounds just strange to me!


Maybe I’m wrong but I was from the impression that the master sends a redundancy protocol messages every second to inform the

backup CSS that it is alive. And that the backup doesn’t send anything to the master.

If the backup CSS doesn’t receive anything after 3 seconds, the backup

CSS becomes the master CSS and begins sending out redundancy protocol messages. Or am I wrong?



Now what did I notice at 07:20:19 on CSS01 the master transitioned from master to slave. Why would a master transition from master to slave? Just before the transition on the CSS1 I see the SNTP polls errors. On the CSS02 I see at 07:20:22 (3 seconds – redundancy protocol timeout) he is becoming the master.

At 07:43:57 I see the same SNTP errors on CSS02. And one second later the CSS01 jumps back from backup to master. Why? Wasn’t he receiving the redundancy protocol messages?


Included with this mail, the complete syslog. If you need extra information doesn’t hesitate to ask.



Thanks a million for you help. If you ever in Belgium I’ll buy you a beer 


With kind regards,


Geert




Attachment: 
Gilles Dufour Tue, 09/13/2005 - 03:33
User Badges:
  • Cisco Employee,

Geert,


unfortunately we won't be able to tell you what happened.

The most important with this kind of problem is to capture a sniffer trace on the 2 CSS ports and see if VRRP messages are seen and/or sent.


I believe the SNTP message is just an indication that there is traffic related issue.

Unable to receive or send SNTP messages and unable to receive VRRP messages.


Regards,


Gilles.

pknoops Tue, 09/13/2005 - 04:43
User Badges:
  • Bronze, 100 points or more

Hi Geert,


I will take a look at the sys.log info. Maybe Gilles already has. It's actually the backup box that sends the polls to the MASTER. If it does not get a response back to 3 of the polls then the BACKUP will become MASTER.


As a side note, you can modify the amount of time needed for the response by changing the "vrrp-backup-timer"


You would need to set this on both the MASTER and BACKUP and then you would need to "bounce" redundancy on the boxes therefore a maint window would be needed.


For more info on this command, see this link:


http://www.cisco.com/univercd/cc/td/doc/product/webscale/css/css_720/advcggd/redndncy.htm#1031447


Regards

Pete..



Greenwolf Tue, 09/13/2005 - 05:28
User Badges:

Hi Pete,


Thanks for this information. I didn't know it worked like that.


Just one more question about the polls. What happens with the master if it doesn't receive any more polls from the Backup?


Why did the MASTER became backup?

CCS01 was the master but at

07:20:21 he transitioned to backup.

5/1 49323 REDUNDANCY-4: Transition to redundancy backup, master is xxx.xxx.xxx.xxx


Everything started because CSS01 became backup.



Thanks again for your help.


Geert


pknoops Tue, 09/13/2005 - 04:47
User Badges:
  • Bronze, 100 points or more

Geert,


What is port e12 ? Is this the connection between the boxes, because if so, it went down and would cause the two boxes to not know which is MASTER, so they would both be MASTER ?


Regards

Pete..

Greenwolf Tue, 09/13/2005 - 04:51
User Badges:

Pete,


Yes. Port e12 is the connection between the both boxes.


I'll have a look at the config immediately again. I guess I misssed that.


With kind regards,


Geert Hermans

Greenwolf Tue, 09/13/2005 - 05:00
User Badges:

Pete, Gilles,


Yes, port e12 is the connection between the both boxes.

But at 09:01:08 I submitted the reboot command at CSS02.

5/1 48376 NETMAN-4: Reboot command entered via CLI


With resulted in 09:01:12 to a down of the port at CSS01. They are connected by a crosscable like you probebly could guess.

5/1 52334 CIRCUIT-6: Port e12 is down for circuit VLANXXX .


The reason why I reboted CSS02 was because they where both in master mode.


Maybe this wasn't a good idea but at the time I seem to be a smart thing to do.



With kind regards,


Geert

pknoops Tue, 09/13/2005 - 06:12
User Badges:
  • Bronze, 100 points or more

Geert,


What version of software are you running. I did some research on this type of thing and quite honestly we have not seen this type of thing for several years.


Can you do a "show core" to see if you have any recent core dumps on either CSS that would have occured around the time in question ?.


Regards

Pete..

Greenwolf Wed, 09/14/2005 - 00:21
User Badges:

Hi Pete,


Thanks for the help guys. We are really appreciating this a lott.


We have here 6 CSS running now for almost 3.5 years. Once we had a hard disk failure a year or so ago ,on one of them, and now this. The hard disk failure wasn't so bad because the other one took over. But this caused some havoc :(


But the other onces are still running smoothly. So their pritty stable.


Here is the information you requested:


CSS01# sh core

CSS01# sh ver

Version: ap0503034s (5.03 Build 34)

Flash (Locked): 5.00 Build 33

Flash (Operational): 5.03 Build 15

Type: PRIMARY

Licensed Cmd Set(s): Standard Feature Set





CSS02# show core


CSS02# sh version

Version: ap0503034s (5.03 Build 34)

Flash (Locked): 5.00 Build 45

Flash (Operational): 5.03 Build 15

Type: PRIMARY

Licensed Cmd Set(s): Standard Feature Set



No dump files. But we did not enable core dumps.


CSS02# show dump-status

Dump mode is disabled



with kind regards,


Geert Hermans

Actions

This Discussion