I have a pair of 4710's (ACE01 and ACE02) in fault tolerant config. I have followed standard config. guidelines. I have the following problem:
(1) I reload ACE01 and ACE02 seems to take control.
(2) After the reload completes, ACE01 does not accept ssh login, therefore I have to login via async router, then when I do 'sh arp' command on ACE01, it thinks about it for about 2 mins and I get the following message:
(3) Then after about 4 or 5 mins of ACE01 coming back up, I lose SSH connectivity on ACE02, then I login via async router onto ACE02 and I get the following message on that:
Arpmgr busy, Possible ARP flood, 526801 arp pkts were dropped over last60 secs
(4) In order to get out of this state, I have to break the fault tolerant link and shutdown the primary network link (shutdown the switchport that the ACE units connect to), then reload both devices again and then I can get SSH login.
Pleae could someone help me, I don't understand what is going on, I have googled the above messages and they said that it might be related to a bug on the switches that the ACE units connect to (2960's), I have subsequently upgraded the switches but still no luck.
Here is the 'show ft group det' result after ACE01 has been reloaded.....the behaviour seems to be normal......but again the problem is still there....
ea-ste10-ace01/Admin# sh ft group detail
FT Group : 1 No. of Contexts : 1 Context Name : ea Context Id : 1 Configured Status : in-service Maintenance mode : MAINT_MODE_OFF My State : FSM_FT_STATE_ACTIVE My Config Priority : 200 My Net Priority : 200 My Preempt : Enabled Peer State : FSM_FT_STATE_STANDBY_HOT Peer Config Priority : 100 Peer Net Priority : 100 Peer Preempt : Enabled Peer Id : 1 Last State Change time : Tue Dec 13 10:48:32 2011
Running cfg sync enabled : Enabled Running cfg sync status : Running configuration sync has completed Startup cfg sync enabled : Enabled Startup cfg sync status : Startup configuration sync has completed Connection sync enabled : Enabled Bulk sync done for ARP: 0 Bulk sync done for LB: 0 Bulk sync done for ICM: 0 ea-ste10-ace01/Admin#
ea-ste10-ace02/Admin# sh ft group detail
FT Group : 1 No. of Contexts : 1 Context Name : ea Context Id : 1 Configured Status : in-service Maintenance mode : MAINT_MODE_OFF My State : FSM_FT_STATE_STANDBY_HOT My Config Priority : 100 My Net Priority : 100 My Preempt : Enabled Peer State : FSM_FT_STATE_ACTIVE Peer Config Priority : 200 Peer Net Priority : 200 Peer Preempt : Enabled Peer Id : 1 Last State Change time : Tue Dec 13 10:48:57 2011
Running cfg sync enabled : Enabled Running cfg sync status : Running configuration sync has completed Startup cfg sync enabled : Enabled Startup cfg sync status : Startup configuration sync has completed Connection sync enabled : Enabled Bulk sync done for ARP: 0 Bulk sync done for LB: 0 Bulk sync done for ICM: 0 ea-ste10-ace02/Admin#
it would seems the result of an arp flood/storm (triggered by a loop?), during the issue you could confirm it by executing:
show processes cpu
and see the cpu usage of arp_mgr (arp is handled on the control plane in ACE) and maybe take a trace monitoring on the switch the port connected to ACE01 to see what the traffic actually is. It could also help to have on the switches "mac-address-table notification mac-move" to detect loops.
Should the above not clarify the issue I would suggest opening a TAC SR to get this investigated further.
VMware Trunk Port Group is supported from ACI version 2.1
VMM integration must be configured properly
ASA device package must be uploaded to APIC
ASAv version must be compatible with ACI and device package version
In the Previous articles of ACI Automation, we are using Postman/Newman as the Rest API tool to automate the ACI Configuration.
In this article I’m going to discuss on usin...
One of the first steps in building your ACI Fabric is to go through Fabric Discovery. While Fabric Discovery is usually a straightforward process, there are various issues that may prevent you from discovering an ACI switch. This article wil...