This document helps in troubleshooting issues related to High availability configuration on ACE. High Availability (or fault tolerance) uses a maximum of two ACE appliances to ensure that your network remains operational even if one of the appliances becomes unresponsive. Redundancy ensures that your network services and applications are always available. Each FT group consists of two members: one active context and one standby context. One virtual MAC address (VMAC) is associated with each FT group. Each FT group acts as an independent redundancy instance. When a switchover occurs, the active member in the FT group becomes the standby member and the original standby member becomes the active member. The ACE sends and receives all redundancy-related traffic (protocol packets, configuration data, heartbeats, and state replication packets) on a dedicated FT VLAN.
Show Interface command shows FT Interface is Down
Verify that the FT interface is not shutdown
If shutdown, issue “no shutdown” command to enable the interface
Verify that the FT VLAN is configured on MSFC
Issue “show vlan” command to show configured VLANs
Verify that the FT VLAN is assigned to the module
Issue “show svclc group” command
Verify that the FT VLAN is trunked across to the other Catalyst chassis
Issue “show interface trunk” command
Show FT Peer command shows PEER_DOWN
Check if the IP addresses for the local and peer are configured correctly on both modules
Verify that Ping or Telnet to Peer IP address works
If Ping fails, check if interface is UP
Verify that FT interface is UP and FT VLAN is assigned to the module
Enter command “show conn” on both sides to check if HA connections have been set up. If connections have not been setup, check HA DP manager log . Setup could have failed because IXP was hung and didn’t respond.
Enter command “show ft stats” on both boxes to see if heartbeats are being sent or received. If the heartbeats missed counter is incrementing, they could be getting dropped in the fastpath.
Check the Fastpath counters.
Enter command "show np <1 or 2> me-stats -sfp" to check the counters.
Show FT Peer command shows TL_RETRY
This would mean that the HA “CP to CP” telnet connection is not getting established. The heartbeats are flowing through just fine.
Verify heartbeats are flowing by checking “show ft stats”.
Verify if Telnet or Ping to the FT peer IP address works. If ping fails, the telnet will also likely fail.
Check IXP stats for fastpath and ICM. The telnet request is most likely getting dropped there
Enter command "show np <1 or 2> me-stats –sfp"
Verify if the following are incrementing
Packets forward to CM : 2
DROP: RX Interface miss: 2
Enter command "show np <1 or 2> me-stats –sicm"
Verify if the following are incrementing
Drop [ACL deny] : 0
Drop [IF FT Standby] : 0
Drop [Encap Miss Msg stat] : 0
These counters will indicate that telnet requests are being punted to the ICM. The ICM drops it because of ACL deny’s, encap misses or interface state indicating it is in standby mode.
Show FT Peer command shows FT_VLAN_DOWN
This would normally occur if the FT VLAN went down when the configured Query interface is UP. Heartbeats would fail immediately. A continuous ICMP ping is started on the query interface. If that succeeds, declare FT_VLAN_DOWN and not PEER_DOWN
To resolve, restore connectivity on FT VLAN. Do a ping/Telnet to FT VLAN Peer IP address to verify.
HA stuck in STANDBY_CONFIG
This is seen on the Standby module when it is receiving config from the Active. Depending on whether the Config is being rolled back on the standby or the Config is being synced from the Active it could take longer than 30 mins. If it eventually moves to standby_cold refer to the next symptom.
Show FT group command shows STANDBY_COLD
If the standby context is in STANDBY_COLD state, it could mean:
a) Both ACE modules do not have same SSL key(s) and certificate(s)
b) Both ACE modules do not have same script file(s)
c) Both ACE modules have different license(s)
d) Configuration Sync Failed
Configuration sync failure can be verified if peer state shows “Compatible” and FT group shows “STANDBY_COLD”. To check the reason for config sync failure:
Enter command “show ft history cfg_cntlr” to see where the failure occurred. You would generally see messages as follows in case of failures:
“error: could not rollback configuration file /tmp/Admin-cfgcntlr-rollback-cfg log file name Admin-cfgcntlr-rollback-cfg-863-1.log context Admin.”
“error: could not apply peer running configuration (file /tmp/005_Admin_0_cfgcntlr-peerbulk-cfg ) for context Admin”
e) Connectivity lost on FT VLAN. Standby moved to Standby_Cold state because Active is still reachable on Query interface
Quickly verify if Standby_cold is due to the FT Vlan going down by checking the Peer State. It should show “FT_VLAN_DOWN”. “show ft stats” will also show that heartbeats are being missed. Restore the connectivity for FT VLAN to restore it.
Connections Table Not Replicated to Standby
Possible reasons for this behavior:
a) Peer is not up. To verify “show ft peer status"
b) FT Group is not up. To verify “show ft group status”
c) Config sync did not complete. Check HA status
d) Encap(s) were unknown to the standby. A connection cannot be created without known encaps, so the first sync may cause an encap lookup
Check the following counters in the output of “show ft stats <ft group id>”: replicate connection sent stat, replicate connection recv stat. They should both be incrementing. They may not be equal because UDP is used to send these messages, which is a non-reliable transport.
Also check if the connection using anything that would disqualify it from connection replication. For example connections for HTTP INSPECT is not eligible.
Syslog message shows "Peer is incompatible due to error str. Cannot be Redundant"
Make sure that software version and license details are identical in paired ACE devices.