We have noticed over the last couple of Vulnerability Scans of devices behind a pair of FWSMs in a Active/Standby Single Context running 3.2(6) that failover is occuring due to lack of a heartbeat within the 30s window.
We have deployed a new Nessus scanner on a dual quad-core machine with lots of RAM. After the first instance we forced the NIC on the Nessus scanner to 100Mbps. The failover still occurs.
I beleive that the heartbeat message is not getting through due to the CPU overhead of the session creation & teardown in conjunction with the debug logging to 2 destinations; our MARS210R and a Linux Syslog server.
While the specs for the FWSM state 100,000 conns/sec our utilization monitoring (MRTG) shows only a Max of 2973cps on a 5-min smoothed average. The MARS reports 100,000 events/min from the FWSM and the syslog server shows 1.63 million log messages processed in the 5-minute interval the failover occured.
Will a FWSM OS upgrade help this out (i.e. something that provides a better CPU slice to the heartbeats) - we are constrained to 3.2(6) due to OS dependency H$LL with CSM?
Will a special rule to not debug log the traffic from the Nessus server lower CPU utilization?
I'm not really comfortable adjusting the failover timers as I really don't want to mess with the devices ability to quickly respond to a real failure rather than us just shooting ourselves in the foot :-0
twitchy/pri/act# sh failover
Failover unit Primary
Failover LAN Interface: failover Vlan 2 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 15 seconds
Interface Policy 50%
Monitored Interfaces 0 of 250 maximum
failover replication http
Config sync: active
Version: Ours 3.2(6), Mate 3.2(6)
Yes, logs on ACLs will spike the cpu. Make sure you remove them and you will see great improvement.
Also snmp and routing protocols can spike the cpu.
I hope it helps.