im facing high cpu issues in cisco asa 5550 firewall running in cluster mode. i have tried the ios images 7.2.2,7.2.4,8.2.2,8.2.3 etc...
But still the same problem.
im having approximately 3000 site to site vpns and the cpu hog is hitting in IKE daemon.
kindly check this and let me know...attached is the log files...
the ip addresses have been removed from the logs as a part of confidentiality...
thanks in advance....
You could be under IKE resource exhaustion attack by looking at the logs. The 5550 is ok for upto 5000 vpn connections as per cisco so you can use this link to trouble shoot more and limit ike connections.
could you let me know how to mitigate this issue in asa 8.x version. this doc does not say how to mitigate the issue..
can anyone help me pls...
Yes. There is a mitigation for your issue.
Since you know the source and destination ip address, You have to apply the ACL on the internet router to avoid unncessary isakmp connection to your firewall.
Have a look into the below URL , you can find the mitigation over there
its me who has initiated this thread...as per the doc the change can only be implemented on the edge router(internet routers). But anything can be done on the firewall, pls let me know. beforehand, i would like to know is it really an ike exhaustion attack, pls let me know is there any way to confirm this ? i do hav L 2 switches behind the fw and router ahead of the fw. also would like to know how to check the each connection with src/dstn and the bandwidth usage of each connection, whereby to confirm the problem might be due to such traffic and block it. my netflow analyser shows only the avg traffic for the entire day. when i ran netflow in the router i couldnt see any b/w choke in the report for the cpu-spike time. is there any mitigation commands available in asa firewall itself if its confirmed as attack? pls let me know.
If you look at the output , check for the "Responder Fails : 1148046" , which means the peer failed to reply to ike negotiations. i think you have a lot of tunnels configured on your device for which their isnt any peers response and also you have a lot of peers ( including the one that should not be initiating an ike negotiation with your device ). even in the logs that you posted when started this Discussion shows logs for no peer response , you should look into your configuration and remove the configuration for tunnel peers that do not exist any more , plus block the ips that are trying to initiate a connection with you and should not be doing that ( do it either at the edge router or firewall outside interface using access-lists as mentioned earlier).
thank manish, could there be situation that the tunnels went down and gradually came up and some of the tunnels did not respond...& hense failed attempts?? Actually the exact problem was - noticed high cpu, found tunnels camedown from the actual value 3000 to 1500, and after 10 mins automatically rest of the tunnels cameup.
secondly there were 2 peer configuration where instead of 1 peer ip, we had mentioned 2 peer ip addresses: Example "crypto map MYMAP! 555 set peer 22.214.171.124 126.96.36.199; we have removed such 2 instances today morning..secondly how to know what are the invalid ip address making such connections... now if we are following the cisco doc, we might have to configure 3000+ ACLs to block the deny ip ip port 500 on edge router/firewall external inf.this would be huge i guess. pls share your inputs
The two peer addresses mentioned could just be there for redundancy purposes , that is incase one of the public ip on the peer goes off then it can use the secondary ip available on that peer for tunnel setup. anyway , i agree with you that placing a huge access-list is not a good solution, i think you should monitor the logs generated by the firewall for failed ike negotiation and if you see ip's that arent configured as peers then just block them on the outside interface.
Also, if you have Smartnet support then you should connact TAC and have them check for any Hardware issues.
Problem has not be rectified...we have escalated this matter to Tac and waiting for the problem to happen next time to capture certain logs in a certain period of time....