We have a number of ADSL connected remote sites with IPSEC tunnels terminating on the PIX firewall at our central office.
This morning, we lost connectivity with one of the ADSL connected sites.
I've checked the router at the remote site, its up and running, its interfaces are all up, etc.
I asked the telco to check the ADSL line and they report there are no problems with the line.
From running some debugs, the router doesn't seem to be establishing the IPSEC communications with the PIX firewall properly - Phase 1 Main Mode fails. The question is why and what can I do to resolve it?
Here's an example of the "debug isakmp" on the router:
*Mar 1 01:36:54.383: ISAKMP: received ke message (1/1)
*Mar 1 01:36:54.383: ISAKMP (0:0): SA request profile is (NULL)
*Mar 1 01:36:54.383: ISAKMP: local port 500, remote port 500
*Mar 1 01:36:54.387: ISAKMP: set new node 0 to QM_IDLE
*Mar 1 01:36:54.387: ISAKMP: insert sa successfully sa = 817BFFEC
*Mar 1 01:36:54.387: ISAKMP (0:1): Can not start Aggressive mode, trying Main mode.
*Mar 1 01:36:54.387: ISAKMP: Looking for a matching key for xxx.xxx.xxx.xxx in default : success
*Mar 1 01:36:54.387: ISAKMP (0:1): found peer pre-shared key matching xxx.xxx.xxx.xxx
*Mar 1 01:36:54.387: ISAKMP (0:1): constructed NAT-T vendor-03 ID
*Mar 1 01:36:54.387: ISAKMP (0:1): constructed NAT-T vendor-02 ID
*Mar 1 01:36:54.387: ISAKMP (0:1): Input = IKE_MESG_FROM_IPSEC, IKE_SA_REQ_MM
*Mar 1 01:36:54.391: ISAKMP (0:1): Old State = IKE_READY New State = IKE_I_MM1
*Mar 1 01:36:54.391: ISAKMP (0:1): beginning Main Mode exchange
*Mar 1 01:36:54.391: ISAKMP (0:1): sending packet to xxx.xxx.xxx.xxx my_port 500 peer_port 500 (I) MM_NO_STATE.....
Success rate is 0 percent (0/5)
*Mar 1 01:37:04.391: ISAKMP (0:1): retransmitting phase 1 MM_NO_STATE...
*Mar 1 01:37:04.391: ISAKMP (0:1): incrementing error counter on sa: retransmit phase 1
*Mar 1 01:37:04.391: ISAKMP (0:1): retransmitting phase 1 MM_NO_STATE
*Mar 1 01:37:04.391: ISAKMP (0:1): sending packet to xxx.xxx.xxx.xxx my_port 500 peer_port 500 (I) MM_NO_STATE
*Mar 1 01:37:14.391: ISAKMP (0:1): retransmitting phase 1 MM_NO_STATE...
*Mar 1 01:37:14.391: ISAKMP (0:1): incrementing error counter on sa: retransmit phase 1
*Mar 1 01:37:14.391: ISAKMP (0:1): retransmitting phase 1 MM_NO_STATE
*Mar 1 01:37:14.391: ISAKMP (0:1): sending packet to xxx.xxx.xxx.xxx my_port 500 peer_port 500 (I) MM_NO_STATE
*Mar 1 01:37:24.383: ISAKMP: received ke message (1/1)
*Mar 1 01:37:24.383: ISAKMP: set new node 0 to QM_IDLE
*Mar 1 01:37:24.383: ISAKMP (0:1): SA is still budding. Attached new ipsec request to it. (local yyy.yyy.yyy.yyy, remote xxx.xxx.xxx.xxx)
*Mar 1 01:37:24.391: ISAKMP (0:1): retransmitting phase 1 MM_NO_STATE...
*Mar 1 01:37:24.391: ISAKMP (0:1): incrementing error counter on sa: retransmit phase 1
*Mar 1 01:37:24.391: ISAKMP (0:1): retransmitting phase 1 MM_NO_STATE
*Mar 1 01:37:24.391: ISAKMP (0:1): sending packet to xxx.xxx.xxx.xxx my_port 500 peer_port 500 (I) MM_NO_STATE
As I say, this site has had connectivity for some time without any problems - until today. There have been no configuration changes so I don't think any problems are config related.
Any suggestions as to what the problem could be and how to resolve it would be greatly appreciated!
From the messages that you posted I get the impression that the router is receiving ISAKMP messages from the PIX and guess that the problem may be that the PIX is not receiving ISAKMP from the router. If there have not been any config changes on the router or on the PIX then I would wonder if the provider has possibly changed something and might be blocking the ISAKMP packets from the router.
Can you tell from the PIX side whether it is receiving any ISAKMP messages from this router?
thanks for the help.
As it happens, aswell as IPSEC tunnel terminating on PIX, we also establish IPSEC communications between 6 other sites, all using Cisco 837 routers.
Now, none of the other sites are experiencing any problems EXCEPT when they try to establish IPSEC communications with the problem site - we get MM_NO_STATE and Phase 1 Main Mode fails.
I turned on debug isakmp on the problem router and one of the other routers and watched the transactions. I could not see any ISAKMP traffic from the problem router - which would completely correspond with what you say i.e. it looks like something on the provider network is somehow blocking the ISAKMP packets!
The problem is, the provider is insisting that they have not made any changes and there are no problems at their end so I'm kind of stuck at the moment!
Thanks for the help all the same and any more suggestions that would help pinpoint the cause of this issue for certain, would be welcomed!
Do I assume that you have remote access to the router (can telnet or SSH to it)? If so that would rule out a basic IP connectivity issue. If not can you establish that there is good IP connectivity (ping to it or traceroute to it from your central site, or from the remote router ping to your central site).
If we are trying to determine if ISAKMP is going through it might work to debug ip packet on the remote router and look for receipt and generation of UDP 500 packets. Depending on the amount of activity on the remote router the debug might generate lots of output. You can reduce the amount of output by using an access list with the debug. It would look something like this:
choose an extended access list number that is not in use on the router (in my example 199)
access-list 199 permit udp any eq 500 any
access-list 199 permit udp any any eq 500
debug ip packet 199
if you are telnetted (or SSH) into the router be sure to do terminal monitor so that you see the debug output. Or you can enable logging buffered debug and do show log to see what is in the logging buffer which would include the debug output.
That should show whether the router is receiving and sending ISAKMP. If you can show that the router is generating the traffic and receiving the traffic maybe the provider will accept that as proof. You might need to find a way to examine traffic coming into your central site to demonstrate that you are receiving ISAKMP from other sites but not from this one.
the remote site has an ISDN router too - so as a workaround, they are currently using the ISDN link. I can connect to the ADSL router for troubleshooting purposes by telneting from the ISDN router (as the ADSL router is just connected to it via Ethernet)
As you have described, I will set up an ACL and debug the IP packets to determine what is happening with ISAKMP traffic. Hopefully that will yield some more info!
Thanks again for your help - will keep you posted on any progress!
replace the router to isolate the possibility of hardware failure.
also, just wondering if all adsl sites are provided by the same isp.
Replacing the router may be something we have to try - though it will probably take me a few days to get hold of another one, get it configured and shipped to the remote site.
Shouldn't there at least be some indications of a hardware problem from the router itself though? I've examined the show tech output and no signs of any problems?
The sites are all provided by the same ISP - however, they are quite separate geographically so its still a possibility that the ISP may have changed something in a part of the network that only affects the problem site. (Well, thats what I think/hope anyway!)
To make matters worse, the ISP we use is more dedicated to home users rather than businesses and particularly not businesses using the kind of set-up that we have so its kind of difficult to get any decent assistance from their technical support!
(I'm relatively new to this Company so all this is a legacy of the old regime!)
Thanks for all the help so far, guys.
providing all the adsl services are provided by the same isp, it is very less likely that the isp is blocking or restricting the ispec just on that particular site.
another quick question, i was wondering if all the routers are running the same ios or not. i agree with you that at least there should be some indications on the router, unfortunately it's not always the case.
Yes, all the routers are Cisco 837s running the same IOS - 12.3(2)XC2.
These have all been running pretty much without problems, certainly for the last 3 months or so that I have been here.
The fact that all of a sudden this site has stopped working (without any config changes at our end) is what makes me so suspicious of the provider network (and the fact there doesn't seem to be any indication of any hardware issues with the router itself).
I will try some of the troubleshooting Rick has suggested and will also send out a replacement router to see if that makes any difference, at least then I can eliminate a hardware problem.
I configured the ACL as you suggested on the problem remote router and on one of the remote routers that is working.
When I ping from the "working" router to the "problem" router, on the "working" router I see packets being sent but nothing returning.
When I ping from the "problem" router to the "working" router, on the "problem" router I see messages such as:
IP: s=xxx.xxx.xxx.xxx (local), d=yyy.yyy.yyy.yyy (Dialer0), len 152, encapsulation failed
But presumably, this is just because the IPSEC communications have not been established successfully i.e. the IPSEC tunnel has not been set up so the packets cannot be encrypted and sent?
I should hopefully be able to get a replacement router with the exact same configuration to the remote site tomorrow so we can see if that makes any difference.
It is interesting that the ping fails. We have not yet really ruled out the possibility of IP connectivity problems. The encapsulation failed error message indicates that the router was not able to map the layer 3 destination address to an appropriate layer 2 address.
Can you post the output of show interface (and perhaps the output of show ip interface)? And perhaps you might post the configuration of the interface (if you have reservations about posting the entire config)?
I am wondering if the provider has made any kind of changes about your connection (even if they claim that they have not). Can the problem router ping (or any other way access) the next hop router (the provider router)?
we had issue with the ios version 12.3(2)XC2 before. with that particular project, we were deploying 15 adsl routers with this ios. during the implementation, 4 out of 15 were having issue with both ssh and ipsec.
with ssh, we were able to establish ssh to the router. however, it drops out every few minutes.
with ipsec, the lan-lan tunnel connected back to the central site was established. unfortunately, no traffic was traversed via the tunnel and eventually the tunnel was dropped after a minute or so.
had a discussion with cisco tac, no luck. since all 14 sites share identical config except the ip scheme, and all routers were running the same ios; we didn't include the ios as a possibility. we even replaced the router as we thought that could be hardware failure, still no luck.
anyhow, we did one ios upgrade due to the fact that we've got nothing to lose, and surprisingly it started working. we then did an upgrade on all "naughty" routers and it resolved the issue 100%.
Ok, we have tried replacing the router with a new one but still see the same thing - MM_NO_STATE message, Phase 1 Main Mode failing.
jackko - thats interesting about the IOS and is something we could maybe try. I've certainly had strange issues before that were only ever rectified by an IOS upgrade but noone knew how or why!
Rick - attached is the output from show int, show ip int, and the interface config itself. (I have blanked out username/password and public address info and changed private address info)
The things that STILL make me suspicious of the provider though is that a) this has been working happily for several months until now b) there have been no config changes from our end and c) we have replaced the router itself but this hasn't made any difference!
Please me know if you spot anything from the output or if you have any more suggestions.
Guys, I'm not sure if you're actually getting the points I've assigned to you for all your help on this so far?
For some reason, it doesn't seem to be showing any of the points I am allocating (or trying to allocate!) to you.
Not sure if there's just a glitch in the system at the moment. If you are not getting any points, let me know and I will try to assign them again later. Rest assured though your help IS appreciated!
Thanks for posting the additional information that I asked about. The main thing that I find interesting in it is that the ATM interface (both the main interface and the subinterface) indicate that there have been 0 packets input. I wonder if this indicates some communications issue.
I am also wondering if we can establish that there is successful IP connectivity or that there is not. From the problem router do you have IP access to anything (something at central site, something at another remote site, addresses within the provider network)?
Also agree that no points show up yet (and appreciate your interest in assigning points).
Ok, latest is - after a bit of a saga between ISP and telco, it looks like it IS a problem with the line after all.
I asked the ISP to carry out some more tests as, especially after replacing the router, it was looking more and more likely that it had to be a comms issue outwith our control. (No IP connectivity beyond the local LAN, to answer your questions, Rick)
The ISP tests, this time came back as "inconclusive" though they were still reluctant to offer much more help without us jumping through a few more hoops for them.
So we have bypassed the ISP and went straight to the telecoms company - they have confirmed their DOES seem to be a fault with the line at the remote site and are sending an engineer to have a look at it. Will let you know the outcome.
As for points - for some reason, it's not accepting any points I allocate to you (no error message or anything but I just noticed that they weren't actually showing up on the conversation) I will try to allocate you some points again tomorrow - hopefully its just a temporary glitch with the system.
I currently manage a private network of over 1000 ADSL connections and 1000 public ADSL connections. So I was interested in this call. There is a good document on cisco.com to troubleshoot these connections using debug commands. (debug ppp nego, debug ppp authen and debug atm events). I have just completed a call whre all three came in use.
If you are having issues with the ISP or suspect issues these commands will come in handy as long as you logg to the buffer (no other way I suppose). The paper I am talking about is as follows.
Hope this helps. btw NetPro is bcoming a wondr land for me. Thanks for future help.lol
thanks for the info - that may very well come in handy for me in future!
This particular issue has now been resolved - turns out it was a problem with the telco all along, which they have now sorted.
Guys, I have tried and failed to assign points to you for your help on this topic but for some reason it's not accepting any of the points I try to assign!
It seems to be a problem only with this conversation as I can assign points to other conversations without any issue. (I also noticed there is no "solved my problem" checkbox appearing on this conversation for me, for some reason - weird!)
Anyway, I emailed Cisco Netpro support about it but I've had no response so it looks like you are going to miss out on your well-earned points in this case. Sorry about that! But thanks anyway for all the help on this issue.