Guest Anchor Controller - Foreign Controller Control Path Down

Unanswered Question
Aug 17th, 2012

We have a Cisco 4400 series wireless controller deployed as a Guest Anchor in a private DMZ.  We have 13 foreign controllers anchored to this for Guest

Wireless.  We recently anchored 17 additional controllers to this Anchor controller. Since we have done that, periodically on just 3 of the foreign controllers, the control path shows down on the mobility peer, then comes back up.  We have had this issue in the past, but it resolved itself.  However, now we are seeing this issue again. Are we reaching a limit on EoIP tunnels?  I have read that there is a max of 71, and that is per controller, not SSID. We do have a firewall in the middle but all necessary ports are open.

We have had this issue for quite sometime, it just does not happen frequently.  Since we have added the additional controllers, it is now happpening very often, but only with 3 controllers.  There is not much in common with these 3 controllers.  2 are 4400 series, and 1 is a 5508.  All 3 are local on a campus LAN, different networks.  Could it have anything to do with memory or utilization?

Thanks.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 0 (0 ratings)
Amjad Abdullah Sat, 08/18/2012 - 03:34

Do mobility ping works fine when the problem is happenning?

Try debugs to check further:

Debug mobility handooff enable

Debug mobility keepalive enable

HTH

Amjad

Sent from Cisco Technical Support iPad App

awatson20 Sat, 08/18/2012 - 05:28

Yes, mobility pings work when this happens(eping/mping) however it is disruptive to clients. I have run all of the debugs, but nothing stands out. I have 29 foreign controllers anchored. This problem started happening when I added the additional controllers. So, as a test I Removed 5 of them, and since I did that, none have dropped now. I understand the sizing limitations, and an not exceeding that, although it acts like I am.

Sent from Cisco Technical Support iPhone App

Scott Fella Sat, 08/18/2012 - 05:55

Are you using the same mobility group name by chance? You might be hitting the limit of 25 per mobility group. Each building and anchor can be on a different mobility group if there is no roaming between sites.

Sent from Cisco Technical Support iPhone App

awatson20 Sat, 08/18/2012 - 06:26

If i understand what your asking, no. Here is how we do it. Anchor has one group name, and the foreign controllers are in different mobility groups, not the same.

Controller. Group Name

Anchor Controller- Anchor-1

Controller-1. Controller-1

Controller-2. Controller-2

Sent from Cisco Technical Support iPhone App

Scott Fella Sat, 08/18/2012 - 07:26

If you put those wlc back on and remove a few others, is it still stable or is it isolated to the ones you remove?

Sent from Cisco Technical Support iPad App

awatson20 Mon, 08/20/2012 - 10:23

Thanks Scott, sorry for not getting back on your question. 

When I add the 6 controllers back to the anchor controller as mobility peers, the problem starts re-occurring and it is the same 3.  If I sort the list of wireless controllers in my mobility group, the 3 this is happening too have the highest IP addresses out of the 29 controllers.  (172.31.211.250, 172.31.228.225, 172.31.254.193)  Not that that has anything to do with the problem, that's just the only thing these 3 WLC have in common.

I have a open TAC case with Cisco, but no luck yet on a resolution. 

Scott Fella Mon, 08/20/2012 - 10:32

I'm curious to see what Cisco comes up with.  The ip address really should have no impact on the situation.  Keep us updated.

Stephen Rodriguez Mon, 08/20/2012 - 10:44

in your firewall, are the WLC's allowed to establish the tunnel bi-directionally?

IIRC, the WLC with the lowest MAC will be the 'master' for the pair.  So if the WLC in the DMZ is the master, there could be issues initiating the traffic for keep-alive.

HTH,
Steve

-----------------------------------------
Please remember to rate useful posts, and mark questions as answered

awatson20 Mon, 08/20/2012 - 11:13

Yes. We checked the firewall, which is a small checkpoint SOHO device, and the rule is set up so that either side, foreign or anchor controller can initiate the tunnel.

George Stefanick Mon, 08/20/2012 - 11:45

You know I have 1 controller that goes up and down in my enviroment. It just so happens this one controller doesnt have any APs on it.  So I have it on my list to dig into, just not a priority at the moment.

Does your controllers have access points and clients on them ?

awatson20 Mon, 08/20/2012 - 11:51

Yes, all of the foreign controllers have access points and clients on them.   

awatson20 Mon, 08/20/2012 - 11:58

I will.  I do not plan on closing this one until there is a definite resolution from Cisco.  Our Guest wireless network has lost credibility due to this issue.

George Stefanick Mon, 08/20/2012 - 12:05

You my friend, are on the right track. Wireless and Credibility ALWAYS go hand and hand. I cant tell you how many hours I spend a week educating folks on the difference between sucky wireless and sucky clients.

Darren Ramsey Tue, 08/21/2012 - 20:45

Had a similar issue back in 2008 after upgrading from 4.0 to 4.2 and the issue was not fixed until 7.0.  Basically the WLC was sending out unicast ARP requests every 5 seconds to the gateway to prevent the ARP entries from getting stale. However GLBP changes the gateway with every ARP request and the WLC did not receive any response from the 2nd virtual MAC on the GLBP.  So it's not able to resolve its ARP and hence the UDP packets were being dropped until the GW ARP got updated.

Long story short, it presented itself as after so many (more than 5 or 6) foreign controllers were added to the guest anchor, then the EOIP tunnels would go up and down only for the last ones in the list  It was a timing issue around the number of foreigns going top-down on the list based on MAC address sorting. The anchor trusted interface was pointed to a GLBP VIP and the other interface was in the DMZ for guest access. Now this may not be your exact issue but this might help you steer TAC in the right direction. Cisco ended up sending one of the original Airespace engineers on site and figured out the issue after TAC and the WBU were stumped for 2+ months. Our short term work around was to point the WLC anchor at a real address and not a GLBP VIP with a loss of FHRP. This was finally fixed in 7.0. The bugids are CSCsv21464 and CSCsu67530. Hope this is helpful and good luck.

Actions

Login or Register to take actions

This Discussion

Posted August 17, 2012 at 10:29 AM
Stats:
Replies:15 Avg. Rating:
Views:1591 Votes:0
Shares:0

Related Content

Discussions Leaderboard