ISE and Citrix Netscaler for LB

Unanswered Question
Sep 24th, 2013
User Badges:

I'm working on a solution where we have NetScaler load balancers distributing radius requests from the NADs to respectvie PSNs. Authentication works and redirect URLs work etc.. The challenge we're having is with EAP-TLS sessions. The user get's a provisioned certificate and chain that checks out on the endpoint fine. When the user tries to connect with the device we see EAP timeouts from the ISE session to the supplicant. Each PSN has the internal identity cert configured for EAP authentication that has been configured from the same internal CA within the customers PKI.


Has anyone configured a NetScaler for use with ISE and besides the general guidlines below are there more specific things that need to be done to make this work with Citrix NetScalers?


Load Balancing guidelines.


No NAT.


  • Each PSN must be reachable by the PAN / MNT directly, without having to go through NAT (Routed mode LB, not NAT).
  • Each PSN must also be reachable directly from the client network for redirections (CWA, Posture, etc…)


Perform sticky (aka: persistence) based on Calling-Station-ID and Framed-IP-address


  • Session-ID is recommended if load balancer is capable (ACE is not).


VIP for PSNs gets listed as the RADIUS server on each NAD for all RADIUS AAA.


Each PSN gets listed individually in the NAD CoA list by real IP address (not VIP).


  • If ”Server NAT" the PSN-initiated CoA traffic, then can list single VIP in NAD CoA list.


Load Balancers get listed as NADs in ISE so their test authentications may be answered.


ISE uses the Layer 3 address to identify the NAD, not the NAS-IP-Address in the RADIUS packet. This is a primary reason to avoid Source NAT (SNAT) for traffic sent to VIP.

Attachment: 
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Tarik Admani Tue, 09/24/2013 - 20:12
User Badges:
  • Green, 3000 points or more

Jacob,


I have not had a chance to set this up but wanted to know if the user's session is being authorized fine? Also are you seeing this for user certs as well as machine certs too? I know in ISE at times the client during bootup and at times during user authentication the supplicant will initiate multiple eap sessions where one of the sessions is used to authorize the session and the other session is left to age out on the radius servers side.


I wanted to make sure you are not running into a common issue, also what is your timeout settings for eap conversations on the port, also what is the re-try interval, with eap-tls the session takes a little more time to authenticate then with peap.


thanks,



Tarik Admani
*Please rate helpful posts*

kj.stjernqvist Thu, 01/29/2015 - 05:09
User Badges:

Use "any" instead of "Radius" as the protocol specified in the Netscaler and it will work. Some kind of bugg in the inspection of radius in netscaler that drops EAP-TLS traffic.

j-sutterfield Thu, 01/29/2015 - 06:05
User Badges:

The solution was the release of 10.5 50.10 version of NetScaler code.  This fixed several AAA bugs and corrected packet handling of load balanced RADIUS traffic.

 

The other comment suggesting 'any' instead of 'RADIUS' is misinformation.  Our setup is using RADIUS as the protocol and works perfectly.

Nick Ciesinski Sun, 02/22/2015 - 18:14
User Badges:

@j-sutterfield can you elaborate on your setup?  I am having a similar issue with ISE and NetScaler but I am having a issue with all EAP types not just TLS.  I can do captive web portal without issue.  I am on NetScaler 10.5 53.9 so passed the version you mentioned with the bug.  I just feel I am doing something small incorrect.

Jacob Gibb Sun, 02/22/2015 - 19:12
User Badges:

Can you post the rules you have setup in Citrix to track sessions? I had to use source IP, which was the NAD (wlc controller), and radius 1812/1813 in our case due to the limitation at the time. Not sure if they have upgraded to 10.5 as mentioned above.

j-sutterfield Mon, 02/23/2015 - 06:55
User Badges:

The policy expression we use for persistence is:
add policy expression FramedIP_CallingStationID "CLIENT.UDP.RADIUS.ATTR_TYPE(8)+CLIENT.UDP.RADIUS.ATTR_TYPE(31)"

I notice that we have that policy bound to the virtual servers themselves as well as the persistence group (see below).  I'm not certain that is necessary but I can say we have a working environment.

add lb vserver isepsn_radius-acct RADIUS 192.168.10.30 1813 -rule FramedIP_CallingStationID -cltTimeout 120
add lb vserver isepsn_radius-auth RADIUS 192.168.10.30 1812 -rule FramedIP_CallingStationID -cltTimeout 120

set lb group isepsn-pg -persistenceType RULE -rule FramedIP_CallingStationID

 

 

 

j-sutterfield Mon, 02/23/2015 - 06:59
User Badges:

I think we worked through a similar issue (it's been awhile) and I think it had to do with the routing flow.  Basically you want your PSNs to use the NetScaler as their gateway (or at least flow the traffic back through them at some point in the network).

Can you be any more specific about what you are seeing as the failure?  Any logs or results from the switch?

Nick Ciesinski Tue, 02/24/2015 - 18:14
User Badges:

@j-sutterfield I would see basic errors in ISE and I didn't see any in the NS but I am new to the NS so I may not have been looking in the right spot.  All I would see is the connection log in ISE and then a error about not being able to complete the EAP session.  I can't recall the exact message since it was a month or two ago and I put the project on hold for a bit.  Is it possible for you to maybe post your working configuration for ISE and the NS?  I originally tried doing it L2 to mimic what we did with the ACE but when that didn't work I tried to do it L3 with the same issue.

Nick Ciesinski Mon, 05/04/2015 - 12:03
User Badges:

For those stuck at the same point I was I discovered what the issue was with LB type RADIUS.  I could get everything working correctly with LB type ANY but not type RADIUS.  While earlier versions of NS code had a issue with processing under type RADIUS I was not at one of the problem versions.  It turned out to be that for whatever reason the NS would not work with type RADIUS unless the service group or service has the "Use Proxy Port" option set to YES.

 

Nick

David Melanz Wed, 05/07/2014 - 11:06
User Badges:

I am also experiencing this issue while using Netscaler as the load balancer.  I Have a policy setup to allow PEAP and EAP-TLS coming in on a Called-Station-ID or specific SSID.  PEAP works fine on this single SSID but EAP-TLS fails when it's using Netscaler as the load balancer.  However, when i use cisco ACE as the load balaner, EAP-TLS works fine and devices are able to authenticate using EAP-TLS.  Any insight on the matter would be greatly appreciated!

Ken Daldine Mon, 05/04/2015 - 09:27
User Badges:

Does anyone have a working configuration for this?  I'm getting successful authentications from the supplicant, but CoA fails. When I perform a CoA I get two of each of the following messages:

1) Event & Failure reason "5436 RADIUS packet already in the process"

then

2) Event "5417 Dynamic Authorization failed" / Failure reason "11215 No response has been received from Dynamic Authorization Client in ISE"

 

The policy nodes are not physically located behind the NetScaler, so I have them pointing to the NetScaler as the default GW.  I'm not sure if we have the policy on the NS configured correctly though, because I had to add the NetScaler as a Network Device and I was under the impression that the switch and PSN should continue to talk directly to each other.

 

Any help would be greatly appreciated!

Cheers!

Ken

kj.stjernqvist Mon, 05/04/2015 - 10:37
User Badges:

Hi,

 

It works great! BUT we have the ISE-network behind the NS, routed. Which is the way to go. If you have to add the NS to your Network Devices, it is probably doing NAT. This is not supported, as you notice with CoA. This will Cause the ISE to send the CoA to the NS instead of the switch => will not work.

 

Only time your should have the NS in the Network Device list in ISE is if the NS needs to check that the PSN-servers are alive using radius, for example.

 

Regards

 Karl-Johan

Nick Ciesinski Mon, 05/04/2015 - 11:56
User Badges:

Ken,

Do you have a RNAT configuration on your NS for the COA packets?  This will re-write the from address of the packet so your NAD will see the packet coming from your NS VIP.

 

add ns acl COA ALLOW -srcIP = <Range of source ISE IP's to catch> -destPort = 1700 -protocol UDP

set rnat COA -natIP <IP of VIP>

 

Nick

Ken Daldine Wed, 05/06/2015 - 07:50
User Badges:

Just tried to apply the add ns acl command above and we get an error that the add command is not found.  We're running NetScaler 10.1, did they use a different syntax for that release? 

Ken Daldine Wed, 05/06/2015 - 08:23
User Badges:

Got the commands to take, just want to make sure I have this right.  My VIP is 172.18.75.82 and my policy server is 172.18.68.53.

We sent the following commands.

add ns acl COA ALLOW -srcIP = 172.18.68.53  -destPort = 1700 -protocol UDP

apply ns acls

set rnat COA -natIP 172.18.75.82

Do we have that right or is it backwards?

Nick Ciesinski Wed, 05/06/2015 - 08:26
User Badges:

That looks correct.  As long as the COA packet is coming to the NS it should re-write the source IP from 172.18.68.53 to 172.18.75.82.

 

Nick

Ken Daldine Thu, 05/07/2015 - 07:41
User Badges:

OK so we were able to get the CoA to work; however, I can only get a device to reauth once. After that, nothing happens.  Here's some screen shots of how we have the NetScaler configured.  As a side note, the only virtual server that we currently have persistence configured on is the CoA server, not sure if it needs to be applied elsewhere.

Nick Ciesinski Thu, 05/07/2015 - 07:50
User Badges:

You shouldn't need a LB setup for the COA response since its a outbound response and the RNAT should translate it back to the server it came from for the return ACK.  We just have a LB for 1813 and 1812 RADIUS.  Those two should have persistence configured if you have multiple backend policy servers.  Based on the COA ACL you posted earlier though it seemed like you only have one policy server.  If you don't then that COA ACL needs to include all of them.

Ken Daldine Tue, 05/19/2015 - 07:58
User Badges:

Sorry it's been so long.  The CoA LB has been removed and it still isn't working.  I was looking at the switch config, since we're initiating the CoA directly from the policy nodes, the 'aaa server radius dynamic-author' should list the true IPs of each of the policy nodes as clients and not the NetScaler VIP correct? I have tried it both ways and I'm still getting 'no response received' failure messages..

 

Nick Ciesinski Tue, 05/19/2015 - 12:40
User Badges:

The purpose of the RNAT for CoA is to re-write the from address to be that of the VIP vs the policy server since your network device would be pointing to the VIP for requests.  So, if you have the RNAT in place then the switch should be using the VIP.

If you do a packet capture do you see the COA packet coming in and out of the NS?  You can do the capture right on the NS and see if the RNAT is taking place.  If you see it then I would make sure the switch is getting it.  If the switch is I would turn on some debugging to see why it isn't accepting it.

I recall having a issue with a wireless controller ignoring the COA packet and never responding because of a issue with persistence. Because of that the COA packet wouldn't have the proper session ID.  Do you have the persistence group setup to make sure both accounting and authentication packets go to the same policy server?  

Ken Daldine Thu, 05/21/2015 - 06:31
User Badges:

It looks like I have the switch configuration sorted out.  I had to point the 'aaa server radius dynamic-author' to the VIP and set the individual 'radius server <host>' to the real IP of the PSN. On the PSN I left the default GW as the NetScaler and I'm seeing successful auths and reauths consistently!!

 

Now I just need to test with our WLCs, fingers crossed :)

 

Thanks for all your help Nick!

Actions

This Discussion