cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2274
Views
0
Helpful
10
Replies

ACE: problem with balance of the RADIUS flows

k-gerasymenko
Level 1
Level 1

Hello!

There are the one RADIUS-client (IP:10.10.10.60) and two RADIUS-server (IP:10.10.10.15 and 10.10.10.16).

The ACE module needs for balancing RADIUS requests between RADIUS-server. It is need to
balance the Radius requests based exeptionally on their "calling-station-id" from one RADIUS-client to two
RADIUS-servers

Right now I am testing the same scheme  with ACE module. But there is some other problem.
It is used the "one-arm" type connection of the ACE module in Cisco 7604 to the network. When the RADIUS-client send the requests
the ACE terminate their and answer to the client with succesful. The ACE balance  these  requets between both of the RADIUS-servers approximately in proportion 2/3.
But these calls are fails on the servers site. The Sticky have not helped. Without ACE the all reguest are successful.

Is there the decision of this problem?

My config is:

access-list ANY line 8 extended permit ip any any

rserver host TEST1

  ip address 10.10.10.15
  inservice
rserver host TEST2
  ip address 10.10.10.16
  inservice

serverfarm host SERVERFARM1
  rserver TEST1
    inservice
  rserver TEST2
    inservice

class-map type management match-any MGMT_CLASS
  2 match protocol icmp any
  3 match protocol ssh any
  4 match protocol telnet any
class-map match-any RADIUS_L4
  2 match virtual-address 10.10.10.100 udp range 1812 1813
class-map type radius loadbalance match-any RADIUS_L7
  2 match radius attribute calling-station-id ".*"

policy-map type management first-match MGMT_POLICY
  class MGMT_CLASS
    permit

policy-map type loadbalance radius first-match RADIUS_L7_POLICY
  class RADIUS_L7
    serverfarm SERVERFARM1

policy-map multi-match RADIUS

  class RADIUS_L4
    loadbalance vip inservice
    loadbalance policy RADIUS_L7_POLICY
    loadbalance vip icmp-reply active
    nat dynamic 1 vlan 10

interface vlan 10
  ip address 10.10.10.10 255.255.255.0
  access-group input ANY
  access-group output ANY
  nat-pool 1 10.10.10.100 10.10.10.100 netmask 255.255.255.0 pat
  service-policy input MGMT_POLICY
  service-policy input RADIUS
  no shutdown

Best regards

Konstantyn

10 Replies 10

Gilles Dufour
Cisco Employee
Cisco Employee

You haven't configured stickyness right now.

You just try to match radius requests containing a "calling-station-id".

Requests not having a calling-station-id will get dropped.

Did you try simple loadbalancing without radius ?

Did you get a sniffer trace in the curreny situation to see where it fails ?

If the request gets to the server, where is the response from the server sent ?

Is it correctly sent to the ACE ?

Is ACE then correctly forwarding the response to the source ?

Gilles.

Filip Talpa
Level 1
Level 1

from my experience roundrobin tends to end up in unequal distribution. I prefer using least-connections predictor on serverfarm.

k-gerasymenko
Level 1
Level 1

Hi

There is other config with the sticky, but the result is the same.

Two RADIUS servers are active: For example the RADIUS client send 10 reguest and all these request are successful. On the RADIUS servers site I see that have been received 15 calls (6 calls - on the TEST1, 9 calls - on the TEST2) and only 8 calls are successful (3 on the TEST1, 5 calls - on the TEST2). The log file on the servers are sowed: call timeout detecting.

If the one of the RADIUS servers is down, all call are successful and equal quantity as well RADIUS server as RADIUS client.


access-list ANY line 8 extended permit ip any any

probe icmp PROBE_ICMP
  interval 2
  faildetect 2
  receive 5
probe radius PROBE_RADIUS


rserver host TEST1
  ip address 10.10.10.15
  inservice
rserver host TEST2
  ip address 10.10.10.16
  inservice

serverfarm host SERVERFARM1
  predictor leastconns
  probe PROBE_ICMP
  rserver TEST1
    inservice
  rserver TEST2
    inservice

sticky radius framed-ip calling-station-id STICKY-1
  serverfarm SERVERFARM1

class-map type management match-any MGMT_CLASS
  2 match protocol icmp any
  3 match protocol ssh any
  4 match protocol telnet any
class-map match-any RAD_L4_C
  2 match virtual-address 10.10.10.100 udp range 1812 1813
class-map type radius loadbalance match-all RAD_L7_C
  2 match radius attribute calling-station-id ".*"

policy-map type management first-match MGMT_POLICY
  class MGMT_CLASS
    permit

policy-map type loadbalance radius first-match RAD_L7_P
  class RAD_L7_C
    sticky-serverfarm STICKY-1

policy-map multi-match POLICY_L7
  class RAD_L4_C
    loadbalance vip inservice
    loadbalance policy RAD_L7_P
    loadbalance vip icmp-reply active
    nat dynamic 1 vlan 10

interface vlan 10
  ip address 10.10.10.10 255.255.255.0
  access-group input ANY
  access-group output ANY
  nat-pool 1 10.10.10.100 10.10.10.100 netmask 255.255.255.0 pat
  service-policy input MGMT_POLICY
  service-policy input POLICY_L7
  no shutdown

You still have no idea what is happening in the network.

ACE is a network device, so we do not need to know what the server reports, but we need to know what packets comes in, what packets go out and what is the content of each packet.

So we need a sniffer trace.

Also, you keep talking about call ??

Is the call traffic also going through ACE ?

G.

this is IMO where cisco gets it all wrong YOU need to understand the application behind. after all any server is just a network device with this approach.

back on topic -- I'll check if stickiness really works as it is supposed. from waiting for call disconnect one can assume that the server never gets the call cleared event. (@g - see no sniffer required...)

For those following this thread and interested in getting better at troubleshooting, this is a perfect example of why people fail to solve their problems.

They start from a server error message and try to understand what happened somewhere in the network.

Since they can't figure out what caused the error, they start changing the configuration in all directions without knowing the source of the problem.

Even, if luckily they change the right component they would still be unable to explain why it fixed the problem.

This is a very common error.

The best way to troubleshoot a network device is to capture a sniffer trace.

Why ?

Because a network device is there to transport traffic (most often tcp/ip)...and usually knowing the details about the application is useless.

In some cases, it is indeed necessary to go deeper in the packet in order to perform a more inteligent routing/switching decision.

For example, when you do http cookie stickyness or radius loadbalancing.

But even in that case, the network device will work on a packet per packet basis and for each of them decide what to do with it.

With the sniffer trace you can first compare successful client-server exchanges compare to failed ones.

You can then see what is different (asymetric path ? traffic misrouted ?  traffic blocked ? packet corrupted ? fragmentation ? ...)

All network problems which can lead to many different error messages on the server.

Messages that will be different depending on the application, the hardware, the OS, the vendor, ...

k-gerasymenko
Level 1
Level 1

Of course, there is a some log.

Certainly one thing is clearly (from the test) that the RADIUS-client and RADIUS-server are directly working correct. In this case the all calls are successful.

I have run the sniffer. The sniffer have displayed all right in the RADIUS packets.

Why do I see more calls on the serevers site than it have sent the client? It would be well to run the debug on the ACE if the one will have.

what does the output from show sticky database show?

well ACE has a debug mode. but it requires a special code to be loaded -- much like you do on nexus switches.

Can you share the sniffer trace ?

Do you know which flow in the trace the server reported as failure ?

Gilles.

k-gerasymenko
Level 1
Level 1

I have founded out follow from the documentation of the ACE:

"The ACE does not load balance RADIUS accounting on/off messages. Instead, it
replicates those messages to each real server in the server farm that is configured
in the RADIUS LB policy.:

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: