cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2134
Views
0
Helpful
15
Replies

ACS redudancy

-kostas-
Level 1
Level 1

Greetings all,

I am tryin to configure an ACS redudancy model in our routers. We have 2 ACS servers runnin on W2K. In the router configuration I've made an "aaa group server tacacs+ test" and denoted our 2 ACS server from the global config.

However, when I shutdown the first ACS server, the whole thing don't work and I get a strange error from the debug (see bellow).

Bellow is a snap-shot from the config just in case a left something out.

Has anyone implement this ?

Thanx in advance,

Kostas

-----------------------------------------------------------------------------------------------------------

aaa new-model

!

!

aaa group server tacacs+ TEST

server 10.10.10.1

server 10.10.10.2

!

aaa authentication login telnet group tacacs+ local

aaa authentication login aux local

aaa authentication login console local

aaa authorization exec default group tacacs+ local

aaa accounting exec default start-stop group tacacs+

aaa session-id common

.

.

.

!

tacacs-server host 10.10.10.1 key tes1t

tacacs-server host 10.10.10.2 key tes2t

tacacs-server directed-request

.

.

.

line con 0

login authentication console

line aux 0

login authentication aux

line vty 0 4

login authentication telnet

transport input telnet

!

end

-----------------------------------------------------------------------------------------------------------

debug-error:

Jun 2 13:04:31.611: TPLUS: Queuing AAA Authentication request 12951 for processing

Jun 2 13:04:31.611: TPLUS: processing authentication start request id 12951

Jun 2 13:04:31.611: TPLUS: Authentication start packet created for 12951()

Jun 2 13:04:31.611: TPLUS: Using server 10.10.10.1

Jun 2 13:04:31.615: TPLUS(00003297): Select released but nopeername.. Failover

Jun 2 13:04:31.615: TPLUS: Choosing next server: 10.10.10.2

Jun 2 13:04:36.616: TPLUS(00003297): Select Timed out

15 Replies 15

msitzman
Cisco Employee
Cisco Employee

Since you have a tacacs group defined, you should change your aaa authen statement to select that group name rather then the tacacs+ keyword. i.e.:

aaa authentication login telnet group TEST local

Hope this helps...

Marcus

ywadhavk
Cisco Employee
Cisco Employee

Hi,

Couple of things to check;

1. At the router, run the below test command to test the TACACS operation;

test aaa group tacacs username password

for e.g test aaa group tacacs cisco cisco

2. Try to bump up the tacacs timeout value from the default 5 sec to 10 sec.

3. What is the version of the IOS? There could be a bug associated.

CSCdx41454

4. Are you using the command

ip tacacs source-interface

Thanks,

yatin

Hi,

Well I did all these things but no luck.

I checked for the bug id. The thing is that I am not using ip tacacs-source interface loopback 0 on my router that I have for testing reasons. I have a single FastEth, and this is what I have also configured in the ACS server.

Any more ideas ?

Thanks in advance,

Kostas

ywadhavk
Cisco Employee
Cisco Employee

OK, next things to check is whether the secondary TACACS server is realy setup correctly. You have authorization configured, check on the ACS that the EXEC is selected. Try to match the settings with those of the working server.

Ohter thing to check is to see that all the ACS services are indeed running on that server.

If this still doesn't resolve the issue, please send the 'sh ver' for the router and take a look at the details in the package.cab file.

Thanks,

yatin

Hi Yatin,

Well, the secondary TACACS is working properly. I configured the router first only with the main TACACS, then only with the secondary, and they both worked well.

Only the redudancy model doesn't seem to work.

The IOS is 12.1.(3)T1.

What I don't understand is the thing about the package.cab file.

Could you please explain ?

Thanks in advance,

/kostas

ywadhavk
Cisco Employee
Cisco Employee

Hi Kostas,

What if you reverse the order of the tacacs server from

tacacs-server host 10.10.10.1 key tes1t

tacacs-server host 10.10.10.2 key tes2t

to

tacacs-server host 10.10.10.2 key tes1t

tacacs-server host 10.10.10.1 key tes2t

As for the package.cab file, here's the procedure; looks lengthy but it is simple.

Follow these instructions even if your server is already running in detailed logging mode. This

will ensure that all the proper service startup information is included in the package.cab file.

If

these instructions are not followed properly, we will need to request the information again.

- Log onto the ACS server itself as the local administrator.

- Browse to the UTILS directory in the ACS program directory.

- Run the program there called CSSupport.

- Select "Set Log Levels Only" and click Next.

- Select "Set Diagnostic Log Verbosity to Maximum."

- Check "Keep TACACS+ Packet Capture."

- Check "Keep RADIUS Packet Capture."

- Click Next, then click Finish.

At this point we need to duplicate the issue. Do whatever is causing the problem, or wait for the

problem to occur again if it's not triggered by a direct sequence of events. Once that's done, we

need to gather the verbose logs created. To do so, follow the instructions below AFTER the problem

has been recreated and recorded:

- Log onto the ACS server itself as the local administrator.

- Browse to the UTILS directory in the ACS program directory.

- Run the program there called CSSupport.

- Select "Run Wizard" and click Next.

- If we need more than today's logs:

-- Put a check in both "Previous Logs" checkbox.

-- Select the number of days to go back.

- Click Next four times.

- When the Finish button appears, click it.

The package.cab will be found in the UTILS\Support directory under the ACS program directory. This

file contains all of the log information from ACS and limited information about the computer that

ACS is running on. All collected information is essential for proper troubleshooting.

Hello again,

So, I did reversed the TACACS servers in my routers but that didn't solved anything.

Once again let me give a short discription to what I am doing.

Configure 2 ACS servers in my routers.

After a successfully login with the primary server, I shutdown it (the primary ACS) and try to login with me secondary ACS. And there where is my problem.

I also tried two methods. The first is to simply add the ACS servers in the global config and the other, after putting them in global config also putting them in "aaa server group tacacs TEST" and change the aaa authentication, authorasation, properly. None of these worked.

Now for the package.cab, I produced. Which of the files is necessary for you and where can I sent them.

Kind regards,

Kostas

Hi Kostas,

The Failed Attempts csv and the tcs.log would be a good starting point. How about the ACS services on this server? Are they all running fine? Has this server even once authenticated a login properly? What you need to confirm is that the server is functioning properly as a primary server. That's why I asked to put this server as the first entry.

What was the result of the "aaa test ......" command?

Thanks,

yatin

Hi Yatin,

Well I am in a middle of a strange situation.

After checking the logs that you pointed out, I didn't find anything strange.

So I reversed one more time the configuration. I mean I put the active ACS (10.10.10.1) as an backup and the backup (10.10.10.2) as an active in the router configuration.

ip tacacs-server host 10.10.10.2

ip tacacs-server host 10.10.10.1

Then I unpluged the network cable from the current active ACS (10.10.10.2) and tried to login in my router and out of nowhere everything worked just fine !

Then I reconfigured my router as it was (reversed the ACS in the previous form) and it didn't worked.

ip tacacs-server host 10.10.10.1

ip tacacs-server host 10.10.10.2

The strange in all this is that my active ACS (10.10.10.1) is doing a FULL replication to the backup (10.10.10.2) in order to have both ACS the accurate configuration. So when I first thinking that there was something wrong in my active ACS (10.10.10.1) I end up in the conclusion that it couldn't be anything wrong at the active ACS since it's doing the replication. So, since the backup ACS has the total same configuration as the active (I tripled checked it!) it shouldn't worked when I did the reverse. Correct ?

I know it sounds a bit confusing but still this is the true story. :-)

Any more good ideas ?

Thanks in advance,

/kostas

PS1: If you still want the package.cab you can send me an e-mail and I will reply to it. I can't post them here since they contain sensitive information

PS2: Is it possible the problem occured because of a RADIUS distribution table ? But then again the backup ACS has the same distribution table....

Hi Kostas,

Let me take a look at the package.cab.

Thanks,

yatin@cisco.com

Hi,

Did you make the following changes made by the first reply:

From,

aaa authentication login telnet group tacacs+ local

To,

aaa authentication login telnet group TEST local

If this doesn't ressolve the problem, problem seems to be with the IOS code.

Thanks,

Mynul

Hi Mynul,

Yes I did that.

It was my first change and to tell you the truth I felt a completely idiot when I saw my obvious mistake.

Nevertheless that didn't solved my problem.

Kind regards,

/kostas

Hi Kostas,

The standby/backup ACS server doesn't seem to be in the domain NOC, i.e. member of this domain. Please check that. If it is in a different domain, then there needs to be a proper trust relationship between those two domains.

Error in the log file ---------------

We are NOT a member of a domain => we cannot authenticate accounts on other trusted domains.

---------------------------------------

Because of this, it seems that there was no replication happening between the primary and secondary servers. The primary ACS in installed on the PDC of domain NOC.

Thanks,

yatin

Hi,

This shouldn't cause any problem with the ACS replications. But, its a problem if you want to authenticate users against the domain controller as the minumum requirement is to install ACS in a member server. Have you intergrated ACS with the domain controller, i.e, are you trying to authenticate users with the domain accounts thru ACS. If thats the case, may be primary acs is sending mal packets when cannot authenticate users against the domain controller. To elimate the pssoibility that its no ACS, please stop the primary acs services all together, then see if router is falling back on the secondary server. If that doesn't happen, then its the problem on IOS, if you can share the vesrion info on the router, can suggest if this is bug on the code. Thanks,

Mynu;