Solved: DCR internal error in communication channel

helexis · ‎10-12-2009

I have reported this error before...and have a TAC case open for it...but have found a workaround that I wanted to share that might shed some light on the issue.

The URL of Common Services > Device Management when I get the error contains the FQDN.

If I modify this URL by removing the domain suffix and attempt the same change in the DCR the change is successful.

Any ideas?

Joe Clarke · ‎10-13-2009

No, the register from remote server works just fine with short hostnames. I'm using it that way in my lab. Everything comes done to the SSL cert. If you create the certs with short hostnames, then you register servers with each other using short hostnames, and you access servers using the short hostnames, then everything will work.

Alternatively, if all of this is done with FQDN, then FQDN should work.

You need to decide how you want this all to work, then start from scratch. Regenerate the SSL certs on each server using the proper hostname. Remove all accepted peer server certs from each server, then reimport the new certs using the proper hostnames. Finally, re-register the applications from remote servers using the proper hostnames, re-setup DCR integration using the proper hostnames, etc. and everything should just work.

View solution in original post

Joe Clarke · ‎10-14-2009

Local applications should never need to be registered. This happens at install time. If, however, you lose local applications (like Common Services) there is a command-line only procedure to get them all back. TAC needs to walk you through this, though.

I don't see how it's possible not to have applications which are installed on a server not available for remote registration. I certainly cannot reproduce that. You may very well have a problem with the CMIC registration database, and it might be a good idea to have TAC walk you through the procedure to dump and re-register all local applications on both servers.

View solution in original post

Joe Clarke · ‎11-04-2009

I found the problem. As I predicted, it has nothing to do with browser, FQDN, or anything. It is a transient issue that only affects Windows SMP systems. It tends to occur mostly on faster machines. A patch is on its way.

View solution in original post

Joe Clarke · ‎10-12-2009

Hostname problems commonly cause errors due to certificate mismatches. One should always access LMS using the same hostname as configured in the certificate.

helexis · ‎10-13-2009

OK...but the applications registered for remote servers show a FQ hostname in application registration status.

I assume this makes all links to that application point to the FQDN url as well. Which would cause the users trouble when going to CS on the remote server and thus getting this internal comm error because it used the FQDN url for the link.

I just realized that the application registration uses the FQDN when you register using the import from other servers option. If I were to unregister everything yet again. and register them with the 'from template' option they would no longer call the FQDN in the links. Is this accurate?

Why is the remote servers option there if the system doesn't mesh well with FQDN?

Joe Clarke · ‎10-13-2009

No, the register from remote server works just fine with short hostnames. I'm using it that way in my lab. Everything comes done to the SSL cert. If you create the certs with short hostnames, then you register servers with each other using short hostnames, and you access servers using the short hostnames, then everything will work.

Alternatively, if all of this is done with FQDN, then FQDN should work.

You need to decide how you want this all to work, then start from scratch. Regenerate the SSL certs on each server using the proper hostname. Remove all accepted peer server certs from each server, then reimport the new certs using the proper hostnames. Finally, re-register the applications from remote servers using the proper hostnames, re-setup DCR integration using the proper hostnames, etc. and everything should just work.

helexis · ‎10-13-2009

Ok so I tried 'starting from scratch' tonight...

Reverted both servers back to StandAlone mode for both DCR and SSO.

Deleted all peer server certificates.

Unregistered all applications.

Restarted the Daemon Managers.

Regenerated the certificates with the FQDN.

Modified the Homepage settings Server Name to reflect the FQDN.

Restarted the Daemon Managers.

Imported peer certificates using FQDN.

Changed DCR modes appropriately.

Changed SSO mode appropriately.

Restarted Daemon Managers.

Preparing to Register Applications:

On each server I chose import from remote server and wrote down what was detected as already registered on the remote servers.

DCR Master w/RME,DFM: Common Services, Setup Center, CiscoWorks Assistant, and Dev Diag Tools

DCR Slave w/CM,CV,IPM: Common Services, Setup Center, CiscoWorks Assistant, Dev Diag Tools, CM Setup Center

With that in mind I find it strange that CM Setup Center is showing up while IPM Setup Center does not. I have come to assume that you have to register CM, IPM, RME, and DFM but not Common Services. Is this accurate?

In any case...I proceeded to register the main components on their respective servers. The big question here is, after choosing Register From Templates, I am prompted to enter a server name which I assume we would stick to the 'plan' and input the FQDN. Is this accurate?

Having opted for the implied answer and inputing the FQDN I successfully registered all the local applications and proceeded to import the apps from the remote servers using the FQDN.

Having followed your advice I still seem to have missed something for I have some strange things that occur now.

Most prominent is the Device Allocation Summary. On the DCR master it reports, Error In getting Installed Applications in DCR domain. And the slave only shows DFM and RME with all devices managed.

To me this suggests that one server has discrepancies somewhere. I just don't know where to start looking.

First thing I have done it unregister CM since it is one of the applications that isn't showing up correctly. I then re-registered it with the shortname and it shows up accurately in the device allocation summary.

Hence my utter confusion. Help!

Joe Clarke · ‎10-13-2009

You shouldn't be using templates to register apps from remote servers. You should select the Remote Server option, enter the FQDN, and select the apps from the list. You are right that you should only import the main apps. These include RME, CM, CS, DFM, IPM, and CiscoView.

Where are you seeing this discrepancy in the auto allocation summary. A screenshot would be helpful. It still sounds like your application registration approach is wrong.

I tried doing some of the things I believe you want to do, and so far, I haven't encountered any problems.

helexis · ‎10-14-2009

I have no doubt that my application registration approach it incorrect. LOL...

What I haven't seen you mention is the process of registering local applications. Maybe this is where the problem lies. Do you have to do this?

For instance: DCR master houses CM and IPM and those applications aren't available to a remote server for import unless I first register them on the local server. Hence the confusion about do I register with the template option and the FQDN or what?

helexis · ‎10-14-2009

I have a TAC case for this whole thread which was closed yesterday. I have emailed the Engineer that I was cooresponding with and requested it be reopened. Do you ever get involved via Webex? I'd be extremely greatful if you would offer your expertise via Webex and help resolve this ongoing issue. :)

Joe Clarke · ‎10-14-2009

Local applications should never need to be registered. This happens at install time. If, however, you lose local applications (like Common Services) there is a command-line only procedure to get them all back. TAC needs to walk you through this, though.

I don't see how it's possible not to have applications which are installed on a server not available for remote registration. I certainly cannot reproduce that. You may very well have a problem with the CMIC registration database, and it might be a good idea to have TAC walk you through the procedure to dump and re-register all local applications on both servers.

helexis · ‎10-14-2009

That sounds fabulous! I knew something wasn't right. I haven't heard from my TAC contact yet. We are currently down. So I guess I open a new case for this.

helexis · ‎10-14-2009

Matter of fact I have opened a TAC case searching for this procedure in the past and was led in another direction.

Is there something I can call this procedure so the engineers know exactly what I am talking about?

I was offered the hostnamechange script and asked to redo the Master/Slave configuration.

Joe Clarke · ‎10-14-2009

The hostnamechange script does modify the CMIC records, but I'm not sure it will fix all of your problems. It really sounds like somethings are missing which should not be. It's certainly easier to give it a try first, though. However, the procedure to which I refer involves deleting the existing CMIC database, then re-registering the local templates from the command line.

helexis · ‎10-14-2009

The hostnamechange script will not run if the hostname hasn't changed. Right?

Joe Clarke · ‎10-14-2009

No, it will state that the two hostnames are the same, and exit.

helexis · ‎10-14-2009

I haven't heard back from my TAC engineer. They were going to research the procedure and call back. I really need to get the servers back online. I submitted the level 3 case this morning. Can you assist in this matter?