Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Announcements

Welcome to Cisco Support Community. We would love to have your feedback.

For an introduction to the new site, click here. And see here for current known issues.

New Member

DCR internal error in communication channel

I have reported this error before...and have a TAC case open for it...but have found a workaround that I wanted to share that might shed some light on the issue.

The URL of Common Services > Device Management when I get the error contains the FQDN.

If I modify this URL by removing the domain suffix and attempt the same change in the DCR the change is successful.

Any ideas?

3 ACCEPTED SOLUTIONS

Accepted Solutions
Cisco Employee

Re: DCR internal error in communication channel

No, the register from remote server works just fine with short hostnames. I'm using it that way in my lab. Everything comes done to the SSL cert. If you create the certs with short hostnames, then you register servers with each other using short hostnames, and you access servers using the short hostnames, then everything will work.

Alternatively, if all of this is done with FQDN, then FQDN should work.

You need to decide how you want this all to work, then start from scratch. Regenerate the SSL certs on each server using the proper hostname. Remove all accepted peer server certs from each server, then reimport the new certs using the proper hostnames. Finally, re-register the applications from remote servers using the proper hostnames, re-setup DCR integration using the proper hostnames, etc. and everything should just work.

Cisco Employee

Re: DCR internal error in communication channel

Local applications should never need to be registered. This happens at install time. If, however, you lose local applications (like Common Services) there is a command-line only procedure to get them all back. TAC needs to walk you through this, though.

I don't see how it's possible not to have applications which are installed on a server not available for remote registration. I certainly cannot reproduce that. You may very well have a problem with the CMIC registration database, and it might be a good idea to have TAC walk you through the procedure to dump and re-register all local applications on both servers.

Cisco Employee

Re: DCR internal error in communication channel

I found the problem. As I predicted, it has nothing to do with browser, FQDN, or anything. It is a transient issue that only affects Windows SMP systems. It tends to occur mostly on faster machines. A patch is on its way.

58 REPLIES
Cisco Employee

Re: DCR internal error in communication channel

Hostname problems commonly cause errors due to certificate mismatches. One should always access LMS using the same hostname as configured in the certificate.

New Member

Re: DCR internal error in communication channel

OK...but the applications registered for remote servers show a FQ hostname in application registration status.

I assume this makes all links to that application point to the FQDN url as well. Which would cause the users trouble when going to CS on the remote server and thus getting this internal comm error because it used the FQDN url for the link.

I just realized that the application registration uses the FQDN when you register using the import from other servers option. If I were to unregister everything yet again. and register them with the 'from template' option they would no longer call the FQDN in the links. Is this accurate?

Why is the remote servers option there if the system doesn't mesh well with FQDN?

Cisco Employee

Re: DCR internal error in communication channel

No, the register from remote server works just fine with short hostnames. I'm using it that way in my lab. Everything comes done to the SSL cert. If you create the certs with short hostnames, then you register servers with each other using short hostnames, and you access servers using the short hostnames, then everything will work.

Alternatively, if all of this is done with FQDN, then FQDN should work.

You need to decide how you want this all to work, then start from scratch. Regenerate the SSL certs on each server using the proper hostname. Remove all accepted peer server certs from each server, then reimport the new certs using the proper hostnames. Finally, re-register the applications from remote servers using the proper hostnames, re-setup DCR integration using the proper hostnames, etc. and everything should just work.

New Member

Re: DCR internal error in communication channel

Ok so I tried 'starting from scratch' tonight...

Reverted both servers back to StandAlone mode for both DCR and SSO.

Deleted all peer server certificates.

Unregistered all applications.

Restarted the Daemon Managers.

Regenerated the certificates with the FQDN.

Modified the Homepage settings Server Name to reflect the FQDN.

Restarted the Daemon Managers.

Imported peer certificates using FQDN.

Changed DCR modes appropriately.

Changed SSO mode appropriately.

Restarted Daemon Managers.

Preparing to Register Applications:

On each server I chose import from remote server and wrote down what was detected as already registered on the remote servers.

DCR Master w/RME,DFM: Common Services, Setup Center, CiscoWorks Assistant, and Dev Diag Tools

DCR Slave w/CM,CV,IPM: Common Services, Setup Center, CiscoWorks Assistant, Dev Diag Tools, CM Setup Center

With that in mind I find it strange that CM Setup Center is showing up while IPM Setup Center does not. I have come to assume that you have to register CM, IPM, RME, and DFM but not Common Services. Is this accurate?

In any case...I proceeded to register the main components on their respective servers. The big question here is, after choosing Register From Templates, I am prompted to enter a server name which I assume we would stick to the 'plan' and input the FQDN. Is this accurate?

Having opted for the implied answer and inputing the FQDN I successfully registered all the local applications and proceeded to import the apps from the remote servers using the FQDN.

Having followed your advice I still seem to have missed something for I have some strange things that occur now.

Most prominent is the Device Allocation Summary. On the DCR master it reports, Error In getting Installed Applications in DCR domain. And the slave only shows DFM and RME with all devices managed.

To me this suggests that one server has discrepancies somewhere. I just don't know where to start looking.

First thing I have done it unregister CM since it is one of the applications that isn't showing up correctly. I then re-registered it with the shortname and it shows up accurately in the device allocation summary.

Hence my utter confusion. Help!

Cisco Employee

Re: DCR internal error in communication channel

You shouldn't be using templates to register apps from remote servers. You should select the Remote Server option, enter the FQDN, and select the apps from the list. You are right that you should only import the main apps. These include RME, CM, CS, DFM, IPM, and CiscoView.

Where are you seeing this discrepancy in the auto allocation summary. A screenshot would be helpful. It still sounds like your application registration approach is wrong.

I tried doing some of the things I believe you want to do, and so far, I haven't encountered any problems.

New Member

Re: DCR internal error in communication channel

I have no doubt that my application registration approach it incorrect. LOL...

What I haven't seen you mention is the process of registering local applications. Maybe this is where the problem lies. Do you have to do this?

For instance: DCR master houses CM and IPM and those applications aren't available to a remote server for import unless I first register them on the local server. Hence the confusion about do I register with the template option and the FQDN or what?

New Member

Re: DCR internal error in communication channel

I have a TAC case for this whole thread which was closed yesterday. I have emailed the Engineer that I was cooresponding with and requested it be reopened. Do you ever get involved via Webex? I'd be extremely greatful if you would offer your expertise via Webex and help resolve this ongoing issue. :)

Cisco Employee

Re: DCR internal error in communication channel

Local applications should never need to be registered. This happens at install time. If, however, you lose local applications (like Common Services) there is a command-line only procedure to get them all back. TAC needs to walk you through this, though.

I don't see how it's possible not to have applications which are installed on a server not available for remote registration. I certainly cannot reproduce that. You may very well have a problem with the CMIC registration database, and it might be a good idea to have TAC walk you through the procedure to dump and re-register all local applications on both servers.

New Member

Re: DCR internal error in communication channel

That sounds fabulous! I knew something wasn't right. I haven't heard from my TAC contact yet. We are currently down. So I guess I open a new case for this.

New Member

Re: DCR internal error in communication channel

Matter of fact I have opened a TAC case searching for this procedure in the past and was led in another direction.

Is there something I can call this procedure so the engineers know exactly what I am talking about?

I was offered the hostnamechange script and asked to redo the Master/Slave configuration.

Cisco Employee

Re: DCR internal error in communication channel

The hostnamechange script does modify the CMIC records, but I'm not sure it will fix all of your problems. It really sounds like somethings are missing which should not be. It's certainly easier to give it a try first, though. However, the procedure to which I refer involves deleting the existing CMIC database, then re-registering the local templates from the command line.

New Member

Re: DCR internal error in communication channel

The hostnamechange script will not run if the hostname hasn't changed. Right?

Cisco Employee

Re: DCR internal error in communication channel

No, it will state that the two hostnames are the same, and exit.

New Member

Re: DCR internal error in communication channel

I haven't heard back from my TAC engineer. They were going to research the procedure and call back. I really need to get the servers back online. I submitted the level 3 case this morning. Can you assist in this matter?

Cisco Employee

Re: DCR internal error in communication channel

You can escalate the case as a severity 1 if you are available, and can work on it now. That will queue it the next engineer. Else, if you have your engineer contact me tomorrow, I can help them find the necessary procedure.

New Member

Re: DCR internal error in communication channel

I need assistance with the command I was provided for re-registering the applications. There must be a typo...Can't get my engineer to return an e-mail or phone call. Can you advise?

I'm not sure if you want me to paste the command on here or not since it was so hard to come by, but here is the error I get:

Exception in thread "main" java.lang.NoClassDefFoundError: administration/1/0

The filename used was administration.1.0.xml and I was advised to remove the .xml for this command.

Cisco Employee

Re: DCR internal error in communication channel

Yes, and you're missing a piece. You forgot the actual class name to execute. The class name, which comes after the end of the classpath argument, and before the filename is:

com.cisco.nm.cmf.registry.CMICApplicationRegistry

New Member

Re: DCR internal error in communication channel

Thanks!

New Member

Re: DCR internal error in communication channel

I don't think we have resolved this issue.

It still occurs from time to time.

I am still not 100% sure about the do's and don't of hostname vs. FQDN with multiserver setup.

I know you recommended using the shortname, but there are times when that doesn't make sense.

For example, lets say I have 2 servers (ciscoworks-cm.domain.net and ciscoworks-rme.domain.net).

I was advised to generate the certificates using the hostname only (ciscoworks-cm and ciscoworks-rme).

In the Homepage settings then I would have to put the short hostnames as well.

When I register applications from a remote server it asks for a server name and display name both of which I assume should be the short hostname.

Then when the apps are registered you see a hostname column for each app that is registered and it apparently reads the FQDN from somewhere and that is what is shown as the hostname for the remote apps. (I'd imagine this could be the md.properties file.)

You also have to provide the servername when you setup SSO and the DCR Master/Slave settings both of which rely on the imported certificates and therefore must match with the short hostname.

Somewhere though the server is told to use the FQDN for URLs and this throws things off when you have your certificate generated with the short hostname.

Out of the box several weeks ago this was the issue which proved an issue when it was apparently what caused the "internal error in communications channel" issues.

I then began a mission to get everything to reference the FQDN since I couldn't successfully get everything to use the shortname.

Plus it just doesn't seem acceptable to expect users to address the site by its short hostname only. This requires tedious fenagling with each users HOSTS file or DNS settings to make certain that it won't append an alternate domain suffix.

I have chased this goose entirely too long.

I even upgraded to LMS 3.2 hoping that would work out some of these kinks but the "internal communication channel" error still seems to rear its head as it pleases.

Cisco Employee

Re: DCR internal error in communication channel

The display name of the registered application can be anything you want. The hostname should be the short hostname.

I did some testing with FQDN vs. short hostname internally on my LMS 3.2 servers, and found things to work generally pretty well when using FQDN except when it comes to application/device mapping (PIDM) and Device Center. I have two machines registered with each other by FQDN, and so far I have not had any communications problem with DCR (though Device Center links use the short hostname of the peer server).

On top of that, the logs I have seen thus far don't point to any real root cause of these issues. There also doesn't appear to be any debugging which can be enabled to give more information. At the very least some code changes would be required to get more clues as to what is going on.

For this reason, you will need to work with TAC so patches can be provided to try and isolate what is going on when this error occurs.

New Member

Re: DCR internal error in communication channel

Just to be certain that the issue wan't due to a server domain suffix change after installation, I fully formatted and reinstalled LMS 3.2.

I have 2 licensed servers still, but have added a third HUM trial to the mix. So 2 slaves.

I am determined to get this working with FQDN, which may prove to be more trouble than it is worth. :)

All configurable references to the remote servers use the FQDN. The certs were generated with the FQDN and the Homepage Settings reflect the FQDN.

Still getting "internal error in communication channel. It actually seems more apparent that the slaves initially attach to the master but drop off soon after.

Browsing the attached logs brought me to this theory.

I also noticed that the DCR mode settings report:

Current DCR Settings

Mode: Slave Master Hostname: [masterhostname.domain.com]

Port: 443

Master Certificate: Valid

Master Server is unreachable.

So I changed the DCR settings to call the short hostname only and that wasn't sufficient.

I still get "Certificate HostName [masterhostname.domain.com] and the URL Host Name [masterhostname] do not match

Before Calling the astandalone to slave

--------------------

I obviously have to generate the certs with the shortname until this issue is addressed further...

Again, the problem I have with having the cert use the shortname is the browser complaints of the URLs not matching when our end users access the server by the FQDN. It doesn't seem plausible to expect users to open their browser and go to https://masterhostname instead of https://masterhostname.domain.com.

Any thoughts or additions?

Cisco Employee

Re: DCR internal error in communication channel

There is not enough information in these logs to determine why your DCRs are unable to sync up.

I recommend you open a TAC service request, and keep it open until this is working. I know it works as I'm currently running in such a configuration. I can only guess that something still has not been done right (or hostname resolution is not working correctly for FQDN).

As to your last point, given that Device Center will still use short hostnames even if everything else is using FQDN will mean that your users will still get prompted to accept the cert hostname mismatch (and authenticate again if using SSO).

New Member

Re: DCR internal error in communication channel

Ok and good point about Device Center's URLs.

Thanks for all your help, anyhow. I have really appreciated it.

New Member

Re: DCR internal error in communication channel

Update: I have opened a TAC case for this "internal error in communication channel."

I advised my TAC engineer that I had been working with you for a couple of week now on this issue. I hope you might find the time to assist. :)

New Member

Re: DCR internal error in communication channel

Still seeing the internal error in communication channel error...ugh! ;)

I was hoping you could explain the purpose of the Home Page Server Name in the Home Page Settings under CS > Server > Home Page Admin.

In a multiserver environment should this be your appointed web server and thus match on all servers or just each servers own local hostname or FQDN?

What is the Provider Group Name function...I assume it is one in the same, but does it affect anything I am seeing.

Right now all of my certs are configured with the shortname as advised, however this provider group name or home page server name is the FQDN of each server itself. Should I modify that?

Today when I received the internal communication error I was trying to update the device credentials on a few devices none of which were successful. All returned the error.

I then went to the browser address and modified it to just the shortname. Still saw the error.

I then decided to clear my browser cookies, history, and temp files and tried again. Same error with the FQDN url, but when I tried the shortname I was able to modify the credentials of every device in the db. ;)

Any insight here? Does this help? LOL

Cisco Employee

Re: DCR internal error in communication channel

The homepage name can be anything you want. You could call it "Cowboy Server" if you wanted. It's just a logical name to present to users (though there are some internal uses as well). There used to be some issues with making this something other than the hostname, but those should be fixed now.

No, this still doesn't explain why this error is occurring. Given the transient nature, and the fact that I cannot reproduce on two clusters, perhaps there is something wrong with the server itself (e.g. bad memory). Or, maybe there is some conflict with something else installed on this server. What services are currently running on the master?

New Member

Re: DCR internal error in communication channel

I assume you mean just LMS services. So I have attached the pdshow.

The servers are all brand new servers with no other "obvious" apps on them. But you never know, I know.

I am starting to wonder if I ever see the error when accessing the DCR directly from the server. I will test that some.

I guess I didn't mention that I usually don't access the server directly when making changes to the devices in DCR. In fact that may be why I couldn't reproduce the exact error while I was with TAC. Hmmm...I'll begin testing that immediately. We definitely have some tight firewalls here that we could be battling with...

However, it is important to note that there are no firewalls between the master and its slaves. They are all in the same subnet. So the master unreachable should be different.

New Member

Re: DCR internal error in communication channel

Oops attachment!

Cisco Employee

Re: DCR internal error in communication channel

No, I meant non-LMS services. LMS will not conflict with itself. But other services could be hindering it.

The client shouldn't have a bearing on how DCR works. All of the communication happens either internally or between servers.

1126
Views
53
Helpful
58
Replies
CreatePlease login to create content