Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Community Member

Relationship of CRS Node Manager and Master Browser

We had an outage in our call center because the master browser of the callcenter workgroup went into a bug check (different issue). The CRS node manager services on the remaining servers could not 'talk' to any other servers because the master browser went offline (deducted conclusion based on eventlogs on all nodes). The Node Manager attempted to restart, failed to restart then rebooted the machines (all nodes/machines).

My specific question is, if I setup a local hosts files on all IPCC servers (master, backup and addons), would this prevent the node manager from using the Browser service (but instead use the local hosts file) to contact/check the heartbeat of other nodes?

With my NT admin hat on, i would say yes, but I do not know the inner workings of the node manager and if it follows the 'rules' based on the Microsoft TCP/IP Host Name Resolution Order. If it does follow the 'rules' then i would say that we should not see an outage/reboot on every node in the IPCC architecture because the master browser went offline. I did see Document ID: 49860 (http://tinyurl.com/a4ldq) but i don't think the resolution completely addresses the 'problem' in this case (We don't EVER want to rely on the browser service). Thanks for any input. (ciscoOS 4.2sr6, IPCC 4.04 Build140, CCM 4.13SR3a)

Reposted.

3 REPLIES
Bronze

Re: Relationship of CRS Node Manager and Master Browser

You're correct that the Name Resolution order in Windows 2000 (assuming hybrid mode) goes:

1) Local cache

2) Hosts file

3) DNS

4) WINS

5) Broadcasts

6) LMHosts

It's considered a best practice to configure a hosts file for call CIPT servers, so that name resolution between the servers is not dependent on DNS or WINs. So, you're totally correct in that respect.

However, the Master Browser service in Windows is different from the Name Resolution order. With a workgroup, one server on each subnet is elected Master Browser and is used to keep a list of members and corresponding shares. In a Windows 2000 Domain, this information instead get stored in Active Directory.

Callmanager and CRS do not explicity use this service, however they do use a share called "IPC$" for piping between servers in the cluster. What I'm not sure of is how a failing Master Browser would affect this share. According to Microsoft, when a Master Browser fails, the client should force a new election and select one of the backups. So in theory, loss of a Master Browser will only cause a delay browsing and not affect any active shares.

Just curious - How did you conclude that the Master Browser service led to the outage? I would expect that to be a symptom of a dying CRS server, and not the actual cause.

Community Member

Re: Relationship of CRS Node Manager and Master Browser

I will attempt to sound logical. There are 4 servers in the CRS infrastructure. 2 are engines (master, slave) and 2 are addons. One of the add on servers went into bugcheck and rebooted. That server just happened to be the master browser at the time of the bugcheck/reboot. As soon as it rebooted, all of the other servers' "CRS Node Manager" services went into restart mode. They could not restart on the first and second try and then those servers rebooted (reboot on third try configured by Cisco).

Server 1 went into a bugcheck - rebooted itself (master browser in the workgroup).

server 2, 3 and 4 had Node manager service restarts. Since the service could not 'restart', the service was set to reboot each machine (configured by Cisco).

I could only conclude that the master browser was holding some key that required the node manager on all other servers to run.

Input appreciated.

Bronze

Re: Relationship of CRS Node Manager and Master Browser

The setup make sense. Sounds all 4 servers are same workgroup, same subnet, same OS version.

I still haven't been able to find anything in the documentation suggesting that the CRS Engine is dependent on the Workgroup Master Browser but then again, the troubleshooting documentation for CRS is pretty skimpy (as you probably have figured out by now) and that would be better answered by a developer. You've certainly collected enough information to get a TAC case rolling at this point.

In the meantime, the best workaround I can think of is force your Primary Engine to act as Master Browser. The browsing behavior can be modified with the registry key

\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Browser\Parameters

Since this is a non-domain setup, you may have to set MaintainServerList = False on the other 3 servers. You can verify by pulling up My Network Places on each server and seeing who the request on TCP port 139 gets sent to via the "netstat" command.

But again, I would guess, based on experience, that both the Master Browser errors and CRS Node Manager service problems are symptoms of problems with the CRS Engine. Did you get a chance to look at the subsystems and see what their status was before the reboot, or check syslog & CRS logs?

Curious to see what the outcome of this is. Please post what you find.

152
Views
0
Helpful
3
Replies
CreatePlease to create content