Phone KeepAlives

will.alvord · ‎01-19-2009

We have several remote branches on v4 and v5 clusters. The phones at the remote branches periodically re-register due to what appears to be TCP timeouts on the KeepAlive requests.

The only devices which I have access to at these remote branches is the ISR's terminating the local PRI's, so I can't check to ensure the QoS configs. I'm wondering what the best course of action is at this point. It doesn't seem that adjusting the "Station and Backup Server KeepAlive Interval" and "Station KeepAliveInterval" service parameter settings would correct the TCP timeouts, but wanted to get some advice.

Any ideas?

thanks,

Will

Nicholas Matthews · ‎01-19-2009

Do you know what the QoS is configured for now?

If there is voice qos for DSCP EF (46), you could change your signaling for that value and hope it gets prioritized like voice traffic.

This is going to depend largely on the configuration of QoS on the WAN router pointed towards CUCM.

You can try increasing the keepalive intervals, but it will only lessen the problem rather than solve it.

hth,

nick

will.alvord · ‎01-19-2009

Unfortunately I don't have access to any of their switches or wan routers so I can't confirm. But at one of the sites I have a pc connected to a test phone with the phone port spanned to the pc port. From the wireshark captures, it looks like the keepalives are being sent with dscp 18/cs3. Are you sure they should be ef?

Nicholas Matthews · ‎01-19-2009

By Default the call signaling will be cs3 or af31.

But, you can trick the WAN routers into thinking it is voice traffic by marking it as DSCP EF.

This assumes that the WAN router / WAN cares about the DSCP marking.

hth,

nick

Jaime Valencia · ‎01-19-2009

no, they shouldn't be EF, also CS3 is 24, not 18. It's needs to be one or the other but they're not equivalent DSCP/PHB behaviors so look again as that cannot be right.

read carefully, Nick never said signaling was marked EF:

"If there is voice qos for DSCP EF (46), you could change your signaling for that value and hope it gets prioritized like voice traffic. "

he said you could prioritize also signaling in the voice traffic.

have someone who has access to the QoS config to look into this.

HTH

java

if this helps, please rate

HTH

java

if this helps, please rate

will.alvord · ‎01-19-2009

Good deal. Thanks. By the way, 18 hex = 24 dec. I just didn't do the conversion.

will.alvord · ‎01-19-2009

Unfortunately I don't have access to any of their switches or wan routers so I can't confirm. But at one of the sites I have a pc connected to a test phone with the phone port spanned to the pc port. From the wireshark captures, it looks like the keepalives are being sent with dscp 18/cs3. Are you sure they should be ef?

will.alvord · ‎01-20-2009

I have access to the access switches now and it looks auto qos was applied with the service-policy applied at the phone ports and dscp trusted at the egress. I'm not a qos master by any means but I've found that auto qos is pretty reliable.

I'm not seeing any errors on the phone ports, but from the ccm event logs I can see a ton of re-registrations with reason code 8 -- device unregistration due to loss of keepalives. I've got a TAC case open but need to continue to troubleshoot in the meantime.

From Wireshark, I can see that the keepalive from phone to ccm is marked with cs3/dscp24, but the skinny keepalive ack is not marked. I checked the switches that the ccm's are on and the same auto qos configs are applied.

It looks like I may have narrowed down the problem with the keepalive ack's not being marked. I'll continue to troubleshoot and any advice will be much appreciated.

thanks,

Will

will.alvord · ‎01-20-2009

I found and corrected the keepalive ack dscp markings. We were setting dscp on the phone ports but on the switches that the callmgr's are on we were trusting cos rather than dscp. I've confirmed now that keepalive ack's from all nodes are now marked properly.

I've been getting very frequent reports of re-registrations so I should know pretty quickly if this resolved it.

thanks,

Will

will.alvord · ‎01-21-2009

The remote branch with the phone re-registrations has a couple of internet-based ipsec vpn tunnels back to HQ. Once the ASA skinny inspection was removed, there haven't been any abnormal device re-registrations.

In the same time period between implementing the fix and now, but 1 day prior -- there were 65 phone re-registrations in the application event logs.

At the risk of jinxing myself, it's safe to say that this has been resolved.

thanks,

Will