I have 5 sites all connected with VPNs. For the most part everything works fine except for our internal calls between sites. We are seeing calls drop at random time intervals ranging from about 2 mins to 35 mins. We have setup some QoS on the firewall but we have not setup any QoS on our internal switch gear and currently we don't have any routers in place. I have also found some discussions that say vlans should be made to separate voip traffic. Our phone system is an Avaya IP Office 500v2 so all of the handsets are digital and connect back to the phone system. Our internal calls are done over Avaya Small Community Network which uses H323. Our average ping time to one of the sites that is having the worst time with the dropped calls is about 130ms which I'm wondering if that is part of the problem at that site. The ping test was done out of network and in network with the same results. Below are the commands that we have on our firewalls for QoS which are all running ASA 8.6. Questions follow below separated out for ease of answering. Any ideas or help in this matter is much appreciated.
1. Having digital hand sets should we still vlan the phone system?
2. Should we setup QoS on all switches and if so what QoS should we configure?
3. What other QoS should we configure on the firewall?
4. Should we be using a router for any reason?
5. Should we implement a routing protocol between sites?
6. Is an average ping time of 130ms a problem for VoIP traffic?
access-list voip extended permit object-group TCPUDP any any eq sip
access-list voip extended permit object h323 any any
crypto ipsec df-bit clear-df Outside-Optimum
crypto ipsec fragmentation before-encryption Outside-Optimum
priority-queue Inside priority-queue Outside-Optimum
match access-list voip
policy-map type inspect h323 no-inspection
description This rule allows all traffic through
service-policy voip interface Outside-Optimum
service-policy voip interface Inside
Normally you should not need to prioritize signaling traffic, but rather concentrate on RTP.
h323 will use TCP so it should deal pretty well with packet loss.
130ms latency should be fine (bidir).
There's an example of ASA's VPN QoS config here (for example):
That being said, 35 minutes of broken connectivity is WAY to high, QoS on itself will not account for this.
Get in touch with Avaya, help them pick the optimum settings for their voip infrastructure for your network, QoS might help, but it's not a guarantee of success.
What you might WANT to check on ASA is behavior with and without h323 inspection. Just do it cautiously, disabling it might affect voip calls over internet (as opposed to over VPN, with default settings on ASA).
I don't think adding complexity (RP, routers) will help much, although routers typically offer better QoS facilities.
Thank you Marcin for the link and the suggestions. We will implement the changes regarding RTP and the guide. Hopefully we will see an improvement. To clarify the range of 2 to 35 minutes is the time frame the calls seem to last. In that range the calls will just drop and disconnect users. The time the calls last varies. The call quality while on the call is perfect but at some point it just drops off. Do you think this is an issue unrelated to the VPN? If so what do you think may be causing this?
Typically IF the problem was with call control, you would most likely see the phone re-registering or similar (I'm not sure how those particular phones would react). If voice is the problem you would see (most of the time) decrease of quality over time.
I doubt it's a pure VPN problem, VPN is taking packets and encrypting them, modern crypto engines should not have a big problem with ranging packet sizes.
What I trully needed is capture on both sides of the call (RTP + signaling traffic) to determine what went wrong and only then actions could be taken (with some level of guesswork).
I did a packet capture while on a call between sites. Unfortunately during the call the far side lost connection for some reason and while trying to get the packet trace back up the call failed. I'm uploading the side that I do have and some monitoring information from the phone system showing what ports the call was using. The packet trace captured all UDP traffic between the near and far side phone systems only, using their IP addresses. The link below is to the PCAP file. The call was initiated around 13:11 and the call failed about 13:34. Please let me know if this info is not was you needed and I will try and post a trace with both side.
14882087mS H323Evt: v=0 stacknum=30 State, new=NullState, old=Active id=3955
14882087mS H323Evt: v=0 stacknum=30 State, new=NullState, old=Active id=38
14882087mS H323Evt: Shared tcp socket for line 30 disconnected
14882087mS H323Evt: H323Pipe::DisconnectIndication cap 1
14882087mS H323Evt: H323Pipe::DisconnectIndication cap 2
14882088mS H323Evt: H323Pipe::DisconnectIndication cap f
14882088mS H323Evt: H323 stack for line 30 is disconnected
14882102mS H323Evt: RTP(END): 10.1.99.225/49160 10.1.97.225/49154 CODEC=G729A8K(6) PKTSZ=20 RFC2833=off AGE=1326654 SENT=66324 RECV=65944 RTdelay=0 jitter=0 loss=0 remotejitter=0 remoteloss=0
14882103mS H323Evt: RTP(END): 10.1.99.225/49158 10.1.97.225/49156 CODEC=G729A8K(6) PKTSZ=20 RFC2833=off AGE=35559 SENT=1770 RECV=1765 RTdelay=0 jitter=0 loss=0 remotejitter=0 remoteloss=0
14882109mS H323Evt: v=0 stacknum=30 State, new=NullState, old=NullState id=-1
14882109mS H323Evt: v=0 stacknum=30 State, new=NullState, old=NullState id=4262
The capture shows somethign interesting (while indeed not full info) 21:33:57 almost all traffic is sourced from 10.1.97.225 (odd packet here or there from 99.225).
Until 21:34:12 (packet # 149524) where *.97 seonds a big upd packet and *99 seems to be able to respond.
That's almost 15 seconds (which could be a timeout of some sort) .
What would be interesting to see if those packets actually got to the other side.
I didn't go into details of every packets.. but I did try graphing traffiuc sourced from both IPs ... here's the result.
Hopefully this reflects what I was saying.
Thank you for looking at the trace. It is odd that most traffic is coming from the 97 network and then at the end a long loss of communication from the 99 network. I will do another trace today with both sides this time. I called the site last time from the 99 network and the call seemed to last longer for some reason. Ill have the far site call me this time and see of it drops sooner. Also I wanted to ask about QoS on the RTP. The RTP range is very large in port numbers so do you QoS all of the ports? How else would you go about QoS on RTP? I also noticed ok an earlier post that you said H323 usually uses TCP. I think Avaya mainly uses UDP could this be an issue? Ill post later today with the packet traces. Should I collect all traffic between phone systems instead of just UDP?
Sent from Cisco Technical Support iPhone App
So I finally got a packet capture on both sides of the call. Im posting a link to the results below. Im also posting call data from our phone system monitoring. I noticed something odd on the phone system monitoring. When my call failed which took almost an hour to do this time from 25 minutes last time. The monitoring on the phone system showed that our connections to three of our sites all failed at the same time. Ill post the results below but all of these connections between the sites use H323 for communication between sites. The three that dropped did come back up quickly though. One of the sites for some reason did not fail but they are still having call problems. In the last section of phone monitoring you can see what port my call was using if it helps. The call dropped about 17:25. Let me know if I can provide anymore information.
Administration Building Side
Disconnects and Reconnects
374412983mS H323Evt: Shared tcp socket for line 23 disconnected
374412983mS H323Evt: H323Pipe::DisconnectIndication cap 1
374412984mS H323Evt: H323Pipe::DisconnectIndication cap 2
374412984mS H323Evt: H323Pipe::DisconnectIndication cap f
374412984mS H323Evt: H323 stack for line 23 is disconnected
374413905mS H323Evt: Shared tcp socket for line 20 disconnected
374413905mS H323Evt: H323Pipe::DisconnectIndication cap 1
374413905mS H323Evt: H323Pipe::DisconnectIndication cap 2
374413905mS H323Evt: H323Pipe::DisconnectIndication cap f
374413905mS H323Evt: H323 stack for line 20 is disconnected
374415080mS H323Evt: Shared tcp socket for line 30 disconnected
374415080mS H323Evt: H323Pipe::DisconnectIndication cap 1
374415080mS H323Evt: H323Pipe::DisconnectIndication cap 2
374415080mS H323Evt: H323Pipe::DisconnectIndication cap f
374415080mS H323Evt: H323 stack for line 30 is disconnected
374415087mS H323Evt: RTP(END): 10.1.99.225/49152 10.1.97.225/49154 CODEC=G729A8K(6) PKTSZ=20 RFC2833=off AGE=3554247 SENT=177666 RECV=177663 RTdelay=0 jitter=0 loss=0 remotejitter=0 remoteloss=0
374428039mS H323Evt: V7 on the other side of line 23 tcp f4e67808 shared_tcp f4e67808 tcp f4e67808 shared_msg_sent 1 new_caps f ver_major 7
374428040mS H323Evt: H323 stack for line 23 is connected
374428040mS H323Evt: H323Pipe::ConnectIndication cap 1
374428040mS H323Evt: H323Pipe::ConnectIndication cap 2
374428040mS H323Evt: H323Pipe::ConnectIndication cap f
374428040mS H323Evt: Shared TCP became operational for line 23
374428952mS H323Evt: V7 on the other side of line 20 tcp f56852a4 shared_tcp f56852a4 tcp f56852a4 shared_msg_sent 1 new_caps f ver_major 7
374428952mS H323Evt: H323 stack for line 20 is connected
374428952mS H323Evt: H323Pipe::ConnectIndication cap 1
374428952mS H323Evt: H323Pipe::ConnectIndication cap 2
374428953mS H323Evt: H323Pipe::ConnectIndication cap f
374428953mS H323Evt: Shared TCP became operational for line 20
374430348mS H323Evt: V7 on the other side of line 30 tcp f4eaaf74 shared_tcp f4eaaf74 tcp f4eaaf74 shared_msg_sent 1 new_caps f ver_major 7
374430348mS H323Evt: H323 stack for line 30 is connected
374430348mS H323Evt: H323Pipe::ConnectIndication cap 1
374430348mS H323Evt: H323Pipe::ConnectIndication cap 2
374430348mS H323Evt: H323Pipe::ConnectIndication cap f
374430348mS H323Evt: Shared TCP became operational for line 30
Checking UDP is not going to be enough.
You can see that both sides of the capture follow similar pattern during disconnect. Also the logs seems to indicate (I'm
guessing since the debugs are not ones I know) a problem on h323/TCP level.
(Granted, the timestamps are a bit off?)
The RTP traffic doesn't seem to be dropped.
Looking at TCP flow "tcp.stream eq 0" in Admin to Gunisson...
You can see 10.1.97.225 finishing h323 session with RSTs.
I don't see a specific reason for those resets (nor do I see them on both sides of capture - maybe due to different timestamps?), but maybe it's time to discuss this with phone vendor?
You can also open up a case with out TAC - have someone go over h323 debugs from ASA (we'll have to take capture on ASA and most likely a debug).
Potentially, yes. h323 inspection requires certain level of proxying of TCP. So it could affect the behavior (which would most likely signify a bug on our side).
h323 inspection over VPN should typically not be requires (unless you have NAT or VPN ACLs).