cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5799
Views
49
Helpful
16
Replies

SPA-112 periodically loses registration

spacehymns
Level 1
Level 1

We have 2 SPA-112 units (running the latest 1.3.3(015) firmware) which are experiencing loss of registration on a regular basis.   The web interface shows that the line is off-hook, and that it is registered.  However, there is no dial tone on the line, and the SIP registrar (a Mitel 3300) shows that the line is out of service.  The telltale sign of this problem is that the "Next Registration In:" field in the Voice tab is blank.  Attached is the SIP debug log of an event - sometime during that window, the SPA lost registration.

1 Accepted Solution

Accepted Solutions

I had a very long back-and-forth with Cisco small business support.  They recommended the firmware which was released just a few days ago (1.3.5).  I upgraded the firmware, and also applied a patch to the 3300 that Mitel released just for us.  We have not experienced the problem since.

View solution in original post

16 Replies 16

Dan Lukes
VIP Alumni
VIP Alumni

The attached debug log seems not to confirm your claiming. The important lines of them are:

 --------------------------------------------

 

  1. 11:14:50 | [0]Off Hook
  2. 11:14:59 | Calling:918885651662@10.10.0.17:0, rc=0
  3. 11:15:02 | CNG tone detected
  4. 11:15:05 | NEW_CALL_STATE(), call 0: old state = CC_CST_CALLING, new state CC_CST_PROCEEDING
  5. 11:15:06 | NEW_CALL_STATE(), call 0: old state = CC_CST_PROCEEDING, new state CC_CST_CONNECTED
  6. 11:15:55 | [0]On Hook
  7. 11:20:54 | [0]Off Hook
  8. 11:21:03 | Calling:918885651662@10.10.0.17:0, rc=0
  9. 11:21:06 | CNG tone detected
  10. 11:21:09 | NEW_CALL_STATE(), call 0: old state = CC_CST_CALLING, new state CC_CST_PROCEEDING
  11. 11:21:10 | NEW_CALL_STATE(), call 0: old state = CC_CST_PROCEEDING, new state CC_CST_CONNECTED
  12. 11:21:52 | [0]On Hook
  13. 11:22:26 | End of Log

 --------------------------------------------

 

According the log, there has been outgoing fax call from 11:14:50 to 11:15:55 then second outgoing fax call from 11:20:54 to 11:21:52. Both calls has been connected to caller. It mean that from 11:14:50 to 11:21:52 (e.g. almost complete debug window) the line has NOT been out of service. It has been in use.

 

OK, the attached log can't help us so much with the issue analysis.

 

Please catch the SIP packets between your ATA and Mitel. We need to see registration requests send and related responses from Mitel. I wish it will help to solve the problem.

 

 

spacehymns
Level 1
Level 1

Thanks, Dan!

I will set up a wireshark trace.  In the meantime, here's another log file that definitely covers the time period when the SPA lost registration.

Please see the attached wireshark trace.  There are some interesting things here: several 'malformed packet' instances.  During the trace, the device lost registration.

There is no sucessfull registration catched in the file. SO I can't decide if registration has been lost prematurely or it has just expired.

Also, all files you supplied looks simmilar - they cover active T.38 fax call. No other type of call, no phone in idle state. Should I assume the issue manifests only during such type of call ? You didn't mention such condition ...

 

The SIP boxes in question only handle fax calls.  At the beginning of the time represented in the trace, the box was registered.  At the end, it was not registered, at least from the point of view of the registrar.  I will do another trace.

Dan Lukes
VIP Alumni
VIP Alumni

It should start at REGISTER time (including the register itself) and should not end before the phone become unregistered AND active call has ended. It's not necesarry to catch the RTP  ...

OK, I've got another capture in progress.  The problem is that from the SPA's perspective, the call is still in progress.  The web interface shows "off hook".

If SPA claim off-hook, then fax is holding the line off-hook. The fax need to hang up the line. It can't be done by SPA.

spacehymns
Level 1
Level 1

I've attached a capture where the phenomenon occurred, but this time the SPA was on-hook but unregistered.  .40 is the SPA, .17 is the SIP registrar.

Lets analyze. Both lines has been registered successfully (packets #13-#20) with expiration set to 3600seconds.

Registration has been renewed about 3570s later (packets #21-#28).

The registration has been successfully renewed so many times then. Successful incoming as well as outgoing calls has occurred several times in meantime.

Important part seems to start at #544. Incoming call to 774201 is diverted to 628 because of "user busy". But previous call has been disconnected so long time ago (#388-#407). So why the 774201 is busy ? Syslog&debug packets are not included in the packet dump, but I assume the fax is holding the line off-hook with no apparent reason. Similar session occurred three times.

But it doesn't affect the registration so we can ignore it.

No interesting event happened up to packet #1001.  A device with IP 10.10.0.24 seems to flood the SPA by suspicious (and possibly malformed) T.38 UDPTL packets.

Not registration is still not lost, so even this can be ignored.

Mos suspicious part is starting at packet #1231. External PBX is attempting to setup incomming call, but upstream PBX decided to cancel it in ringing phase. SPA doesn't respond to such attempt. It's not so good, but it should not harm - upstream PBX should consider the dialog expired.

From packet #1245 we see attempt to setup another incoming call - such requests are not responded from SPA's side. The device seems to be dead. Although previous registration is not expired yet, the device is in wrong state and may claim anything - including it's not registered.

 

Well, what's now ?

 

I see several suspicious event that may cause device's abend.

 

At the first, the device's configuration related to T.38 seems to be broken. T.38 reinvite is attempted during almost all calls, but they are unsuccessful. If you are wishing for T.38 fax mode, the issue needs to be solved. Ask your VoIP provider for help as we can't know why the T.38 is rejected by them. If no T.38 is supported then it should be deconfigured on SPA's side.

At the second, there is a flood of packets from .24 node. The source and purpose of such packets needs to be identified. It may confuse the SPA.

Well, if it will not help, there is "at the third" paragraph. Call cancelled during RINGing phase of incoming call is latest event before the SPA become frozen. It should not be so hard to verify it cause the problem or not. Just call the SPA, don't allow fax to go on-hook, cancel the call during ringing phase. If you confirm it freeze the SPA you discovered the firmware bug. Downgrade may help then.

Note - don't try "third" unless "first" and "second" issue become solved.

Hope it help ...

 

 

 

Dan, this is an utterly fantastic analysis.

I'm going to go through your advice bit by bit.  One note: the .24 node is the on-site SIP proxy, which is a Mitel Border Gateway.

Thank you!

 

 

It required about two hours of analysis (more detailed in first version, but I deleted unrelated parts then), so it SHOULD be considered useful ;-)

Start with T.38 - just try not to use it even in the case you wish to use it. You may reactivate it later.

Note the SPA112 is not state-of-the-art device even with the latest firmware. The abandoned PAP2T seems to be still more reliable and stable than SPA112. So try not to push the SPA112 to the edges.

And rate useful responses ;-)

 

For interest's sake, here's the SIP log from the device at the time of packet #1231 and #1245:

Aug 25 09:41:09 10.10.0.41  ################# CC: running... ############
Aug 25 09:41:53 10.10.0.40  ################# CC: running... ############
Aug 25 09:46:09 10.10.0.41  ################# CC: running... ############
Aug 25 09:46:52 10.10.0.40  ################# CC: running... ############
Aug 25 09:47:39 10.10.0.40 CC_eventProc(), event: CC_EV_SIG_CALL_ARRIVED(0x31), lid: 1, par: 0, par2: 0x407b9900
Aug 25 09:47:39 10.10.0.40 AUD_ccEventProc: event 49 vid 1 par 0x0 par2 0x407b9900
Aug 25 09:47:39 10.10.0.40 CC_lineBusy()
Aug 25 09:47:39 10.10.0.40 getCodecList line 0x22cc04 call 0x22cc08 clRemote: 0x0 bInbound 1
Aug 25 09:47:39 10.10.0.40 pconly: 0
Aug 25 09:47:39 10.10.0.40 ====== Local codec num 4 ======
Aug 25 09:47:39 10.10.0.40 0
Aug 25 09:47:40 10.10.0.40 130
Aug 25 09:47:40 10.10.0.40 8
Aug 25 09:47:40 10.10.0.40 18
Aug 25 09:47:40 10.10.0.40 ================================
Aug 25 09:47:40 10.10.0.40 [AUD]Get UCH node for AUD_LINE 1 0.
Aug 25 09:47:40 10.10.0.40 uchAllocateNode(), Node 0 allocated ret=0
Aug 25 09:47:41 10.10.0.40 [AUD]UCH node 0 allocated to AUD_LINE 1.
Aug 25 09:47:41 10.10.0.40 uchConnectEpToNode(), connecting EP FXS 2 to node 0
Aug 25 09:47:41 10.10.0.40 uchEnableNode(), Node 0 enbaled ret=0
Aug 25 09:47:41 10.10.0.40 CC_eventProc(), inf.strName = ONT GOVT
Aug 25 09:47:41 10.10.0.40 CC_eventProc(), inf.strPhone = 4163142695
Aug 25 09:47:41 10.10.0.40 callEventProcTable[0] is cepIdleProc
Aug 25 09:47:41 10.10.0.40 cepIdleProc(lid=1, call=0x22cc08, event=18(CC_EV_USR_ACCEPTCALL), par=0, par2=0x407b9900)
Aug 25 09:47:42 10.10.0.40 cepIdleProc(), lid=1
Aug 25 09:47:42 10.10.0.40 cepIdleProc(), line->sigProc(CC_CMD_ACCEPT)
Aug 25 09:47:42 10.10.0.40 cepIdleProc(), call->cinf.bAutoAnswer = 0
Aug 25 09:47:42 10.10.0.40 NEW_CALL_STATE(), call 0: old state = CC_CST_IDLE, new state CC_CST_RINGING
Aug 25 09:47:42 10.10.0.40 CID_initGen 8
Aug 25 09:47:42 10.10.0.40 [1]CID:CID_initGen() >>> offhook 0 delay 2200 phone 4163142695 name ONT GOVT
Aug 25 09:47:42 10.10.0.40 SLIC_startRing state 0 ts 0x25b18con 2000 off 4000 len 60000
Aug 25 09:47:43 10.10.0.40 [1]Ring cad event 0 pol 0
Aug 25 09:47:43 10.10.0.40 RTP_nextMediaPort(), port = 16476
Aug 25 09:47:43 10.10.0.40 RTP_nextMediaPort(), rc=16474
Aug 25 09:47:43 10.10.0.40 AUD_allocCallObj() call(0x277360)
Aug 25 09:47:43 10.10.0.40 [1:0]AUD ALLOC CALL (port=16474)
Aug 25 09:47:43 10.10.0.40 +++++ SIP_lineCcCmdProc CC_CMD_ACCEPT AUD_startRtpRx
Aug 25 09:47:43 10.10.0.40 [AUD]AUD_startRtpRx(0x277360) lid 1
Aug 25 09:47:44 10.10.0.40 Local loopback mode: None. Type: None.
Aug 25 09:47:44 10.10.0.40 Remote loopback mode: None. Type None.
Aug 25 09:47:44 10.10.0.40 RTP channel setup: udp_no_checksum 0, sysmmetric_rtp 0, tos 0xb8, cos 6, mlb 0.
Aug 25 09:47:44 10.10.0.40 uchConnectEpToNode(), connecting EP VoIP 0 to node 0
Aug 25 09:47:44 10.10.0.40 uchEnableEchoCan(), lid 1 EP 1 enable
Aug 25 09:47:44 10.10.0.40 uchEnableModemCall() Modem call state(0) not change
Aug 25 09:47:44 10.10.0.40 UCH sync parameter hold off time is 70
Aug 25 09:47:45 10.10.0.40 Set QoS succeed
Aug 25 09:47:45 10.10.0.40 uchSetGTD(), Disable GTD for FXS 2  
Aug 25 09:47:45 10.10.0.40 uchSetGTD() GTD state not change
Aug 25 09:47:45 10.10.0.40 uchSetDTMFMute(), ENABLE
Aug 25 09:47:45 10.10.0.40 cordless_start_rtp(), chan:0 remote ip:(null) port:0 local:16474 rx:1 ipt:0 ptime:30 bInMdmPasstru:0
Aug 25 09:47:45 10.10.0.40 Starting Rx only RTP.
Aug 25 09:47:45 10.10.0.40 Socket 0 bound to port 16474.
Aug 25 09:47:46 10.10.0.40 Remote IP/port: 0.0.0.0:0
Aug 25 09:47:46 10.10.0.40 set RTP_SESSION_OPT_DTMF
Aug 25 09:47:46 10.10.0.40 VAD = 0
Aug 25 09:47:46 10.10.0.40 RTP configuration:
Aug 25 09:47:46 10.10.0.40   audio_mode RTP_MODE_REC_ONLY, media_loop_level RTP_LOOP_LEVEL_NONE, dtmf2833numEndPakcets 3, opts 0x0
Aug 25 09:47:46 10.10.0.40   Codec: duration 30, rx_pt_event 101, tx_pt_event -1, tx_pt 0  
Aug 25 09:47:47 10.10.0.40          rx[0] 0 PCMU/8000, rx[1] 8 PCMA/8000, rx[2] 2 G.726/8000
Aug 25 09:47:47 10.10.0.40          rx[3] 18 G.729/8000, rx[4] 9 G.722/16000, rx[5] 0  
Aug 25 09:47:47 10.10.0.40   Jib: max 180ms, min 60ms, adapt 1
Aug 25 09:47:47 10.10.0.40 RTP Channel 0 is virgin: 1.
Aug 25 09:47:47 10.10.0.40 #### rtp seq number is 5067  
Aug 25 09:47:47 10.10.0.40 RTP session 0 started
Aug 25 09:47:47 10.10.0.40 [AUD]RTP Rx Up
Aug 25 09:51:09 10.10.0.41  ################# CC: running... ############
Aug 25 09:56:09 10.10.0.41  ################# CC: running... ############
Aug 25 10:01:09 10.10.0.41  ################# CC: running... ############ 

It conform with analysis results.

We can see the Call Control is aware of incoming call (CC_EV_SIG_CALL_ARRIVED) , call has entered RINGING Call STate (CC_CST_RINGING ). Also, rx-only early-audio channel has been established (Starting Rx only RTP and bellow).

There is nothing related to neither CANCEL nor INVITE ...

Unfortunately, there is nothing that can help us with identification of device freezing reason as well - so no new things nor advices will follow ;-)

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: