Just picked up the SPA112 on the weekend, and am setting it up with voip.ms. Several times a day, registration fails, resulting in no dialtone when the phone is picked up. It does eventually seem to reconnect, but I won't be able to use it if this continues. I followed all the instructions for setting it up as posted on the voip.ms wiki. It also looks like I could reduce the "Reg Retry Long Intvl" param in the Voice-SIP settings, but I feel this would just be masking the real issue.
I have also tried using the Cisco Support chat, but after sitting there for hours with no response, I thought I'd try this forum -- it looks like it is monitored by the support engineers. (I've captured my settings and log files, ready to send them in for analysis.)
Anyone else had this issue and been able to resolve it?
In the log file, how often is the registration supposed to re-occur? The service provider sends a setting and we do as well. The parameter that controls this on the ata, is called 'register expires'.
Also, is the service provider looking for a keep alive? We do provide for this, it's the NAT Keep Alive. You would enable the NAT keep alive for this.
I cannot find any indication in the log files (even though I have them set to debug level) as to why the registration fails. In fact, I don't see anything even indicating it failed.
In the Voice menu, I have set the following params:
NAT Mapping Enable: Yes
NAT Keep Alive Enable: Yes
Proxy and Registration:
Register Expires: 60
Proxy Fallback Intvl: 60
After my posting yesterday, I've also made the following changes in the SIP section:
SIP Timer Values (sec):
Reg Retry Intvl: 10
Reg Retry Long Intvl: 60
At least now after making these last changes, I haven't found an instance where there is no dialtone on picking up the phone. I will try to find out from my provider the correct value to set in the "Response Status Code Handling-Retry Reg RSC" parameter (which is currently empty). It would seem if I can get the right value there then the "Reg Retry Long Intvl" timer should not kick in very often.
Is there any way I can get the reason for the registration failure? It would seem that if I could get the reason, then it would be possible to determine what the root cause is.
P.S. I've already upgraded to the latest firmware, version 1.1.0(011).
Based on your registration settings, you should be seeing Reg sent out every 60 secs. Are you seeing that in the log file?
Make sure the SIP debug is set to All.
Further details at https://supportforums.cisco.com/docs/DOC-9862
Ok, I've followed your instructions to successfully capture the debug log. I reverted the latest settings I'd made last night in order to reproduce the problem, and waited for it to re-occur. It did, and I have captured a log file. How can I get the log file to you for analysis?
Hi nseto, I sent you my logs as well. I'd like to suggest for the engineers involved with these units to get a free account with voip.ms and les.net (I don't think they charge unless you buy a DID), and it takes a minute to set up. This way you can look at the registration problem directly, and hopefully at least determine if you can do anything about it.
I have opened a support case
You will find the logs there, I will also e-mail them to you directly.
As per Rob's note below, since the issue is reproducible, opening your own account with one of these providers might be the best way to troubleshoot this issue.
Obviousely, the problem is this buggy 1.1 SPA112 firmware which makes the shame on Cisco. PAP2T should have been kept :-(
Also posted here :
Nomado VoIP provider
Best Quality Rate of VoIP (datanews.com)
This looks like the same problem I have. See...
Of course my solution doesn't do anything to address the problem.
I will re-open my case and submit my log as well.
I agree -- it sounds the same.
I tried connecting my SPA112 directly to my cable modem (not going through my router), and it still lost the dialtone, so it definitely isn't a router issue.
To get the proper logs, put the ip address of your computer running slogsrv into Voice -> System -> Miscellaneous Settings -> Debug Server:
Make sure its level 3.
Don't use the Log module in the Administration section.
I attached my log file as an example. For the purposes of the log I set...
Register Expires: 60
Reg Retry Long Intvl: 30
To attach the log file, switch to advanced editor.
Message was edited by: Robert Kralik (log file removed)
Thanks for the logging tips. I had followed the links nseto had provided above, and managed to get some detailed logs with the failures. (They don't look quite as nice as yours with the date/timestamps, but do have more detailed info on each transaction.)
From what I can see, both your logs and mine indicate registration is failing on authorization. It fails 3 times, then the long timer kicks in (default of 1200s). Question I have, why does authorization suddenly start failing?
Another thing I see (that my provider also pointed out) was a number of "489 Bad event" entries in the log, and they'd suggested that often indicates a NAT issue. But I've got the NAT Mapping & Keep Alive both set to "Yes". And also, I still lost dialtone when connecting directly to my cable modem (taking my router completely out of the loop).
Anyways, I was able to get through to Cisco support today and open a case, referring them to this thread. They suggested the issue had to do with the expiry time not matching my voip provider's registration expiry time. So I'm currently running some tests with it set to 100 to see. I am skeptical that is the real issue, given the clues in the log files indicating the bad events and authorization failures. But I'm willing to give it a shot.
Will keep you posted.
Well, I think I corrected my "489 Bad event" issue. I changed the "NAT Keep Alive Msg" from $NOTIFY to $REGISTER. My provider's server didn't like the $NOTIFY for the NAT keep-alive event.
So maybe that was the problem? If the keep-alive didn't get through, then perhaps that eventually results in an auth failure? Would be curious to know if you are also getting those 489 messages in your debug logs.
The other thing I've done is enable and set the "Use auth ID" in the Voice-Line 1-Subscriber Information section. Although I am betting that is redundant, so I will probably revert that change if this keep alive change works.
I'll let it go overnight, and see how things look in the morning.
My providers instruction suggests NAT mapping and keep-alive remain off, so that probably explains why I'm not seeing the 489 messages. The only responses I'm seeing from my provider are "100 Trying", "401 Unauthorized", and eventually "200 OK".
I get three or four 401's before the spa122 fails and goes into reg retry long invl.
I am also doubtful the issue has to do with the local reg expiry not matching the providers'. Following the reg retry long intvl (un-registered state, ie. no expiry to factor in), I'm still seeing a couple 401's occurring.
The logs aren't giving me any clue, I'll send them to Cisco tomorrow to take a look.
I got a voip.ms account to play around with and was getting the same loss of registration out-of-the-box. But since I changed $NOTIFY to $REGISTER in NAT Keep Alive Msg like you suggest, it's been re-registering more successfully than before. I had to edit this to add that I was a bit too optimistic and it's once again in reg retry long intvl.
Message was edited by: Robert Kralik
Changing the message from $NOTIFY to $REGISTER seems to have improved things, but not totally solved the issue. My wife picked up the phone yesterday (after I'd made the change), and again there was no dialtone so I am assuming it lost the registration again.
I'm going to try to collect some more logs today.
I'm getting to the point where I am thinking this may not be worth it in the end. In addition to this issue, my wife has said the calls she has been able to make successfully were of poor quality (although I'd specified the G711u codec, and the premium service), so she's basically hung up and redialed on our old line, which isn't saving us anything
I initialy set the spa122 up with my provider's general access proxy and calls had a bit of a bubbles under water quality plus slight delay despite using g7.11u, and there were no packet loss or errors noted. However, since my provider granted permission to use their exclusive local access proxy, the quality is brilliant with no perceptible delay, and I'd say it's equivalent to Shaw Digital phone.
The details I was able to gather is the general access proxy is on the east coast and serves lots of connections, while the local access proxy is in my city and I'm sure serves much fewer connections.
I switched to point to the chicago.voip.ms server (from the toronto2.voip.ms server) this morning, reverted back to $NOTIFY (as that's what voip.ms support told me it should be), and so far haven't lost the registration. According to voip.ms, the chicago server has the latest software on it. I'll give it some more time, though, before I am convinced...
Did the switch over work? When the keepalive is sent to the proxy, does it respond with 200 ok instead of the 489 bad event?
Normally the unit will send Register and the proxy will send 200 Ok. In this particular case, the proxy wants auth as well, so it sends 401 Unauth and so the unit replies with the Register with the Auth added.
The reg expires is set to 60 secs, so there should be a reg request from the unit every minute to the proxy. You can turn on the timestamp option with the slogsrv by adding the -t to it. For various options available, type in slogsrv -h for the help.
If the register is sent out but there's nothing coming back from the proxy, that's when the keepalive may be help (as the port may be closed on the router after some time, this keepalive keeps the port open).
Still getting 489 Bad Event messages. Tried enabling Auth ID, but it made no difference.
In spite of that, so far so good on my side, no registration failures since switching to Chicago. Will continue to monitor.
"In this particular case, the proxy wants auth as well, so it sends 401 Unauth and so the unit replies with the Register with the Auth added."
Is there a way to make the device send the correct event to the proxy (i.e. register with auth) the first time, to eliminate the 489's?
Quick update, have had a few auth failures this morning, but all recovered within 10 seconds and re-registered successfully.
AuthID, isn't the same as the proxy requesting auth during a registration.
If the proxy needed AuthID in the first place and you didn't have it, the unit would never have been able to register. You would have gotten something like a 403 registration error.
A 401 Unauth (proxy does this which is optional) is always sent in response to a regular registration.
Unit would send a Registration, proxy either sends 200 OK or 401 Unauth. If it sends back 401 Unauth, then the unit sends another Registration with the Auth in the Registration.
You might as well turn off the Notify keepalives if the proxy is only sending 489 in response to that.
toronto.voip.ms is running Asterisk 126.96.36.199
chicago.voip.ms is running Asterisk 1.4.32
The most interesting item that comes up in the changelog http://downloads.asterisk.org/pub/telephony/asterisk/old-releases/ChangeLog-1.4.32 is:
2009-07-10 16:23 +0000 [r205804] David Vossel
* channels/chan_sip.c: SIP registration auth loop caused by stale nonce If an endpoint sends two registration requests in a very short period of time with the same nonce, both receive 401 responses from Asterisk, each with a different nonce (the second 401 containing the current nonce and the first one being stale). If the endpoint responds to the first 401, it does not match the current nonce so Asterisk sends a third 401 with a newly generated nonce (which updates the current nonce)... Now if the endpoint responds to the second 401, it does not match the current nonce either and Asterisk sends a fourth 401 with a newly generated nonce... This loop goes on and on. There appears to be a simple fix for this. If the nonce from the request does not match our nonce, but is a good response to a previous nonce, instead of sending a 401 with a newly generated nonce, use the current one instead. This breaks the loop as the nonce is not updated until a response is received. Additional logic has been added to make sure no nonce can be responded to twice though. (closes issue #15102) Reported by: Jamuel Patches: patch-bug_0015102 uploaded by Jamuel (license 809) nonce_sip.diff uploaded by dvossel (license 671) Tested by: Jamuel Review: https://reviewboard.asterisk.org/r/289/
If this is the issue with the spa1xx devices, as it seems to be, then any provider running pre-1.4.26 version of Asterisk will be affected.
I started logging my connection to the chicago.voip.ms server that you mentioned at 1:44 this after noon. It's interesting that the only times that it failed and went into reg retry long intvl occurred at 2:43, 3:43, and 4:43. I'll keep the log running to see if this pattern continues.
New edit: Actually the chicago.voip.ms server has not failed. I just realized the instances of failure are ocurring on line 1, my les.net connection, which is set to expire every hour. Voip.ms chicago server on line 2 is holding strong.
Message was edited by: Robert Kralik
I set up a test to confirm that indeed the spa1xx devices are interacting with the pre-1.4.26 Asterisk registration deadlock condition I mentioned above [r205804]. I reproduced this failure on my LAN with the spa122 and Asterisk 188.8.131.52 running on Linux with netem delay introduced on the network interface. The failure starts to occur somewhere between 40-45ms of delay (40-45ms ping time as seen from the spa122) and greater. As expected, once I applied the 1.4.26 patch to Asterisk, the failure no longer occured.
As noted in https://supportforums.cisco.com/message/3616349
The T1 timer is 10x shorter than what's entered in the interface. This appears to be the reason why the spa1xx are sending the second registration request so quickly and thus interacting with the deadlock condition mentioned above.
If you can't upgrade your Asterisk server (e.g. hosted VoIP), Cisco now has a beta load of firmware available which solves this problem.