Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshooting A11 PDSN and Application Unreachable alarm on Alcatel-Lucent RNC PCF

Alarms such as Application Unreachable or PDSN Unreachable are often reported to Starent for Alcatel-Lucent PCFs. These alarms are triggered on the PCF to indicate EITHER that no A11 Registration Reply (RRP) was received for a specific set of repeated A11 Registration Requests (RRQ) for a session OR that the RRPs for a session were repeatedly one of a number of specific Deny codes (not all codes, more on this below). The term “unreachable” can be deceiving in the case of a Deny reply, because in fact the PDSN was reachable – it replied and the PCF received the reply, but it simply did not like the reply. In both cases, the behavior of the PCF is the same – it retries 3 times in 1 second intervals for a total of 4 attempts.

Terms/Explanations:

Traffic Processor (TP) – each TP will generally communicate to two PDSNs, typically handling 1000 sessions per PDSN.  There are multiple types of TPs: 690, 750 and UTP.

Universal TP – higher processing power than a normal TP

Application Processor (AP) – each AP typically has 2 TPs, but it can have 1 or 2 UTPs.

RNC Frame – each RNC frame typically has 8 APs, a combination of APs, TPs, or UTPs. There are two types, R1SR and UNC.

Important note: 1X and EVDO RNCs are built on completely different hardware and software platforms and the error logging is also very different

Commands/Logistics/Troubleshooting on the PCF:

- show pdsn – script command which can be run from RNC frame and shows the number of RP sessions on each TP and each PDSN.  It also shows the state of each PDSN for every TP, such as PDSN Application Reachable, PDSN Application Unreachable, Ping Unreachable, Initial, or Try_Out state

- can’t just monitor a specific subscriber (like with the PDSN) but must turn on all A11 on each TP. Under higher load, it is better to obtain a capture trace from network of A11 messages.

- all the logging is written to a single file that grows large fairly quickly as it contains information from multiple RNC frames. Most of the logging is cell-side and not useful for troubleshooting PCF-PDSN related issues, so, one needs to run grep/search commands to mine the information that is relevant for the issue being troubleshot. Timestamps/log # for each log entry will show up on a separate line and are not relevant and can be ignored (see examples below).

In the following example, the alarm was triggered due to the repeated Deny of Unspecified Reason (0x80 = 128). Note the following:

TP #

PDSN IP address (the TP address is not displayed in logging to conserve space, as the TPs IP address is obviously known by the TP)

Error code

Date/time

**03 REPT: EVDO: TP 326, ALARM
    PDSN 10.209.29.132: APPLICATION UNREACHABLE
    UNSPECIFIED REASON(80H)-RETRIES EXHAUSTED, Timer=1000,Retries=3
    PERCEIVED SEVERITY: MAJ, CAUSE: Loss Of Signal
    2010-01-04 09:03:42 REPORT #000001 FINAL

01/04/10 09:03:31 #803463

03 REPT: EVDO: TP 326, ALARM CLEARED
    PDSN 10.209.29.132: APPLICATION UNREACHABLE
    UNSPECIFIED REASON(80H)-RETRIES EXHAUSTED, Timer=1000,Retries=3
    2010-01-04 09:03:42 REPORT #000001 FINAL

01/04/10 09:03:31 #803468

Each TP maintains a state for each PDSN. If an alarm triggers, it marks the PDSN out of Service, and though it will not assign NEW calls to that PDSN, existing calls’ protocol/message exchange will continue, and, if a single transaction is successful, then the alarm is cleared. Otherwise, it waits 15 minutes and then will try sending new calls again, and if successful, the alarm will be cleared, otherwise the timer will be reset again.  Normally, because of the call volume, alarms tend to clear quickly unless there are no existing sessions OR there is a true reachability issue.

There are various deny reason codes associated with Deny RRPs. In the case where the PDSN discards RRQs, there is NO reply, and so there are no standards codes like there are for Denies, but the PDSN still counts them overall and stores its own reason codes for them as well. The PDSN command “show rp statistics” displays counters for all the reason codes by category.  Below is a snippet from the output of that command. For RRQ Deny, the actual error codes can be found in the PDSN Administration Guide, PDSN Service Reply Codes, and have been included here in parentheses for your convenience.

Note that there are sub-error codes listed underneath some of the Deny reason codes AND for Discards. These codes are NOT part of the 3GPP2 specifications, but rather are Starent/Cisco specific codes that are NOT included in the RRP (for Denies) but are counted in the RP interface counters. There are respective bulkstats variables for these (also listed below in the respective sections in italics), and these may need to be retrieved during troubleshooting to determine more specifically when and why the Denies/Discards occurred. Even for a number of these sub codes, further investigation involving collecting monitor subscriber or logging monitor (if the subscriber experiencing the issue can be anticipated) output, or debug logging for specific facilities (if the issue is random) as directed by engineering, may be necessary.

Note: some of the sub-error code counters may not add up properly, and this is being investigated by Starent/Cisco.

Registration Request Denied:

  (80H/128) Unspecified Reason:       10674         (81H/129) Admin Prohibited:        4           

  (82H/130) Insufficient Resources:   114           (83H/131) PCF Failed Auth:         0           

  (85H/133) Identification Mismatch:  0           

  (86H/134) Poorly Formed Request:    0           

  (88H/136) Unknown PDSN Address:     340         

  (89H/137) Reverse Tunnel Unavail:   0           

  (8AH/138) Reverse Tunnel Required:  0           

  (8DH/141) Unrecognized Vendor Id:   0           

  Session Already Closed:   0           

deny-unspec, deny-adminprohib, deny-noresource, deny-auth, deny-idmismatch, deny-badrequest, deny-unknownpdsn,

deny-revtununavail, deny-revtunreq, deny-unrecogvend

RRQ Denied - Insufficient Resource Reasons:

  No Session Manager:       0             No Memory:               0           

  Session Managers Retried: 0             Input-Q Exceeded:        0           

  Policy Rejected:          0             Session Manager Rejected:71          

  A11 Manager Rejected:     0           

deny-noresource-a11mgrrej, deny-noresource-inputq, deny-noresource-nomem, deny-noresource-nosessmgr,

deny-noresource-policy, deny-noresource-sessmgrrej, deny-noresource-sessmgrretried

RRQ Denied - Poorly Formed Request Reasons:

  Session Already Dormant:  0             Already Active:          0           

  Airlink Setup Absent:     0             Mismatched CoA/Src addr: 0           

  Packet Too Short:         0             Packet Too Long:         0           

  Invalid Field Length:     0             Invalid Flags:           0           

  HOA Non Zero:             0             Invalid SSE:             0           

  Invalid VSE:              0             Invalid Auth Extn:       0           

  Invalid Unknown Extn:     0           

  Other Reason:             0           

deny-badrequest-alractive, deny-badrequest-alrdorm, deny-badrequest-authextn, deny-badrequest-fieldlen,

deny-badrequest-flags, deny-badrequest-hoanonzero, deny-badrequest-miscoaaddr, deny-badrequest-other,

deny-badrequest-pkttoolong, deny-badrequest-pkttooshort, deny-badrequest-setupabsent, deny-badrequest-sse,

deny-badrequest-unkextn, deny-badrequest-vse

RRQ Denied - Unspecified Reasons:

  Null Packet Received:     0             LifeTime Zero In Initial RRQ: 0           

  Session Manager NotReady: 0             Closed RP Handoff In Progress:0           

  No Airlink Setup:         0             Intra PDSN Handoff Triggered:  2731  

deny-unspec-crphandoff, deny-unspec-intrahandoff, deny-unspec-lifezero, deny-unspec-noairlink,

deny-unspec-notready, deny-unspec-nullpkt

     

RRQ Denied - Overload/Congestion Control:

  Admin Prohibited(reject): 0             Unknown PDSN (redirect): 0           

Registration Request Discard Reasons:

  No Session Manager:       0             No Memory:               0           

  Malformed:                0             Auth Failure:            0           

  Session Manager Dead:     0             Admin Prohibited:        0           

  Session Manager NotReady: 0             Unknown PDSN:            0           

  Internal Bounce Error:    0             Input-Q Exceeded:        0           

  Max Sessions Reached:     0             Invalid Packet Length:   0           

  GRE Key Changed:          0             Overload/Congestion:     0           

  Session Not Created:      9303          Unknown Extn:            0           

  Unhandled RRQ message:    0             HO without Airlink setup:0           

  Reply send failed for code:

    Accept:                 0             Unspecified Error:       0           

    Poorly Formed Request:  0             No AirLink Setup:        0           

    Session Already Closed: 0           

  Dropped During Handoff:   0             Misc Reasons:            0      

rrqdiscard-adminprohib, rrqdiscard-authfail, rrqdiscard-bounce, rrqdiscard-grekey, rrqdiscard-inputq,

rrqdiscard-invlen, rrqdiscard-maxsess, rrqdiscard-misc, rrqdiscard-nomem, rrqdiscard-nosessmgr,

rrqdiscard-overload, rrqdiscard-smgrdead, rrqdiscard-smgrnotready, rrqdiscard-unkpdsn

For reference, here is an example of monitor subscriber output for a RRP Deny “Mobile Node Failed Authentication” (“PCF Failed Auth” in counters)  83H, which was due in this test case to the SPI not matching, just to give an idea of what a RRP Deny looks like:

<<<<OUTBOUND  21:24:20:030 Eventid:29001(3)

A11 Tx PDU, from 192.168.50.150:699 to 192.168.50.200:699 (41)

        Message Type: 0x03 (Registration Reply)

                Code: 0x83 (Mobile Node Failed Authentication)

            Lifetime: 0x0078

        Home Address: 0.0.0.0

  Home Agent Address: 192.168.50.150

      Identification: 0xCF3AEA0E

      Identification: 0x000D62E2

Session Specific Extension Follows:

      Extension Type: 0x27

              Length: 0x13

       Protocol Type: 0x8881 (Unstructured Byte Stream)

                 Key: 0x00000000

            Reserved: 0x0000

   MN Session Ref Id: 0x0001

          MN Id Type: 0x0006 (IMSI)

        MN Id Length: 0x06

  Odd Even Indicator: 0x00

     Identity Digits: Hex: <00 00 00 00 00 01 02 03 04 05 >

Not all RRP Deny codes result in Retrying with RRQ as described earlier:

Deny code “Insufficient Resources”  82H results in immediate unreachable alarm in order to minimize overloading the PDSN with retries, hence why you see no retries in this case:

**31 REPT: EVDO: TP 325, ALARM

    PDSN 10.209.29.132: APPLICATION UNREACHABLE

    INSUFFICIENT RESOURCES(82H)

    PERCEIVED SEVERITY: MAJ, CAUSE: Loss Of Signal

    2010-01-04 13:31:27 REPORT #000001 FINAL

01/04/10 13:31:16 #877086

31 REPT: EVDO: TP 325, ALARM CLEARED

    PDSN 10.209.29.132: APPLICATION UNREACHABLE

    INSUFFICIENT RESOURCES(82H)

    2010-01-04 13:31:27 REPORT #000001 FINAL

Deny code “Registration Identification Mismatch”  85H, for example:

<<<<OUTBOUND  21:42:30:821 Eventid:29001(3)

A11 Tx PDU, from 192.168.50.150:699 to 192.168.50.200:699 (63)

        Message Type: 0x03 (Registration Reply)

                Code: 0x85 (Registration Identification Mismatch)

            Lifetime: 0x0078

        Home Address: 0.0.0.0

  Home Agent Address: 192.168.50.150

      Identification: 0xCF3AEE16

      Identification: 0x000A9B17

Results in the PCF resending an RRQ that matches the timestamp resulting in an RRP Accept. (This could be addressed on the PDSN with the spi config line “timestamp-tolerance” keyword)

Deny code “Administratively Prohibited”  81H, for example:

<<<<OUTBOUND  21:54:14:335 Eventid:29001(3)

A11 Tx PDU, from 192.168.50.150:699 to 192.168.50.200:699 (63)

        Message Type: 0x03 (Registration Reply)

                Code: 0x81 (Administratively Prohibited)

Results in the PCF simply giving up because it is being told to do so. (This could be implemented on the PDSN with the newcall policy reject feature.)

Deny code “Unknown PDSN Address”  88H, for example:

<<<<OUTBOUND  21:46:57:390 Eventid:29001(3)

A11 Tx PDU, from 192.168.50.150:699 to 192.168.50.200:699 (63)

        Message Type: 0x03 (Registration Reply)

                Code: 0x88 (Unknown PDSN Address)

            Lifetime: 0x0078

        Home Address: 0.0.0.0

  Home Agent Address: 1.1.1.1

      Identification: 0xCF3AEF5C

      Identification: 0x000432B0

Results in the PCF directing a new RRQ to the specified PDSN address in the Home Agent Address field (1.1.1.1 in this example). (This could be implemented on the PDSN with the newcall policy redirect feature.). This could also happen if the Home Agent Address in the RRQ doesn’t match the destination IP address of the RRQ packet itself.

In the following example, the alarm was triggered by no RRP received after sending the RRQ 4 times. So, unlike in the previous examples, there is no reason code displayed because nothing was received, but the behavior of the PCF is the same, to retry 3 times:

01/28/10 20:36:27 #660163
**36 REPT: EVDO: TP 331, ALARM
    PDSN 10.211.28.132: APPLICATION UNREACHABLE
    RRQ RETRIES EXHAUSTED-Timer=1000,Retries=3
    PERCEIVED SEVERITY: MAJ, CAUSE: Loss Of Signal
    2010-01-28 20:36:44 REPORT #000001 FINAL

01/28/10 20:36:30 #660189
36 REPT: EVDO: TP 331, ALARM CLEARED
    PDSN 10.211.28.132: APPLICATION UNREACHABLE
    RRQ RETRIES EXHAUSTED-Timer=1000,Retries=3
    2010-01-28 20:36:50 REPORT #000001 FINAL

To troubleshoot, you would want to look at the RP counters for any RRQ Discards and sub reason codes during the incident period. Here are some example PDSN logs for facility a11mgr showing discards:

[a11mgr 9854 unusual] [4/0/512 <a11mgr:1> a11mgr_rp.c:1881] [context: source, contextID: 3]  [software external user] A11Mgr-1 Rcvd RRQ from PCF <10.80.90.13>, PDSN service addr <10.80.90.15>, MSID <111115054016914>, Lifetime <1800>, rp_status <29>; RRQ discarded for reason <Discarded old queued RRQ waiting for release from sessmgr>

[a11 29812 unusual] [3/0/819 <sessmgr:11> rpservfsm.c:2375] [software internal user] RP driver silently discarding RRQ from PCF <10.246.163.45> to PDSN service addr <10.64.126.20>, MSID <311274056451750>; call dropped while about to handoff.

If there are no discards matching the period in question, then the packets were dropped in the network, and either the PDSN never received the RRQ, or the RRP never made it back to the PCF. Troubleshoot dropped packets, firewalls, Access Control Lists, and the like in the network, with the help of equipment logging and packet captures as necessary. Note that A11 RRPs are sent with DSCP = 0x00 which is best effort instead of, for example, 0xB8 (Expedited Forwarding), and it has been seen that such packets could be dropped during traffic spikes in some networks.

Finally, there are also entries labeled PING UNREACHABLE:

**12 REPT: EVDO: TP 307, ALARM

     PDSN 10.211.28.132: PING UNREACHABLE

     PERCEIVED SEVERITY: MAJ, CAUSE: Loss Of Signal

     2010-01-27 22:12:49 REPORT #000001 FINAL

  18 REPT: EVDO: TP 307, ALARM CLEARED

     PDSN 10.211.28.132: PING UNREACHABLE

     2010-01-27 22:19:09 REPORT #000001 FINAL

PDSN Ping Unreachable is generated when ping monitoring of PDSN has been provisioned. The PCF will send ICMP PING packets every second to the PDSN, and if 3 pings are missed it will generate this alarm.  When in this state, no new RP sessions will be setup on the PDSN since this indicates a connectivity issue. Pings will continue to be sent and a successful reply will clear the alarm.

Imported from Starent Networks Knowledgebase Article # 11060

Version history
Revision #:
1 of 1
Last update:
‎01-24-2012 09:57 AM
Updated by:
 
Everyone's tags (1)