Re: STUN problem

csgnm · ‎06-16-2003

Hi,

We implement a FEPtoFEP connection with basic STUN (TCP encapsulation). But the connection is not stable. From Netview logging you can see that de FEP-FEP connection went down and recovered later. The frequency that this happens varis from 2 minutes to 40 minutes.

the router confuguration is:

stun peer-name 1.1.1.1

stun protocol-group 1 basic

interface loopback 0

ip address 1.1.1.1 255.255.255.255

interface serial0/0

mtu 2104

encapsulation stun

nrzi-encoding

clockrate 2000000

stun group 1

stun route all tcp 2.2.2.2

Can anyone help me please ?

- is it the MTU size (the value 2104 is apparently the default size, I haven't configured this value)

- could it be the combination of clockrate and cable-length ? the X.21 cable from the FEP is 15 meters and a DCE cable of 1,5 meters.

regards,

Phi

dixho · ‎06-16-2003

There is a bug to change the default MTU of STUN interface from 1500 to 2104. I think that the bug ID is CSCdz28817. However, changing MTU size on a STUN interface should not caused the SDLC link disconnected.

Based on the symptom, the most common cause is that the FEP is sending a frame larger than the STUN inteface can handle. Thus, the router drops the I-frame. Eventually, FEP detects a gap in the SDLC sequence number and disonnect the SDLC link.

Do you know all applications running on the FEP-FEP link? If so, do you know the mode used by the applications? Please check the mode table and find out the RU size used. Please pay special attention to any APPC applications.

If you cannot find any mode entry using RU size larger than 1500, please consider to increase the STUN interface MTU to 5000. As the MTU for 4M token ring is 4K, most implementations do not use RU size larger than 4K.

Please be aware that changing MTU value on RSP (i.e. 7000 and 7500 routers) will cause all interface to reset.

If changing MTU does not fix the problem, you need a NCP line trace or SDLC sniffer trace.

csgnm · ‎06-17-2003

Thanks Kasing ! Your explanation and suggestion is very encouraging !

I have 1 question about the ADDRESS specification when using STUN SDLC. Maybe you or someone else can help me with this:

when using STUN SDLC for a FEPtoFEP connection you have to specify the ADDRESS. And the documenation mentions that this is the relative position of the line in the NCP configuration. Is this regardless for any PUTYPE ? Or do you have to count from the line with PUTYPE4 ? Is this value the same as the first 2 byte of SDI or NDI when turning "sdlc stun packet" on ? Can you determine these values with debugging on the router ?

Thanks,

Phi

dixho · ‎06-17-2003

Only count lines with PUTYPE4.

Please go to the following URL on how to get SDLC address from debug stun packet:

http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122sup/122debug/dbfser.htm#1018810

I always use debug sdlc packet. The first byte (i.e. the first 2 hex-digit) is the SDLC address. Please be aware that SDLC address is negotiated during the XID exchange. When the PU4s are exhanging XIDs, they use FF (broadcast SDLC address). Thus, please ignore any debug messages starting with "FFBF24". After you see "??93" or "??DF", the ?? in the debug message is the SDLC poll address. Also, PU4-PU4 connections use echo addressing. For example, FEP on one side uses SDLC address 01. The FEP on the other side

uses SDLC address 81. Use SDLC address 01 on the router.

By the way, you do not need to define SDLC address because you use STUN Basic.

One final comment. Please be ready to turn off "debug SDLC packet." I usually type "un all" without hitting enter. Once I see "??93", I hit enter right away. Letting "debug sdlc packet" running will freeze the router.

csgnm · ‎06-17-2003

I think in our situation BASIC STUN is sufficient because delay and bandwidth is not an issue in our network currently.

But when delay and bandwidth is an issue then we might need SDLC STUN, because this one offers the advantage of local-ack (is what I understand from the documentation).

Do I understand correctly that the SDLC addresses can change because these are negotiated ? So when you do an IPL the addresses might change and your SDLC STUN is not working anymore until you change the SDLC addresses on the routers ?

many thanks,

Phi

dixho · ‎06-17-2003

Correct. If you add or remove any PU4 links, the SDLC address may change after an IPL.

One more comment about local-ack. Cisco routers only support modulo 8 on SDLC links if local ack is configured. (i.e. When STUN SDLC-local-ack, STUN SDLC-TG, or DLSw is used) If you want to configure local-ack on Cisco routers, please make sure that modulo 128 is not configured on the PU4 links.

csgnm · ‎06-22-2003

MTU is set to 4400 and we are in production with this implemetation since yesterday. So far no problem.

Thank you very much for your help !!!

Btw, on the serial interface with STUN encapsulation, when this was set to 2104, sometimes we got input errors. And the amount of input errors was the same as the amount of giants (indicating that this was a MTU size problem).

csgnm · ‎06-23-2003

I thought it is working but it is not. It has worked for 20 hours but after then the FEP-FEP connection disconnected and recovered later for several times. MTU size is now set to 5000.

But this time we haven't seen input errors but we see giants packets on the serial interface. Do you have an explanation for this ?

What is the disadvantage of setting the MTU size very high ?

thanks,

Phi

dixho · ‎06-23-2003

It looks like that you are running into a MTU problem. Do you know the application being run on the FEP-FEP link when the link disconnected?

The disadvantage of high MTU comes when re-transmit large packet. If there is a corruption of a frame, the end station has to re-transmit the whole packet again. In the old days, the end stations re-transmit the packets because of a dirty line (i.e. noise).

In your configuration, the FEP is connecting to a router through back-to-back cable. The error rate is close to zero. The re-transmission only occurs when there is too much delay introduced by STUN (i.e. the TCP/IP connection).

By the way, what is the IOS version of the router? There are a number of TCP bugs which may cause STUN peer to drop:

http://www.cisco.com/warp/customer/770/fn20673.shtml

Even though the field notice in the above URL only mentions DLSw, the field notice should apply to STUN as well.

Also, do you have a chance to put a WAN sniffer and confirm that the problem is not casued by SDLC timeouts? What is the value for REPLYTO?

csgnm · ‎06-23-2003

Thanks for your prompt answer !

The application being run on that moment is hard to determine. The only strange thing we saw this time on the serial interface is the amount of giants frames. And I am pretty sure that the problem is caused by giants. But strange enough the amount of input errors is zero (with MTU 4400). (I thought the amount of input errors is the sum of giants, CRCs ... so at least the amount of input errors should be equal to the amount of giants).

Cisco documentation recommends to use a maximum MTU size of 4400 but this is apparently not enough for our situation. (Delay in our IP network is minimal). Is the value of 4400 has to do with retransmission ?

We use IOS version 122-11.T5 (2651XM platform)

Do you know any STUN situations that a MTU size bigger than 5000 is necessary ? (I am looking for a working biggest MTU size posible ..:-)

dixho · ‎06-23-2003

Without knowing the application, it is tough to say what is enough. You have to find out the source of the packet. For 4M token ring, the MTU for SRB traffic is 4472 bytes. If you have a 16M token ring all the way, it can bump up to 16K!