Controller E1 and Serial Interfaces - Troubleshooting

Unanswered Question
Dec 4th, 2007

Is there any information or block diagram about how is the

signal routed or relationship between controller <-> serial , how

the error control works in timeslots mode or unframed, for example,

someone told me that setting it in unframed mode, the error control

is done on the serial, but if I configure it as timeslots the error

control in done at the controller.

I'm interested in knowing this because I'm studying a case

of a flap on a interface and trying to understand where can be the

problem, so because of the lack of technical information about the

controller <-> serial relationship sometimes is difficult to find the

problem, for example, I found an eigrp flap on the serial X (retry time exceed) that I have a suspect that maybe it could have problems (I some the interface resets on the counters) , but on the log of the router I see only the flap of the neighbor but not a flap on the serial or controller, is possible that setting the keepalive lower, the serial could sense the problem going down, but if there is a problem at electrical level, is possible that I don't see a flap of the controller?...

how is the reaction time of the controller, is more reactive the serial

modifying the keepalive?...Which counters internally the controllers take in

account to go down?...

About the counters (sh controllers e1), is another history, for know I'm focused in understand how is the data flow internally from controller to the serial and if it's possible to have flaps that I will only see on the serial and not on the controller.

Sorry about to much questions and I will apreciate so much

your help!

Best Regards

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Paolo Bevilacqua Tue, 12/04/2007 - 07:37

Yes you can have flaps only on interface and not on the controller.

This can happen for multiple reasons, for example the circuit becomes interrupted somewhere in the middle, but the "tails" are OK, so you will see no errors at controller level.

Many SP don't care about reflecting the true state of the circuit at E1 level, due to the way their network is arranged.

In E1 each frame is normally covered by CRC, plus there are other error indications. The pro E1 performance monitoring is done according to ITU standards, so you see the "severe error count" in the 5 minuts, last 24 hours, etc.

Where there are this kind of problem you need to have the circuit fixed by the SP, changing timers and keepalives only hides the problem, but never gives you an usable circuit.

Hope this helps, please rate post if it does!

m.musacchio Tue, 12/04/2007 - 08:15

Thanks for your answer, actually I'm the SP and I'm having problems with the sdh team, they are telling that is all ok and I know that the problem is in the SDH backbone and has to be demostrated. I would like to understand the logic between the controller and the serial and in which conditions the controller put down the controller itself and there is no document available that explain this logic.

Paolo Bevilacqua Tue, 12/04/2007 - 09:15

Hi,

it's very simple, as soon any alarm (RAI, LOS, etc) is detected, the controller goes down.

When no alarms, controller goes up. Even if a lot of slips, frame errors, etc, the controller will stay up anyway.

The performance counters, that you can also collect via SNMP (E1 MIB) keep track of errors (slips, CRC, LOS, etc).

When controller is up, all interfaces under it will go up initially, but will go down if keepalives are missed. When the interface goes down, you will see "state is reset". This means the interface is resetting hardware in the attempt to recover the circuit.

The router is very reliable at detecting errors. However, if your SDH team doesn't believe the router, have them put in place an E1 tester with loopback in one side and continuous bert testing on the other, for a minimum of 24H. If the circuit is defective, you should see the same errors just like with the router.

m.musacchio Wed, 12/05/2007 - 05:42

Thanks again for your time.

Do you know if it's true that when you set the controller in timeslots, the error control is done at the controller level and the serial don't see anything?... but if it's unframmed the controller don't do error control and you can see the errors in the serial?...

We had just done a error control with a tester and showed no errors, the problem is that the origin of error is so random that could happens once in a month or in another month a couple of times, I think it's a small clock slip that I can't see in the counters, but the only sign is a small couple of interface resets once o more in a month...

We have changed the routers and controllers and the problem is the same in the same sdh circuit, but the other circuits are ok. Again I have the suspect at 95% that is a problem in one of the sdh backbone switches and trying to demostrate it... crazy network problem! :)

Paolo Bevilacqua Wed, 12/05/2007 - 10:47

Hi,

Yes, in theory with framed E1 and crc you should see errors at controller level and not interface.

And yes, with unframed there is no crc check at controller level.

Now, what router and interface do you have the problem with? do you see slips under show controller ?

Also, are using a dce with the circuit? it is unframed ?

m.musacchio Thu, 12/06/2007 - 02:56

Thanks!, this is a router series 7500 (IOS 12.1.15 - It's old but it's impossible to upgrade for the moment) with diverse PA-MC-8E1, the serial is directly connected to the MSH-11 (sdh) through a balloon, balloons tested and changed more than once, even the cables.

We found even the same problem with a PA-E3, the same type of error: a couple of interfaces resets/carrier transition in a random way.

The only error I see is a couple of interfaces resets, all the other counters (all intervals in the sh controllers) are clean.

The controller is configured as channel-group timeslots 1-31 (with crc).

It happens only when the interface is with traffic.

m.musacchio Thu, 12/06/2007 - 04:59

The traffic is more and less 900Kbps max, the E1 interface is at 50%, here is a sh interface of the PA-E3 that have the same problems of the E1:

Note: The drops you see are probably for the delay between the carrier transitions and the "hole" created in that moment.

Serial5/1/0 is up, line protocol is up

Hardware is cyBus E3 Serial

Internet address is 172.28.27.50/30

MTU 4470 bytes, BW 1000 Kbit, DLY 60000 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation HDLC, crc 16, loopback not set

Keepalive set (2 sec)

Restart-Delay is 0 secs

Last input 00:00:00, output 00:00:00, output hang never

Last clearing of "show interface" counters 03:02:14

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 235

Queueing strategy: fifo

Output queue :0/40 (size/max)

5 minute input rate 0 bits/sec, 0 packets/sec

5 minute output rate 0 bits/sec, 0 packets/sec

3408410 packets input, 752343201 bytes, 0 no buffer

Received 0 broadcasts, 0 runts, 0 giants, 0 throttles

0 parity

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort

3579830 packets output, 884070939 bytes, 0 underruns

0 output errors, 0 applique, 0 interface resets

0 output buffer failures, 0 output buffers swapped out

2 carrier transitions

LC=up CA=down TM=down LB=down TA=down LA=down

Paolo Bevilacqua Thu, 12/06/2007 - 05:10

Hi,

first of all I would bring back the keepalive and bandwidth values to default.

Then observe, and upgrade IOS when possible. I would recommend latest 12.2 S for the 7500.

m.musacchio Thu, 12/06/2007 - 06:05

Thanks for your feedback and IOS recommendation; if i put the keepalives to default, I could not sense the problems in the interface, setting to default you will see the problem as a eigrp flap because of retry timeout, so for the analysis I'll prefer to leave it at 2s.

For the part of the IOS, I don't think that this is a IOS problem, because I had put int the middle of one of the E1, a couple of 7206VXR, to see what happened I had the problem again (once in a month), here is the output (I'm ever convinced in a 90% that the problem is in the sdh ring):

E1 1/0 is up.

Applique type is Channelized E1 - balanced

No alarms detected.

alarm-trigger is not set

Framing is CRC4, Line Code is HDB3, Clock Source is Line.

International Bit: 1, National Bits: 11111

Data in current interval (62 seconds elapsed):

(All data intervals were without any error)

Total Data (last 24 hours)

0 Line Code Violations, 1319 Path Code Violations,

0 Slip Secs, 0 Fr Loss Secs, 0 Line Err Secs, 0 Degraded Mins,

2 Errored Secs, 1 Bursty Err Secs, 0 Severely Err Secs, 147 Unavail Secs

c7206#sh controllers e1

E1 1/0 is up.

Applique type is Channelized E1 - balanced

No alarms detected.

alarm-trigger is not set

Framing is CRC4, Line Code is HDB3, Clock Source is Line.

International Bit: 1, National Bits: 11111

Data in current interval (89 seconds elapsed):

0 Line Code Violations, 0 Path Code Violations

0 Slip Secs, 0 Fr Loss Secs, 0 Line Err Secs, 0 Degraded Mins

0 Errored Secs, 0 Bursty Err Secs, 0 Severely Err Secs, 0 Unavail Secs

Total Data (last 24 hours)

0 Line Code Violations, 1319 Path Code Violations,

0 Slip Secs, 0 Fr Loss Secs, 0 Line Err Secs, 0 Degraded Mins,

2 Errored Secs, 1 Bursty Err Secs, 0 Severely Err Secs, 147 Unavail Secs

Actions

This Discussion