cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1567
Views
0
Helpful
14
Replies

Intermittent Loss of Internet - How to Troubleshoot?

mnestander
Level 1
Level 1

Our school has a fairly simple setup. Most department switches (all cisco) are connected via fiber to our core switches. The core is connected to our router, which is connected to our ISP's router.

Every morning our Internet connection comes to a crawl for a varying amount of time. Internal services are not affected (shares, email etc).

When the bell rings for next period, everything returns to normal. This leads me to believe that perhaps a student is plugged in somewhere causing problems (intentionally or not). We don't have any set polices that disallow students from plugging in laptops, etc (which we will soon be addressing).

Anyway, my problem is how do I go about troubleshooting this problem? I have a machine running Wireshark in promiscuous mode, but I'm unsure how to analyze the results. Is this a job for something like Snort? Are there some commands on the router or core switches that I could be running to give me some clues?

I realize locking down ethernet ports, tighter network polices, etc would help, but I'm trying to learn something about network troubleshooting.

Some other info:

My ISP tells me they see nothing suspicious and that my bandwidth utilization across their line is normally below average

My CPU and bandwidth utilization on all my switches and my router are LOW

I can PING any of the websites I'm trying to reach just fine (at least those that allow ICMP)

None of my interfaces are registering a large number of errors (CRC, collisions, etc)

There are no outgoing ACLs

None of our DNS servers (responsible for outside and inside lookups) seem to be having any issues

Outgoing connections are NAT'ed, we have a few internal web servers and an Exchange server

14 Replies 14

vijayasankar
Level 4
Level 4

Hi Mark,

Would it be possible for you to provide a simple network/topology diagram depicting the links and ip addreses( including the details of your internet access setup, where nat is configured..etc.)

Are you having PAT (overload NAT) configured for general internet browsing.

If so, you can check the NAT translation statisitics during the problematic time.( show ip nat translations)

What is capacity of your internet link.?

What is min/average/max utilisation of both incoming and outgoing traffic on that link during the problematic period.?

Remember for http traffic, the outgoing traffic from your network will be less, when compared to return traffic received from the website for the http request?

So you should watch out both the outbound and inbound link utilisation of that internet link.

How browing access is provided to the users..? Any proxy server kind of setup?

Please revert back to us with as much details as possible to further diagnose the issue.

-VJ

Hi,

one more remark: check for drops on your router interfaces with "show interface".

It will show you input and output drops.

What also might help is nbar protocol discovery, which can be configured through "ip nbar protocol-discovery" on an interface. Reduce the load interval to 30 seconds and check the values during your "crawl time".

Regards, Martin

Thanks for your replies...

All student workstations connect to the Internet via a sideways proxy server (squid, dans guardian). However, the problem occurs on all workstations.

I did a #sho ip nat trans# during the problem, but am unsure how to use the info? I can tell you that it was ~200 lines long.

** sho int on my router **

GigabitEthernet0/0

Internet address is 192.168.254.254/16

MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 1000Mb/s, media type is T

output flow-control is XON, input flow-control is XON

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:00, output 00:00:00, output hang never

Last clearing of "show interface" counters 22:23:54

Input queue: 0/75/3/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 142000 bits/sec, 67 packets/sec

5 minute output rate 472000 bits/sec, 67 packets/sec

4180958 packets input, 2204810418 bytes, 0 no buffer

Received 486642 broadcasts, 0 runts, 0 giants, 3 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 0 multicast, 0 pause input

0 input packets with dribble condition detected

4121684 packets output, 3616949559 bytes, 0 underruns

0 output errors, 0 collisions, 0 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 0 pause output

0 output buffer failures, 0 output buffers swapped out

GigabitEthernet0/1

Internet address is *clipped*.164.46/28

MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 100Mb/s, media type is T

output flow-control is XON, input flow-control is XON

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:00, output 00:00:00, output hang never

Last clearing of "show interface" counters 22:23:54

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: Class-based queueing

Output queue: 0/1000/64/0 (size/max total/threshold/drops)

Conversations 0/2/256 (active/max active/max total)

Reserved Conversations 4/4 (allocated/max allocated)

Available Bandwidth 5000 kilobits/sec

5 minute input rate 474000 bits/sec, 68 packets/sec

5 minute output rate 112000 bits/sec, 62 packets/sec

4209955 packets input, 3640882477 bytes, 0 no buffer

Received 630 broadcasts, 0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 0 multicast, 0 pause input

0 input packets with dribble condition detected

3735834 packets output, 1913232861 bytes, 0 underruns

0 output errors, 0 collisions, 0 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 0 pause output

0 output buffer failures, 0 output buffers swapped out

----

sho run

!

interface GigabitEthernet0/0

ip address 192.168.254.254 255.255.0.0

ip access-group 100 in

no ip redirects

no ip unreachables

no ip proxy-arp

ip accounting output-packets

ip accounting access-violations

ip inspect SDM_LOW out

ip nat inside

ip virtual-reassembly

ip route-cache flow

duplex auto

speed auto

no mop enabled

!

interface GigabitEthernet0/1

ip address *clipped*.164.46 255.255.255.240

ip access-group 102 in

ip verify unicast reverse-path

no ip redirects

no ip unreachables

no ip proxy-arp

ip nbar protocol-discovery

ip inspect SDM_LOW out

ip nat outside

ip virtual-reassembly

ip route-cache flow

duplex auto

speed auto

no mop enabled

service-policy output SDM-Pol-GigabitEthernet0/1

!

ip classless

ip route 0.0.0.0 0.0.0.0 *clipped*.164.33

ip nat inside source list 1 interface GigabitEthernet0/1 overload

*** end sho run ***

Thanks

Hi Mark,

Thanks for reverting back.

Would be interested to see your acls too. ( acl 100, 102)

I could see that "ip accounting" is enabled on the interface ge0/0

IP Accounting is a cpu intensive process. You can turn it on when needed for troubleshooting and later disable it to save cpu cycles.

If Internet access for everyone is available only through the Proxy server, then you may have to focus on that server to examine the reason for slow browing issue.

For test purpose can you allow one workstation to have direct internet access, instead of proxy. Once you have this workstation with direct internet access,

by-passing the proxy server, you can check the performance from that workstation.

From your intial post, i could understand that, only internet access is crawling, whereas PING is going through fine. Hence i strongly feel that there might be some issue with your proxy servers.

As stated earlier, I would need that ACL 100 to check what is allowed from inside lan to outside.

I have seen in some instances, the SQUID proxy periodically rebuilds/invalidates its cache, during that time http browing via that proxy server will get a performance hit. Not sure if this is what happening in your setup.

As stated earlier, if you could check the performance of internet access from a internal host( after bypassing proxy for this host alone), you will be able to isolate whether the issue is only related to proxy server or oubound internet access in general.

Kindly Revert back with your observations.

Hope this helps.

-VJ

Ok, I may have been unclear. Not all of our machines exit via the squid proxy, just the student machines. All of our machines, proxy'ed or not, experience the slow-down.

Here are the acl's on our router (clipped for privacy):

Router#sho access-lists

Standard IP access list 1

10 permit 192.168.0.0, wildcard bits 0.0.255.255 (1232783)

Extended IP access list 100

10 deny ip **.164.32 0.0.0.15 any (1899591)

20 deny ip host 255.255.255.255 any

30 deny ip 127.0.0.0 0.255.255.255 any

40 permit ip any any (37876428)

Extended IP access list 101

10 permit udp any host **.164.39

20 permit tcp any host **.164.39

30 permit udp any host **.164.38

40 permit tcp any host **.164.38

50 permit tcp host **.161.64 any

60 permit tcp host **.161.64 host **.164.46 eq 443

70 permit tcp host **.161.64 host **.164.46 eq 22

80 permit tcp host **.161.64 host **.164.46 eq cmd

90 permit tcp any host **.164.37 eq www

100 permit tcp any host **.164.35 eq 443

110 permit tcp any host **.164.35 eq www

120 permit tcp any host **.164.35 eq smtp

130 permit tcp any host **.164.34 eq www

140 deny ip 192.168.0.0 0.0.255.255 any

150 permit icmp any host **.164.46 echo-reply

160 permit icmp any host **.164.46 time-exceeded

170 permit icmp any host **.164.46 unreachable

180 deny ip 10.0.0.0 0.255.255.255 any

190 deny ip 172.16.0.0 0.15.255.255 any

200 deny ip 127.0.0.0 0.255.255.255 any

210 deny ip host 255.255.255.255 any

220 deny ip host 0.0.0.0 any

230 deny ip any any log

Extended IP access list 102

10 deny tcp host **.32.71 any (11113)

20 deny tcp host **.32.70 any (1016)

30 deny tcp host **.32.69 any (17584)

40 deny tcp host **.32.68 any (20633)

50 permit tcp any host **.164.34 eq www (2733833)

60 permit tcp any host **.164.42 eq www (3090)

70 permit tcp any host **.164.42 range 1417 1420 (6958)

80 permit udp any host **.164.42 range 1417 1420 (8)

90 permit tcp any host **.164.42 eq 407 (5077)

100 permit udp any host **.164.42 eq 407 (15)

110 permit tcp any host **.164.42 eq 443

120 permit tcp any host **.164.43 eq 3389 (20173)

130 permit tcp any host **.164.37 eq 3389 (9614)

140 permit tcp any host **.164.34 eq 3389 (6)

150 permit tcp any host **.164.37 eq www (937081)

160 permit tcp any host **.164.36 eq www (2058)

170 permit tcp any host **.164.45 range 5900 5901

180 permit tcp any host **.164.35 eq 443 (954183)

190 permit tcp any host **.164.35 eq smtp (714017)

200 permit tcp any host **.164.35 eq www (1663)

210 permit udp any host **.164.39 eq 5632

220 permit tcp any host **.164.39 eq 5631

230 permit udp any host **.164.38 eq 5632 (2)

240 permit tcp any host **.164.38 eq 5631 (71879)

250 permit udp any host **.164.41 eq 5632 (18)

260 permit tcp any host **.164.41 eq 5631 (141015)

270 permit udp any host **.164.40 eq 60554

280 permit tcp any host **.164.40 eq 60554

290 deny ip 192.168.0.0 0.0.255.255 any (10)

300 permit icmp any host **.164.46 echo-reply

310 permit icmp any host **.164.46 time-exceeded (1031061)

320 permit icmp any host **.164.46 unreachable (9984)

330 permit tcp host **.161.64 host **.164.46 eq 443

340 permit tcp host **.161.64 host **.164.46 eq 22

350 permit tcp host **.161.64 host **.164.46 eq cmd

360 deny ip 10.0.0.0 0.255.255.255 any (84)

370 deny ip 172.16.0.0 0.15.255.255 any (3)

380 deny ip 127.0.0.0 0.255.255.255 any

390 deny ip host 255.255.255.255 any

400 deny ip host 0.0.0.0 any

410 deny ip any any log (374627)

Extended IP access list **name clipped**

10 permit tcp any any eq 8200

20 permit tcp any any eq 443

Thanks!

hello,

This can also be due to the class-based QOS configured on your interface.. can you post us that config too ?? are you giving any priorities to any other traffic other than HTTP ? what all protocols flow in your network , through internet ? You can also remove the service policy command and just run http service, to isolate the issue !!!

Raj

Incoming traffic:

http (webservers)

https (exchange OWA)

smtp (exchange)

Out:

websurfing (http, https, streaming audio, etc)

ftp

I'm not sure what you mean by "remove the service policy"

As for QOS, I think this is what you are referring to (???)

Router#sho class-map

Class Map match-any SDMSignal-GigabitEthernet0/1 (id 1)

Match protocol h323

Match protocol rtcp

Class Map match-any SDMRout-GigabitEthernet0/1 (id 6)

Match protocol bgp

Match protocol egp

Match protocol eigrp

Match protocol ospf

Match protocol rip

Match protocol rsvp

Class Map match-any SDMBulk-GigabitEthernet0/1 (id 2)

Match protocol exchange

Match protocol ftp

Match protocol irc

Match protocol nntp

Match protocol pop3

Match protocol printer

Match protocol secure-ftp

Match protocol secure-irc

Match protocol secure-nntp

Match protocol secure-pop3

Match protocol smtp

Match protocol tftp

Class Map match-any SDMManage-GigabitEthernet0/1 (id 8)

Match protocol dhcp

Match protocol dns

Match protocol imap

Match protocol kerberos

Match protocol ldap

Match protocol secure-imap

Match protocol secure-ldap

Match protocol snmp

Match protocol socks

Match protocol syslog

Class Map match-any SDMIVideo-GigabitEthernet0/1 (id 3)

Match protocol rtp video

Class Map match-any class-default (id 0)

Match any

Class Map match-any SDMSVideo-GigabitEthernet0/1 (id 9)

Match protocol cuseeme

Match protocol netshow

Match protocol rtsp

Match protocol streamwork

Match protocol vdolive

Class Map match-any SDMTrans-GigabitEthernet0/1 (id 7)

Match protocol citrix

Match protocol finger

Match protocol notes

Match protocol novadigm

Match protocol pcanywhere

Match protocol secure-telnet

Match protocol sqlnet

Match protocol sqlserver

Match protocol ssh

Match protocol telnet

Match protocol xwindows

Class Map match-any SDMVoice-GigabitEthernet0/1 (id 5)

Match protocol rtp audio

Class Map match-any SDMScave-GigabitEthernet0/1 (id 4)

Match protocol napster

Match protocol fasttrack

Match protocol gnutella

policy-map SDM-Pol-GigabitEthernet0/1

class SDMVoice-GigabitEthernet0/1

priority percent 70

set dscp ef

class SDMManage-GigabitEthernet0/1

bandwidth remaining percent 3

set dscp cs2

class SDMTrans-GigabitEthernet0/1

bandwidth remaining percent 33

set dscp af21

class SDMRout-GigabitEthernet0/1

bandwidth remaining percent 3

set dscp cs6

class SDMSignal-GigabitEthernet0/1

bandwidth remaining percent 40

set dscp cs3

!

Thanks again

Not sure if this is relevant, but I noticed the DROPS:

Router#sho int g0/1 switching

GigabitEthernet0/1 $FW_OUTSIDE$$ETH-WAN$

Throttle count 78

Drops RP 323 SP 0

SPD Flushes Fast 0 SSE 0

SPD Aggress Fast 0

SPD Priority Inputs 1021562 Drops 0

Protocol IP

Switching path Pkts In Chars In Pkts Out Chars Out

Process 4023090 1323166173 3987250 305176872

Cache misses 1 - - -

Fast 63086484 3162115666 49777734 1082941822

Auton/SSE 0 0 0 0

Protocol ARP

Switching path Pkts In Chars In Pkts Out Chars Out

Process 11803 708180 962 57720

Cache misses 0 - - -

Fast 0 0 0 0

Auton/SSE 0 0 0 0

Protocol Other

Switching path Pkts In Chars In Pkts Out Chars Out

Process 0 0 119200 7152000

Cache misses 0 - - -

Fast 0 0 0 0

Auton/SSE 0 0 0 0

NOTE: all counts are cumulative and reset only after a reload.

Router#sho int g0/1 switchport ?

| Output modifiers

Router#sho int g0/1 switchport

% Gi0/1 is not a switchable port

Router#sho int g0/0 switching

GigabitEthernet0/0 $ETH-SW-LAUNCH$$INTF-INFO-GE 0/0$$FW_INSIDE$$ETH-LAN$

Throttle count 375

Drops RP 5993 SP 0

SPD Flushes Fast 0 SSE 0

SPD Aggress Fast 0

SPD Priority Inputs 3967308 Drops 2

Protocol IP

Switching path Pkts In Chars In Pkts Out Chars Out

Process 4992099 656598230 2978590 1117785666

Cache misses 0 - - -

Fast 49909339 1202096437 62845107 3042327746

Auton/SSE 0 0 0 0

Protocol ARP

Switching path Pkts In Chars In Pkts Out Chars Out

Process 4003643 240590776 1281527 76891620

Cache misses 0 - - -

Fast 0 0 0 0

Auton/SSE 0 0 0 0

Protocol Other

Switching path Pkts In Chars In Pkts Out Chars Out

Process 3117710 3652788604 119203 7152180

Cache misses 0 - - -

Fast 0 0 0 0

Auton/SSE 0 0 0 0

NOTE: all counts are cumulative and reset only after a reload.

Router#

What is the capacity of your connection to your upstream provider? You may be hitting the ceiling during the peak times and seeing that you have proxy server(s), they may have significant impact on your total capacity, depending on prefetch options, etc.

Regards,

Chris

Thanks for your reply - as I mentioned, our ISP has noticed no capacity thresholds being met at any time.

One thing I'm wondering about - Could the fact that our ACLs being applied to returning traffic be part of the problem? For example, the log shows several packets from outside port 80 to inside being blocked.

ACL's are exact. They either do or don't work. You have a pattern that points to a slowdown of some resource on your network (seeing that it's not bandwidth related).

Do a 'show proc cpu' during the slowdown period and compare with the same for normal period and let us know what the CPU load is like. Also, what platform is your core router?

Here is an example of my log - there are tons of these from various IP addresses all from port 80 (as though it is returning traffic from sites surfed to from within our school)

1/30/2007 9:37 Local7.Info 192.168.254.254 304191: 304199: *Jan 30 16:41:57.879 UTC: %SEC-6-IPACCESSLOGP: list 102 denied tcp 17.250.248.77(80) -> ***.***.164.46(49177), 4 packets

***.***.164.46 is the public side of router g0/1

I totally agree with the fact that ACLs are exact, it's so strange that we would be able to surf at all if port 80 coming back in is denied.

Our platform: Cisco IOS Software, 2800 Software (C2800NM-ADVSECURITYK9-M), Version 12.4(3d)

Hope that's what you meant.

Yes, that's what I wanted to see. Also, please provide the output from 'show proc cpu'

Chris

I was going to post the sho proc cpu, but everything is < 1% for each time interval. In fact, most of them are at 0.00. The only one that changes much is IP Input with a 5 minute interval of ~0.55

thanks

Review Cisco Networking products for a $25 gift card