How could we solve intermittent network failure issues
I have a problem, where i am finding it difficult to find the root cause.The pblm is the end users is connected to access switches which in turn conected to the core switches which are running HSRP and all the switches are 3550 switches, intermittently the users were thrown out of the applications and the switches were not rechable from the NOC,both the noc and the application servers are outside the network like the application is over the WAN and the noc is just anothr lan, and there are no suspicious logs on the switches and the pblm is restored automatically, what i need is, is there any logs which i can enable it on the switches to find the cause, logs like stp changing or any other logs.
Re: How could we solve intermittent network failure issues
There's not enough detail here to talk definitively about the issue, but I'll throw a couple of suggestions your way.
You need to identify what layer the error is happening at, L3, L2, etc.
If logging is in fact enabled and you were losing any of the routing or uplink interfaces on the switches there should be a message generated in the log. Likewise the routing between the networks should only fail if the routed interfaces went down. i.e. the SVI, or the routing protocol should actually show that the route for the application network, or for the end-user network changed. A show ip route command should tell you if this is happening or not. The route should show a last update timer which would have changed at the time of the failure.
If you're not seeing errors in the log files on any of the switches and the layer three routing doesn't appear to be the culprit, then I would investigate the spanning-tree. You may be experiencing a spanning-tree reconvergence which is blocking your users. One quick way to check is to do a show spanning-tree detail and look at the last topology change for the vlan that's experiencing the problem.
The output should look something like this:
VLAN0001 is executing the rstp compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 1, address 0000.0000.0000
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 1 last change occurred 1y0w ago
If the "number of topology changes" and "last change" are a high number, and fairly recent, then spanning tree has reconverged often/recently for the vlan that is experiencing the problem. If this is the case, review your switching design for best practice considerations such as: turn portfast on for all end-user ports and remove "loops" in the topology without sacrificing any redundancy. If you turn on portfast for end-user ports, make sure you use some protective features on the switches such as bpduguard, loopguard, storm-control, etc to avoid having some user plug a hub into your network and take it down with a broadcast storm.
Question We run asr9001 with XR 6.1.3, and we have a very long delay to
login w/ SSH 1 or 2 to the device compare to IOS device. After
investigation, the there is 1s delay between the client KEXDH_INIT and
the server (XR) KEXDH_REPLY. After debug ssh serv...
Introduction The purpose of this document is to demonstrate the Open
Shortest Path First (OSPF) behavior when the V-bit (Virtual-link bit) is
present in a non-backbone area. The V-bit is signaled in Type-1 LSA only
if the router is the endpoint of one or ...
Hi, I am seeing quite a few issues with patch install and wanted to
share my experience and workaround to this. Login to admin via CLI, then
access root with the “shell” command Issue “df –h” and you’ll probably
see the following directory full or nearly ...