I am currently running a network monitor between a 2950 switch and a 2600 router. This is showing about 1% frame errors of a type which would normally be dropped by a NIC.
I am trying to narrow down where these errors are coming from and so would like to know if either of the devices mentioned would normally try to send these frames or if they would drop them.
Thanks in advance.
* Checks for errors on a received packet, determines the destination port, stores the packet in shared memory, and then forwards the packet to the destination port "
I understand that the 2950 does error checking, so it wont pass on the bad frames. 2600 is a L3 device. So the errors cannot be passed across any of these devices.
If at all, you see these errors, they should be generated by the devices sharing that LAN. I am not sure how you are monitoring this connection though!
I have the 2950, 2600 and a PC with Win2K running CommView (http://www.tamos.com/products/commview/) into a hub. There are no other connections.
The quote above doesn't say that the 2950 will drop frames with errors only that it checks for them unless I am misreading it.
The frame errors we are receiving come in groups i.e we may get none for half an hour and then get many all at once. This coincides with a performance drop which is what caused me to investigate in the first place.
Thanjks for replying
I understand from the abstract, that the 2950 would drop the packets with errors. I do not see any reason for it to forward a frame which has errors. Now that you have explained the topology which has a hub. I am pretty much sure that the error is being introduced by one of these devices on the hub. You should check eliminating them one by one.
The switch and the router would normally be connected directly via a standard CAT 5E cable. The only reason I introduced the hub is so that I could monitor the frames being transmitted between the two of them. The switch, router and monitoring PC are the only devices attached to the hub.
I agree it doesn't make sense for the switch to forward a frame with errors therefore given the above setup it must be either the router or the switch creating the errors.
Does that round reasonable?
Does the port stats on either 2950 or 2600 show any frame errors? Why do
you suspect that there are errors between 2950 and 2600?
My guess is that the 1% error that you are observing is due the monitoring
PC and the hub and not really because of the switch or the router.
Maybe you can check with some other hub and PC.
Unfortunately I can't get any stats off either pieces of equipment as they were installed for us as part of a broader project and I have no access codes, passwords or any other information.
The only reason I am doing any of this is that in 6 months we have had no response from the people responsible for this equipment whilst in the meantime we are having to put up with a substandard service. The idea is to produce evidence of a problem which will then hopefully provoke some action. Unfortunately due to the politics of the situation we cannot simply decide to not use it and go elsewhere.
As for the PC and hub I will move them to a different point in the network and see if the errors persist.
Thanks for your time.
Get a mib browser and see if they left the snmp access open to the "public" community. If they did you can get interface stats from the devices that way.
If the devices normally run full duplex and may be hard configured to do so, your hub can be cuasing the problems you are seeing.
The router will always drop error packets but the switch can be configured in cut through mode where it will start forwarding a packet as soon as it gets the destination address OR fragment free where it forwards it after it receives 64 bytes. Both can forward bad packets because they forward before they see the crc.
You would have to see a lot of errors for them to visibly affect performance.
A couple of testing options. We have several remote networks that go through providers that don't allow any visability into their network.
From one side of your quesitonable equipment perform extended ping tests using large MTU's and variations in data format, ie. 1500 MTU and all 0's all 1's and a mixture inbetween(this is not the proper terminology for this).
Start extended pings to devices that are on your local lan. Record the results.
Next extended ping interfaces on the suspect equipment working your way across the equipment.
Finally extended pings to devices beyond the suspect equipment to your destination devices.
Look for high latency at the problem hops
Look for retransmissions, indicating errors on one of the devices
Look for problems at the end devices.
Trace routes can be helpful if latency is a big issue, by pinpointing when the slow responding device enters the picture.
All of this testing can be captured and used convincingly to expose a device or network that may be having problems and requires attention.
It also helps to know what service level the network is expected to provide. You may want that answer before you present you documentation. An example are WAN links, a certain latency range is acceptible and the service provider should know when a threshold has been exceeded.
Placing your sniffer on the local LAN to check responses and identify when certain types of failures or retransmissions occur is very helpful.
The SNMP Public access string is closed so I can get no info from the equipment itself.
I have moved the hub and monitoring PC to a different location and errors persist so the errors could be caused by this equipment itself.
Using the Solarwinds Network Management tools I have run a continuous ping to all the hops on the route. Whilst there is an occasional high ping internally (which gives a corresponding high ping to all hops) by far the most consistent poor performer is the last hop on the route which is, I am told, a Cisco Content Engine (is this the same as a proxy server?), which is the last hop available either by TraceRoute or Ping, so it would seem the issues we are experiencing stem from this.
Having managed to find out the names of other companies attached to this system and spending a while on the phone it would seem that other people are having similiar problems so this would seem to confirm this.
Thanks to all for your help and if you have any further suggestions please post them.
With the understanding that you cannot access the equipment, and that the individuals who set it up are not responding to you, maybe you should break into the router and switch. There could be a configuration problem that might cause this entire problem, or might not be. You don't know until you verify the configuration. I don't know if you would get permission, or if you can do it on the D/L. When you break into the equipment, you can either change the passwords, or you can leave them alone and keep doing the break sequence every time you need to access the router. That would just be so that no one knows you were in the router. Again, this is assuming you can get the downtime to do this 5-minute or less process.
Cisco 2600 Router
Cisco 2950 Switches
VERY interesting links which I have made a note of. We have, however, got enough material for an initial report and we are going to give them a couple of weeks to respond. After that, who knows?