I'm running into a performance issue where I'm at a loss at trying to troubleshoot any further.
I have two switches (6509 with SUPII/MSFCII) and 4006 with SUPII that I use for a backup environment. I have a client on the 4006 that is NFS mounting from an NFS server on the 6509. The backup server is also on the 6509. The backup flow is NFS server (6509) to client (4006) back to the Backup server (6509). Crazy I know but that's for another day. They are connected via a 2xGEC as follows:
6509:WS-X6516-GBIC:4/9 to 4006:SUPII:1/1
6509:WS-X6516-GBIC:4/13 to 4006:SUPII:1/2
The servers in question are connected as follows:
NFS Server: WS-X6316-GE-TX:9/5
BACKUP SVR: WS-X6316-GE-TX:9/9
It used to be that I had the backup server on the same group of 8 ports that the client was connected to (i.e. on WS-X4448-GB-RJ45:2/17). Performance between those two servers was good but horrible between the backup server and many other servers that mostly live on the 6509 or other switches that transit the 6509. So we moved the backup server to the 6509 per my description above based on flow analysis of traffic and this resolved the backup problems for all other systems except for the one client mentioned above.
Now I'm aware of the shared 1 Gbps per 8 port grouping and how limitations due to oversubscription manifest themselves in the counters. But this time I'm not seeing any Que/Error/Pause counters increment on all ports involved (NFS server, Backup Server, Client, 2xGEC from each end). Also I?ve ensured that only the one client per 8 port group is being backed up at any given time. If I move the client to the 6509 (something I can?t do permanently) then performance is good again. If I move the backup server back to the 4006:2/17 then performance to the client is good again. The 2xGEC is about 6%-9% utilized on one of the links when the backup is occuring (the other is idle expectedly).
Any help/pointers to look at how to diagnose congestion internal to the switch using something other than "show counters" and "show ports" or other techniques for finding internal 4006/SUPII architecture bottlenecks is greatly appreciated.