09-15-2011 01:28 PM - edited 03-04-2019 01:37 PM
Hey all,
I started having a really odd issue Tuesday evening and I really dont know where else to look.
I have my HQ location and a backup datacenter location (DR). They are interconnected via a 250mbps p to p Ethernet link. Backup replications and other traffic was flowing back and forth across this link just fine for the past couple months until this week. I got an email from my App Dev folks telling me that their nightly process was taking 4 hours to complete vs the normal 1.5 hours. I started running some iPerf tests between my HQ location and my DR location.
If you aren't familiar, iPerf is a point to point bandwidth tester. I ran my test for 300 seconds, 30 second reporting intervals, 5 Parallel threads.
From HQ -> DR, i can get 250 mbps across 5 flows, each connection only going at roughly 50mbps
Sample:
[ ID] Interval Transfer Bandwidth
[328] 0.0-30.0 sec 167 MBytes 46.8 Mbits/sec
[320] 0.0-30.0 sec 167 MBytes 46.8 Mbits/sec
[312] 0.0-30.0 sec 167 MBytes 46.8 Mbits/sec
[296] 0.0-30.0 sec 167 MBytes 46.7 Mbits/sec
[304] 0.0-30.0 sec 167 MBytes 46.8 Mbits/sec
[SUM] 0.0-30.0 sec 837 MBytes 234 Mbits/sec
From DR -> HQ I am only able to get around 50mbps TOTAL across the 5 threads.
I have run similar tests from other points in my HQ and my DR locations in both locations. I have ran the tests with and without encryption on the link, same results. I have a 3945E router on both ends of this circuit. I purposely dont have any QoS or queueing on these interfaces on the 3900s or on the cores going to the 3900.
I am using VM servers connected to core switches via VPC as well as a standalone PC connect to an access-switch, also tested by connecting this PC right to the core servers.
These results are similar even if I do other file transfers that don't entail iPerf, so I know its not an issue specifically relating to the iPerf app.
HQ relevant Lan setup:
Nexus 5548 x2 in VPC
3945E connected to both 5548 via a routed port on both nexuses
We run OSPF between the Nexus and the 3900s OSPF adjacencies formed and stable
DR Lan setup:
3945E connects to Nexus 5548P and 4503-E via a routed port to each. OSPF runs here.
Packet captures dont show any ICMP errors spit back in either direction from the router towards the hosts.
HQ 3900, relevant config:
flow exporter 1Flow
description Exporting flow to Solarwinds
destination 192.168.59.243
source Loopback0
output-features
transport udp 2055
export-protocol netflow-v5
!
!
flow monitor flow-1
record netflow ipv4 original-input
exporter 1Flow
!
interface GigabitEthernet0/2
bandwidth 250000
ip address 192.168.100.29 255.255.255.252
ip ospf message-digest-key 1 md5 XXXXXXX
load-interval 30
duplex full
speed 1000
no cdp enable
crypto map eline-map
!
interface Tunnel1
description Eline to DR
bandwidth 250000
ip unnumbered GigabitEthernet0/2
no ip redirects
ip mtu 1440
ip flow monitor flow-1 input
ip virtual-reassembly in max-reassemblies 1024
ip ospf message-digest-key 1 md5 XXXX
load-interval 30
tunnel source GigabitEthernet0/2
tunnel destination 192.168.100.30
tunnel path-mtu-discovery
end
DR 3900 relevant config:
DR-3945E#sh run int gi0/2
Building configuration...
Current configuration : 267 bytes
!
interface GigabitEthernet0/2
bandwidth 250000
ip address 192.168.100.30 255.255.255.252
ip flow monitor flow-1 input
ip ospf message-digest-key 1 md5 7 XXXXXXXX
duplex full
speed 1000
no cdp enable
crypto map eline-map
end
DR-3945E#sh run int tu1
Building configuration...
Current configuration : 386 bytes
!
interface Tunnel1
description Eline to HQ
bandwidth 250000
ip unnumbered GigabitEthernet0/2
no ip redirects
ip mtu 1440
ip flow monitor flow-1 input
ip virtual-reassembly in max-reassemblies 1024
ip ospf message-digest-key 1 md5 7 XXXXXXXXX
load-interval 30
tunnel source GigabitEthernet0/2
tunnel destination 192.168.100.29
tunnel path-mtu-discovery
end
Neither of the routers are showing any CPU spikes. I can run a similar iPerf test on either of the LANs, HQ or DR, and get 1.8+gbps in both directions, so it may seem to be limited to this link. I've opened a ticket with my provider to see if they're done anythign, but its STRANGE. Anyone have ANYWHERE I could look? I've tested p-to-p from every point i can think of to rule out various parts of my network. I think i'm going to connect my PC right to the 3900, a spare interface, and see if i get the same results. Keep in mind, i've changed NOTHING on my network to cause this. Zero changes Tuesday. Could the FNF have anything to do with it?
09-19-2011 06:52 PM
actually its not that strange..the service provider could have inadvertently done some reconfiguration on the circuit pegging it down to 50mbps. are there any results on their investigation? did they do any recent maintenance somewhere on their network that could have affected your circuit?
09-21-2011 11:25 AM
It was actually a TCP windowing issue. I increased the TCP window on my iPerf to 256K, and I flooded the circuit with 1 flow. Also, the main host that was having trouble replicating its traffic was rebooted. Once that was done, performance returned to normal for them.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: