I have a 500 node MPLS network. All the remote routers are 2811's and all the remote traffic terminates here at the central site on a 7606 with a DS3. We have implemented service policies in all the remote routers and they seem to be working well. However, we are "slamming" or overrunning the remote circuits because we are not shaping our traffic down by remote site as it leaves our DS3. My understanding is that you can nest policies, but that is a huge amount of ACL's and policies. I'm not even sure it would work.
Any advice or recommendations would be appreciated.
To give a better picture of the situation, how different are the remote sites in terms of QoS policy and circuit speeds? I take it there is alot of variation, hence, your frustration?
The network is very consistent: 475 of the branches are identical (128Kb) and 20 of the remaining are also identical (256Kb).
We are concerned that 500 cascaded policies would be too much for the CPU. We are also concerned that it just wouldn't work - there are no examples we can find of anyone having that many "classes" in a single configuration.
OK I think I see the problem. Let me guess, on the DS3 you do not have a sub-interface for each location, rather just one physical interface?
If you did have a sub-interface for each location you could just define 1 policy-map to cover each of the 475 identical sites.
If your DS3 is one physical interface it does get more complicated. You want to make sure you dont burst beyond the link capacity of each site, but you dont want to define 475 different classes of traffic with one policy-map to match a policer to.
If my hunch is correct you may be able to use MicroFlow Policing:
Using MicroFlow Policing the PFC on the 7606 would be able to distinguish sites based on the destination address in the flow and individually police each flow with a single class configuration, rather than defining each site with 500 different classes.
The challenge with this would be understanding how many flows the PFC could see at one time for each site and setting the policer accordingly.
Yes - that is exactly the problem. I didn't word it very well!
MicroFlow Policing looks like a good option, but I need to define ranges of address (i.e. 10.131.45.1/24) to define the source addresses of the remote hosts. It didn't look like I could do that. It also appears that you are limited to 64 flows.
I would assume that policing might not exactly be what you want, because it will drop packets (or remark them) in case you are above the configured bandwidth. To me it would seem more preferable to shape the traffic to the remote site access link bandwidth. This would presumably not lead to drops (or at least less drops), but would queue packets.
The implication is however, that you have a number of classes equal to the number of sites, which is impossible in your case, afaik. The number of classes per policy is limited to 256.
So a solution could be to create a couple of subinterfaces on the main T3 towards the Service provider with a policy each having less than 256 classes. The complexity of this approach is however quite substantial.
There is no "nice" solution to your problem, afaik.
I've read the answers from the others here, and just wanted to make some comments..
First of all, microflow policing is done on a "flow by flow" basis.. you can't define the flows as you want to.. it's all up to the type of flow-setting you use (dest-only, src-only, full-flow). You can't use netmasks for this.
Second; output policies do not support microflow policing. This alone would make this option useless for you I would guess.
I am not sure I have understood your network correctly, you say you have a 500-node MPLS network terminated on a DS3, but what we need to know is the following:
- Is the DS3 terminated directly in your 7606, or via a external unit which you connect to your 7606 with a ethernet-connection?
- Is the MPLS network implemented in YOUR units, or in a provider network?
If you are in fact using a provider-based MPLS-network with only a ethernet-connection to your DS3, I think we could make a workaround for you. Depending on your type of supervisor in the 7606, you could either use VRF-lite or three (or more) external routers to do this work for you.
I have attached a quick drawing of what I am thinking of, but in short:
- Make three VRF's
- VRF 1 is for half of the outgoing sites
- VRF 2 is for the other half
- VRF 3 is to gather the traffic and for the incoming traffic from the sites
If you have 500 sites in the address-space 10.0.0.0 (250 sites) - 10.1.255.0 (250 sites) (with each one a /24), route 10.0.0.0/16 to VRF 1 and 10.1.0.0/16 to VRF 2.
- Setup import/export between VRF 1 and 3, likewise for VRF 2 and 3 so that VRF 1 knows about the routes out to the sites and VRF 2 likewise
- Setup import/export between VRF 3 and the global VRF so that VRF 3 knows about the internal routes. The global VRF should NOT route directly to VRF 3
- There should now be setup an external physical link between the global VRF and VRF 1, and likewise for global and VRF 2.
- These physical links will now be available for use of QoS settings..
If you can't use VRF's, use three ethernet-routers instead.. should do the same trick.. :)
Any comments or something I have missed?
If, on the other hand, you are in fact doing the MPLS-stuff yourself and the DS3 is installed in your 7606, this won't work.. :(
Did it help? If so, please rate it.
it should be up to the remote sites in this network to ensure they never get oversubscribed... the hub just doesn't have to do it...
you can use wred with policing or shaping (mqc) at each spoke router... wred will use weighting to drop packets before congestion occurs, yet still allow optimum line utiliztion (as apposed to a tail drop when the queue fills). In the network I built, the hub didn't do qos at all (no we were not running voice over this network). If so they hub would have just llq'd the voice packets and transmitted them first.
When you have a network with X spokes and its more likely they are going to get oversubscribbed first, you limit ingress/egress at the spokes, while prioritizing the traffic !
(DONT HAVE THE HUB DO QOS EXCEPT FOR VOICE!!!)
have you found a solutions to this problem as we have the exact same problem with over 1000 sites hitting a MCI dual DS3 connections on our 7206 router. We are looking at putting in some sort of QOS to these sites as the HUB is the bottle neck for our VOIP calls.
Not really - WRED seems to be working well for most applications. I still believe that micropolicing on a per hub site basis is the ultimate solution.
we are currently looking at adding MQC (rate limiting on IP's) to every VOIP site we have but this is going to overload our router at some point and do the same at the hub site. We currently don't have any other solutions on the table. Is it possible to discuss your attempts and issue with this problem to see if maybe we can find a solution?
Yes - I would be interested in doing that. Sounds like your network is almost identical to ours!
Is there some way to PM you my contact info?
We have the similar setup as you - 220 nodes with variable speeds. Hub site is 200 Mbps, and end nodes are from 128Kbps to 1024 Kbps.
And we have a working solution - All QoS is done on both sides of the bottle neck. In every case the bottleneck is the slow line of the remote.
E.g. If it's:
central CPE <-> CE <-> PE <-> CLOUD <-> PE <-> CE <-> CPE remote
The bottleneck is PE remote <-> CE remote, the rest of the network is High Speed. We have QoS implemented on output of PE remote to CE remote, and on output from CE remote to PE remote.
You need to implement QoS where your software queues are created. One way is to manually create those queues using Shaping at central site, but that's not scalable or easily possible, as you have mentioned. Therefore you need to implement it on every device that has the slow link connected.
Hope this helps.
we have tried a simple test like that and did get some good results but we have only a 2-DS3 as our connection to the MPLS cloud which is also a bit over subscribed which is the main problem. We don't have a problem accing MQC to the remote nodes out there but the Hub site is going to be a bit harder.
Well, but hub is almost a no issue.
If you have done QoS on ALL of your branch lines, there is not much point to do the QoS at the central site. Just trace the speeds and see where the queues will form. Sure there will be some at output of hub to branches, but they are many magnitudes smaller than what will be formed at the branch side.
I don't understand why you don't recommend using QOS at the central site. You need to tag the outbound packets from the hub to ensure reliable deliver to the remotes. Otherwise, everything would be seen as "default" and subject to whatever the carrier chose to do with it. This would be especially true for voice traffic- it would not receive jitter-free handling.
I agree with Pavlo, best place to initially place QoS is where the bottlenecks are.
Reason I asked about whether your routers were LSRs, is trying to understand whether you also control/set QoS policy on MPLS egress. If not, then without getting into the problem of managing some type of shaping at the central site, and if your carrier offers MPLS egress QoS policies, mark your outbound packets at the central site to take advantage of them. (Marking packets at the hub outbound, I would consider part of WAN end-to-end QoS.)
Another possibility, I don't recall whether there any options to shape, police or set policies on tunnels, but if there were, you could control the outbound traffic via a tunnel.
Let me clarify myself. Queues formed at central site are negligible since it is not a primary (or even secondary) bottleneck.. Classification and Marking features of QoS besides congestion management and avoidance are almost mandatory at central HUB. Because first output of a router with a slow link should use DSCP values for classification.
So, QoS at central hub: Definitely. But congestion management and Congestion Avoidance will have little to no effect. I still stringly recommend implementing it for times of high load to multiple branches at once.
Also, if MPLS network is under your control, remember that three high order bits of DSCP field are automatically copied into MPLS EXP field. If you are controlling MPLS routers, you can setup congestion management and avoidance on those routers based on MPLS EXP bit. But again - concentrate on your bottlenecks.
It is not that hard to configure 2x500 routers (from both sides of a slow link) as you might think - Same policy map (that uses percentage rather than absolute values) came be applied on all of them. And if you (or your provider) are using same interface numbers on branches, it can even be pasted over SNMP or some other automation tool.