InPkts arriving from Pix FW is process switched in 3750=100% CPU (IP Input)

Unanswered Question
Sep 21st, 2007
User Badges:

Hi all

I'm trying to solve/identify a problem for one of our customers in an PIX535 <-> 3750-stack enviroment.


I think I may know the answer but I really hope someone can prove me wrong.


Symptom:

100% CPU load (average for several hours during backup/batchjob hours!) where IP Input claims allmost all cpu time.

And yes, I've read "Troublesooting High CPU Utilization in IP Input Process" http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a00801c2af3.shtml



ASCII version of network topology:


(Internet)

|

External network (our AS)

|

Redundant PIX 535 FW

|

Redundant 3750stacks (Internal networks)

|

Even more Internal networks...



The problem arises in the 3750 stack connected to the Active PIX 535, on the transit Vlan between the PIX and the 3750 stack.


All internal networks with Vlan Interfaces configured in the 3750 stacks seems to do well regarding fast switching, as in 90% (or so) of the packets are NOT Process switched/routed. Packets from 3750 to PIX FW included.


But... 100% of all packets from PIX FW to 3750 is process switched!


I have found figures stating that 3750 is able to process switch somewhere between 2500 to 3000 pps which gives around 45Mbit troughput in best case. (This is well below what our customer needs)



To the best of my knowledge, this is how it works:


Inbound traffic from 'outside' and DMZ's on the PIX FW is simply thrown on the (from the PIX perspective) "next hop", which happens to be the Interface vlanX in the 3750. PIX knows that the IP address of the next hop has mac-address "AA-BB-CC-DD-EE-FF" and forwards all packets there on level 2.


Arriving in the 3750 all packets from PIX is addressed to Interface vlanX mac-address (3750 still not aware of destination host IP address). 3750 now has to pick up the packet to level 3 (IP) to see where it's destined to. Okey, 3750 knows where to forward the packet... another trunk or an access switch port etc.


As far as I understand, this has to be done with each and every packet comming from the PIX since I don't see there is any chanse for the 3750 to "IP CEF" this since all packets are destined for Interface vlanX mac-address on level2.


Please, someone tell me that I'm missing something here or else I will have to slap the designer of our "refreshed" datacenter core switch/router environment in his face... hard!


(We used to have 7200VXR routers connected with GigaInterfaces to Catalyst 4000 chassis before, which ran smoothly without any problems. Nearly 100 servers is served by the 3750 today).

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Edison Ortiz Sat, 09/22/2007 - 09:52
User Badges:
  • Super Bronze, 10000 points or more
  • Hall of Fame,

    Founding Member

A diagram is worth a thousand words. Can you put together a diagram, post it along with a sanitized config ?

cratejockey Sat, 09/22/2007 - 23:27
User Badges:

There is alot here to absorb. Seeing a detailed diagram and safe for public configuration files would be very helpful. However your last line about your old edge connection versus your new connection concerns me.


You mention that your 3750 is "stacked" and that nearly 100 servers are served by your 3750. So the assumption that I make is that you have at least 3 3750 switches stacked using the 32Gbps backplane cables with just under 100 1Gbps connected servers. Assuming that these servers are under heavy usage and that your designer used 48 port switches instead of 24 port switches you could over commit the individual switches backplane. Add this to that fact that you have 1 switch at the top of the stack connected to the PIX you could have 2 heavily committed switches over-committing the 32 Gbit Stacking link to the switch connected to the PIX. Feel free to verify the numbers at http://www.cisco.com/en/US/products/hw/switches/ps5023/products_data_sheet0900aecd80371991.html


The other thing to note is your 535s capabilities. Based on the documentation at http://www.cisco.com/en/US/products/hw/vpndevc/ps2030/products_data_sheet09186a008007d05d.html

The 535 has a max cleartext throughput of about 1.7 Gbps. Again if your a running a failover pair of 535s and you have 100 servers that are outbound of that device you could also be over-committing the PIXs interfaces.


At the end of the day it might not be any of this. I would recommend several things.

1. Open a TAC case ASAP to backstop any finding and or solutions that you make.

2. If you are able to design your own projects, due so. A designer who does not deploy in the field probably should not be designing complex networks. In our environment we had very gifted L2 engineers designing complex L3, Security and High Avail environments with no field experience. Until we put an end to that we were loosing our shirts! If you cannot do this then you need to build time into your deployment to review the design with our without the designer to avoid potential disastrous deployments.

3. This is non-technical but important. Avoid use of threatening language in forums, emails an especially in person! Trust me I know where you are coming from and there are times I would like to strap on some boxing gloves to resolve a work issue. But also from the experience of a friend being sent to anger management to keep his job fro telling his manager he needed his A** kicked its not worth it.


Good luck and if you add more info I will review and try to respond.


www.staticnat.com

Actions

This Discussion