Re: simple Multicast - same vlan different switch causes high cp

mo shea · ‎01-22-2006

Hi,

I have followed the example given in the cisco document below to allow multicasting within the same vlan but on different switches,

http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml

To sumarize what I did, I added ip pim-sparse statement under the vlan where the MCAST server (Symantec Ghost, and was able to push images to the client but the core switch (6500 sup 720 IOS) cpu shot up to 90% and got back to norm after the image was pushed.

I havent configured any RPs or mcast routing.There is only one core switch connected to several 2950 switches (single uplinks to core).

Is this behaviour normal?

Any help is appreciated.

Thanks

Georg Pauwen · ‎01-22-2006

Hello,

can you do a ´show proc cpu´ and see if you can identify the process that causes the high CPU ? I remotely remember a similar situation, where the culprit in the end turned out to be the Ghost server (configuration).

Can you try and configure ´ip mroute-cache´ on the VLAN interface where you have the ´sparse-mode´ statement configured ?

Regards,

GP

mo shea · ‎01-23-2006

Hi,

Thanks for the reply. I added the statement ip mroute-cache under the same vlan where pim is defined, but got almost the same result. Below is a sample of show proc cpu. The IP input process had the highest cpu utilization

Core#sho proc cpu | exclude 0.00

CPU utilization for five seconds: 93%/84%; one minute: 91%; five minutes: 51%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

4 58392300 4538907 12864 0.47% 0.21% 0.19% 0 Check heaps

105 20811508 153969576 135 7.67% 6.22% 3.41% 0 IP Input

106 4736216 22356527 211 0.07% 0.09% 0.05% 0 CDP Protocol

134 12296 6023371 2 0.07% 0.03% 0.01% 0 Adj Manager

135 8121008 40789586 199 0.15% 0.11% 0.07% 0 CEF process

208 64698260 600222526 107 0.23% 0.20% 0.22% 0 Port manager per

Any advice is appreciated

mo shea · ‎01-24-2006

Hi,

I was wondering wehther changing from pim sparse mode to pim sparse-dense mode will affect the situation?

Thanks

Bobby Thekkekandam · ‎01-24-2006

Since you have not configured RPs or mcast routing, changing the PIM mode is not going to help.

I have a strong suspicion of what the problem is, though.

Some versions of Ghost server do a 'ttl discovery' thing that if configured in a certain way, will cause some or all of the mcast traffic to hit the router with an expiring ttl.

Ip packets with an expiring ttl can't be hardware switched (the router has to spin a cpu cycle to generate the icmp 'ttl expired' packet).

Ghost clients locate the Ghost server using an expanding TTL search. It first sends out a request with a TTL=1, then TTL=2, etc... until it gets a response from the server. The server sends image data with a TTL equal to the largest TTL value-received from a client.

If the PC and the server are on the same segment, the server will send all image data with TTL=1, which the router sees as an expiring packet and sends to the CPU. In a lot of cases, the result is a high CPU condition on the Cisco device.

Take a look at 'sho ip traffic' and see what 'bad hop count' says. If it's high (and increments when the mcast traffic is running), this is probably our issue.

The best way around this is to disable TTL discovery or manually set the TTL threshold to 7 or some other value > 1.

HTH,

Bobby

mo shea · ‎01-24-2006

Thanks bobby.

I tried to check for discovery settings in the Ghost application but was not successful. I only found an option that specifies number of router hops, with a max. of 16. I will reduce it to 1 or even might try zero. But I didnt have time to do it yet

We will be upgrading to a unicast based Ghosting application. Any advice on how to measure the safe number of clients to be imaged simultaneously?

Thanks a lot.

glen.grant · ‎02-07-2006

Did you ever get this resolved , we are seeing the exact same thing on a sup 720 with a ghosting ap where it buries the CPU with "ip input" as the cause of the cpu utilization . It took me better part of a month to try and track this down , finally was able to dump the input queue buffer and find out who was doing this . What I see in the buffer is this address , we'll say 192.168.10.2 sending repeated packets to 192.168.10.255 all going to port 7777 . We tracked them down and apparently they ghosting something out to like 40 pcs at once . Anyway to eliminate or lessen the cpu utilization ? Below is what we see. I didn't include all the packet information .

Buffer information for Big buffer at 0x44140798

data_area 0x83BCEC4, refcount 1, next 0x501F5AC0, flags 0x280

linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1

if_input 0x51265070 (Vlan99), if_output 0x0 (None)

inputtime 8w1d (elapsed 02:09:42.424)

outputtime 00:00:00.000 (elapsed never), oqnumber 65535

datagramstart 0x83BCF3A, datagramsize 1486, maximum size 1740

mac_start 0x83BCF3A, addr_start 0x83BCF3A, info_start 0x0

network_start 0x83BCF48, transport_start 0x83BCF5C, caller_pc 0x40335B54

source: 192.168.99.2, destination: 192.168.99.255, id: 0x9380, ttl: 127,

TOS: 0 prot: 17, source port 4032, destination port 7777

Bobby Thekkekandam · ‎02-07-2006

Hi Glen,

From the output you provided, I don't see anything there that would cause it to be punted to the CPU. Can you post a "show ip traffic?"

thanks,

Bobby

glen.grant · ‎02-07-2006

the problem isn't currently happening but i'll post this anyway. On the bottom is a "show int " for the problem vlan 99 and you a large number of drops and flushes and this is just over a day since it was cleared.

IP statistics:

Rcvd: 2935904773 total, 142741780 local destination

0 format errors, 0 checksum errors, 189568 bad hop count

0 unknown protocol, 191485 not a gateway

0 security failures, 0 bad options, 0 with options

Opts: 0 end, 0 nop, 0 basic security, 0 loose source route

0 timestamp, 0 extended security, 0 record route

0 stream ID, 0 strict source route, 0 alert, 0 cipso, 0 ump

0 other

Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble

0 fragmented, 0 couldn't fragment

Bcast: 94959580 received, 347822 sent

Mcast: 28514452 received, 28197293 sent

Sent: 48988279 generated, 2143035604 forwarded

Drop: 10052992 encapsulation failed, 0 unresolved, 0 no adjacency

70 no route, 0 unicast RPF, 0 forced drop

Drop: 10 packets with source IP address zero

ICMP statistics:

Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 47 unreachable

1849208 echo, 9298 echo reply, 21 mask requests, 3 mask replies, 0 quench

0 parameter, 0 timestamp, 0 info request, 0 other

224 irdp solicitations, 0 irdp advertisements

Sent: 2 redirects, 1140 unreachable, 9164 echo, 1849205 echo reply

0 mask requests, 0 mask replies, 0 quench, 0 timestamp

0 info reply, 2531 time exceeded, 0 parameter problem

0 irdp solicitations, 0 irdp advertisements

UDP statistics:

Rcvd: 139364551 total, 0 checksum errors, 94172254 no port

Sent: 45638633 total, 0 forwarded broadcasts

TCP statistics:

Rcvd: 171888 total, 0 checksum errors, 3807 no port

Sent: 180304 total

Probe statistics:

Rcvd: 0 address requests, 0 address replies

0 proxy name requests, 0 where-is requests, 0 other

Sent: 0 address requests, 0 address replies (0 proxy)

0 proxy name replies, 0 where-is replies

PIMv2 statistics: Sent/Received

Total: 0/0, 0 checksum errors, 0 format errors

Registers: 0/0, Register Stops: 0/0, Hellos: 0/0

Join/Prunes: 0/0, Asserts: 0/0, grafts: 0/0

Bootstraps: 0/0, Candidate_RP_Advertisements: 0/0

State-Refresh: 0/0

IGMP statistics: Sent/Received

Total: 0/0, Format errors: 0/0, Checksum errors: 0/0

Host Queries: 0/0, Host Reports: 0/0, Host Leaves: 0/0

DVMRP: 0/0, PIM: 0/0

BGP statistics:

Rcvd: 0 total, 0 opens, 0 notifications, 0 updates

0 keepalives, 0 route-refresh, 0 unrecognized

Sent: 0 total, 0 opens, 0 notifications, 0 updates

0 keepalives, 0 route-refresh

EGP statistics:

Rcvd: 0 total, 0 format errors, 0 checksum errors, 0 no listener

Sent: 0 total

OSPF statistics:

Rcvd: 1674991 total, 0 checksum errors

1018027 hello, 18 database desc, 5 link state req

257102 link state updates, 71357 link state acks

Sent: 1307346 total

IP-EIGRP statistics:

Rcvd: 0 total

Sent: 0 total

ARP statistics:

Rcvd: 35269014 requests, 89297 replies, 326 reverse, 0 other

Sent: 7386077 requests, 8471810 replies (6887812 proxy), 0 reverse

Drop due to input queue full: 1017

Vlan99 is up, line protocol is up

Hardware is EtherSVI, address is 0011.5df3.b800 (bia 0011.5df3.b800)

Internet address is XXX.XXX.XXX.XXX/24

MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:01, output 00:00:19, output hang never

Last clearing of "show interface" counters 1d02h

Input queue: 0/75/6175308/6173969 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Bobby Thekkekandam · ‎02-07-2006

Hi Glen, thanks for the output, I wanted to verify that a TTL of 1 was not a problem, even though the sample packet was 127. The number of "badhopcount" is low enough that its not the issue here.

Now, the destination IP was a directed broadcast, so I'm assuming that the Ghosting is using directed broadcast mode instead of multicast. Since the Vlan interface is part of the subnet that the directed broadcast is aimed at, that traffic will hit the CPU. Since the high cpu correlates with the Ghosting, I have a strong feeling that this is the culprit.

Can your server folks use multicast instead? If so, make sure they set the TTL to something sufficiently higher than 1 to prevent another high cpu issue with TTL expiry as the cause ;-)

HTH,

Bobby

glen.grant · ‎02-07-2006

Thanks for the reply after reading your description from what i have seen is exactly what is happening , they are using directed broadcast and this also correlates to the rapidly increasing drops and flushes whenever this is happening . Amazing one user can bury a 720 cpu like that . Thanks again for the reply .

Armegeden · ‎02-08-2006

This thread is a very interesting read. Thanx much for the continued responses.

My problem is that I've never done this type of in-depth searching for a "culprit." :)

Would someone mind copy and pasting the parts of the Output that were key factors in finding the problem?

Show me what you read that uncovered the problem?

glen.grant · ‎02-08-2006

Bobby do you know why they would be using a directed broadcast to the same subnet that they are attached to? I would understand it if they were direct broadcasting to a different subnet. I profess to not knowing how the ghost application actually works when transmitting information , just wonder why they would be doing a directed broadcast to their own subnet .

simple Multicast - same vlan different switch causes high cpu utilization