I have followed the example given in the cisco document below to allow multicasting within the same vlan but on different switches,
To sumarize what I did, I added ip pim-sparse statement under the vlan where the MCAST server (Symantec Ghost, and was able to push images to the client but the core switch (6500 sup 720 IOS) cpu shot up to 90% and got back to norm after the image was pushed.
I havent configured any RPs or mcast routing.There is only one core switch connected to several 2950 switches (single uplinks to core).
Is this behaviour normal?
Any help is appreciated.
can you do a ´show proc cpu´ and see if you can identify the process that causes the high CPU ? I remotely remember a similar situation, where the culprit in the end turned out to be the Ghost server (configuration).
Can you try and configure ´ip mroute-cache´ on the VLAN interface where you have the ´sparse-mode´ statement configured ?
Thanks for the reply. I added the statement ip mroute-cache under the same vlan where pim is defined, but got almost the same result. Below is a sample of show proc cpu. The IP input process had the highest cpu utilization
Core#sho proc cpu | exclude 0.00
CPU utilization for five seconds: 93%/84%; one minute: 91%; five minutes: 51%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
4 58392300 4538907 12864 0.47% 0.21% 0.19% 0 Check heaps
105 20811508 153969576 135 7.67% 6.22% 3.41% 0 IP Input
106 4736216 22356527 211 0.07% 0.09% 0.05% 0 CDP Protocol
134 12296 6023371 2 0.07% 0.03% 0.01% 0 Adj Manager
135 8121008 40789586 199 0.15% 0.11% 0.07% 0 CEF process
208 64698260 600222526 107 0.23% 0.20% 0.22% 0 Port manager per
Any advice is appreciated
I was wondering wehther changing from pim sparse mode to pim sparse-dense mode will affect the situation?
Since you have not configured RPs or mcast routing, changing the PIM mode is not going to help.
I have a strong suspicion of what the problem is, though.
Some versions of Ghost server do a 'ttl discovery' thing that if configured in a certain way, will cause some or all of the mcast traffic to hit the router with an expiring ttl.
Ip packets with an expiring ttl can't be hardware switched (the router has to spin a cpu cycle to generate the icmp 'ttl expired' packet).
Ghost clients locate the Ghost server using an expanding TTL search. It first sends out a request with a TTL=1, then TTL=2, etc... until it gets a response from the server. The server sends image data with a TTL equal to the largest TTL value-received from a client.
If the PC and the server are on the same segment, the server will send all image data with TTL=1, which the router sees as an expiring packet and sends to the CPU. In a lot of cases, the result is a high CPU condition on the Cisco device.
Take a look at 'sho ip traffic' and see what 'bad hop count' says. If it's high (and increments when the mcast traffic is running), this is probably our issue.
The best way around this is to disable TTL discovery or manually set the TTL threshold to 7 or some other value > 1.
I tried to check for discovery settings in the Ghost application but was not successful. I only found an option that specifies number of router hops, with a max. of 16. I will reduce it to 1 or even might try zero. But I didnt have time to do it yet
We will be upgrading to a unicast based Ghosting application. Any advice on how to measure the safe number of clients to be imaged simultaneously?
Thanks a lot.
Did you ever get this resolved , we are seeing the exact same thing on a sup 720 with a ghosting ap where it buries the CPU with "ip input" as the cause of the cpu utilization . It took me better part of a month to try and track this down , finally was able to dump the input queue buffer and find out who was doing this . What I see in the buffer is this address , we'll say 192.168.10.2 sending repeated packets to 192.168.10.255 all going to port 7777 . We tracked them down and apparently they ghosting something out to like 40 pcs at once . Anyway to eliminate or lessen the cpu utilization ? Below is what we see. I didn't include all the packet information .
Buffer information for Big buffer at 0x44140798
data_area 0x83BCEC4, refcount 1, next 0x501F5AC0, flags 0x280
linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1
if_input 0x51265070 (Vlan99), if_output 0x0 (None)
inputtime 8w1d (elapsed 02:09:42.424)
outputtime 00:00:00.000 (elapsed never), oqnumber 65535
datagramstart 0x83BCF3A, datagramsize 1486, maximum size 1740
mac_start 0x83BCF3A, addr_start 0x83BCF3A, info_start 0x0
network_start 0x83BCF48, transport_start 0x83BCF5C, caller_pc 0x40335B54
source: 192.168.99.2, destination: 192.168.99.255, id: 0x9380, ttl: 127,
TOS: 0 prot: 17, source port 4032, destination port 7777
From the output you provided, I don't see anything there that would cause it to be punted to the CPU. Can you post a "show ip traffic?"
the problem isn't currently happening but i'll post this anyway. On the bottom is a "show int " for the problem vlan 99 and you a large number of drops and flushes and this is just over a day since it was cleared.
Rcvd: 2935904773 total, 142741780 local destination
0 format errors, 0 checksum errors, 189568 bad hop count
0 unknown protocol, 191485 not a gateway
0 security failures, 0 bad options, 0 with options
Opts: 0 end, 0 nop, 0 basic security, 0 loose source route
0 timestamp, 0 extended security, 0 record route
0 stream ID, 0 strict source route, 0 alert, 0 cipso, 0 ump
Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
0 fragmented, 0 couldn't fragment
Bcast: 94959580 received, 347822 sent
Mcast: 28514452 received, 28197293 sent
Sent: 48988279 generated, 2143035604 forwarded
Drop: 10052992 encapsulation failed, 0 unresolved, 0 no adjacency
70 no route, 0 unicast RPF, 0 forced drop
Drop: 10 packets with source IP address zero
Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 47 unreachable
1849208 echo, 9298 echo reply, 21 mask requests, 3 mask replies, 0 quench
0 parameter, 0 timestamp, 0 info request, 0 other
224 irdp solicitations, 0 irdp advertisements
Sent: 2 redirects, 1140 unreachable, 9164 echo, 1849205 echo reply
0 mask requests, 0 mask replies, 0 quench, 0 timestamp
0 info reply, 2531 time exceeded, 0 parameter problem
0 irdp solicitations, 0 irdp advertisements
Rcvd: 139364551 total, 0 checksum errors, 94172254 no port
Sent: 45638633 total, 0 forwarded broadcasts
Rcvd: 171888 total, 0 checksum errors, 3807 no port
Sent: 180304 total
Rcvd: 0 address requests, 0 address replies
0 proxy name requests, 0 where-is requests, 0 other
Sent: 0 address requests, 0 address replies (0 proxy)
0 proxy name replies, 0 where-is replies
PIMv2 statistics: Sent/Received
Total: 0/0, 0 checksum errors, 0 format errors
Registers: 0/0, Register Stops: 0/0, Hellos: 0/0
Join/Prunes: 0/0, Asserts: 0/0, grafts: 0/0
Bootstraps: 0/0, Candidate_RP_Advertisements: 0/0
IGMP statistics: Sent/Received
Total: 0/0, Format errors: 0/0, Checksum errors: 0/0
Host Queries: 0/0, Host Reports: 0/0, Host Leaves: 0/0
DVMRP: 0/0, PIM: 0/0
Rcvd: 0 total, 0 opens, 0 notifications, 0 updates
0 keepalives, 0 route-refresh, 0 unrecognized
Sent: 0 total, 0 opens, 0 notifications, 0 updates
0 keepalives, 0 route-refresh
Rcvd: 0 total, 0 format errors, 0 checksum errors, 0 no listener
Sent: 0 total
Rcvd: 1674991 total, 0 checksum errors
1018027 hello, 18 database desc, 5 link state req
257102 link state updates, 71357 link state acks
Sent: 1307346 total
Rcvd: 0 total
Sent: 0 total
Rcvd: 35269014 requests, 89297 replies, 326 reverse, 0 other
Sent: 7386077 requests, 8471810 replies (6887812 proxy), 0 reverse
Drop due to input queue full: 1017
Vlan99 is up, line protocol is up
Hardware is EtherSVI, address is 0011.5df3.b800 (bia 0011.5df3.b800)
Internet address is XXX.XXX.XXX.XXX/24
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:01, output 00:00:19, output hang never
Last clearing of "show interface" counters 1d02h
Input queue: 0/75/6175308/6173969 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Hi Glen, thanks for the output, I wanted to verify that a TTL of 1 was not a problem, even though the sample packet was 127. The number of "badhopcount" is low enough that its not the issue here.
Now, the destination IP was a directed broadcast, so I'm assuming that the Ghosting is using directed broadcast mode instead of multicast. Since the Vlan interface is part of the subnet that the directed broadcast is aimed at, that traffic will hit the CPU. Since the high cpu correlates with the Ghosting, I have a strong feeling that this is the culprit.
Can your server folks use multicast instead? If so, make sure they set the TTL to something sufficiently higher than 1 to prevent another high cpu issue with TTL expiry as the cause ;-)
Thanks for the reply after reading your description from what i have seen is exactly what is happening , they are using directed broadcast and this also correlates to the rapidly increasing drops and flushes whenever this is happening . Amazing one user can bury a 720 cpu like that . Thanks again for the reply .
This thread is a very interesting read. Thanx much for the continued responses.
My problem is that I've never done this type of in-depth searching for a "culprit." :)
Would someone mind copy and pasting the parts of the Output that were key factors in finding the problem?
Show me what you read that uncovered the problem?
Bobby do you know why they would be using a directed broadcast to the same subnet that they are attached to? I would understand it if they were direct broadcasting to a different subnet. I profess to not knowing how the ghost application actually works when transmitting information , just wonder why they would be doing a directed broadcast to their own subnet .