We've been struggling with Mars performance for quite a while now.
It's too the point now where strategically we're thinking we may have
to migrate away from this solution. I don't just mean the slow GUI...I
mean issues with dropping events and high pnparser and java processor
utilization. pnparser frequently restarts. I'm curious if anyone else
is in a situation similar to ours. We have a 200 (4.3.1) that is
processing about 90-100 million events per day. We don't collect
netflow and we don't use SNMP. We collect about 80-90 million events
per day from a single checkpoint management server. The next busiest
devices are our domain controllers and they are a distant 2nd.
1) Does anyone else have a MARS 100 or 200 seeing this many events per
second (theoretically, even the 100 should be able to handle this
load)? If so, what is your typical pnparser processor and memory
utilization (sysstatus)? Does pnparser constantly get restarted (i.e.
2) Does anyone have a checkpoint environment that processes this many
events per day? If so, do you collect them from a single management/
3) I consider our implementation to be "ideal" from a performance
standpoint. What I mean by that is since we don't do netflow or SNMP,
the opportunity to reach the marketed EPS is greatest. Unfortunately,
I don't believe the 200 is capable of even half its rated capacity
even in this situation. Can anyone speak to there experience along
these lines (MARS rated capacity versus real capacity)?
I agree with your sentiments about the MARS box. We too had issues with our MARS 200GC dropping data. We were sending Netflows to the box, though, which could have caused some issues. All in all, I think the MARS is an expensive piece of crap. The GUI is too slow and clunky to get things done efficiently, it needs a more robust CLI, the reporting is too rigid, it is practically impossible to get it working on the first try, every time it is a one off, etc. What other SIM type devices are you considering?
well, we came from a Netforensics environment which was pricey and not very stable either (the developer support was incredible though...you could actually TALK to one and get things done). There are a lot of new options in the SIM space that didn't exist a couple years ago. Frankly, I'm about ready to throw any device in my environment that has a Cisco label on it away and going completely open source --lucky for Cisco I don't manage the network;-). If one of the open-source SIM's doesn't meet our needs we may just build our own.
Not counting netflow events I have around 600 EPS reported on a MARS 50 and I must add I'm a bit disappointed on MARS performance wize.
GUI is slow despite 10-15% idle CPU and if I add netflow from two more edge routers the whole system becomes almost unusable.
I'm curious if anybody contributing to this thread is using the ethernet1 interface on the MARS box. It is purported to have more RAM and will produce a faster refresh of the screen. I can attest to its usefulness so I just wanted to separate this parameter from the normal slowness of the GUI. Most installations I have come across don't use it because it requires that a static route be added to the MARS box from the command line.
Thats an interesting point. I do use eth1 but as log gathering interface and eth0 is my management interface, I believe this is how the manual suggested it to be set up
However if the other way around would increase performance? That's something I think I'll try
So a question is, does the BU have a tool to measure the EPS and to generate a large number of EPS to test? That would be pretty handy.
I use the second interface just for logging in and viewing the GUI and the GUI is still painfully slow. I've opened TAC case after TAC case after TAC case. What has gotten fixed? Nothing. Enough yelling on my part got a temporary patch for the raw message retrieval, and that's it.
I'm with the guy who said he's ready to ditch the Cisco equipment. Any suggestions on a MARS alternative?
The total events per day that you say your MARS200 is receiving figures to about 1k eps, which is about 1/10th of the rated capacity. How are you masuring the dropped events? If your assessment of the performance your box is getting is correct, you should file a customer case asap. If you're not using eth1, I suggest using eth1 to receive events from monitored devices and eth0 exclusively for management purposes.
There is an event/incident generated when Mars drops events (supposedly, although I suspect that the frequent pnparser crashes also result in "dropped events"). I have a TAC case open, multiple in fact. Cisco does not appear to have an adequate way to measure peak EPS, although I'm quite sure it's no where near the rated capacity anyway.
re: utilizing the second interface. Cisco TAC mentioned this in a prior case. I basically told them to show me the money. eth0 is a gig interface with utilization so low that it can hardly be graphed...help me understand why utilizing a second interface would help? Are they different hardware? different drivers? I may test using the second interface for management, but everything I know about the stack tells me it should not make a difference.
Keep up the good work and please report your findings. I give it a "5" for your tenacity and your desire to get to the bottom of this issue.
Thanks Paul. We haven't given up yet;-) I've got our network group running cables right now. If the second interface helps, I'll be sure to reply as such.
Hi, can you email your case information to firstname.lastname@example.org and we'll look into this. It could be that you may be hitting a known issue but we'll need to look at this further. Please email me directly and I'll be glad to have this looked into.