Re: Accuracy of Mail Flow Monitor's numbers?

mgraci_ironport · ‎02-28-2006

I am running 4.5.6.

The MFM numbers seem to be high (even though they are now a little less since my upgrade from 4.5.5 to 4.5.6).

If I click on a domain listed in the detail of MFM I notice that the detail numbers are not the same as what is on the first page.

For example, domain abc.com is listed on the MFM screen as:
22 Attempted
15 stopped by Rep
7 clean.

When I click on the domain the domain page reports:
7 recieved
5 rejected

Note these are both for the past hour.

I also noticed the numbers in MFM do correlate to the numbers for the same period via MFC?

What gives?

I need to give some numbers to management and don't want to overstate the numbers.

shannon.hagan · ‎02-28-2006

A multiplier is used on the overview page for number of messages blocked by reputation. From the ironport manual:
"Notes on Counting Messages in Mail Flow Monitor
The method Mail Flow Monitor uses to count incoming mail depends on the number of recipients per message. For example, an incoming message from example.com sent to three recipients would count as three messages coming from that sender.

Because messages blocked by reputation filtering do not actually enter the work queue, the appliance does not have access to the list of recipients for an incoming message. In this case, a multiplier is used to estimate the number of recipients. This hard-coded multiplier was determined by IronPort Systems, Inc. and based upon research of a large sampling of existing customer data."

There seem to be discrepencies in the reporting based on what report you use. Some of this appears to be due to "messages" that aren't counted for some reports but are for others (example: system alerts, system reports, undeliverable notifications) and the multiplier that IronPort uses on some of the reports.

Also, I would open a case with IronPort if you are having trouble with the reporting.

tminchin_ironport · ‎03-01-2006

Supposedly the Stopped by Reputation Filter has been wrong since December last year.

Though I doubt you could ever really call it accurate given the fudge factor...

mgraci_ironport · ‎03-01-2006

So Ironport is purposely skewing the numbers to make us think their device is more important and more effective then it really is?

I would just like fairly accurate numbers that don't require me to manually create reports and SQL statements....is that too much to ask?

FYI, I use MFC, but it useless when it comes to summary data.

-Matt

randwacker_ironport · ‎03-01-2006

Hey all, I'm in charge of Reporting here at IronPort and I wanted to try and answer a couple of questions that came up in this thread (last question first):

Regarding the multiplier for "Stopped by Reputation Filter", it is currently set to 3 recipient-messages for every connection rejected. This number was determined based on an analysis of multiple enterprise log files where we found that each blocked connection usually resulted in 1.7 SMTP DATA commands, each with 1.6 SMTP RCPT commands. 1.7 x 1.6 = 2.72 but we had to round the result to avoid floating point math. We've had requests from customers for the ability to change this multiplier (both up and down), and so we're making that possible in our upcoming 4.6.0 release (in final qualification now). I'm unaware of any accounts that "Stopped by Reputation Filter has been wrong since December last year", but if you have specific questions please feel free to contact myself or Customer Support.

Regarding the numbers presented in MFC vs MFM, they currently /are/ different because they are reporting on different things. MFC is counting SMTP DATA commands as a message, while MFM is counting SMTP RCPT commands. We made this change because a number of customers wanted our reporting to more accurately match the accounting for "items" that were going into their backend Exchange and Notes servers. You can retrieve counts of SMTP DATA commands from MFM through the reporting API, but due to the fact that its possible for a single SMTP message to be split and modified in many different ways, our counters for spam and virus detected are all recipient based.

And finally, for the possible discrepancies in the top-level MFM report vs the detailed domain report, these numbers should correlate, so there may be an oddity on your box, or a bug somewhere that we need to fix. Please open up a ticket with Customer Support so we can retrieve the necessary data and figure out what's going on.

Erich_ironport · ‎03-02-2006

When we did a detailed analysis of the average number of recipients per email, our data averaged to 2.86. This was looking at millions of emails per day for a week, so the default of 3 works for me, although, I can see where it would be nice to be able to adjust it. I would love to see a statistic for average recipients per LDAP Invalid Recipients email. From everything I can tell spam has a higher recipient per email rate than valid emails do, and this may be a more accurate way to see what the multiplier should be in your own environment. I’m guessing ours would be closer to 4 or 5 average recipients per spam message.

Also I agree with counting recipients vs. messages. It correlates more accurately to the resultant load on our Exchange systems. And I find myself having to explain the difference between message counts vs. recipient counts. And the way management and the Exchange guys think about it, is how many emails in to every unique mailbox. With 90K+ mailboxes, single instant storage in Exchange databases doesn't buy us enough to even consider it a benefit. So it is always counted as total emails in every mailbox for us.

Donald Nash · ‎03-02-2006

Regarding the multiplier for "Stopped by Reputation Filter", it is currently set to 3 recipient-messages for every connection rejected.  This number was determined based on an analysis of multiple enterprise log files...

I did a similar analysis here last year and came up with 2.2. Like MFM, I needed a way to quantify how much traffic we were eliminating by rejecting connections so I could give a "we got rid of this much spam" number to the suits. I only applied this number to outright rejected connections. Connections subjected to throttling had every rejected RCPT TO: counted in the numbers I gave to the suits. In retrospect, this was probably not a good idea since some spamware hammers over and over with the same recipient when it gets a 4xx in response to RCPT TO:.

I wonder how many other people have re-invented this same wheel? I wonder how close we all got to each other?

randwacker_ironport · ‎03-06-2006

I wonder how many other people have re-invented this same wheel?  I wonder how close we all got to each other?

Its encouraging that we're all going in the same direction though. :wink:

some spamware hammers over and over with the same recipient when it gets a 4xx in response to RCPT TO:

We're seeing some odd new behavior where some spamware will also try to reconnect many many times after a TCP Refuse (like we do when blocking connections). Some sites have seen 60 retries in a minute, which has caused their Rejected Connections number to get huge. Has anyone found a good way to explain this kind of anomaly in their graphs?

Donald Nash · ‎03-06-2006

We're seeing some odd new behavior where some spamware will also try to reconnect many many times after a TCP Refuse (like we do when blocking connections).  Some sites have seen 60 retries in a minute, which has caused their Rejected Connections number to get huge.  Has anyone found a good way to explain this kind of anomaly in their graphs?

I think it's the same behavior as hammering over and over after a 4xx response to RCPT TO. My belief is that it's just poorly written spam code. We tend to think of connection and recipient rejection primarily as ways of thwarting spammers, but that's overlooking our history. True temporary problems such as resource unavailability, a reboot in progress, and other "accidentals" can cause these symptoms as well. Programmers who write spamware are only interested in blasting out as much mail as possible, and don't care about behaving themselves (why should they, when they know that they're already behaving badly just by sending spam). So if your only goal is to deliver as much mail as you can without regard to good behavior, and you have the sort of resources that spammers have (large botnets), then the most logical way to handle temporary errors of any sort is retrying as agressively as you can.

While it's tempting to think of spammers as vindictive people who want to punish the servers that reject them (that's certainly what it feels like, and there have been documented incidents of this), we need to remember that most of them are just in it for the money, and vengeance doesn't make money. Remember Hanlon's Razor.

It's this sort of behavior that makes me want a "BLACKHOLE" behavior available in the HAT. If the response to a TCP refusal is an immediate retry, then let's just throw the SYN on the floor and let the unscrupulous client time out. It won't stop him, but it should slow him down. It is still possible to retry this failure mode agressively, but it's more work because you have to keep lots pending connections open, and I'm not sure the spammers are doing this yet. If they thought that server unreachability was a temporary error worth dealing with agressively, I'd bet money that they would.

bfayne_ironport · ‎03-06-2006

Regarding the multiplier for "Stopped by Reputation Filter", it is currently set to 3 recipient-messages for every connection rejected.  This number was determined based on an analysis of multiple enterprise log files

I am not an enterprise and my average recipients per connection is even lower than what others have reported here in this forum.

I'm glad that 4.6.0 will allow that multiplier to be changed. It seems like the current setting seems to be less of a middle ground than a number that no one really agrees with.