Want Perl script to sort mail_logs

grant · ‎05-29-2007

I would like a perl script that aggregated all log lines associated with a given message set from reception to end. It would be agonizing slow for me to tackle the job. Anyone out there have such a beast. I am looking for something similar to the qmLogsort script for qmail logs.
Any help really appreciated.
--
Grant Basham

mark_ironport · ‎05-30-2007

I recently wrote a Python script that would:

1. Display matching MIDs based on envelope To, envelope From or Subject
2. Given a MID, display the log lines matching that MID with backtracking to get ICID and DCID log lines and following splintered (split) message as well

I am awaiting time from our QA team to verify the functioning of this script. If this would be of general interest prior to formal QA, I could make it available to the IronPort Nation team for "real world" testing and to give us feedback.

usommer_ironport · ‎06-01-2007

I would like a perl script that aggregated all log lines associated with a given message set from reception to end.  It would be agonizing slow for me to tackle the job.  Anyone out there have such a beast.  I am looking for something similar to the qmLogsort script for qmail logs.
Any help really appreciated.
--
Grant Basham

its already available on the support webpage:
"spamtowho.exe" will do the job.

chhaag · ‎06-01-2007

If I understood the initial request here -- I think smaptowho will not suffice. Spamtowho provides summary data about what types of messages are flowing through the appliance.

To determine everything that happened to an individual message you need the script Mark is talking about. Stay tuned -- we'll get something posted soon.

In the interim, you can use the on box grep to get this done -- albeit in several steps. Instructions are here:
http://tinyurl.com/jb7z4

cheers

grant · ‎06-01-2007

You are correct that I had hoped for more than the spamwhoto summary, though it is a nice tool. I tried saving qmail-style mail logs and using the qmLogsort utility, but that did not give much more information than spamwhoto. I started outlining a perl script to do job, but there are so many different structures possible that it will be a bear to get working. I have been downloading the mail_logs and using a standard linux command-line grep, much likes the url above suggests. I expect I can do that as often as I need the output for years before I amortize the time needed for me to write the script. Maybe when I retire...

It would be nice if the ironport could hang a unique tag on each entry in the full transaction set in the log, hen it would be a trivial job just to sort by tag number,. But I expect that it is as hard to do on the ironport as it is to do in the downloaded logs.

mark_ironport · ‎06-07-2007

Sorry for the delay. Here is a link to the alpha code:

https://supportportal.ironport.com/irppcnctr/srvcd?u=http://secure-support.soma.ironport.com/subproducts/tools/dloads/mid_tracker.py&sid=900017

Please let me know what you think about this utility and whether it addresses your use case.

grant · ‎06-07-2007

Hi Mark.

A reasonable first pass. My desktop is linux, and I generally download the logs to my box and use grep or a "search in pager" on them. If I could push a new logfile thru something that groomed the log by grouping the events, be they incoming or old outgoing-from-queue, good messages and drop/rejects, the information I was looking for would all be together when I hit my target. I could search by to/from and also IP of sending host.

1) In the best of all possible worlds, I am looking for something that would take the whole log as input and output all non-Status log lines grouped by ICID/MESSAGE/DCID. I would include Warning level entries, even if they generated IP or ICID only information.

2) I am using RedHat v5 with python 2.4. If I do -m MID, I get all records for a complex, split message with many recipients. With the split, if I came in with a downstream MID generated by the split, I did not see the upstream ICID/MID. If I start with the initial MID, I see everything. Actually quite an accomplishment.

3) Output of -t TO-ADDRESS or -f FROM-ADDRESS is a grep like output with a single line for each message for the to address, showing an MID that could be used for further work. Seeing "New SMTP ICID, To:, From:" for each hit would be needed if I were chasing messages with the tool.

Again, a useful first pass. Thanks for the response.

msblack_ironport · ‎06-07-2007

Take a look at Splunk. They have a free version which will accommodate logs up to several gigabytes. The paid version is fairly inexpensive. We've just recently started to use Splunk so I can't give detailed feedback. Their website has a cool live demo.

http://www.splunk.com

bfayne_ironport · ‎06-07-2007

Sawmill has a product that looks good but I have yet to throw major amounts of logs at it.

http://www.sawmill.net/

mark_ironport · ‎06-12-2007

Grant,
Thank you for the feedback. The requirements for this code were a little different from your use case. For a future release we are thinking about adding a quick message tracker into the CLI (sample below). This would allow for searching and displaying the onbox logs with a better interface than grep.

Responding to your points:

1. That would require quite a bit more bookkeepping to backtrack due to ICID's possibly being days before DCID's actually deliver the message. In other words, there would have to be a lot of state tracking all MIDs to an ICID and then expiring the ICID information as all of the MIDs get delivered. That would be a bit of work to get efficient with memory usage, performance and just to get it "right". I'm not sure I have the time to work on this level of tool right now.

2. Not backtracking a split MID is by design. I wanted to always track from MID creation to delivery. Based on the CLI, we will not show split MIDs so this is not an issue. Besides, I don't consider split starts as interesting as the actual message being injected via an ICID. I just wanted to handle splits correctly in case someone wanted just that information. Oh, and because the split has the originating MID, it is easy to just query on that MID.

3. I actually had the tool automatically follow (via 2 passes) a To and From. I took it out since multiple hits would generate a lot of data. The intent was to help narrow down all the logs entries for a single message.

I agree that Splunk is a cool tool and could be used to harvest additional data very efficiently. There are also good use cases for the tool and CLI that I wrote.

Mark

-------


c600.mfg> tracker

Currently configured logs:
1. "mail_logs" Type: "IronPort Text Mail Logs" Retrieval: FTP Poll
Enter the number of the log you wish to use for message tracking.
[]> 1

1. Track by envelope FROM
2. Track by Message ID
3. Track by Subject
4. Track by envelope TO
[1]> 1

Enter the regular expression to search for.
[]> mark

1. MID 13 (Fri May 11 15:04:50 2007) 
2. MID 17 (Fri May 11 15:34:03 2007) 
3. MID 21 (Fri May 11 15:35:09 2007) 
[1]> 13

Fri May 11 15:35:09 2007 Info: New SMTP ICID 15 interface main (172.18.0.40) address 172.18.0.40 reverse dns host porky3.mfg verified yes
Fri May 11 15:35:09 2007 Info: ICID 15 ACCEPT SG None match ALL SBRS rfc1918
Fri May 11 15:35:09 2007 Info: Start MID 21 ICID 15
Fri May 11 15:35:09 2007 Info: MID 21 ICID 15 From: 
Fri May 11 15:35:09 2007 Info: MID 21 ICID 15 RID 0 To: 
Fri May 11 15:35:09 2007 Info: MID 21 ICID 15 RID 1 To: 
Fri May 11 15:35:09 2007 Info: MID 21 ICID 15 RID 2 To: 
Fri May 11 15:35:09 2007 Info: MID 21 Message-ID '<5v8i84>'
Fri May 11 15:35:09 2007 Info: MID 21 Subject 'This is a [foobar] test '
Fri May 11 15:35:09 2007 Info: MID 21 ready 509 bytes from 
Fri May 11 15:35:09 2007 Info: MID 21 was split creating MID 22 due to a per-recipient policy foo in the inbound table
Fri May 11 15:35:09 2007 Info: MID 22 ICID 0 From: 
Fri May 11 15:35:09 2007 Info: MID 22 ICID 0 RID 0 To: 
Fri May 11 15:35:09 2007 Info: MID 21 was split creating MID 23 due to a per-recipient policy bar in the inbound table
Fri May 11 15:35:09 2007 Info: MID 23 ICID 0 From: 
Fri May 11 15:35:09 2007 Info: MID 23 ICID 0 RID 0 To: 
Fri May 11 15:35:09 2007 Info: MID 21 was split creating MID 24 due to a per-recipient policy DEFAULT in the inbound table
Fri May 11 15:35:09 2007 Info: MID 24 ICID 0 From: 
Fri May 11 15:35:09 2007 Info: MID 24 ICID 0 RID 0 To: 
Fri May 11 15:35:09 2007 Info: Message finished MID 21 done
Fri May 11 15:35:09 2007 Info: MID 22 queued for delivery
Fri May 11 15:35:09 2007 Info: MID 23 queued for delivery
Fri May 11 15:35:09 2007 Info: MID 24 queued for delivery
Fri May 11 15:35:09 2007 Info: New SMTP DCID 16 interface 172.18.0.40 address 172.18.0.102 port 25
Fri May 11 15:35:09 2007 Info: Delivery start DCID 16 MID 22 to RID [0]
Fri May 11 15:35:09 2007 Info: Message done DCID 16 MID 22 to RID [0]
Fri May 11 15:35:09 2007 Info: MID 22 RID [0] Response 'sent'
Fri May 11 15:35:09 2007 Info: Message finished MID 22 done
Fri May 11 15:35:09 2007 Info: Delivery start DCID 16 MID 23 to RID [0]
Fri May 11 15:35:09 2007 Info: Message done DCID 16 MID 23 to RID [0]
Fri May 11 15:35:09 2007 Info: MID 23 RID [0] Response 'sent'
Fri May 11 15:35:09 2007 Info: Message finished MID 23 done
Fri May 11 15:35:09 2007 Info: Delivery start DCID 16 MID 24 to RID [0]
Fri May 11 15:35:09 2007 Info: Message done DCID 16 MID 24 to RID [0]
Fri May 11 15:35:09 2007 Info: MID 24 RID [0] Response 'sent'
Fri May 11 15:35:09 2007 Info: Message finished MID 24 done

grant · ‎06-12-2007

Many thanks for everyone's input. I am going to use the -msg-csv option from the Ironport "spamwhoto" utility to get a pretty well groomed message-based view of the logs. It suits my management style. I have looked at everything else suggested; they all have their weak and strong points. Our Ironport engineer sent along a perl implementation of the spamhowto utility that I could scavenge for code to do my own, but I think I am going to minimize my input at this point. I have what I need.. Again, thanks to all.
--G

lucas.castro_ironport · ‎06-14-2007

Hi Mark,

it would be very nice to have this "tracker" script you've made incorporated to the system.
It would spare everyone from a lot of grep job hehe.

mark_ironport · ‎06-18-2007

Right now it is slated to be in the Luxor (AsyncOS 5.5 for email) release as a CLI only command. We will be looking into making it a GUI page as well in a future release.

ghoule_ironport · ‎06-22-2007

I'm currently using Splunk to aggregate and index syslog data from our mail servers. It make message tracking, even through multiple mail servers, very easy.

It would be quite a bit easier if Ironport included the MID in every log entry for a particular message. If the MID was on each ICID and DCID line it would decrease the number of steps to watch a message enter and leave the ironport.

Donald Nash · ‎06-24-2007

If the MID was on each ICID and DCID line it would decrease the number of steps to watch a message enter and leave the ironport.

The MID can't be on each ICID line because the incoming connection is established before any messages are received. The MID isn't assigned until the client starts sending a message (the MAIL FROM command). It is possible to have an SMTP connection that doesn't result in any message being transmitted (a probe connection), or that results in multiple messages being transmitted. So there might not be any MID at all, or there might be multiple of them.

Likewise, a single delivery connection can send any number of messages, so which MID are you going to put on all the DCID lines?