sendmail and failover-config for two C100 without loop?

Pat_ironport · ‎04-03-2008

We have running two C100. The first handles 99,9% of the load and the second should be there for a failover or just because we have to apply updates and reboot the first device.
Our ASP is using 'sendmail' to send/receive all the mails to/from us. If the first C100-appliance isn't reachable anymore, the second will be addressed (this works - on every reboot of the first C100 we can see a small amount of mails on the second C100 - until the first C100 is up&running again.)

In this configuration, we have a "strange" problem:
All the rejected mails from the first appliance (ie 'recipient not valid') goes back to the 'sendmail' from our ASP. But his 'sendmail' doesn't send such messages outgoing back to the sender elsewhere on the internet, he send such messages again to us to our second C100.
Our second C100 does the same checks again and send the already rejected message again back to our ASP.
THEN - finally - the mail goes back to the sender elsewhere in the internet

Has anyone an idea, why 'sendmail' should do this? Has anyone a similar configuration with 'sendmail' and could give us some advices?

(BTW: Outgoing mails with origin from within our company doesn't make this little loop - they will processed from our first C100 and don't come back to the second C100).

Donald Nash · ‎04-08-2008

It sounds like sendmail at your ASP is walking all the MX records trying to deliver the message, even when its gets a presumably authoritative "no such user" from the first one. I believe this is standard behavior for sendmail, but I'm not certain because we don't use sendmail here. This is a somewhat controversial behavior. Some people say that sendmail is doing the right thing by doing its utmost to deliver the message. Others say that sendmail is being too aggressive, and should respect permanent errors if it receives them from the most preferred MX server. I tend to be in the latter camp.

There's not much you can do about this. We don't have this problem because we put all our ESAs behind a load balancer, and thus have only a single MX record visible to the outside world. That's the only solution I can think of, other than just living with it. After all, it is harmless, just annoying.

Pat_ironport · ‎04-08-2008

Thank you for the explanation! And 'yes' you are right: It is harmless, but needless at the same time.

On the other hand I could say: This is a special way to check if the failover-machine is working well. :roll: :wink:

Donald Nash · ‎04-08-2008

Yes it's needless, but you wouldn't have noticed it if you hadn't seen it in your logs. It's not worth worrying about. They're burning their own resources more than they're burning yours.

My personal preference is not to have a "failover" machine because it only gets exercised when the primary is down. I prefer to have both machines share the load, with enough headroom that one of them could take the entire load if necessary.

Pat_ironport · ‎04-09-2008

Actually I prefer the solution with two machines, because I can apply updates and test some configurations on the second machine before I do the same on the first one.

But I see your contrary point with load balancing as well!

jloehler_ironport · ‎04-09-2008

Pat,
ask your ISP for sharing the load between your two C100 by doing a round robin based on DNS. I don't think that he uses official MX records to forward the mail to your ironports. If he makes use of MX records, ask him for changing the MX preferences.

Joerg

Pat_ironport · ‎04-09-2008

Is it true that 'Round Robin' is not officially supported from sendmail?

Donald Nash · ‎04-09-2008

Actually I prefer the solution with two machines, because I can apply updates and test some configurations on the second machine before I do the same on the first one.

That's a good point. We actually do that as well, but we have a larger server farm than you do, and we have a hardware load balancer to spread the traffic. Exercising our test unit with production traffic is much less important because our headroom capacity is spread across all our production units. A single ESA failure will simply result in the others taking up the slack.

In your situation, I'd think strongly about getting an inexpensive load balancer. You could then send production traffic to both units most of the time, then take one unit offline for testing when necessary. And as I mentioned before, it would also solve the problem with your ASP's mail server that started this thread.

steven_geerts · ‎04-11-2008

Maybe a silly sugestion, but why don't you ask your ISP to change the MX records for you domain(s) so that the mail is delivered directly to your Ironport machines and no longer runs over their Sendmail.
You have two machines so redundancy is assured. If you want an external backup you can always put the providers Sendmail in your MX with a higher preference.
If your (connection) ISP is blocking direct connections to your port 25, I would strongly advise you to seek for another provider.

By putting the Sendmail server of your provider in front of your Ironport machine you are totally decommissioning Senderbase. We drop 98 to 99 percent of our spam by the Sendabase techniques. Senderbases functionality expects the internet mail servers to connect directly to your Ironport. Only then it can do it's job properly.

Best regards Steven