My pain, your gain.

Troubleshooting myself in the foot.

Page 2 of 2

PAM configurations

Ok, so this one did not take me hours, but I did have to go read a manual because no one seems to explain these settings – everyone just says “hey! these settings worked for me, plug em in.” That was my cue to try and understand it better.

Turns out that the Linux PAM system administrators guide was not really that bad and had some very useful explanations of how things work.

One thing I was looking to find out what the new bracket “[]” syntax for PAM config files works. Most admins will probably have seen this example somewhere:

auth [success=1 default=ignore]
auth required try_first_pass
auth required

What is up with those brackets? Turns out its pretty simple. According to the docs, the value=action control values are just a way of telling PAM what to do with the result of the rule (In this case, the rule is go check and tell me what it said).

The value for the control can be set to any of the return values of the rule – success being rather obvious, default meaning anything that I didnt explicitly write. The action for the control can be ignore, bad, die, ok, done, reset or a number – the number just means skip the next N rules (the PAM SAG explains the config syntax fully).

In the above case, PAM is being instructed, in the common-auth config file to check LDAP first, ignore its failure if it cant find a username:password pair in the directory, or skip the next rule if it does find the pair. If it fails, it falls back to local accounts, if it succeeds iit jumps to the rule which just permits anything it gets (keep in mind that if any required rule fails, the whole block fails, so thats why cant permit something that fails).

Thats it – pretty thought out by those clever PAM people. I hope I’ve explained it well enough here to help out. For a more in depth read, check out the Linux PAM system administrators guide.

My – aww man, do I really have to read this manual – pain, your gain.

postfix bug! no, cyrus bug! no, bdb bug! yay!

So this is what you get for my efforts – a discription of something it took me more than a couple of hours to fix and that, in my opinion, were not that easy to find a solution to.

I’ve been installing a new mail server, the usual suspects involved, postfix + cyrus. In the past they have always played well together, and they continue to do so today. Both, I think, are excellent choices when going for mail processing and end user delivery.

However, in this installation instance, which, as we hit the brass tax, did not work quite right.

The installation was done on a normal intel server running ubuntu 6.06.1 LTS release, with nothing special installed – pretty much vanilla. One this that was being done out of the ordinary was using an LDAP backend for the whole thing to authenticate off of. However, that didnt, as things end up, causing any issues.

After everything was installed and tested by the truly faithful who volunteered to move their mail spools over to the new server; it did have new features, was a more powerful server, have updated code, so it was an upgrade from the aging existing server. The problems started showing up after the server was put under some load. Postfix started reporting odd bounced messages, with a most peculiar error.

Jul 18 14:52:48 HOST postfix/lmtp[6763]: 9E7451008B: to=<...>, orig_to=<...>,
relay=/var/run/cyrus/socket/lmtp[/var/run/cyrus/socket/lmtp], delay=1, status=bounced
(host /var/run/cyrus/socket/lmtp[/var/run/cyrus/socket/lmtp] said: 250 2.1.5 ok (in reply to DATA command))

Hey postfix, cyrus said OK and you bounced the email. Are you guys listening to each other?

So I went to google and searched the error and came up with two interesting theories, by other admins who were also having issues with the same thing. I must admit that I would be defensive too, but Wietse and Viktor got a little crabby. Anyway, one says this is the fault of postfix pipelining and the other says this is the fault of cyrus duplicate suppression. One of them was partially right in my opinion.

So rather than disable pipelining, which would affect performance for almost everything dealing with postfix (although I think I understand now that you can just limit it for one particular place withing postfix and let other clients use it) I went and messed with duplicate suppression.

In imapd.conf I set, per the docs, duplicatesuppression: 0. Restarted cyrus, postfix for good measure and yay it seemed to be fixed. However while under load, the problem came back. Boo. I went ahead and reenabled duplicate suppression in cyrus, it was something that I wanted to keep anyway, however I knew that it had something to do with it so I kept digging in that general vicinity.

Turns out that Jon was right (in the above neohapsis post) and was duplicate suppression causing the problem, but not as he thought. The error that cyrus reported back in debug was the key – cyrus kept reporting how many lockers there were on the duplicate db, and that was causing postfix to abort. Lockers it turns out, related more to the db than to cyrus duplicate delivery.

In this case, the default setting, cyrus was using berkeley db as the backend for the duplicate database. I switched this setting (duplicate_db: skiplist) over to skiplist in imapd.conf and putting the server under load while watching the logs: the errors did not reappear.

Doing some more research on this, I found out that BDB 4.2 has a bug in it (which dapper is running and which cyrus is compiled with). Recompiling cyrus with BDB 4.3 or switching the duplicate suppression to use something that does not involve BDB 4.2 fixed the issue, even under load. Yay!

My pain, your gain.


I decided to stop just searching for info and elevate myself from leech status by blogging on problems that it took me more than a couple of hours to fix. My pain in fixing the issues, your gain in benefiting from my fixorating stuff (your pain possibly in having to read my writing however).

Newer posts »

© 2023 My pain, your gain.

Theme by Anders NorenUp ↑