You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Troubleshooting myself in the foot.

Category: ubuntu

anything to do with ubuntu

Java update causing ubuntu installer to fail

On trying to install the Oracle JDK on ubuntu 16.04 I kept getting ERROR 404: Not Found. After some searching I found out that the problem is that there is a new version of java and the installer (which is essentially a wrapper around the Oracle installer) wasn’t updated to reference it.

Since I needed this not as an update, but an install (and I just wanted to know why it was broken), I ended up updating some configs (with the help of pointers from a SO article).

To fix it on ubuntu x64 you need to sub out the file name, URL and checksum for the old version (191) with the new version (201).

cd /var/lib/dpkg/info
sed -i 's|JAVA_VERSION=8u191|JAVA_VERSION=8u201|' oracle-java8-installer.*
sed -i 's|PARTNER_URL=http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/|PARTNER_URL=https://download.oracle.com/otn-pub/java/jdk/8u201-b09/42970487e3af4f5aa5bca3f542482c60/|' oracle-java8-installer.*
sed -i 's|SHA256SUM_TGZ="53c29507e2405a7ffdbba627e6d64856089b094867479edc5ede4105c1da0d65"|SHA256SUM_TGZ="cb700cc0ac3ddc728a567c350881ce7e25118eaf7ca97ca9705d4580c506e370"|' oracle-java8-installer.*
sed -i 's|J_DIR=jdk1.8.0_191|J_DIR=jdk1.8.0_201|' oracle-java8-installer.*

After that, rerun apt-get update, apt-get install oracle-java8-installer and you are set to go.

all temps are not the same

I had an interesting problem – we try to build our VMs as lean as possible so occasionally we will have machines that dont have that much disk or RAM. When RAM is minimal, our /tmp partition, which is a RAMFS device, gets small.

Why does this matter, well lots of the processes that run like to use tmp space for, well, tmp space. If this tmp space fills, then the processes that was running, usually fails.

Have you ever seen an error like this? This was after doing an apt-get dist-upgrade on one of smaller VMs.

tar: ./lib/foo/bar/file.bin: Cannot write: No space left on device
tar: Skipping to next header
tar: Error exit delayed from previous errors
dpkg-deb: subprocess tar returned error exit status 2
debsums: can't unpack /var/cache/apt/archives/foo_i386.deb
E: Problem executing scripts DPkg::Post-Invoke 'if [ -x /usr/bin/debsums ]; then /usr/bin/debsums --generate=nocheck -sp /var/cache/apt/archives; fi'
E: Sub-process returned an error code

apt-get does not like to run out of space, and the /tmp partition is pretty small on this machine. Smaller than the amount of space this package was taking to compile. This is an easy fix however. First you have to completely remove this package. More than likely its got something missing or corrupted. You can do this easily by entering in the following on a console (substituting “foo” for whatever package gave you the error):

# dpkg --purge foo

Once you have the package removed, just run apt-get with “env TMPDIR=/var/tmp” prepended to it. The tmp dir does not have to be /var/tmp, it can be any directory that the user you are running as has write access to.

# env TMPDIR=/var/tmp apt-get install foo

On a slight aside, we sometimes also get stuck with these errors.

dpkg: error processing linux-image-1.2.3-4-server (--purge):
cannot remove `/boot/System.map-1.2.3-4-server': Read-only file system

This one is easy to fix and we've been doing this one for a while. On a console, before you run the command that gave you this error, you need to remount the partition (in this case /boot) in RW.

# mount -oremount,rw /boot

autofs annoyances with ubuntu lucid (10.04)

Like a lot of admins that run ubuntu, we decided to update many of our machines to ubuntu’s next LTS release, lucid lynx, aka ubuntu 10.04.  We dont run a huge shop here, we have under 100 machines, a significant percentage of which are VMs, but repeatedly fixing bugs does still annoy me.  One of the bugs present in lucid is particularly annoying because it affects how autofs starts at boot.  Services have dependencies, and its complicated to sort them out – I get that – but come on ubuntu, dependencies are not a new development and sorting them out should be easy enough for a bunch of smart developers.

The specifics are this:

1. Lucid switched to upstart.  To put it succinctly: “upstart is a replacement for the /sbin/init daemon which handles starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running.”

2. Upstart does not like autofs.

Not to rant too much, but if you are going to replace init, please do it with something that does not require that every person installing various packages has to do the hacks I am about to point out.

Thankfully, there are a bunch of smart, technical people that run ubuntu systems.  On top of that, when things go wrong they complain and post bug reports.  After some quick searching I thought I had fixed the bug.  That was until upstart was updated a couple of weeks back and the boot problems started again with autofs.

The solution is similar to the one I originally implemented, per the suggestion of comment #15, but it works past the update that had broken autofs again.  In the /etc/init directory edit the autofs.conf file and add the following stanza directly after the pre-start script line.

statd_status=`status statd| cut -d, -f1`
while [ "$statd_status" != "statd start/running" ]; do
sleep 5
start statd
let i++; statd_status=`status statd| cut -d, -f1`
if [ $i -gt 10 ]; then
echo "statd startup failed"
fi
done

Once this is in you should be able to (re)start autofs.  Next time the machine is rebooted, autofs will have been started automatically.  Essentially this is the same hack as the one in the aforementioned comment, with the exception that the while loop causes the script to wait until it sees statd has successfully started.

Ok, back to finding more annoying things.

Changing your password

A lot of times people think changing passwords is such a pain. I always look at it in terms of security – how many times has possible malfeasance had the chance to take place in the time that you have been using your password. Its also nice to look at it in terms of a review of where your password is stored. Change you password, everything that has it stored in some cache breaks. Its an A-ha! moment; too few of those in our daily lives. To make a game of it, think of it as a learning activity – how long does it take you to remember it without having to read it back, how long until you are not looking at the keyboard, how long until you have to actually think about your password because your muscles have retained it in memory. Compare to the last time – are you getting better or worse?

With that in mind, following are the instructions on changing a password where your password probably has not been changed in a while.

Changing your ssh key password with ssh-keygen

The -p option requests changing the passphrase of a private key file instead of creating a new private key. The program will prompt for the file containing the private key, for the old passphrase, and twice for the new passphrase. Use -f {filename} option to specifies the filename of the key file. For example, change directory to .ssh:

$ cd .ssh

To change your ssh-key passphrase, enter:

$ ssh-keygen -f id_{rsa or dsa} -p

postfix bug! no, cyrus bug! no, bdb bug! yay!

So this is what you get for my efforts – a discription of something it took me more than a couple of hours to fix and that, in my opinion, were not that easy to find a solution to.

I’ve been installing a new mail server, the usual suspects involved, postfix + cyrus. In the past they have always played well together, and they continue to do so today. Both, I think, are excellent choices when going for mail processing and end user delivery.

However, in this installation instance, which, as we hit the brass tax, did not work quite right.

The installation was done on a normal intel server running ubuntu 6.06.1 LTS release, with nothing special installed – pretty much vanilla. One this that was being done out of the ordinary was using an LDAP backend for the whole thing to authenticate off of. However, that didnt, as things end up, causing any issues.

After everything was installed and tested by the truly faithful who volunteered to move their mail spools over to the new server; it did have new features, was a more powerful server, have updated code, so it was an upgrade from the aging existing server. The problems started showing up after the server was put under some load. Postfix started reporting odd bounced messages, with a most peculiar error.

Jul 18 14:52:48 HOST postfix/lmtp[6763]: 9E7451008B: to=<...>, orig_to=<...>,
relay=/var/run/cyrus/socket/lmtp[/var/run/cyrus/socket/lmtp], delay=1, status=bounced
(host /var/run/cyrus/socket/lmtp[/var/run/cyrus/socket/lmtp] said: 250 2.1.5 ok (in reply to DATA command))

Hey postfix, cyrus said OK and you bounced the email. Are you guys listening to each other?

So I went to google and searched the error and came up with two interesting theories, by other admins who were also having issues with the same thing. I must admit that I would be defensive too, but Wietse and Viktor got a little crabby. Anyway, one says this is the fault of postfix pipelining and the other says this is the fault of cyrus duplicate suppression. One of them was partially right in my opinion.

http://www.irbs.net/internet/info-cyrus/0609/0435.html

http://archives.neohapsis.com/archives/postfix/2007-02/0940.html

So rather than disable pipelining, which would affect performance for almost everything dealing with postfix (although I think I understand now that you can just limit it for one particular place withing postfix and let other clients use it) I went and messed with duplicate suppression.

In imapd.conf I set, per the docs, duplicatesuppression: 0. Restarted cyrus, postfix for good measure and yay it seemed to be fixed. However while under load, the problem came back. Boo. I went ahead and reenabled duplicate suppression in cyrus, it was something that I wanted to keep anyway, however I knew that it had something to do with it so I kept digging in that general vicinity.

Turns out that Jon was right (in the above neohapsis post) and was duplicate suppression causing the problem, but not as he thought. The error that cyrus reported back in debug was the key – cyrus kept reporting how many lockers there were on the duplicate db, and that was causing postfix to abort. Lockers it turns out, related more to the db than to cyrus duplicate delivery.

In this case, the default setting, cyrus was using berkeley db as the backend for the duplicate database. I switched this setting (duplicate_db: skiplist) over to skiplist in imapd.conf and putting the server under load while watching the logs: the errors did not reappear.

Doing some more research on this, I found out that BDB 4.2 has a bug in it (which dapper is running and which cyrus is compiled with). Recompiling cyrus with BDB 4.3 or switching the duplicate suppression to use something that does not involve BDB 4.2 fixed the issue, even under load. Yay!

My pain, your gain.

© 2024 My pain, your gain.

Theme by Anders NorenUp ↑