2013/09/29: Forward synchronising email

I'm using the way I synchronise my (private) mail for 4.5 years now. Recently, quite a few people asked about details, so I decided to write things up in a level of detail that should allow to easily reproduce my setting. It might seem a lot of infrastructure, but it really grew one simple script at a time.

It started with my dissatisfaction with (the then current version of) offlineimap. First, it took ages over a slow connection, as each time the full list of mails of the folder to be synchronised had to be transmitted. Secondly, it didn't handle well the case of the underlying tcp connection experiencing a time out. The latter situation even lead to data loss. So I decided I wanted to understand where the difficulties in mail synchronisation are. My approach to learn was to write my own mail-synchronisation solution, doing everything the most naive way, and see where it fails. I must say, I haven't learned that much, as, using it for 4.5 years for all my mails now, I haven't experienced any problems. Nevertheless, in my solution you still find the safety nets one would add when testing a program on live email in the expectation of the program doing things wrong.

Forward synchronisation

Admittedly, my situation is quite simple. I have a fixed set of machines on which I read and write emails (my server, my desktop, my laptop), I have a shell of each of them (in fact, root), and this set changes rarely.

So I can connect them to a tree (i.e., an undirected graph, that is connected and acyclic) and assume each machine knows its neighbours. Then forward synchronising is easy. Each machine detects local changes and tells its neighbours about it; if receiving such a notification it applies it locally and tells all neighbours, but the one it received it from, about it. In that way, changes detected locally are propagated to each machine precisely once.

maildirdiff

The scripts I use to generate and incorporate the notifications about changes to a maildir, including their man pages, are contained in the maildirdiff shell archive; my general programs page also contains other versions and a gpg-signature of the hashes.

This section only gives an overview over the scripts and data formats involved. The details of the semantics can be found in the corresponding man pages.

The maildir patch format

As explained, the idea is that each machine detects local changes and sends them to all neighbours. To describe the change to a single mail, I use a simple line-based format.

The status file format

To recognise what has changed locally since the last inspection, for every mailbox, a status file is kept. It contains a line for every mail with the following information separated by tabs.

maildirdiff, maildirpatch, and the loop

Given a status file and an actual maildir, maildirdiff produces a new status file, and a set of patches in the format just described. Given a patch and the location where the status files are stored, maildirpatch applies that patch, updating the status files. To avoid races on the status files, it acquires a lock in the status directory first. Additionally, maildirpatch takes two more arguments, the "deletion directory" and the log file. Concerning the deletion directory, as it all started as an experiment, maildirpatch is, of course, not allowed to delete files; instead every file to be deleted is moved to this directory. The actual deletion happens by my rotate-maildirdiff script, which is called by a daily cron job.

Given the primitives maildirdiff and maildirpatch, what is missing is a loop over all incoming patches and all mailsdirs which for me are precisely the subdirectories of one directory (/home/aehlig/MAIL in my case). This loop is provided by maildirdiff-sync. It reads a configuration file and then does the following.

On my laptop, the configuration file looks as follows.


MAILDIR /home/aehlig/MAIL
STATUSDIR /home/aehlig/.maildirdiff/status
INDIR /home/aehlig/uucp-drop
OUTDIR /home/aehlig/.maildirdiff/out
DELDIR /home/aehlig/.maildirdiff/del
REJECTDIR /home/aehlig/.maildirdiff/reject
TMPDIR /home/aehlig/.maildirdiff/tmp
LOG /home/aehlig/.maildirdiff/log
INCLUDE [a-zA-Z0-9]
EXCLUDE SPAM
EXCLUDE MAIRIX
NEIGHBOUR isilmar batch-patch-uucp %s isilmar

A few remarks on the individual stanzas.

uucp

As means of putting a patch from one machine into the ${INDIR} on another machine, I use uucp with ssh as transport layer (specifying that as a pipe modem). The main advantage is, that I have a form of remote copy that I can invoke at any time, without having to care whether the other system currently is reachable or not. Other advantages are that after an interrupted transfer, uucp can continue at the very byte the interruption occurred. Additionally, it can transparently fall back to other transportation layers, like tunnelling ssh over http using http2tcp.

When working with uucp there are certain things to keep in mind.

The scripts I'm using are batch-patch-uucp to add a patch to the batches, unbatch-patch-uucp to unbatch on my laptop (hilbert) for a given machine, and uucp-batch-drop-hilbert as drop script for my laptop (hilbert) on my server.

Heuristics on when to synchronise

So, the only thing missing are some heuristics on when to call maildirdiff-sync and on when to unbatch.

For maildirdiff-sync you get the bulk of changes by the simple heuristic that most probably the maildir has changed that you just left in your mail reader. Using mutt I have the following in my .muttrc.


folder-hook . 'set my_oldrecord=$record; set record=^; set my_folder=$record; set record=$my_oldrecord'
folder-hook . 'push ":set nowait_key<enter>!/home/aehlig/conf/mutt-enter-folder.pl $my_folder<enter>:set wait_key<enter>"'

Here the script mutt-enter-folder.pl simply calls maildirdiff-sync on the old folder.

This heuristics doesn't eliminate the need to have a cronjob or similar to regularly run maildirdiff-sync to inspect all folders, but that doesn't have to happen at high frequency.

The question on when to unbatch is a bit more complicated, as it has to be a compromise between large batches and prompt synchronisation.

Outgoing mail

My approach to outgoing mails is pretty standard. My server is a mail server, and as such speaks SMTP. My desktop and my laptop have not been assigned a static IP address, so they relay all outgoing mail through my server. To do so, they use uucp, using the BSMTP format. BSMTP ("batched SMTP") essentially is everything you would say to an SMTP server, assuming it would always give the canonical positive answer (a corollary of that assumption is that mails cannot be rejected at the gateway, so if anything goes wrong, bounces have to be sent; therefore, only accept BSMTP from machines for which you are willing to relay unconditionally).

Sending BSMTP with postfix

On my desktop and my laptop, I have a local postfix running. To tell it about the BSMTP service, I have the following lines in the master.cf.


bsmtp     unix  -       n       n       -       -       pipe
  flags= user=uucp argv=/root/bsmtp/bsmtp $sender $nexthop $recipient

The referenced executable bsmtp is a simple perl script, that takes the sender, next hop, and recipients from the arguments and the mail from stdin, and pipes that information formatted as BSMTP to uux - $nexthop!rsmtp.

In the main.cf I just specify that the default is to use BSMTP and relaying to my server (called isilmar).


default_transport = bsmtp
relayhost = isilmar

Receiving BSMTP with qmail

On my server, rsmtp is a simple script that parses BSMTP and passes the mail to qmail-queue. It is located in some directory in the command-path and the hosts for which the server is mail relay host have rsmtp listed in their commands.



Cross-referenced by: