Introducing ImapGoose

ImapGoose is a small program to keep local mailboxes in sync with an IMAP server. The wording “keep […] in sync” implies that it does so continuously, rather than a one-time sync. ImapGoose is designed as a daemon, monitoring both the IMAP server and the local filesystem, and immediately synchronising changes. When the IMAP server receives an email, it shows up in the filesystem within a second. When an email is deleted on another email client, it is removed1 from the filesystem within a second.

ImapGoose is highly optimised to reduce the amount of network traffic and tasks performed. To do so, it relies on a few modern IMAP extensions and only supports modern email servers. “Modern servers” in the context of email means servers which support extensions which were standardised between 2005 and 2009. ImapGoose uses the CONDSTORE extension (standardised in 2006), which basically allows it to tell the server “I last saw this mailbox when it was in state XYZ, please tell me what’s new”. This avoids the need to download an entire message list (which can be tens of thousands of emails), making incremental syncs much more efficient. It also uses the QRESYNC extension (standardised in 2008) so that the server includes a list of deleted messages too (i.e. VANISHED). Finally, ImapGoose uses the NOTIFY extension (standardised in 2009), which allows an IMAP client to tell the server “please let me know when there are changes to these mailboxes”, and then leave a connection open. NOTIFY has two nice consequences: (1) the client doesn’t need to ask the server if there have been any changes at regular intervals, and (2) the client is informed of any changes immediately, so they can be processed without delay. Unlike the older IDLE extension (from 1996), NOTIFY (from 2009) allows monitoring multiple mailboxes per connection, rather than just one.

In this article, I’ll cover some of the general design details, inner workings and other development details.

General mode of operation

[permalink]

First off, ImapGoose keeps a small status database with some minor metadata about the last-seen status of both the server and local Maildirs. This includes the mapping between server UIDs and filesystem filenames. Its general design is strongly inspired by how OfflineIMAP works.

At start-up, ImapGoose lists all mailboxes in the server and in the local filesystem. It then starts monitoring them (the server via NOTIFY, the client via inotify/kqueue), so we receive notifications of any changes that may happen after our initial listing. This ensures that, for example, if we receive a new email while performing the initial sync, we get a notification for it.

Once monitoring is set up, ImapGoose queues a task to perform a full sync of each mailbox. Initially, we determine if this is the first time we see this mailbox by its absence in the status database. If this mailbox has not been seen before, then we request all messages. The server returns all of these along with a HIGHESTMODSEQ, which we store in the status database. This HIGHESTMODSEQ is a numeric property of each mailbox and increases every time a change occurs inside that mailbox. If a mailbox has been seen before, then we can ask the server for changes since that HIGHESTMODSEQ, which delivers only the minimal amount of data which we need, and nothing else about all the other thousands of unchanged messages.

When a message is present in the server and absent in the filesystem (or vice versa), we need to determine whether it is a new message, or if it is a message that was previously present in both and deleted from the local filesystem. To determine this, we use the status database and apply the exact same algorithm as offlineimap. It’s simple and well tested.

At times, ImapGoose may disconnect from the server (for example, due to a laptop disconnecting from Wi-Fi, or going into sleep mode). It will try to re-connect automatically using an exponential back-off: after 1 second, then after 2 seconds, 4 seconds, 8 seconds, 16 seconds, 32 seconds,… all the way up to 17 minutes. Then it will continue retrying every 17 minutes. This means users don’t really have to worry about ImapGoose’s current state, whether it’s still working, etc. It knows how to back-off when there’s no network and how to get back to work when it is feasible again.

As mentioned above, ImapGoose “queues” sync tasks. Internally, it uses a task queue; when changes are detected on the server, a task to sync that entire mailbox is queued. A worker picks this up from the queue, asks for changes in that mailbox, and synchronises them. When changes are detected in the filesystem, a task to sync that particular message is queued. It may happen that multiple messages arrive in quick succession for the same mailbox. In this case, we don’t want to trigger multiple syncs of the same mailbox, and we especially don’t want two workers to sync the same mailbox concurrently: this would quickly lead to duplicate emails.

To work around concurrent syncs and redundant mailbox updates, ImapGoose uses a “dispatcher”, which hands off sync tasks to workers. When a task to sync a specific mailbox is handed to a worker, that mailbox is marked as “busy”, and we don’t process other tasks for that queue until that worker notifies that it has finished its work on that mailbox. While a worker is synchronising a mailbox, we may receive several notifications that changes have happened to that mailbox. These changes could be the result of the changes made by the worker, or they could be new emails being delivered, so we have to queue another task to sync that mailbox. These tasks are kept in queue until the worker frees up the mailbox, and the dispatcher additionally de-duplicates them: synchronising a mailbox just once after the last change notification is enough to synchronise the changes in all the notifications.

When a message changes in the filesystem, ImapGoose receives an inotify event. This doesn’t trigger a sync of the full mailbox, but instead a “targeted” sync, which focuses only on that email message. We know that a single message has changed, so there’s no point in re-scanning the thousands of messages in the mailbox. These targeted syncs are taken into account in deduplication; they only get de-duplicated if the path for them is the same.

While the connection which is listening for changes from the server is kept alive by sending periodic NOOP commands, the connections for workers are allowed to time out. If no activity is happening, these connections simply time out, but a connection is re-established once a worker needs it again. Great care has been taken to avoid unnecessary churn in all possible aspects.

Prior art

[permalink]

Before developing ImapGoose, I studied prior art in the field. In particular, offlineimap does a great job at synchronising mailboxes. However, it doesn’t “keep in sync” in the same way; offlineimap needs to execute periodic syncs, doesn’t rely on modern extensions, and tends to “hang” when there are network time-outs. ImapGoose is new and has no existing users, so it can just require modern extensions or declare other scenarios as unsupported. Existing tools have to maintain compatibility for existing users, which might rely on some legacy email server. If I couldn’t rely on NOTIFY, implementing ImapGoose in such a clean efficient way would not have been possible. If I couldn’t rely on CONDSTORE and QRESYNC, I would have had to download lists of thousands of emails each time even a single one changes. Thanks to UIDPLUS, the server returns the UID of a newly uploaded message, and we don’t need any ugly workarounds to retrieve it.

If someone needs to sync data from legacy servers, plenty of tools are still out there, providing the best experience which those servers can offer.

Development

[permalink]

When working on ImapGoose, I focused exactly on my needs for my particular use case: keep my local mailboxes in sync with an IMAP server. There’s no other supported scenario, there’s no fallback for legacy servers, and there’s no support for alternative email backends. All these constraints allowed me to focus on making a tool that’s great for a single use case: it does one thing and does it well.

I strongly believe that my keeping tight constraints (e.g.: focusing on just one use case, ignoring support for legacy servers, keeping things as simple as possible) helped develop this much faster and with much cleaner results.

I started with a very clear picture of how the whole thing would work. I was also familiar with go-imap, and knew it to be a well designed and well implemented IMAP library. My immense appreciation goes to emersion and the contributors who’ve worked on it. I didn’t need to worry about the inner details of talking to an IMAP server, parsing responses, tracking connection state, etc. go-imap provides a simple idiomatic Go interface for IMAP commands and their responses.

go-imap was lacking two features which I needed: support for the NOTIFY command and for VANISHED (rfc5162). While still standing on the shoulders of giants, I implemented both of these and sent patches for both of them (NOTIFY, VANISHED). Until those are merged, ImapGoose is built using my own (temporary) fork which has those two patches applied.

Configuration

[permalink]

For configuration, I opted for the very simple and straightforward scfg configuration format. The configuration file looks something like:

account example {
server imap.example.com:993
username hugo@example.com
password-cmd pass show email/example
local-path ~/mail/example
}

Naming

[permalink]

I wanted something easy to remember, easy to pronounce and that won’t yield thousands of unrelated search engine results. There’s also room for an obvious mascot/logo: a goose wearing a postman’s hat carrying an envelope, using the colour palette from the Go ecosystem. Please reach out if you are an illustrator willing to contribute with artwork.

Open source

[permalink]

ImapGoose is open source and distributed under the terms of the ISC licence. The source code is available via git. Feedback is welcome, including bug reports.

Typically, another client moves a message to Trash, and ImapGoose replicates the same operation, but the general idea still stands. ↩︎

General mode of operation

Prior art

Development

Configuration

Naming

Open source

Similar Posts