Obox: Admin

The object storage plugin administration is mainly related to making sure that the mail cache and the index cache perform efficiently and they don't take up all the disk space.

Service Stop and Restart

When the dovecot service is stopped, it flushes all pending changes. The idea is to ensure users are not able to re-login onto backends you want to stop. For this you should propagate changes and kick users repeatedly:

First run doveadm metacache flushall -i to flush all pending important changes.
- Since the important changes are usually flushed every 5 minutes, the flushes aren't expected to take excessively long.
Make sure all new imap, pop3, managesieve, submission and lmtp connections are stopped.
Use doveadm kick '*' to kick all the existing imap, pop3 and managesieve connections.
- LMTP connections can't be kicked. However, they're assumed to finish rather quickly.
Re-run doveadm metacache flushall -i to flush the important changes that might have happened while stopping the connections.
Wait approx. 1 second.
Re-run doveadm kick '*' again - just in case there were a few more clients that managed to log in.
Run doveadm metacache flushall -i" one final time.
Run service dovecot stop to shut down the dovecot processes.

This flushing isn't performed when restarting the service or when doing a package upgrade.

There's also metacache-flush.service that can be manually stopped if the flushall isn't wanted to be run.

Simple Upgrade

The simplest way to upgrade Dovecot backend is to run yum upgrade or apt-get upgrade (depending on your distribution you might run another package manager). This causes very little downtime on that server, so most clients can successfully reconnect back to the server after getting disconnected. This method also has the advantage that all the caches are filled up for the users.

INFO

Make sure that the backend is still online in the cluster after the update: doveadm cluster backend status

Complex Upgrade

Sometimes in-place upgrades aren't wanted. Instead the backends are upgraded by first shutting down the backend, upgrading, and then bringing the server back up. See below for problems related to this.

Problems with Backend Shutdown

Users are moved to new backends with empty caches. Filling the caches causes temporary object storage I/O spikes.
"Unimportant changes" are changes that can be regenerated in case of a backend crash. This most importantly means data added to dovecot.index.cache file. Usually these get flushed while the indexes get flushed for other reasons, for example every 10th new mail delivery. Indexes with only unimportant changes are automatically flushed to object storage only when metacache disk space runs out and metacache process decides to clean up some disk space. It flushes the indexes before deleting them so nothing is lost. However, if there is enough available disk space in metacache this may mean that after shutting down Dovecot there may be a lot of indexes in metacache with unimportant changes.
This has two problems:
- When user is moved to a new backend, these missing unimportant changes may need to be regenerated. Usually this means reading a maximum of 9 mails per folder (obox_max_rescan_mail_count = 10), but in some rare situations the cache may have just had huge changes. These changes will need to be re-done on the new backend, which may be expensive.
- Metacache directories with unimportant changes are left lying around.
  - Normally this shouldn't actually cause problems, because:
    - Eventually they may become cleaned up to free up disk space, which normally causes them to be flushed to object storage. However, the flushing isn't performed when another server has already changed the indexes. So an obsolete index bundle won't actually be written to object storage.
    - When opening an obsolete metacache directory with only unimportant changes, it's not used if there's already a newer index bundle. The obsolete directory just gets deleted. Only if there are important changes it performs dsync-merging.

There are some things that can be done to help problems caused by these:

Run doveadm metacache flushall every night. This way there won't be highly out-of-date indexes lying around.
Delete really old obsolete indexes from all backends before shutting down backends. Ideally only when the user isn't assigned to the backend before the shutdown - otherwise it could unnecessarily delete indexes for users who simply haven't been accessed for a while but don't have any newer indexes anywhere.
- When starting up a backend also delete rather old (e.g. >1 day) indexes from metacache.
- Use the last-access(0) timestamp in doveadm metacache list output to determine the user's last access timestamp.
  - If the user isn't found from the list at all, then it's definitely an old index that hasn't so far been accessed in this backend since Dovecot was started up.

TIP

You can't currently use doveadm metacache clean to delete changed indexes. The only alternative is to just forcibly rm -rf the directory. However, if the user happens to be accessed during the rm -rf this can cause index corruption, which can have rather bad consequences (like redownloading all mails). This is why it should verify whether this is the user's currently assigned backend, and only rm -rf users whose backend is elsewhere.

Backend Crashes

Dovecot doesn't use metacache for users that were accessed before the backend crashed the last time. This is done using /var/lib/dovecot/reboots file.

When starting up, Dovecot gets /proc's ctime and adds it to the reboots file. At a clean dovecot service shutdown this timestamp is marked to be "clean". Each .state file in metacache directories contains the /proc ctime when it was last modified. If opening metacache finds that there's been a crash since the last metacache write, the metacache directory is assumed to be corrupted and is deleted.

Normally this works as expected and admin doesn't need to worry about this.

Mail Fscache

The mail cache size is specified in the obox_fs setting as the fscache parameter, which is commonly set to 1-2 GB.

If fscache runs out of disk space, most operations won't return user-visible failures (although errors are still logged). Currently the "mail prefetching" can't transparently handle such failures though, so these errors can result in user-visible failures.

If fscache runs out of disk space, it's usually because one of:

fscache.log doesn't match the actual disk space usage. Maybe due to a bug, or maybe due to crashes.
Users are accessing/saving too large emails. See quota_max_mail_size setting.
Mail files are being kept open for a long time, resulting in already deleted files reserving disk space on the filesystem. For example because a client is downloading a large mail with a slow internet connection.

Generally the problem goes away by syncing fscache.log with reality by running doveadm fscache rescan.

This will update the fscache.log to contain the correct size. It also prints whether the current size was correct or not. It's possible also to manually delete files from fscache by using the rm command. The doveadm fscache rescan command must be run afterwards.

Many of our customers are running the doveadm fscache rescan command in a cronjob every hour. This makes sure that the fscache won't be wrong for too long.

Index Metacache

The metacache index size is specified in the metacache_max_space setting.

This should ideally be as large as possible to reduce both object storage GETs for the indexes and also local filesystem writes when the indexes are unpacked to local cache.

Metacache is rarely large enough to contain indexes for all the users in the backend. This is why it also supports priorities, which attempts to keep the most useful information in the metacache longest to reduce the object storage IO.

For example, INBOX and \Junk folders are usually accessed more often than other folders (due to mail deliveries), so they're prioritized higher than other folders. User's root indexes are prioritized the highest, mainly because they're always required whenever a user is accessed, but also because they're small enough that they can be cheaply kept in metacache for a long time.

The metacache performance can be monitored by looking at the number of index GET and PUT requests. Metacache cleans are also logged by metacache-worker.

To list all users currently known to be in metacache, run doveadm metacache list.

There are 4 priorities for index files:

Priority	Description
0 (highest)	User root indexes
1	FTS indexes
2	`INBOX` and `\Junk` folder indexes
3 (lowest)	Other folders' indexes

You can also manually clean some older indexes from cache by running doveadm metacache clean -u user@domain

If the indexes aren't fully uploaded to the object storage, the clean command will fail.

You can manually upload indexes to object storage with:

doveadm metacache flush -u user@domain
doveadm metacache flushall

It's also possible to flush only indexes with specified priority (and below) with the -p parameter.

If a user no longer actually exists on filesystem, it can be removed from metacache process with doveadm metacache remove user@domain.

This command also supports wildcards, so you can remove e.g. testuser* or even * for everyone.

If multiple backends do changes to the same mailbox at the same time, Dovecot will eventually perform a dsync-merge for the indexes. Due to dsync being quite a complicated algorithm, there's a chance that the merging may trigger a bug/crash that won't fix itself automatically. If this happens, the bug should be reported to get it properly fixed, but a quick workaround is to run:

doveadm -o plugin/metacache_index_merging=none force-resync -u user@domain INBOX

TIP

To allow easier migration of users and to support the new needs brought up with Palomar, the doveadm metacache pull command has been implemented. This command allows to pull the metacache for specific users(s) from another backend.

doveadm metacache pull -u user@domain --latest-only --clean 10.0.0.5

Restarting obox backend with minimal user impact

This procedure is generic procedure to perform backend maintenance with minimal user impact.

TIP

The best way to avoid any user impact is to avoid having to use this procedure in the first place:

Upgrading can be done with installing new Dovecot packages.
Configuration changes can be done by modifying the config file and running doveadm reload. If there are configuration mistakes, the reload will fail and preserve the original configuration. Although this only happens for syntax mistakes and other mistakes that can doveconf(1) can catch - not mistakes that are detected only at runtime.

Shutting Down Backend

WARNING

doveadm cluster backend evacuate can potentially increase the load on the backend significantly if many backends are pulling metacache from it at the same time.

Use doveadm cluster backend evacuate command to move all the user groups out of the backend.
Wait for the evacuation to complete.
Shut down Dovecot on the backend

Now all sessions are gone and backend is ready for upgrade or major config change.

INFO

doveadm cluster backend evacuate does several things:

Set the load factor to 0, and
set the status of the backend to standby.

Thus it is easier to call this command directly. Another option is to manually set the load factor to 0 using doveadm cluster backend update but it might take a significant time until the backend is empty. Set the status of the backend to standby after the backend is empty to signify that the backend is not usable for connections.

Starting Up Backend

Synchronize metacache
Metacache database may not be fully synchronized with the index files that actually exist on filesystem. It's recommended at this stage to either delete the metacache or rescan it.
- Rescan metacache:
  doveadm metacache rescan
- Delete metacache:
  1. Remove old metacache database files.
    As metacache service is now reduced to one file the old files need to be removed.
    sh
```
rm -f /var/lib/dovecot/metacache/metacache-users*
```
  2. Remove metacache from filesystem:
    sh
```
rm -rf /var/dovecot/vmail/*
```
Restart dovecot.
sh
```
systemctl start dovecot
```

Verify with test user that the backend is usable.

# Fetches mailbox list from metacache.
# This is fetched from storage now as metacache is reset.
doveadm mailbox list -u <uid>

# Fetches more info from metacache
doveadm mailbox status -u <uid> messages "*"

# Verifies Dovecot can fetch mail objects from storage
doveadm fetch -u <uid> text all > /dev/null

If all of the above commands succeed, the backend can be put back to production.
Add the backend to the cluster (making sure the load factor is restored):
doveadm cluster backend update --load-factor 100 --status online <backend host>

Delete User Mailbox

After an end user has ended their contract with the service provider providing mailbox service, the mail data needs to be removed from not only the (object) storage but also from Cassandra and cached information also needs to be removed from the serving backend.

Before the actual doveadm commands the user should be disabled in the userdb (e.g. LDAP) to disallow IMAP/POP/LMTP connections but not be removed from the userdb; if user doesn't exist in the userdb, doveadm commands for that user will fail.
After the user is disabled their existing connections should be closed. This is most easily done in the proxy, which forwards the kick command to the user's current backend. This can also be managed by a provisioning system issuing a Doveadm HTTP API call.
doveadm cluster kick john@example.com
Delete the user from the storage running the doveadm obox user delete command in the proxy (or via the provisioning system that needs to issue Doveadm HTTP API calls to proxy):
doveadm obox user delete -u john@example.com
This command removes:
- All mail data in object storage (mails, index bundles, FTS indexes),
- all fs-dictmap data in Cassandra, and
- user's metacache in the user's currently active backend (same backend as where the user was kicked in step 2).
Note that this command does NOT remove:
- User's objects from fscaches (which is usually rather short-lived, so it doesn't take long for the objects to drop out of the cache anyway).
- User's metacache in backends where the user was accessed earlier, but which wasn't cleaned. These metacaches should become cleaned when disk space pressure pushes them out.
- Any other dict data that is not deleted automatically while folders or mails are deleted (e.g. Quota Clone plugin, Last Login plugin).
After the user's data is deleted the user can be removed from userdb. Whether or not that is wanted right away depends on the policy and the provisioning system, whether the provisioning system can keep that email address reserved for typical 6 months before it's assigned to another user or if the userdb needs to keep that information for that period of time.

Troubleshooting

Storage Workarounds

These settings may be useful if some emails are inaccessible.

Issue: An email is inaccessible during a FETCH.
- Workaround: Finish the FETCH as well as possible and return a tagged NO reply. The default is to disconnect the IMAP client immediately on the failure. It depends on the IMAP client whether this behavior is useful or not.
- Setting:
  imap_fetch_failure = no-after
Issue: How to handle Cassandra: Object exists in dict, but not in storage errors.
- Workaround: Return empty emails to the IMAP client. The tagged FETCH response will be OK instead of NO.
- Setting:
  obox_fetch_lost_mails_as_empty = yes
```
remote <webmail IP range> {
  plugin {
    obox_fetch_lost_mails_as_empty = yes
  }
}
```

Guides

Lua Support

Authentication

Databases

Mechanisms

Events

Guides

Mail Delivery

Mailbox

Formats

SQL

Sieve

Extensions

Users

Obox: Admin

Service Stop and Restart

Simple Upgrade

Complex Upgrade

Problems with Backend Shutdown

Backend Crashes

Mail Fscache

Index Metacache

Restarting obox backend with minimal user impact

Shutting Down Backend

Starting Up Backend

Delete User Mailbox

Troubleshooting

Storage Workarounds

Databases

Mechanisms

Formats

Extensions

Obox: Admin ​

Service Stop and Restart ​

Simple Upgrade ​

Complex Upgrade ​

Problems with Backend Shutdown ​

Backend Crashes ​

Mail Fscache ​

Index Metacache ​

Restarting obox backend with minimal user impact ​

Shutting Down Backend ​

Starting Up Backend ​

Delete User Mailbox ​

Troubleshooting ​

Storage Workarounds ​

Obox: Admin

Service Stop and Restart

Simple Upgrade

Complex Upgrade

Problems with Backend Shutdown

Backend Crashes

Mail Fscache

Index Metacache

Restarting obox backend with minimal user impact

Shutting Down Backend

Starting Up Backend

Delete User Mailbox

Troubleshooting

Storage Workarounds