Search K
Appearance
Appearance
The object storage plugin administration is mainly related to making sure that the mail cache and the index cache perform efficiently and they don't take up all the disk space.
When the dovecot service is stopped, it flushes all pending changes. The idea is to ensure users are not able to re-login onto backends you want to stop. For this you should propagate changes and kick users repeatedly:
doveadm metacache flushall -i
to flush all pending important changes. doveadm kick '*'
to kick all the existing imap, pop3 and managesieve connections. doveadm metacache flushall -i
to flush the important changes that might have happened while stopping the connections.doveadm kick '*'
again - just in case there were a few more clients that managed to log in.doveadm metacache flushall -i"
one final time.service dovecot stop
to shut down the dovecot processes.This flushing isn't performed when restarting the service or when doing a package upgrade.
There's also metacache-flush.service
that can be manually stopped if the flushall isn't wanted to be run.
The simplest way to upgrade Dovecot backend is to run yum upgrade
or apt-get upgrade
(depending on your distribution you might run another package manager). This causes very little downtime on that server, so most clients can successfully reconnect back to the server after getting disconnected. This method also has the advantage that all the caches are filled up for the users.
INFO
Make sure that the backend is still online in the cluster after the update: doveadm cluster backend status
Sometimes in-place upgrades aren't wanted. Instead the backends are upgraded by first shutting down the backend, upgrading, and then bringing the server back up. See below for problems related to this.
Users are moved to new backends with empty caches. Filling the caches causes temporary object storage I/O spikes.
"Unimportant changes" are changes that can be regenerated in case of a backend crash. This most importantly means data added to dovecot.index.cache
file. Usually these get flushed while the indexes get flushed for other reasons, for example every 10th new mail delivery. Indexes with only unimportant changes are automatically flushed to object storage only when metacache disk space runs out and metacache process decides to clean up some disk space. It flushes the indexes before deleting them so nothing is lost. However, if there is enough available disk space in metacache this may mean that after shutting down Dovecot there may be a lot of indexes in metacache with unimportant changes.
This has two problems:
When user is moved to a new backend, these missing unimportant changes may need to be regenerated. Usually this means reading a maximum of 9 mails per folder (obox_max_rescan_mail_count = 10
), but in some rare situations the cache may have just had huge changes. These changes will need to be re-done on the new backend, which may be expensive.
Metacache directories with unimportant changes are left lying around.
Normally this shouldn't actually cause problems, because:
Eventually they may become cleaned up to free up disk space, which normally causes them to be flushed to object storage. However, the flushing isn't performed when another server has already changed the indexes. So an obsolete index bundle won't actually be written to object storage.
When opening an obsolete metacache directory with only unimportant changes, it's not used if there's already a newer index bundle. The obsolete directory just gets deleted. Only if there are important changes it performs dsync-merging.
There are some things that can be done to help problems caused by these:
Run doveadm metacache flushall
every night. This way there won't be highly out-of-date indexes lying around.
Delete really old obsolete indexes from all backends before shutting down backends. Ideally only when the user isn't assigned to the backend before the shutdown - otherwise it could unnecessarily delete indexes for users who simply haven't been accessed for a while but don't have any newer indexes anywhere.
When starting up a backend also delete rather old (e.g. >1 day) indexes from metacache.
Use the last-access(0)
timestamp in doveadm metacache list
output to determine the user's last access timestamp.
TIP
You can't currently use doveadm metacache clean
to delete changed indexes. The only alternative is to just forcibly rm -rf
the directory. However, if the user happens to be accessed during the rm -rf
this can cause index corruption, which can have rather bad consequences (like redownloading all mails). This is why it should verify whether this is the user's currently assigned backend, and only rm -rf
users whose backend is elsewhere.
Dovecot doesn't use metacache for users that were accessed before the backend crashed the last time. This is done using /var/lib/dovecot/reboots
file.
When starting up, Dovecot gets /proc
's ctime and adds it to the reboots file. At a clean dovecot service shutdown this timestamp is marked to be "clean". Each .state
file in metacache directories contains the /proc
ctime when it was last modified. If opening metacache finds that there's been a crash since the last metacache write, the metacache directory is assumed to be corrupted and is deleted.
Normally this works as expected and admin doesn't need to worry about this.
The mail cache size is specified in the obox_fs
setting as the fscache
parameter, which is commonly set to 1-2 GB.
If fscache runs out of disk space, most operations won't return user-visible failures (although errors are still logged). Currently the "mail prefetching" can't transparently handle such failures though, so these errors can result in user-visible failures.
If fscache runs out of disk space, it's usually because one of:
fscache.log
doesn't match the actual disk space usage. Maybe due to a bug, or maybe due to crashes.
Users are accessing/saving too large emails. See quota_max_mail_size
setting.
Mail files are being kept open for a long time, resulting in already deleted files reserving disk space on the filesystem. For example because a client is downloading a large mail with a slow internet connection.
Generally the problem goes away by syncing fscache.log
with reality by running doveadm fscache rescan
.
This will update the fscache.log
to contain the correct size. It also prints whether the current size was correct or not. It's possible also to manually delete files from fscache by using the rm command. The doveadm fscache rescan
command must be run afterwards.
Many of our customers are running the doveadm fscache rescan
command in a cronjob every hour. This makes sure that the fscache won't be wrong for too long.
The metacache index size is specified in the metacache_max_space
setting.
This should ideally be as large as possible to reduce both object storage GET
s for the indexes and also local filesystem writes when the indexes are unpacked to local cache.
Metacache is rarely large enough to contain indexes for all the users in the backend. This is why it also supports priorities, which attempts to keep the most useful information in the metacache longest to reduce the object storage IO.
For example, INBOX
and \Junk
folders are usually accessed more often than other folders (due to mail deliveries), so they're prioritized higher than other folders. User's root indexes are prioritized the highest, mainly because they're always required whenever a user is accessed, but also because they're small enough that they can be cheaply kept in metacache for a long time.
The metacache performance can be monitored by looking at the number of index GET
and PUT
requests. Metacache cleans are also logged by metacache-worker.
To list all users currently known to be in metacache, run doveadm metacache list
.
There are 4 priorities for index files:
Priority | Description |
---|---|
0 (highest) | User root indexes |
1 | FTS indexes |
2 | INBOX and \Junk folder indexes |
3 (lowest) | Other folders' indexes |
You can also manually clean some older indexes from cache by running doveadm metacache clean -u user@domain
If the indexes aren't fully uploaded to the object storage, the clean command will fail.
You can manually upload indexes to object storage with:
doveadm metacache flush -u user@domain
doveadm metacache flushall
It's also possible to flush only indexes with specified priority (and below) with the -p
parameter.
If a user no longer actually exists on filesystem, it can be removed from metacache process with doveadm metacache remove user@domain
.
This command also supports wildcards, so you can remove e.g. testuser*
or even *
for everyone.
If multiple backends do changes to the same mailbox at the same time, Dovecot will eventually perform a dsync-merge for the indexes. Due to dsync being quite a complicated algorithm, there's a chance that the merging may trigger a bug/crash that won't fix itself automatically. If this happens, the bug should be reported to get it properly fixed, but a quick workaround is to run:
doveadm -o plugin/metacache_index_merging=none force-resync -u user@domain INBOX
TIP
To allow easier migration of users and to support the new needs brought up with Palomar, the doveadm metacache pull
command has been implemented. This command allows to pull the metacache for specific users(s) from another backend.
doveadm metacache pull -u user@domain --latest-only --clean 10.0.0.5
This procedure is generic procedure to perform backend maintenance with minimal user impact.
TIP
The best way to avoid any user impact is to avoid having to use this procedure in the first place:
doveadm reload
. If there are configuration mistakes, the reload will fail and preserve the original configuration. Although this only happens for syntax mistakes and other mistakes that can doveconf(1)
can catch - not mistakes that are detected only at runtime.WARNING
doveadm cluster backend evacuate
can potentially increase the load on the backend significantly if many backends are pulling metacache from it at the same time.
Use doveadm cluster backend evacuate
command to move all the user groups out of the backend.
Wait for the evacuation to complete.
Shut down Dovecot on the backend
Now all sessions are gone and backend is ready for upgrade or major config change.
INFO
doveadm cluster backend evacuate
does several things:
0
, andstandby
.Thus it is easier to call this command directly. Another option is to manually set the load factor to 0
using doveadm cluster backend update
but it might take a significant time until the backend is empty. Set the status of the backend to standby
after the backend is empty to signify that the backend is not usable for connections.
Synchronize metacache
Metacache database may not be fully synchronized with the index files that actually exist on filesystem. It's recommended at this stage to either delete the metacache or rescan it.
Rescan metacache:
Delete metacache:
Remove old metacache database files.
As metacache service is now reduced to one file the old files need to be removed.
rm -f /var/lib/dovecot/metacache/metacache-users*
Remove metacache from filesystem:
rm -rf /var/dovecot/vmail/*
Restart dovecot.
systemctl start dovecot
Verify with test user that the backend is usable.
# Fetches mailbox list from metacache.
# This is fetched from storage now as metacache is reset.
doveadm mailbox list -u <uid>
# Fetches more info from metacache
doveadm mailbox status -u <uid> messages "*"
# Verifies Dovecot can fetch mail objects from storage
doveadm fetch -u <uid> text all > /dev/null
If all of the above commands succeed, the backend can be put back to production.
Add the backend to the cluster (making sure the load factor is restored):
doveadm cluster backend update --load-factor 100 --status online <backend host>
After an end user has ended their contract with the service provider providing mailbox service, the mail data needs to be removed from not only the (object) storage but also from Cassandra and cached information also needs to be removed from the serving backend.
Before the actual doveadm commands the user should be disabled in the userdb (e.g. LDAP) to disallow IMAP/POP/LMTP connections but not be removed from the userdb; if user doesn't exist in the userdb, doveadm commands for that user will fail.
After the user is disabled their existing connections should be closed. This is most easily done in the proxy, which forwards the kick command to the user's current backend. This can also be managed by a provisioning system issuing a Doveadm HTTP API call.
Delete the user from the storage running the doveadm obox user delete
command in the proxy (or via the provisioning system that needs to issue Doveadm HTTP API calls to proxy):
doveadm obox user delete -u john@example.com
This command removes:
Note that this command does NOT remove:
After the user's data is deleted the user can be removed from userdb. Whether or not that is wanted right away depends on the policy and the provisioning system, whether the provisioning system can keep that email address reserved for typical 6 months before it's assigned to another user or if the userdb needs to keep that information for that period of time.
These settings may be useful if some emails are inaccessible.
Issue: An email is inaccessible during a FETCH.
Workaround: Finish the FETCH as well as possible and return a tagged NO reply. The default is to disconnect the IMAP client immediately on the failure. It depends on the IMAP client whether this behavior is useful or not.
Setting:
Issue: How to handle Cassandra: Object exists in dict, but not in storage
errors.
Workaround: Return empty emails to the IMAP client. The tagged FETCH response will be OK
instead of NO
.
Setting:
obox_fetch_lost_mails_as_empty = yes
remote <webmail IP range> {
plugin {
obox_fetch_lost_mails_as_empty = yes
}
}