Search K
Appearance
Appearance
Dictmap is a required component of obox.
Using obox with Cassandra is done via the fs-dictmap wrapper, which translates internal "lib-fs paths" into dict API.
The dict API paths in turn are translated to SQL/CQL queries via dict-sql.
Cassandra requires installing Dovecot Pro Cassandra plugin package and the cpp-driver from 3rdparty repository.
The fs-dictmap syntax is:
dictmap:<dict uri> ; <parent fs uri>[ ; <dictmap settings>]
Note
The delimiter between the dictmap configuration components is ‘ ; ‘ (<SPACE><SEMICOLON><SPACE>).
The spaces before and after the semicolon are necessary; otherwise Dovecot will emit a syntax error and exit.
For obox, dictmap requires using the Cassandra dictionary.
Note
Cassandra support is done via Dovecot's SQL dict, because Cassandra CQL is implemented as a lib-sql driver.
Parameter | Description |
---|---|
refcounting-table | Enable reference counted objects. Reference counting allows a single mail object to be stored in multiple mailboxes, without the need to create a new copy of the message data in object storage. |
lockdir=<path> | If refcounting is enabled, use this directory for creating lock files to objects while they're being copied or deleted. This attempts to prevent race conditions where an object copy and delete runs simultaneously and both succeed, but the copied object no longer exists. This can't be fully prevented if different servers do this concurrently. If lazy_expunge is used, this setting isn't really needed, because such race conditions are practically nonexistent. Not using the setting will improve performance by avoiding a Cassandra SELECT when copying mails. |
diff-table | Store diff & self index bundle objects to a separate table. This is a Cassandra-backend optimization. |
delete-dangling-links | If an object exists in dict, but not in storage, delete it automatically from dict when it's noticed. This setting isn't safe to use by default, because storage may return "object doesn't exist" errors only temporarily during split brain. See Obox Troubleshooting: Object exists in dict, but not in storage. |
bucket-size=<n> | Separate email objects into buckets, where each bucket can have a maximum of <n> emails. This should be set to 10000 with Cassandra to avoid the partition becoming too large when there are a lot of emails. |
bucket-deleted-days=<days> | Track Cassandra's tombstones in buckets.cache file to avoid creating excessively large buckets when a lot of mails are saved and deleted in a folder. The <days> should be one day longer than gc_grace_seconds for the user_mailbox_objects table.By default this is 10 days, so in that case bucket-deleted-days=11 should be used.When determining whether bucket-size is reached and a new one needs to be created, with this setting the tombstones are also taken into account. This tracking is preserved only as long as the buckets.cache exists. It's also not attempted to be preserved when moving users between backends. This means that it doesn't work perfectly in all situations, but it should still be good enough to prevent the worst offenses. |
bucket-cache=<path> | Required when bucket-size is set. Bucket counters are cached in this file. This path should be located under the obox indexes directory (on the SSD backed cache mount point; e.g. %h/buckets.cache) |
nlinks-limit=<n> | Defines the maximum number of results returned from a dictionary iteration lookup (i.e. Cassandra CQL query) when checking the number of links to an object. Limiting this may improve performance. Currently Dovecot only cares whether the link count is 0, 1 or "more than 1" so for a bit of extra safety we recommend nlinks-limit=3. |
delete-timestamp=+<time> | Increase Cassandra's DELETE timestamp by this much. This is useful to make sure the DELETE isn't ignored because Dovecot backends' times are slightly different. Recommendation is to use delete-timestamp=+10s |
storage-objectid-prefix=<prefix> | Use fake object IDs with object storage that internally uses paths. See Path Based Object Storages. For example storage-objectid-prefix=%u/mails/ |
storage-passthrough-paths=[full|read-only] | Assume that object ID is the same as the path. See Path Based Object Storages.
|
storage-objectid-migrate | This can be used with storage-objectid-prefix when adding fs-dictmap for an existing installation. See Path Based Object Storages. |
max-parallel-iter=<n> | Describes how many parallel dict iterations can be created internally. The default value is 10 . Parallel iterations can especially help speed up reading huge folders.Changed: 3.0.0 Default changed from 1 to 10 . |
no-cleanup-uncertain | When enabled: If a write to Cassandra fails with uncertainty (dictmap_cassandra_uncertain_writes ) Dovecot does not attempt to clean it up.Changed: 3.0.0 The default behavior changed to attempt to cleanup uncertain writes by default. This setting was added to allow the old behavior. |
The fs-dictmap uses the following dict paths:
Main Access
shared/dictmap/<path>
If refcounting-table is used
shared/dictrevmap/<user>/mailboxes/<folder guid>/<object id>
shared/dictrevmap/<object id>/<object name>
shared/dictrevmap/<object id>
If diff-table is used:
shared/dictdiffmap/<user>/idx/<host>
<host>
shared/dictdiffmap/<user>/mailboxes/<folder guid>/idx/<host>
<host>
dict {
cassandra = cassandra:/etc/dovecot/dovecot-dict-cql.conf.ext
}
# Location of Cassandra Server(s)
#
# ALL local Cassandra nodes should be added; the Cassandra driver code uses
# this list internally to find the initial list of Cassandra nodes.
#
# Cassandra will perform load balancing internally among all the local
# Cassandra nodes (including ones not specified here).
connect = host=10.2.3.4 \
host=10.3.4.5 \
host=10.4.5.6 \
keyspace=mails \
# Cassandra connection port
# port=9042 \
# User/password authentication
# user=cassandra_user \
# password=cassandra_pass \
# If this error is seen: "Host x.x.x.x received invalid protocol response Invalid or unsupported protocol version: 4"
# Add this parameter to force Cassandra protocol downgrade to version 3
# version=3 \
# For multi-DC consistency on normal operation (see below), add:
# write_consistency=each-quorum \
# write_fallback_consistency=local-quorum \
# delete_consistency=each-quorum \
# delete_fallback_consistency=local-quorum \
# Connection/Request timeouts
# connect_timeout=5 \
# request_timeout=5 \
# Define the number of Cassandra access threads to use
# num_threads=4 \
# Use latency-aware routing
# Existence of setting = yes; absence of setting = no
# See: https://datastax.github.io/cpp-driver/topics/configuration/#latency-aware-routing
# latency_aware_routing \
# DEBUG: Warning timeouts; if request takes longer than this amount of seconds, log query at WARN level
# warn_timeout=5 \
# Interval between heartbeats to Cassandra server
# heartbeat_interval=30s \
# If heartbeat hasn't been received for this long, reconnect to Cassandra.
# idle_timeout=1min \
# Automatically retry Cassandra queries. By default
# nothing is currently retried, so these settings should be enabled.
# how many times and in which intervals the execution is retried on top of the original request sent
execution_retry_interval=500ms \
execution_retry_times=3 \
# Cassandra query result paging: Add page_size=n to dovecot-dict-cql.conf.ext's connect setting.
# can also add log_level=debug so it logs about each pageful.
# page_size=500 \
# DEBUG: Set log level
# log_level=debug \
# DEBUG: Output all Cassandra queries to log at DEBUG level
# Existence of setting = yes; absence of setting = no
# debug_queries=yes \
# DEBUG: Output internal metrics in JSON format to this file.
# Format of data can be found at the end of this document.
# metrics=/tmp/dovecot-cassandra.metrics.%{pid} \
# TLS settings (setting any of these will enable TLS)
# Trusted CA certificates
# ssl_ca=/path/to/ca-certificate \
# Level of verification:
# * none = don't verify
# * cert = verify certificate
# * cert-ip = verify IP from CN or SubjectAltName
# * cert-dns = verify hostname from CN or SubjectAltName as determined by reverse lookup of the IP.
# ssl_verify=none
# TLS client certificate
# ssl_cert=<path>
# TLS client private key
# ssl_key=<path>
# TLS client private key password
# ssl_key_password=<string>
The details of how to create the Cassandra tables and the dict mappings that need to be appended to dovecot-dict-cql.conf.ext
are described below.
The connect string is described in more detail in Cassandra configuration.
The following base tables are always needed by fs-dictmap:
Cassandra doesn't handle row deletions very efficiently. The more rows are deleted, the larger number of tombstones and the longer it takes to do lookups from the same partition.
Most of the deletions Dovecot does are index diff & self-bundle updates.
Each Dovecot Backend server always writes only a single such object per folder, which allows storing them with (user, folder, host) primary key and updating the rows on changes, instead of inserting & deleting the rows.
The fs-dictmap diff-table
parameter enables this behavior.
Diff-table requires these additional tables to exist in Cassandra:
Reference counting allows a single mail object to be stored in multiple mailboxes, without the need to create a new copy of the message data in object storage. There are two downsides to it though:
However, the benefits outweigh the concerns as reference counting exchanges expensive storage operations with relatively cheap Cassandra row updates.
The fs-dictmap refcounting-table
parameter enables this behavior.
Reference counting requires an additional table:
There are only two configurations that are currently recommended:
Quorum within a single datacenter (default):
connect = \
# ...other connect parameters... \
read_consistency=local-quorum \
write_consistency=local-quorum \
delete_consistency=local-quorum
Local-quorum guarantees that reads after writes are always returning the latest data. Dovecot requires strong consistency within a datacenter.
Quorum within multiple datacenters:
connect = \
# ...other connect parameters... \
read_consistency=local-quorum \
#read_fallback_consistency=quorum \
write_consistency=each-quorum \
write_fallback_consistency=local-quorum \
delete_consistency=each-quorum \
delete_fallback_consistency=local-quorum
As long as the datacenters are talking to each other, this uses each-quorum for writes. If there's a problem, Cassandra nodes fallback to local-quorum and periodically try to switch back to each-quorum. The main benefit of each-quorum is that in case the local datacenter suddenly dies and loses data, Dovecot will not have responded OK to any mail deliveries that weren't already replicated to the other datacenters. Using local-quorum as fallback ensures that in case of a network split the local Palomar still keeps working. Of course, if the local datacenter dies while the network is also split, there will be data loss.
Using read_fallback_consistency=quorum
allows reads to succeed even in cases when multiple Cassandra nodes have failed in the local datacenter. For example:
Note that if there are only a total of 3 Cassandra nodes per datacenter and 2 of them are lost, writes can't succeed with either each-quorum or local-quorum. In this kind of a configuration having read_fallback_consistency=quorum
is not very useful.
Also note that there are no consistency settings that allow Dovecot to reliably continue operating if Cassandra in the local datacenter no longer has quorum, i.e. at least half of its nodes have gone down. In this case writes will always fail. If this happens, all users should be moved to be processed by another datacenter.
Dovecot normally sends the Cassandra queries with the primary consistency setting. If a write fails because either
Dovecot attempts the query again using the fallback consistency. When this happens, Dovecot also switches all the following queries to use the fallback consistency for a while. The consistency will be switched back when a query with the primary consistency level succeeds again.
While fallback consistency is being used, the queries are periodically still retried with primary consistency level. The initial retry happens after 50 ms and the retries are doubled until they reach the maximum of 60 seconds.
Cassandra doesn't perform any rollbacks to writes. When Cassandra reports a write as failed, it only means that it wasn't able to verify that the required consistency level was reached yet. It's still likely/possible that the write was successful to some nodes. If even a single copy was written, Cassandra will eventually be consistent after hinted handoffs or repairs. This means that even though a write may initially have looked like it failed, the data can become visible sooner or later.
Changed: 3.0.0 When this happens, Dovecot attempts to revert the Cassandra write by deleting it. If this deletion was successful, the object is deleted from storage as well. This is indicated as adding - Object ID ... deleted
after the original write error message.
If the deletion was unsuccessful, it logs file write state is uncertain for object ID ...
For some writes the revert isn't possible, and success is uncertain, not deleting object ID ...
is logged. This also happens when no-cleanup-uncertain
parameter is used. In these cases the object is not deleted in storage.
When the revert wasn't performed, the Cassandra write may become visible at some point later (possibly leading to duplicate mails). If it doesn't become visible, the object becomes leaked in the storage. Currently to avoid these situations an external tool has to be monitoring the logs or exported events, and fixing up these uncertain writes when Cassandra is again working normally. See fs_dictmap_dict_write_uncertain
.
fs-dictmap can be used also with object storages which are accessed by paths rather than by object IDs (e.g. S3). This needs special configuration to avoid unnecessary Cassandra lookups.
Use storage-objectid-prefix=<prefix>
to enable fake object IDs for obox_fs
. These fake object IDs are stored in Dovecot index files, which can be translated into object paths without doing a Cassandra lookup. The translation is simply <prefix>/<object ID>
, unless the migrate feature is used.
Use storage-passthrough-paths=full
to enable passthrough object IDs for obox_index_fs
and fts_dovecot_fs
. With these the object ID is the same as the object path. The object ID is written as an empty string into Cassandra. If this setting is used, the object can't be copied (which is fine, because it is not done for index bundle or FTS objects).
Use storage-objectid-migrate
to enable migration for obox_fs
. Use the storage-objectid-migrate-mails(1)
and storage-objectid-migrate-index(1)
scripts to migrate the indexes and mails. These scripts list all (index bundle, fts and email) objects for the user and add them to Cassandra. Note that the user must be completely inaccessible (imap, pop3, managesieve, mail deliveries) while these scripts are run to avoid data loss.
Before migration the mails are stored in <user>/mailboxes/<mailbox_guid>/<oid>
paths. The migration script adds all these mails to Cassandra using <oid>
as the object ID. The obox-raw-id record is also set to <oid>
. The "extra data" byte in the <oid>
for path based object storages is always 0. For all newly written emails when storage-objectid-prefix
is non-empty, the 0x80
bit is set for the "extra data" byte. This allows generating the object ID from the obox-raw-id (<object-id>
) without a Cassandra lookup:
<prefix>/<object-id>
<user>/mailboxes/<mailbox_guid>/<object-id>
Note that listing object IDs with e.g. doveadm fs iter --object-ids
doesn't add the path prefix. It only returns the <object-id>
.
Newly saved mails can be efficiently copied within dictmap, but migrated mails must first be copied from <user>/mailboxes/<mailbox_guid>/<old-object-id>
to <prefix>/<new-object-id>
.
CREATE KEYSPACE IF NOT EXISTS mails
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
USE mails;
CREATE TABLE IF NOT EXISTS user_index_objects (
u text,
n text,
i blob,
primary key (u, n)
);
CREATE TABLE IF NOT EXISTS user_mailbox_index_objects (
u text,
g blob,
n text,
i blob,
primary key ((u, g), n)
);
CREATE TABLE IF NOT EXISTS user_mailbox_objects (
u text,
g blob,
b int,
n blob,
i blob,
primary key ((u, g, b), n)
);
CREATE TABLE IF NOT EXISTS user_mailbox_buckets (
u text,
g blob,
b int,
primary key ((u, g))
);
CREATE TABLE IF NOT EXISTS user_fts_objects (
u text,
n text,
i blob,
primary key (u, n)
);
CREATE TABLE IF NOT EXISTS user_index_diff_objects (
u text,
h text,
m text,
primary key (u, h)
);
CREATE TABLE IF NOT EXISTS user_mailbox_index_diff_objects (
u text,
g blob,
h text,
m text,
primary key (u, g, h)
);
CREATE TABLE IF NOT EXISTS user_mailbox_objects_reverse (
u text,
g blob,
n blob,
i blob,
primary key (i, n)
);
dovecot-dict-cql.conf.ext
file# WARNING: The order of the map {} sections is important here.
# Do NOT reorder them or the end result may not work.
map {
pattern = shared/dictmap/$user/idx/$object_name
table = user_index_objects
value_field = i
value_type = hexblob
fields {
u = $user
n = $object_name
}
}
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid/idx/$object_name
table = user_mailbox_index_objects
value_field = i
value_type = hexblob
fields {
u = $user
g = ${hexblob:mailbox_guid}
n = $object_name
}
}
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid/$bucket/$object_name
table = user_mailbox_objects
value_field = i
value_type = hexblob
fields {
u = $user
g = ${hexblob:mailbox_guid}
b = ${uint:bucket}
n = ${hexblob:object_name}
}
}
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid/max_bucket
table = user_mailbox_buckets
#value_field = b # for v2.3.13 and older
value_field = b,writetime(b) # for v2.3.14 and newer
#value_type = uint # for v2.3.13 and older
value_type = uint,uint # for v2.3.14 and newer
fields {
u = $user
g = ${hexblob:mailbox_guid}
}
}
map {
pattern = shared/dictmap/$user/fts/$object_name
table = user_fts_objects
value_field = i
value_hexblob = yes
fields {
u = $user
n = $object_name
}
}
### diff-table Settings ###
map {
pattern = shared/dictdiffmap/$user/idx/$host
table = user_index_diff_objects
value_field = m,writetime(m)
value_type = string,string
fields {
u = $user
h = $host
}
}
map {
pattern = shared/dictdiffmap/$user/mailboxes/$mailbox_guid/idx/$host
table = user_mailbox_index_diff_objects
value_field = m,writetime(m)
value_type = string,string
fields {
u = $user
g = ${hexblob:mailbox_guid}
h = $host
}
}
# For listing folder GUIDs during index rebuild:
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid
table = user_mailbox_index_diff_objects
value_field = m
fields {
u = $user
g = ${hexblob:mailbox_guid}
}
}
# Use ONLY if you don’t enable “diff-table” parameter.
#map {
# pattern = shared/dictmap/$user/mailboxes/$mailbox_guid
# table = user_mailbox_index_objects
# value_field = i
# value_type = hexblob
#
# fields {
# u = $user
# g = ${hexblob:mailbox_guid}
# }
#}
### Reference Counting Settings ###
# For reverse set:
map {
pattern = shared/dictrevmap/$user/mailboxes/$mailbox_guid/$object_id
table = user_mailbox_objects_reverse
value_field = n
value_type = hexblob
fields {
u = $user
g = ${hexblob:mailbox_guid}
i = ${hexblob:object_id}
}
}
# For reverse unset and iteration:
map {
pattern = shared/dictrevmap/$object_id/$object_name
table = user_mailbox_objects_reverse
value_field = g
value_type = hexblob
fields {
i = ${hexblob:object_id}
n = ${hexblob:object_name}
}
}
# for reverse gets - this isn't actually used currently
map {
pattern = shared/dictrevmap/$object_id
table = user_mailbox_objects_reverse
value_field = u,g,n
#value_type = hexblob # for v2.2.27.1 and older
value_type = string,hexblob,hexblob # v2.2.27.2 and newer
fields {
i = ${hexblob:object_id}
}
}
Note
Object ID access should always be preferred, since it avoids most of the Cassandra lookups related to emails.
These path-based mappings mainly exist for legacy reasons and for testing.
CREATE KEYSPACE IF NOT EXISTS mails
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
USE mails;
CREATE TABLE user_index_objects (
u text,
n text,
i text,
primary key (u, n)
);
CREATE TABLE user_mailbox_index_objects (
u text,
g blob,
n text,
i text,
primary key ((u, g), n)
);
CREATE TABLE user_mailbox_objects (
u text,
g blob,
b int,
n blob,
i text,
primary key ((u, g, b), n)
);
CREATE TABLE user_mailbox_buckets (
u text,
g blob,
b int,
primary key ((u, g))
);
CREATE TABLE user_fts_objects (
u text,
n text,
i text,
primary key (u, n)
);
CREATE TABLE user_index_diff_objects (
u text,
h text,
m text,
primary key (u, h)
);
CREATE TABLE user_mailbox_index_diff_objects (
u text,
g blob,
h text,
m text,
primary key (u, g, h)
);
CREATE TABLE user_mailbox_objects_reverse (
u text,
g blob,
n blob,
i text,
primary key (i, n)
);
dovecot-dict-cql.conf.ext
filemap {
pattern = shared/dictmap/$user/idx/$object_name
table = user_index_objects
value_field = i
fields {
u = $user
n = $object_name
}
}
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid/idx/$object_name
table = user_mailbox_index_objects
value_field = i
fields {
u = $user
g = ${hexblob:mailbox_guid}
n = $object_name
}
}
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid/$bucket/$object_name
table = user_mailbox_objects
value_field = i
fields {
u = $user
g = ${hexblob:mailbox_guid}
b = ${uint:bucket}
n = ${hexblob:object_name}
}
}
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid/max_bucket
table = user_mailbox_buckets
value_field = b
value_type = uint
fields {
u = $user
g = ${hexblob:mailbox_guid}
}
}
map {
pattern = shared/dictmap/$user/fts/$object_name
table = user_fts_objects
value_field = i
fields {
u = $user
n = $object_name
}
}
map {
pattern = shared/dictdiffmap/$user/idx/$host
table = user_index_diff_objects
value_field = m
fields {
u = $user
h = $host
}
}
map {
pattern = shared/dictdiffmap/$user/mailboxes/$mailbox_guid/idx/$host
table = user_mailbox_index_diff_objects
value_field = m
fields {
u = $user
g = ${hexblob:mailbox_guid}
h = $host
}
}
# For listing folder GUIDs during index rebuild:
map {
pattern = shared/dictmap/$user/mailboxes/$mailbox_guid
table = user_mailbox_index_diff_objects
value_field = m
fields {
u = $user
g = ${hexblob:mailbox_guid}
}
}
map {
pattern = shared/dictrevmap/$user/mailboxes/$mailbox_guid/$object_id
table = user_mailbox_objects_reverse
value_field = n
value_type = hexblob
fields {
u = $user
g = ${hexblob:mailbox_guid}
i = $object_id
}
}
# for reverse unset:
map {
pattern = shared/dictrevmap/$object_id/$object_name
table = user_mailbox_objects_reverse
value_field = g
value_type = hexblob
fields {
i = $object_id
n = ${hexblob:object_name}
}
}
fs-dictmap works by providing a view to Cassandra that ends up looking like a filesystem, which is compatible with the obox mailbox format.
There are several hardcoded paths necessary to accomplish this.
The mapping between the filesystem and the dict keys is:
Filesystem Path | Dict Keys (shared/ prefix not included) | Files |
---|---|---|
$user | Hardcoded idx/ and mailboxes/ | |
$user/idx/ |
| User root index bundles |
$user/mailboxes/ |
| Folder GUID directories |
$user/mailboxes/$mailbox_guid/ |
| Email objects |
$user/mailboxes/$mailbox_guid/idx/ |
| Folder index bundles |
$user/fts/ |
| Full text search index objects |
The filesystem can be accessed using doveadm fs commands. The fs-driver and fs-args parameters are based on the obox_fs
and/or obox_index_fs
settings depending on whether you’re accessing email objects, index objects or both.
The following shell script can be used to list all internal Dovecot filesystem names for a user.
Note
This script must be run on a system configured to access the obox storage, as it reads Dovecot’s configuration.
#!/bin/sh
user="$1"
if [ "$user" = "" ]; then
echo "Usage: $0 <username>" >&2
exit 1
fi
obox_fs_args=`doveconf -h plugin/obox_fs | sed 's/^.*dictmap://'`
if [ "$obox_fs_args" = "" ]; then
echo "plugin/obox_fs not set" >&2
exit 1
fi
obox_index_fs_args=`doveconf -h plugin/obox_index_fs | sed 's/^.*dictmap://'`
if [ "$obox_index_fs_args" = "" ]; then
obox_index_fs_args="$obox_fs_args"
fi
doveadm fs iter dictmap "$obox_index_fs_args" "$user/idx/" | sed "s,^,$user/idx/,"
doveadm fs iter-dirs dictmap "$obox_index_fs_args" "$user/mailboxes/" |
while read mailbox; do
doveadm fs iter dictmap "$obox_index_fs_args" "$user/mailboxes/$mailbox/idx/" | sed "s,^,$user/mailboxes/$mailbox/idx/,"
doveadm fs iter dictmap "$obox_fs_args" "$user/mailboxes/$mailbox/" | sed "s,^,$user/mailboxes/$mailbox/,"
done
Below is a list of operations that dictmap does for accessing data.
👁️: Cassandra operations | 📦: Object storage operations
Refreshing user root index
Refreshing folder index
Writing user root self/diff index
Writing user root base index
Writing folder diff/self index
Writing folder base index
Delivering a new email via LMTP, or saving a new email via IMAP APPEND
Reading email
Deleting email
Copying email
Moving email
Running "doveadm force-resync"
In the (extremely unlikely) case that all Cassandra (fs-dictmap) data is lost, it is possible to recover this information by iterating through all objects stored in the object store.
A rough overview of the process is as follows:
Per benchmark data, sizing of the Cassandra node can be estimated by assuming 50 bytes/email is required to store each message. Thus, assuming 512 GB total storage per Cassandra node (= 256 GB of usable storage + 256 GB for repairs/rebuilds), this means that each node can store data on up to 5.1 billion emails.
For high availability, a minimum of three nodes is required for each data center.
The Cassandra cpp-driver library requires a lot of VSZ memory. Make sure dict process doesn't immediately die out of memory (it may also be visible as strange crashes at startup) by disabling VSZ limits:
service dict-async {
vsz_limit = 0
}
Usually there should be only a single dict-async process running, because each process creates its own connections to the Cassandra cluster increasing its load. The Cassandra cpp-driver can use multiple IO threads as well. This is controlled by the num_threads
parameter in the connect setting in dovecot-dict-cql.conf.ext
. Each IO thread can handle 32k requests simultaneously, so usually 1 IO thread is enough. Note that each IO thread creates more connections to Cassandra, so again it's better not to creates too many threads unnecessarily. If all the IO threads are full of pending requests, queries start failing with "All connections on all I/O threads are busy" error.
If you encounter Object exists in dict, but not in storage
errors in the Dovecot Pro log file you most likely have resurrected deleted data, which happened because of inconsistencies due to replication. See: