Search K
Appearance
Appearance
The Backend does the primary work of reading and writing mails to storage and handling the bulk of the mail protocol interaction with the client.
The Backends are organized as a pool of independent nodes. A user is not permanently assigned to a specific Backend. However, due to the performance and load reasons, the platform is designed to allow users to move between Backends.
Backends are connected to:
doveadm metacache pull
to pull index files when users are moved.The Backends do NOT need to connect to Backends on foreign sites, or any Proxies.
For Dovecot Pro, and the obox mailbox format, the Backend is connected to the object storage where users' mail data is stored.
As a user connects to Dovecot to read mails, the user's mail indexes are fetched from the object storage and cached in a local file system. The mail indexes are updated locally while the user modifies the mailbox. The modified local indexes are uploaded back to object storage in the background every 5 minutes, except for LMTP mail deliveries. With LMTP, the indexes are uploaded on each 10th mail (see obox_max_rescan_mail_count
) to avoid unnecessary object storage writes. The index updates for LMTP deliveries don't contain anything that can't be recreated from the mails themselves.
Backends are stateless; if the server crashes the only thing lost for logged-in users are the recent message flag updates. When a user logs in the next time to another Backend, the indexes are fetched again from the object storage to local cache. Because LMTP mail deliveries don't update indexes immediately, the email objects are also listed once for each accessed folder to find out if there are any newly delivered mails that don't yet exist in the index.
Backends attempt to do as much as possible within the local cache to minimize object storage I/O. The larger the local cache the less object storage I/O there is. Typically, each Backend should have at least 2 MB of local cache allocated for each active user (e.g. if there are 100,000 users per Backend who are receiving mails or who are accessing mails within 15 minutes, there should be at least 200 GB of local cache on the Backend).
It's important that the local cache doesn't become a bottleneck, so ideally it would be using SSDs. Alternatives are to use in-memory disk (tmpfs) or filesystem on SAN that provides enough disk IOPS. NFS should not be used for the local cache. Dovecot never uses fsync when writing to the local cache, so after a server crash the cache may be inconsistent or corrupt. This is why the caches should be deleted at server bootup, although Dovecot internally attempts to keep track of crashes and won't open an index that was potentially corrupted.
Dovecot's Backend is responsible for indexing messages for use with Full Text Search when a message is delivered to a mailbox.
When an indexing back end is not present, searching falls back to slow and sequential searches through all message headers or text. For commercial grade email, this is unacceptable performance for the end user. Thus, a search indexing backend is a requirement for Dovecot Pro.
Additionally, for storage backends that do not provide fast sequential access to message data (e.g. object storage), it is critically important to perform searches through a global index. On-demand message body searches will simply not be possible for larger mailboxes otherwise.
Dovecot's standard IMAP SEARCH TEXT/BODY parameters use the FTS indexes. Searches through message headers benefit from Dovecot's fast message index cache implementation, which often contains the necessary information. Optionally, header searches can also be done from FTS indexes.
Triggers for FTS indexing are configurable. It can be started on demand, as a batch job, or automatically when new messages arrive. For the best mix of performance and user experience, indexing on delivery is the best option. The indexing takes place when the user may not even be interacting with the system, and spreads the load across delivery time rather than during the peak periods of the day when users may be performing the most mailbox search actions.
Dovecot Pro provides the fts-dovecot plugin to perform the indexing and searching of mail messages. This plugin is a Pro-only feature that does the actual indexing and efficient storage of indexed data, and it is specifically designed for use and optimization with object storage.
The Pro engine uses the Dovecot Core FTS library to perform common search and indexing features:
Feature | Summary |
---|---|
Normalize | Unify saved form of text as much as possible. |
Stemming | Reduce words to their basic form. |
Detect Language | Detect language of processed text to more accurately apply other filters and features. |
Skip Stop Words | A configurable list of words (per language) not to be indexed. |
Skip Bad Characters | Non-language characters, base64 data, HTML tags, etc. will not be indexed. |
Decompound | Index compunded words separately. |
Attachment Search | Allow data in text-based attachments to be indexed. |
The Pro FTS engine provides a feature-rich system that is entirely integrated within the Dovecot Pro platform, with no need for additional storage systems or dedicated indexing nodes.