Search K
Appearance
Appearance
The Dovecot Backend does all the hard work of reading and writing mails to storage and handling all of the IMAP/POP3/LMTP protocols.
For Dovecot Pro, and the obox mailbox format, the Backend is connected to the object storage where users' mail data is stored.
As a user is connecting to Dovecot for reading mails, the user's mail indexes are fetched from the object storage and cached in local file system. The mail indexes are updated locally while the user does mailbox modifications. The modified local indexes are uploaded back to object storage on background every 5 minutes, except for LMTP mail deliveries. With LMTP, the indexes are uploaded only every 10th mail (see obox_max_rescan_mail_count
) to avoid unnecessary object storage writes. The index updates for LMTP deliveries don't contain anything that can't be recreated from the mails themselves.
Backends are stateless, so should the server crash the only thing lost for the logged in users are the recent message flag updates. When user logs in the next time to another Backend, the indexes are fetched again from the object storage to local cache. Because LMTP mail deliveries don't update indexes immediately, the email objects are also listed once for each accessed folder to find out if there are any newly delivered mails that don't exist yet in the index.
Backends attempt to do as much in local cache as possible to minimize the object storage I/O. The larger the local cache the less object storage I/O there is. Typically you can count that each Backend should have at least 2 MB of local cache allocated for its active users (e.g. if there are 100,000 users per Backend who are receiving mails or who are accessing mails within 15 minutes, there should be at least 200 GB of local cache on the Backend).
It's important that the local cache doesn't become a bottleneck, so ideally it would be using SSDs. Alternatives are to use in-memory disk (tmpfs) or filesystem on SAN that provides enough disk IOPS. (NFS should not be used for local cache.) Dovecot never uses fsyncing when writing to local cache, so after a server crash the cache may be inconsistent or corrupted. This is why the caches should be deleted at server bootup, although Dovecot also attempts to keep track of crashes internally and won't open an index that was potentially corrupted.
Dovecot's Backend is responsible for indexing messages for use with Full Text Search when a message is delivered to a mailbox.
When an indexing back end is not present, searches fall back on slow sequential searches through all message headers or text. For commercial grade email, this is unacceptable performance for the end user. Thus, a search indexing backend is a requirement for Dovecot Pro.
Additionally, for storage backends that do not provide fast sequential access to message data (e.g. object storage), it is critically important to perform searches through a global index. On-demand message body searches simply will not be possible for larger mailboxes otherwise.
Dovecot's standard IMAP SEARCH TEXT/BODY parameters use the FTS indexes. Searches through message headers benefit from Dovecot's fast message index cache implementation, which often contains the necessary information. Optionally, header searches can also be done from FTS indexes.
Triggers for FTS indexing are configurable. It can be started on demand, as a batch job, or automatically when new messages arrive. For the best mix of performance and user experience, indexing on delivery is the best option. The indexing takes place when the user may not even be interacting with the system, and spreads the load across delivery time rather than during the peak periods of the day when users may be performing the most mailbox search actions.
Dovecot Pro provides the Dovecot Pro FTS engine to perform the indexing and searching of mail messages. The driver itself is a Pro-only feature that does the actual indexing and efficient storage of indexed data, and it is specifcally designed for use and optimization with object storage.
The Pro engine uses the Dovecot Core FTS library to perform common search and indexing features:
Feature | Summary |
---|---|
Normalize | Unify saved form of text as much as possible. |
Stemming | Reduce words to their basic form. |
Detect Language | Detect language of processed text to more accurately apply other filters and features. |
Skip Stop Words | A configurable list of words (per language) not to be indexed. |
Skip Bad Characters | Non-language characters, base64 data, HTML tags, etc. will not be indexed. |
Decompound | Index compunded words separately. |
Attachment Search | Allow data in text-based attachments to be indexed. |
The Pro FTS engine provides a feature-rich system that is entirely integrated within the Dovecot Pro platform, with no need for additional storage systems or dedicated indexing nodes.