Search K
Appearance
Appearance
dovecot-cluster - How Palomar works
Cluster's main job is to direct users into specific backends to distribute the load and provide high availability for the cluster. Same user's concurrent connections should go to the same backend. Otherwise the IMAP connections' states may look different, which may confuse user/client. For high availability, there needs to be an eventual automatic failover to move users from broken backends to working ones.
Cluster tries to optimize where a new connection goes. For storage optimization, user should be redirected to the backend where it already has its indexes in metacache.
For cluster optimization, users should be distributed in a way that all backends have approximately equal load (IO, memory, CPU).
For best high availability, users should be moved away from a broken backend immediately. The backend may be broken only for some seconds, in which case it would take longer to move users than to just do nothing. A broken network connection could trigger a site failover, but that needs to be balanced with preventing a huge load spike that would take down the whole cluster if it's done too rapidly. If the network connection is up again after a few minutes, the site failover may not have gotten very far at all.
Cluster tracks the state in a GeoDB (e.g. Cassandra). This way Dovecot doesn't have to be responsible for keeping the state in sync. The GeoDB will also make Cluster work across multiple sites by using the same shared state. The GeoDB schema is designed to support split brain situation, i.e. when the split brain merges and state becomes resyncronized, the cluster won't become a chaos of rapidly moving users between backends/sites.
This is intended for caching GeoDB state to reduce lookups.