Palomar: Cluster Controller

Cluster controller is the Palomar component that manages the cluster state for a given site.

Each Cluster site has to have one controller running. The controller provides health checks, load balancing, evacuation automation, and API access to GeoDB. If the cluster controller goes down, there is no immediate impact to the cluster, but it needs to be brought back as soon as possible.

WARNING

Cluster controller will not provide automation to more than its home site. Every Palomar site requires a controller to be running.

Installation

See Cluster Controller Installation.

Dry Run mode

Controller features can operate in "Dry Run" mode, where feature actions are logged but not executed. This allows you to verify the feature's behavior without impacting the production environment.

By default, Controller features are disabled, and Dry Run mode can be toggled using either the Controller APIs or the Controller UI.

Configuration settings are evaluated based on the level of specificity:

Global level: Only the global setting is considered.
Site level: If either the global or site-level setting is disabled, the site-level setting is considered disabled.
Backend level: If any of the global, site, or backend-level settings are disabled, the backend-level setting is considered disabled.
If no setting is specified, the feature is disabled by default.

API Endpoint

Controller API provides programmatic access to GeoDB for external and internal components. You can perform CRUD operations for certain objects, and read operations on other objects.

The API endpoint also provides access to aggregated cluster statistics metrics via OpenMetrics endpoint.

See Palomar REST APIs.

Load Balancing

Controller observes Z-score values in Prometheus (hosted within the Controller container). The scores are calculated from various metrics provided from Palomar hosts, such as system memory, CPU usage, and metacache. Each resource has its own dedicated Z-score. The final score of the host is the sum of all of these Z-scores.

Z-Score Calculation

The current algorithm calculates the euclidean distance from averages in n-dimensions, where n is the number of metrics. This provides a flat total score which is used to select the best and worst pair, and the load balancing is performed by moving a group between these two.

Resource

Example: Memory usage for a backend with hostname "host1"

Calculate 5 minute average of memory usage per host (we'll call this $h o s t_r a t e_{5 m}$ )
Calculate site wide average for 1 day (we'll call this $s i t e_a v g_{1 d}$ )
Calculate standard deviation for 1 day per host (we'll call this $h o s t_s t d d e v_{1 d}$ )
Record Z-score as $\frac{(h o s t_r a t e_{5 m} - s i t e_a v g_{1 d})}{h o s t_s t d d e v_{1 d}}$
Store and label the value as dovecot_cluster:host:zscore{site=site1, host=host1, score=memused}

Host/Site

Z-scores are calculated by and stored on Prometheus. Controller then collects all the values (i.e. dovecot_cluster:host:zscore metrics) for the whole site and all hosts:

Site Z-score is the average of the Z-score for all hosts in the site.
Sum of all Z-scores for the host is calculated to get the load score for the host.
For the site, we calculate site score as $\frac{\sum s c o r e_{h o s t}}{h o s t c o u n t}$ .
finally, the host score is adjusted for the backend's load factor based on The site score (this means value of $(1 - l o a d_f a c t o r / 100) \cdot s c o r e_{s i t e}$ is added to the host score).

Health Checking

Controller observes the proxy destination statistics for each backend and site. It will attempt to bring sites up & down based on their success and failure rates. It will attempt to reduce load on backends when they exhibit failures by moving groups to other hosts.

Cascade Prevention

Controller frequently checks all backends' healthiness at the same time. If it detects an unhealthy backend which fails more than 90% of the IMAP login or LMTP mail delivery attempts, it will start evacuating the backend by moving user groups to healthy hosts in batches.

If the backend is still unhealthy after all groups have been moved out, the backend is taken offline.

If the failure rate is not above 90% but still higher than the threshold set in HOST_FAILURE_RATIO then one group is moved from the backend to a healthy one to see if the backend recovers.

In the case of evacuating a backend with a very high failure rate, the controller first checks that there is enough capacity in the cluster to accommodate the users. We can make the situation worse if we move too many users to healthy backends. The check is done by first planning ahead of time how many backends need to be evacuated and which ones it should take offline (if there are no more groups on the backend). If there are too many backends that are to be evacuated, the controller doesn't do any group moves and raises a critical error for the admin to manually intervene.

Currently the threshold for failing backends is 50% of the whole cluster, i.e., if more that 50% of backends need to be evacuated then the controller aborts automatic group moves.

Operations

Cluster controller can be managed either via the REST API or Web UI.

To access these endpoints, you need to expose http://controller-api:8080/ on your controller deployment.

This port is exposed by default when using the docker-compose installation method.

Administration

See Palomar Administration.

Authentication

Databases

Mechanisms

Users

Mail Delivery

Events

Mailbox Formats

Proxying

Sieve

Extensions

SQL

Lua Support

Palomar

Palomar: Cluster Controller

Installation

Dry Run mode

API Endpoint

Load Balancing

Z-Score Calculation

Resource

Host/Site

Health Checking

Cascade Prevention

Operations

Administration

Databases

Mechanisms

Users

Extensions

Palomar: Cluster Controller ​

Installation ​

Dry Run mode ​

API Endpoint ​

Load Balancing ​

Z-Score Calculation ​

Resource ​

Host/Site ​

Health Checking ​

Cascade Prevention ​

Operations ​

Administration ​

Palomar: Cluster Controller

Installation

Dry Run mode

API Endpoint

Load Balancing

Z-Score Calculation

Resource

Host/Site

Health Checking

Cascade Prevention

Operations

Administration