Search K
Appearance
Appearance
Cluster controller is a component that manages the cluster state for a given site. Cluster controller will not provide automation to more than its home site.
Each Cluster site has to have one controller running. The controller provides health checks, load balancing, evacuation automation and API access to GeoDB. If the cluster controller goes down, there is no immediate impact to the cluster, but it needs to be brought back as soon as possible.
Controller observes the proxy destination statistics for each backend and site. It will attempt to bring sites up & down based on their failure and success rates. It will attempt to reduce load on backends when they exhibit failures by moving groups to other hosts.
Controller observes Z-score values in prometheus. These are calculated from various metrics provided from Cluster hosts. The current algorithm calculates euclidean distance from average in n-dimensions, where n is number of metrics. This provides a flat total score which is used to select best and worst pair, and the load balancing is performed by moving a group between these two.
The load score for the backend is based on various load indexes called Z-score calculated by Prometheus based on different resource usage such as system memory, cpu, and metacache (each resource has its dedicated Z-score). The final score of the host is a sum of all of these Z-scores. Each Z-score is calculated as the example below.
Example: Memory usage for a backend with hostname "host1"
dovecot_cluster:host:zscore{site=site1, host=host1, score=memused}
As mentioned above, Z-scores are calculated by and stored on Prometheus. Controller then collects all the values (i.e. dovecot_cluster:host:zscore
metrics) from Prometheus for the whole site and all hosts:
Controller observes Z-score values in prometheus. These are calculated from various metrics provided from Cluster hosts. The current algorithm calculates euclidean distance from average in n-dimensions, where n is number of metrics. This provides a flat total score which is used to select best and worst pair, and the load balancing is performed by moving a group between these two.
Controller API provides programmatic access to GeoDB for external and internal components. You can perform CRUD operations for certain objects, and read operations on certain other objects.
The API endpoint also provides access to aggregated cluster statistics metrics via OpenMetrics endpoint.
To install cluster controller you need to first setup Palomar. Please see Palomar Configuration. You also need a functional Kubernetes cluster, or functional Docker environment.
We recommend setting up a high-available Kubernetes cluster with at least 2 worker nodes to ensure proper availability and resilience for the Cluster controller service.
Alternatively, as a fallback option, you can deploy a single-node Kubernetes cluster using Minikube or a similar tool, or run the service on a single machine with Docker Compose.
In both scenarios, each node should have at least 4 CPU cores and 8 GB of memory.
You need to use Helm package manager to install cluster controller chart.
INFO
See Kubernetes Support Policy for details on Kubernetes version support.
While any recent version of Helm should be OK.
Before installing controller helm chart, you will need to make sure you have access to Open-Xchange container registry and have your helm logged in.
helm login registry.open-xchange.com
This step is needed once and after a successful login, helm will not need to be re-authenticated.
Before installation, you need to create global.yaml
and controller.yaml
files for your site. See Palomar Cluster Controller Chart values for values applicable for this chart, with global.yaml
containing the parameters starting with global.
(that will be shared with other charts in the future) and controller.yaml
with the other controller-specific parameters.
Also, cluster controller expects to be deployed in a dedicated Kubernetes namespace.
To install, run
helm install <release-name> oci://registry.open-xchange.com/dovecot-pro/charts/controller --version <version> -f global.yaml -f controller.yaml -n <namespace>
# example
helm install controller oci://registry.open-xchange.com/dovecot-pro/charts/controller --version 3.0.0 -f global.yaml -f controller.yaml -n dovecot
If everything goes well, you should now have a functional controller.
You can use Helm package manager to rollback to a previous working version of your Cluster Controller chart.
First, you need to identify the revision number of the release you want to rollback to. You can list all the previous releases of your Helm chart using the following command:
helm history <release-name>
Once you have identified the revision number you want to revert to, you can perform the rollback with the following command:
helm rollback <release-name> <revision-number>
After executing the rollback, you can verify that the rollback was successful by checking the status of the release:
helm status <release-name>
Even though the primary method to install cluster controller is on Kubernetes managed deployments, It is possible to install it via docker-compose as well. All the files needed for this installation method are bundled in a traditional package included in dovecot-pro repositories. The package does not have any dependencies against docker or docker-compose and it's left for the administrator to choose their preferred method of installing them.
Dependencies
The package includes a compose file along with extra configuration files needed to run various services need for controller. A systemd unit file is also included that invokes docker-compose file to to start or stop all containers.
Before installing controller compose package, you will need to make sure you have access to Open-Xchange container registry and have your docker daemon logged in.
docker login registry.open-xchange.com
This step is needed once and after a successful login, docker will not need to be re-authenticated.
Install dovecot-pro-controller-compose
package from Dovecot Pro repositories. Once installed, all the configuration files and the docker-compose file will be located at /usr/share/doc/dovecot-pro-controller-compose/
.
Controller configuration file is located at /usr/share/doc/dovecot-pro-controller-compose/controller_config
. Please refer to controller configurations for all available options. Review and apply any necessary changes.
Apart from cluster controller's configuration options, there are various deployment variables that can be tweaked to customize deployment. An example .env
file is included with the compose package (located in the same directory as the compose file) that contains a list of key-value pairs for all these deployment variables:
Variable Name | Description | Default |
---|---|---|
CONTROLLER_IMAGE_URL | URL of the registry used to pull the cluster controller image from. | registry.open-xchange.com/dovecot-pro/controller |
CONTROLLER_IMAGE_TAG | Docker image tag of the cluster controller version used. | <released-version> |
CONTROLLER_API_LOG_LEVEL | Log level set for cluster controller API container. | info |
CONTROLLER_WORKER_LOG_LEVEL | Log level set for cluster controller worker container. | info |
CONTROLLER_SCHEDULER_LOG_LEVEL | Log level set for cluster controller scheduler container. | info |
CONTROLLER_WORKER_LOW_PRIO_REPLICAS | Number of replicas for controller worker performing low priority tasks. | 2 |
CONTROLLER_WORKER_HIGH_PRIO_REPLICAS | Number of replicas for controller worker performing high priority tasks. | 1 |
REDIS_IMAGE_URL | URL of the registry used to pull Redis image from. | docker.io/bitnami/redis |
REDIS_IMAGE_TAG | Docker image tag of the Redis version used. | 7.2.3 |
CONTROLLER_REDIS_REPLICAS | Number of replicas for Redis containers. | 1 |
PROMETHEUS_IMAGE_URL | URL of the registry used to pull Prometheus image from. | quay.io/prometheus/prometheus |
PROMETHEUS_IMAGE_TAG | Docker image tag of the Prometheus version used. | 25.8.2 |
Rename the example file to .env
or create a new file in the same folder with modified values.
cp /usr/share/doc/dovecot-pro-controller-compose/{example,}.env
Env file
Overridden variables must be in .env
file otherwise the default values will be used.
It's expected that there is an external Cassandra cluster running and the address of the server(s) is passed to controller in settings (refer to controller configurations described in step 3 above). Therefore, the keyspace and all tables needed for Palomar to function need to have been correctly created for controller to function properly. It is possible, however, to enable automatic initialization of the database as part of the deployment. If enabled, the keyspace and necessary tables are created if missing from the database based on the parameters described in the table bellow. Most of these initialization parameters are a direct translation of some of Chart settings for Kubernetes deployments as described in Palomar Cluster Controller Chart values.
For docker-compose
installations these settings must be in the same config file as other settings outlined above.
Variable Name | Description | Default |
---|---|---|
INIT_GEODB_SCHEMA | Initialize the Cassandra schema (keyspace and tables) needed for Palomar. | True |
CASSANDRA_DATACENTER_REPLICATIONFACTOR | A sequence of data centers and their configured replication factor on the Cassandra cluster (can be a Python dict or list object). Each element in the sequence is a string containing datacenter and its replication factor separated by comma. | ( |
CASSANDRA_GEODB_KEYSPACE | Cassandra keyspace used for Palomar GeoDB. Must have the same value as CASSANDRA_KEYSPACE controller setting. | "d8s_cluster" |
INIT_SITE | Should the site be created as well. If enabled, controller API is used to create a site with name taken from CLUSTER_SITE controller setting. | True |
DOMAIN | Public FQDN of the cluster site load balancer. Used when creating the site if INIT_SITE is true. | "d8s.test" |
DICTMAP_ENABLED | Whether fs-dictmap is configured for Dovecot object storage. | True |
INIT_DICTMAP_SCHEMA | Initialize the Cassandra schema (keyspace and tables) used for fs-dictmap. | True |
CASSANDRA_DICTMAP_KEYSPACE | Cassandra keyspace used for fs-dictmap. | "d8s_dovecot" |
Start cluster controller either by invoking docker-compose
directly or by using the systemd service.
Executable file permissions
Depending on the distribution used, files bundled in the package might have been stripped of their execution permission. Make sure cassandra-bootstrap.sh
and config2env.py
(located in the same director as the compose file) are executable.
docker-compose -f /usr/share/doc/dovecot-pro-controller-compose/docker-compose.yml up
Or
systemctl start dovecot-cluster-controller
Cluster controller can be managed either via API or Web UI. To access these, you need to expose http://controller-api:8080/
. This is exposed by default in Docker compose. To see API operations, see OpenAPI documentation.
Site name is required. Only provide Site ID in special cases, it is autogenerated by default. Site load balancer is needed if the site has an external load balancer.
You can edit site name, site load balancer and site tag.
DANGER
The site name must match Dovecot configuration for the respective site, see cluster_local_site
. Changing the site name to non-matching value will cause your cluster site to stop operating.
Host name is required, and must be resolvable by the controller. You cannot add a host by IP address, it must have valid host name. Only provide Backend ID in special cases, it is autogenerated by default.
If you need to drain a backend, the recommended way is to set load factor to 0, which will cause controller to drain the host from users. If you are in a hurry, you can use the evacuate button. This will trigger immediate move of all users to other backends
DANGER
Using the evacuate button can cause severe load on a site.
Cluster controller performs load balancing automatically. If necessary, you can also trigger extra load balancing.
Cluster has users in multiple smaller groups. This is determined during cluster installation and bootstrap. You can change the number of groups using cluster controller.
Cluster controller allows configuring site features, these are disabled by default.
Controls error mitigation features of controller. If disabled, cluster controller will not offline broken hosts, or try to move users from degraded hosts.
Controls automatic load balancing. If disabled, cluster controller will not attempt to move users between shards and user groups between hosts to balance load.
Controls providing Prometheus metrics. If disabled, the metrics endpoint in cluster controller will not work.