Search K
Appearance
Appearance
Note: This is pre-release documentation.
Please access https://doc.dovecotpro.com/latest/ for documentation on released versions.
Cluster Controller is the Palomar component that manages the cluster state for a given site. Think of it as the "brain" of your Palomar cluster — it continuously tracks the health and activity of all mail Backends, and makes decisions about where users should be located, automatically responding to problems. While it uses proxying statistics to assess system health, it does not directly monitor the backends themselves.
Each Cluster site has to have one Cluster Controller running. The Controller provides health checks, load balancing, evacuation automation, and API access to GeoDB. If the Cluster Controller goes down, there is no immediate impact to the cluster (mail continues to flow normally), but it needs to be brought back as soon as possible to maintain automation and monitoring capabilities.
WARNING
Cluster Controller will not provide automation to more than its home site. Every Palomar site requires a Cluster Controller to be running.
See Cluster Controller Installation.
The Cluster Controller consists of several components that work together to manage your mail cluster.
%%{init: {'theme': 'dark'}}%%
flowchart TD
subgraph Legend [Legend]
direction LR
L1[User Interface] ~~~ L2[Compute Layer] ~~~ L3[Data Layer] ~~~ L4[Backends Layer]
end
Legend ~~~ CLUSTER_ADMIN
CLUSTER_ADMIN([Dovecot Cluster Administrator]) --> FRONTEND[Frontend]
CLUSTER_ADMIN --> REST_API
subgraph ClusterController [Cluster Controller]
direction TB
FRONTEND
REST_API[REST API]
CELERY_TASKS[Celery Tasks]
REDIS[(Redis)]
PROMETHEUS[(Prometheus)]
FRONTEND --> REST_API
REST_API --> CELERY_TASKS
REST_API --> REDIS
CELERY_TASKS --> REDIS
CELERY_TASKS --> PROMETHEUS
end
REST_API --> CASSANDRA[(Cassandra)]
REST_API --> DOVECOT_BACKENDS[Dovecot Backends]
CELERY_TASKS --> CASSANDRA
CELERY_TASKS --> DOVECOT_BACKENDS
PROMETHEUS --> DOVECOT_BACKENDS
style FRONTEND fill:#a8d5ba,stroke:#333,color:#000
style REST_API fill:#87ceeb,stroke:#333,color:#000
style CELERY_TASKS fill:#87ceeb,stroke:#333,color:#000
style CASSANDRA fill:#f4a460,stroke:#333,color:#000
style REDIS fill:#f4a460,stroke:#333,color:#000
style PROMETHEUS fill:#f4a460,stroke:#333,color:#000
style DOVECOT_BACKENDS fill:#dda0dd,stroke:#333,color:#000
style L1 fill:#a8d5ba,stroke:#333,color:#000
style L2 fill:#87ceeb,stroke:#333,color:#000
style L3 fill:#f4a460,stroke:#333,color:#000
style L4 fill:#dda0dd,stroke:#333,color:#000| Component | What It Does | Why It Matters |
|---|---|---|
| Frontend (Web UI) | Browser-based dashboard showing cluster status, Backend health, and controls for manual operations | Allows administrators to monitor the cluster and perform operations without using the API directly |
| REST API | HTTP endpoints for all Controller operations | Enables automation, scripting, and integration with other systems |
| Celery (Scheduler & Workers) | Triggers background tasks on a schedule (every 5-60 seconds depending on the task). Execute the actual work: checking health, moving users, calculating load scores. | Ensures continuous monitoring even when no administrator is watching. The "muscle" that carries out all automated operations |
The Cluster Controller requires one external service and includes two bundled services:
External Service (Customer-Provided):
| Store | What It Stores | What Happens If It's Down |
|---|---|---|
| Cassandra (GeoDB) | Permanent cluster state: which sites exist, which Backends are in each site, user locations, feature flag settings | Controller cannot function - all state is lost. Must be highly available. |
Bundled Services (Included with Controller Deployment):
| Store | What It Stores | What Happens If It's Down |
|---|---|---|
| Redis | Temporary data: task queue, cached metrics, ongoing move tracking | Tasks stop executing, UI shows stale data. Controller recovers automatically when Redis returns. |
| Prometheus | Time-series metrics from all Backends: CPU, memory, login success/failure rates | Health checking and load balancing become blind - no automatic decisions can be made. Manual operations still work. |
Redis and Prometheus are provided via the Helm chart or Docker Compose deployment and shouldn't be replaced with external instances.
Site reachability monitoring is performed by the cluster plugin running in Dovecot proxies.
WARNING
The reachability feature is currently only visible in the UI and can be accessed via the API — no automatic action is taken.
The Controller runs several automated tasks that keep your cluster healthy. These tasks run continuously in the background without administrator intervention. The background task lifecycle is briefly described by below diagram.
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#861BE4', 'primaryTextColor': '#FFFFFF', 'primaryBorderColor': '#a855f7', 'lineColor': '#6b21a8', 'secondaryColor': '#1a0a2e', 'tertiaryColor': '#0E0D10', 'actorBkg': '#861BE4', 'actorBorder': '#a855f7', 'actorTextColor': '#FFFFFF', 'signalColor': '#6b21a8', 'signalTextColor': '#FFFFFF', 'noteBkgColor': '#1a0a2e', 'noteTextColor': '#FFFFFF', 'activationBorderColor': '#a855f7', 'activationBkgColor': '#9833FF', 'labelBoxBkgColor': '#1a0a2e', 'labelTextColor': '#FFFFFF'}}}%%
sequenceDiagram
participant Beat as Celery Beat<br/>(Scheduler)
participant Redis as Redis<br/>(Queue)
participant Worker as Celery Worker
participant Prom as Prometheus
participant DB as Cassandra
participant Backend as Dovecot Backend
Beat->>Redis: Schedule task (every Xs)
Redis->>Worker: Dequeue task
Worker->>DB: Check feature flags
alt Feature Enabled
Worker->>Prom: Query metrics
Worker->>DB: Read/Update state
Worker->>Backend: Execute action (doveadm)
else Feature Disabled
Worker->>Worker: Skip (raise Ignore)
endScheduled / automatically triggered tasks:
| Task | How Often | What It Does |
|---|---|---|
cache_metrics | Every 5 seconds | Fetches fresh metrics from Prometheus and caches them in Redis. This makes the Web UI responsive and ensures other tasks have current data. |
scrape_stats | On demand | Automatically triggered each time Prometheus scrapes the Controller API - /metrics endpoint, ensuring up-to-date stats are collected. |
check_user_moves | Every 5 seconds | Monitors ongoing user migrations. If a move gets stuck (Backend not responding), it retries or escalates to a force-move. |
rebalance_sites | Every 60 seconds | Analyzes load across all Backends. If one Backend is significantly more loaded than others, moves some users to balance the load. |
check_Backend_health | Every 60 seconds | Checks if Backends are healthy by looking at login and mail delivery success rates. Automatically evacuates users from failing Backends. |
evacuate_zero_load_factor_hosts | Every 60 seconds | Moves users off Backends that have been marked for decommissioning (load_factor = 0). |
check_site_reachability | Every 60 seconds | Tests connectivity to remote sites in multi-site deployments. |
delete_stale_data | Every 24 hours | Cleans up orphaned records in the database (e.g., statistics for Backends that no longer exist). |
What if a task fails?
Individual task failures are logged but don't kill the Cluster Controller. The task will be retried on the next scheduled run. The Controller continues running because these failures are often temporary (e.g., network, brief service outages). Persistent failures usually indicate a problem with an external service (Prometheus, Cassandra, or a Backend). Check the Cluster Controller logs for details.
Important for health checking: Just because the Cluster Controller process is up doesn't mean it's successfully executing the tasks you expect. Monitor the logs to verify that scheduled tasks are completing successfully.
All automated behavior is controlled by feature flags. For example, health checking can be enabled or disabled independently. This gives you fine-grained control over what the Controller does automatically versus what requires manual intervention.
When you first deploy a Controller or make significant changes, you may want to:
Feature flags are defined at three levels: Global, Site, and Backend. Precedence is top-down — once a feature is disabled at a higher level, lower-level settings cannot re-enable it.
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#6366f1', 'primaryTextColor': '#f8fafc', 'primaryBorderColor': '#818cf8', 'lineColor': '#94a3b8', 'secondaryColor': '#1e293b', 'tertiaryColor': '#0f172a', 'background': '#0f172a', 'mainBkg': '#1e293b', 'nodeBorder': '#475569', 'clusterBkg': '#1e293b', 'clusterBorder': '#475569', 'titleColor': '#f8fafc', 'edgeLabelBackground': '#1e293b'}}}%%
flowchart LR
Global["GLOBAL
(default for entire cluster)"]
Site["SITE
(override for one site)"]
Backend["BACKEND
(override for one Backend)"]
Global -->|overridden by| Site
Site -->|overridden by| Backend
classDef primary fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#f8fafc
classDef secondary fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#f8fafc
classDef accent fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#f8fafc
class Global primary
class Site secondary
class Backend accent⚠️ TODO
API endpoint for Backend feature flags configuration doesn't exist yet
Example: You enable load balancing globally, but disable it for a specific Backend that you're troubleshooting. The rest of the cluster continues to balance automatically while that one Backend is excluded.
| Feature | What It Controls | When to Disable |
|---|---|---|
MetricsExport | Whether the Controller collects and exports metrics. Required for all other features. | Rarely - disabling this blinds the Controller |
BackendHealthAutoHandling | Automatic detection of failing Backends and user evacuation | During planned maintenance when you expect temporary failures |
BackendLoadBalancing | Automatic movement of users to balance load across Backends | When you want manual control over user placement |
| State | What Happens |
|---|---|
| Enabled | Full automation - the Controller acts on its decisions |
| Disabled | No automation - the feature doesn't run at all (default) |
| DryRun | The Controller makes decisions and logs them, but doesn't execute. Perfect for testing. |
Recommended Workflow
Feature flags can be managed via the Controller API or the Controller Web UI under "Site Features".
Load balancing ensures users are distributed evenly across your Backends. Without it, some Backends might become overloaded while others sit idle - leading to poor performance for some users and wasted capacity. When BackendLoadBalancing feature is disabled on the Backend level then this Backend is not a part of (load)balancing.
Every 60 seconds, the Controller:
flowchart TD
Start([Every 60 seconds])
GetScores["Calculate load score<br/>for each Backend"]
FindPair["Find most loaded and<br/>least loaded Backends"]
CheckDelta{"Is the difference<br/>significant enough?"}
MoveUsers["Move some users from<br/>overloaded → underloaded"]
Done([Wait for next cycle])
Skip([No action needed])
Start --> GetScores
GetScores --> FindPair
FindPair --> CheckDelta
CheckDelta -->|"Yes (delta > threshold)"| MoveUsers
CheckDelta -->|"No"| Skip
MoveUsers --> Done
Skip --> DoneThe Cluster Controller uses a statistical measure called Z-score to compare Backends fairly. The Z-score tells you how far a Backend is from the average:
The formula combines multiple metrics (memory, CPU, metacache) into a single score, so a Backend with high memory but low CPU is compared fairly against one with low memory but high CPU.
A Backend is excluded from load balancing if:
offline or standby)HOST_LOAD_BALANCE_MIN_COOL_TIME_SECS)HOST_LOAD_BALANCE_MIN_SAMPLES)HOST_FAILURE_RATIO)| Setting | What It Controls |
|---|---|
HOST_LOAD_BALANCE_SCORE_DELTA_THRESHOLD_RATIO | How different two Backends must be before moving users. Higher = less sensitive. |
HOST_LOAD_BALANCE_MIN_COOL_TIME_SECS | How long to wait before moving the same user again. Prevents thrashing. |
HOST_LOAD_BALANCE_MIN_SAMPLES | Minimum data points needed before trusting a Backend's score. New Backends need time to gather data. |
Health checking automatically detects and responds to failing Backends. When a Backend starts failing (e.g., rejecting logins, failing to deliver mail), the Controller moves users away before they're significantly impacted.
Every 60 seconds, the Controller:
flowchart TD
Start([Every 60 seconds])
FetchMetrics["Get login and delivery<br/>statistics from Prometheus"]
ForEach["Check each Backend"]
CalcRate["Calculate failure rate:<br/>failures ÷ total attempts"]
CheckSevere{"Failure rate<br/>> 90%?"}
CheckHigh{"Failure rate<br/>> 10%?"}
Evacuate["CRITICAL: Evacuate all users, when no users then set Backend OFFLINE"]
MovePartial["WARNING: Move some users<br/>to healthy Backends"]
Healthy["Backend is healthy<br/>No action needed"]
Start --> FetchMetrics
FetchMetrics --> ForEach
ForEach --> CalcRate
CalcRate --> CheckSevere
CheckSevere -->|Yes| Evacuate
CheckSevere -->|No| CheckHigh
CheckHigh -->|Yes| MovePartial
CheckHigh -->|No| Healthy| Condition | What Happens |
|---|---|
| > 90% failures | Backend is critically failing. All users are evacuated. When no users are left, status is set to OFFLINE. |
| > 10% failures (configurable) | Backend is degraded. Some users are moved away to reduce load and see if it recovers. |
| < 10% failures (configurable) | Backend is healthy. No action taken. |
WARNING
Cluster Controller for the critical failure case try to move 20% of original number of users instead of current amount to avoid taking too long for the Backend to be evacuated.
While a Backend is offline, proxies are periodically checking when it comes back online. When the health check succeeds, the Controller automatically brings the Backend back online. Users can then be moved back by normal load balancing.
What if many Backends fail at once (e.g., a network issue affecting half your data center)? Moving all users to the remaining Backends could overload them, making the situation worse.
The Controller has built-in protection:
When Catastrophe Protection Triggers
Check the Controller logs immediately. This usually indicates a serious infrastructure problem (network outage, storage failure, etc.) rather than individual Backend issues.
| Setting | What It Controls |
|---|---|
HOST_FAILURE_RATIO | Failure rate threshold for moving users. Lower = more sensitive. |
HOST_FAILURE_MIN_LOGINS | Minimum attempts before making decisions. Prevents overreacting to small sample sizes. |
HOST_FAILURE_COOL_TIME_SECS | Minimum time in seconds between moving users from a host with failing logins. |
HOST_FAILURE_BACKEND_NUM_THRESHOLD | Maximum percentage of Backends that can fail before catastrophe protection kicks in. |
While the Controller automates most operations, sometimes you need manual control - for planned migrations, emergency evacuations, or testing.
Gradually migrates a percentage of users from one site to other sites. The Cluster Controller sends a request to the Backend to move users (doveadm doveadm cluster user batch move backend command). If HOST_FAILURE_COOL_TIME_SECS is set the min-last-moved parameter is added to doveadm command. This is used to avoid moving the same users too often.
⚠️ TODO
Add reference to the OpenAPI batch move users post
Use cases:
Immediately moves a percentage of users in a single operation. Faster than batch move but more disruptive.
⚠️ TODO
Add reference to the OpenAPI force move users post
Use cases:
Triggers (via REST API request) an immediate load rebalancing cycle without waiting for the 60-second schedule.
REST API
See Palomar REST APIs for the complete API reference.
Use cases:
The Controller API provides programmatic access to all Controller functions. You can:
The API also exposes an OpenMetrics endpoint for monitoring systems to scrape Controller metrics.
REST API
See Palomar REST APIs for the complete API reference.
The Controller includes a browser-based administration interface. Access it by navigating to your Controller's HTTP endpoint (default port 8080).
The UI provides: