Cluster Management¶
nmg supports active-active clusters with no primary/replica concept. Every node is equal and processes mail independently.
Cluster Topology¶
- No failover, no quorum, no promote
- Configuration is replicated via app-level fan-out
- rspamd data (Bayes, reputation) via KeyDB active-active
- Mail queue and quarantine: local per node, cluster-wide readable
Cluster Status¶
The cluster overview shows for each node:
| Column | Description |
|---|---|
| Node ID | Unique UUID of the node |
| FQDN | Fully qualified hostname |
| Version | Installed nmg version — differing versions are highlighted in red |
| Config Hash | SHA256 of the configuration state — deviations from the local hash = drift |
| Hash Age | Seconds since the last configuration update |
| Status | online / offline / version mismatch |
| Reboot Required | Whether a package update requires a restart |
| Last Seen | Timestamp of last health check signal |
Detecting Configuration Drift¶
When the config hash of a node differs from the local node's hash, a drift has occurred — the nodes are no longer in sync. Causes:
- Node was offline during a configuration change
- Network error interrupted replication
- Manual configuration change directly on a node
Drifts are marked with a red Drift tag in the table.
Repairing Drift¶
Via the actions in the node row:
- Repair (Push) — Local configuration is pushed to the target node (overwrites its state)
- Pull from Peer — The target node's configuration is transferred to the local node
Push vs. Pull
Push overwrites the remote node. Pull overwrites the local node. Before a pull action, verify that the remote node has the correct state.
Cluster CA (Certificate Authority)¶
nmg operates its own internal CA for mTLS between cluster nodes. CA information is shown in the cluster status:
| Field | Description |
|---|---|
| CA Path | Path to the CA certificate on this node |
| CA Fingerprint | SHA256 fingerprint of the CA certificate (should be identical on all nodes) |
| CA Expiry | Validity date of the CA certificate |
Adding a Node¶
On the New Node¶
curl -s https://get.netcell-mailguard.de | sudo bash
# Setup wizard: step "Cluster" → "This node joins an existing cluster"
# Cluster IP: IP of the first node
# Join token: one-time token from the management UI
In the Management UI¶
- Cluster → Add Node
- Generate Join Token — copy and paste into the setup wizard on the new node
- After a successful join, the new node appears in the cluster overview
What happens on join: - mTLS certificates are automatically issued by the cluster CA - Configuration (domains, mail config, filters) is replicated - KeyDB replication link is established
Removing a Node¶
- Cluster → select node → Remove
- Enter confirmation
- nmg revokes the node's mTLS certificate
- Node is removed from the peer list of all other nodes
Ghost node detection
If a node with the same machine ID is reinstalled (e.g. after an OS rebuild), it is detected as a "ghost node" and must be explicitly removed from the cluster.
Configuration Replication¶
All configuration changes are immediately replicated to all reachable peers. Offline nodes are synchronised at the next opportunity.
Replicated objects: Domains, mail config, sender filters, DKIM keys, RBL settings, composites, phishing feeds/keywords/TLDs/URL shorteners, firewall rules, users, API keys
Not replicated (local per node): License keys, mail queue, quarantine contents, mail logs