Skip to content

Cluster Management

nmg supports active-active clusters with no primary/replica concept. Every node is equal and processes mail independently.

Cluster Topology

Node 1 (nmg-1)  ←──mTLS──→  Node 2 (nmg-2)
     ↑                            ↑
     │         Peer-to-Peer       │
     └────────────────────────────┘
  • No failover, no quorum, no promote
  • Configuration is replicated via app-level fan-out
  • rspamd data (Bayes, reputation) via KeyDB active-active
  • Mail queue and quarantine: local per node, cluster-wide readable

Cluster Status

The cluster overview shows for each node:

Column Description
Node ID Unique UUID of the node
FQDN Fully qualified hostname
Version Installed nmg version — differing versions are highlighted in red
Config Hash SHA256 of the configuration state — deviations from the local hash = drift
Hash Age Seconds since the last configuration update
Status online / offline / version mismatch
Reboot Required Whether a package update requires a restart
Last Seen Timestamp of last health check signal

Detecting Configuration Drift

When the config hash of a node differs from the local node's hash, a drift has occurred — the nodes are no longer in sync. Causes:

  • Node was offline during a configuration change
  • Network error interrupted replication
  • Manual configuration change directly on a node

Drifts are marked with a red Drift tag in the table.

Repairing Drift

Via the actions in the node row:

  • Repair (Push) — Local configuration is pushed to the target node (overwrites its state)
  • Pull from Peer — The target node's configuration is transferred to the local node

Push vs. Pull

Push overwrites the remote node. Pull overwrites the local node. Before a pull action, verify that the remote node has the correct state.

Cluster CA (Certificate Authority)

nmg operates its own internal CA for mTLS between cluster nodes. CA information is shown in the cluster status:

Field Description
CA Path Path to the CA certificate on this node
CA Fingerprint SHA256 fingerprint of the CA certificate (should be identical on all nodes)
CA Expiry Validity date of the CA certificate

Adding a Node

On the New Node

curl -s https://get.netcell-mailguard.de | sudo bash
# Setup wizard: step "Cluster" → "This node joins an existing cluster"
# Cluster IP: IP of the first node
# Join token: one-time token from the management UI

In the Management UI

  1. ClusterAdd Node
  2. Generate Join Token — copy and paste into the setup wizard on the new node
  3. After a successful join, the new node appears in the cluster overview

What happens on join: - mTLS certificates are automatically issued by the cluster CA - Configuration (domains, mail config, filters) is replicated - KeyDB replication link is established

Removing a Node

  1. Cluster → select node → Remove
  2. Enter confirmation
  3. nmg revokes the node's mTLS certificate
  4. Node is removed from the peer list of all other nodes

Ghost node detection

If a node with the same machine ID is reinstalled (e.g. after an OS rebuild), it is detected as a "ghost node" and must be explicitly removed from the cluster.

Configuration Replication

All configuration changes are immediately replicated to all reachable peers. Offline nodes are synchronised at the next opportunity.

Replicated objects: Domains, mail config, sender filters, DKIM keys, RBL settings, composites, phishing feeds/keywords/TLDs/URL shorteners, firewall rules, users, API keys

Not replicated (local per node): License keys, mail queue, quarantine contents, mail logs

Cluster Diagnostics

# Test mTLS connection to peer
curl -k --cert /etc/nmg/secret/peer.crt --key /etc/nmg/secret/peer.key \
  https://<peer-ip>:8443/api/v1/agent/health

# Check KeyDB replication
keydb-cli -n 0 info replication

# Cluster status via CLI
nmg-ctl cluster status