Cluster-Endpoints¶

REST-API für Cluster-Status. Wird von externen Monitoring-Tools (Prometheus, Zabbix) verwendet.

`GET /cluster/status`¶

Health-Snapshot aller Cluster-Nodes.

curl -H 'X-API-Key: nmg_<...>' \
     https://mailguard.example.com:3443/api/v1/cluster/status

Response:

{
  "data": {
    "local_node_id": "node-1",
    "local_hostname": "mailguard-1.example.com",
    "peers": [
      {
        "node_id": "node-1",
        "fqdn": "mailguard-1.example.com",
        "api_url": "https://mailguard-1.example.com:3443",
        "last_seen": "2026-05-01T08:42:13Z",
        "healthy": "online",
        "config_hash": "a7f2c1b8...",
        "config_hash_age_s": 3245,
        "outbox_backlog_to_us": 0,
        "license": {
          "valid": true,
          "type": "license",
          "status": "active",
          "expires_at": "2027-04-30T00:00:00Z"
        }
      },
      {
        "node_id": "node-2",
        "fqdn": "mailguard-2.example.com",
        ...
      }
    ],
    "generated_at": "2026-05-01T08:42:15Z"
  },
  "error": null,
  "message": "OK"
}

Health-Werte¶

Wert	Bedeutung
`online`	letzter Heartbeat < 60 s — Node ist gesund
`stale`	letzter Heartbeat 60-300 s — möglicher Netzwerk-Hiccup
`offline`	letzter Heartbeat > 300 s — Node down oder partitioniert

Drift-Erkennung¶

Wenn config_hash zwischen Nodes abweicht, ist Konfigurations-Drift vorhanden. Empfehlung im Monitoring: Alarm wenn config_hash nicht überall identisch ist UND config_hash_age_s > 600.

Outbox-Backlog¶

outbox_backlog_to_us zeigt wieviele Replikations-Events von einem anderen Node noch ausstehend für uns sind. Erhöht heißt Replikation hinkt. Sollte normalerweise 0 sein.

Prometheus-Metrics¶

Alternative zu diesem Endpoint: /metrics auf jedem Node (Default Port 9090) liefert dieselben Daten im Prometheus-Format inkl. Cluster-Health-Counter.