Bayes Training¶
Bayes training improves rspamd's detection rate by learning from known messages. The training page shows all archived and quarantined mails in a single combined list — training actions are available inline.
Statistics¶
The top of the page shows current corpus numbers:
| Metric | Description |
|---|---|
| Bayes Ham | Number of ham-trained messages in the Bayes corpus |
| Bayes Spam | Number of spam-trained messages in the Bayes corpus |
| Neural Spam Samples | Training data for the neural network (spam) |
| Neural Ham Samples | Training data for the neural network (ham) |
| Scanned | Total number of all processed messages |
| Learned | Sum of all Bayes training actions |
The neural network only starts training at 1,000 samples per class (rspamd default behaviour).
Bayes Classes¶
nmg supports 6 Bayes classes (not just spam/ham):
| Class | Description | Use Case |
|---|---|---|
spam |
Unwanted advertising mail | Standard spam training |
ham |
Legitimate mail | Standard ham training |
phishing |
Phishing attempt | Train specific phishing patterns |
bec |
Business Email Compromise | CEO fraud, targeted impersonation |
newsletter |
Mass mail / newsletter | Correctly classify legitimate bulk mail |
transactional |
Transactional mails | Order confirmations, system notifications |
For spam and ham, quick buttons are available directly in each row. The other 4 classes are accessed via the dropdown menu (⋮) in the action column.
Search¶
The search field above the table enables full-text search across the mail corpus by sender, recipient, or subject. The search filters all table entries and updates the view immediately. Regular expressions are not supported — the search term is matched as a substring.
Mail Corpus¶
The table shows all mails available for training — in a single combined list:
- Archived mails (
source: delivered) — mails from the BCC archive - Quarantine mails (
source: hold) — mails in the Postfix hold queue
| Column | Description |
|---|---|
| Time | Receipt timestamp |
| From | Sender (masked depending on role) |
| To | Recipient (masked depending on role) |
| Subject | Subject (masked depending on role) |
| Score | rspamd score at delivery |
| Bayes Status | manualSpam / manualHam / autoSpam / autoHam / notLearned |
| Trained By | Admin account that triggered the training |
| Node | Cluster node where the mail resides |
Per-Mail Actions¶
- Ham / Spam (quick buttons) — train directly as ham or spam
- Other classes (dropdown) — phishing, bec, newsletter, transactional
- Unlearn — undo the training for this mail
- Preview — display mail body and headers
- Download EML — download the raw file
Bulk Training¶
Select multiple rows and train in one step via Train as Spam or Train as Ham. Errors in individual rows do not interrupt bulk training — they are reported separately.
Privacy (GDPR)¶
Sender, recipient, and subject are masked depending on the user role:
| Role | Display |
|---|---|
admin_full / admin |
Always shown in plain text |
training_operator |
Masked — unmasking possible via Reveal button (creates audit entry) |
| other | Always masked, no unmasking |
Autolearn¶
When configured in Mail Configuration → Autolearn, nmg automatically trains: - High-score mails as spam - Low-score mails as ham
Spam Bursts¶
Under Spam Bursts, clusters of similar spam mails in short time windows are detected.
Burst Table¶
| Column | Description |
|---|---|
| Time Window | Start and end of the burst window |
| Count | Number of similar mails |
| Distinct Senders | Number of different sender addresses |
| Sender Domain | Most frequent sender domain |
| Sample Subject | Typical subject line of the burst |
| Sample Recipients | First affected recipients |
| Avg Score | Average rspamd score |
| Active | Whether the burst is still actively blocked |
| Expires | Automatic expiry date of the block |
Actions¶
- Train as Spam — Add all burst mails to the Bayes corpus
- Unblock — Mark burst as handled (without training)
- Delete — Remove the burst entry
Enable Show Expired to see burst blocks that have already expired.
Spam Analytics¶
Under Spam Analytics, which rspamd symbols are most frequently active is shown.
Symbol Table¶
| Column | Description |
|---|---|
| Symbol | rspamd symbol name (e.g. RCVD_IN_SPAMHAUS_SBL) |
| Hits | Total hits in the selected time range |
| Avg Score | Average score contribution |
| % of Spam | Share of spam detection traffic |
| % of Ham | Share of ham traffic (false positive indicator) |
Symbols with a high ham percentage are potential false positive sources → reduce in Score Tuning.
Score Distribution¶
The bar chart shows which score ranges the processed mails fall into:
| Bucket | Meaning |
|---|---|
< 0 (Ham) |
Clearly legitimate mail |
0 – 2, 2 – 4, 4 – 6 |
Grey zones |
6 – 8, 8 – 10, 10 – 14 |
Probable spam |
≥ 14 (Reject) |
Immediately rejected mail |
Near-Threshold Senders (Top 50)¶
Sender domains whose mails average close to the quarantine threshold — early warning for gradually worsening spam sources:
| Column | Description |
|---|---|
| Domain | Sender domain |
| Count | Mails in this time range |
| Avg Score | Average rspamd score |
| Max Score | Highest observed score |
False Negatives¶
Mails reported as spam by users (that passed through the filter):
| Column | Description |
|---|---|
| Time | Report time |
| Source | delivered (archived), hold (quarantine), other |
| Subject | Subject of the reported mail |
| Sender | Sender address |
| Actor | Who reported the mail |
The time range filter (24h / 7d / 30d) applies to all three views.
Neural Network¶
rspamd contains a neural network (neural) that automatically learns from Bayes training. It only starts training at 1,000 spam and 1,000 ham samples. Configured in Mail Configuration → Neural Network.