Skip to content

Bayes Training

Bayes training improves rspamd's detection rate by learning from known messages. The training page shows all archived and quarantined mails in a single combined list — training actions are available inline.

Statistics

The top of the page shows current corpus numbers:

Metric Description
Bayes Ham Number of ham-trained messages in the Bayes corpus
Bayes Spam Number of spam-trained messages in the Bayes corpus
Neural Spam Samples Training data for the neural network (spam)
Neural Ham Samples Training data for the neural network (ham)
Scanned Total number of all processed messages
Learned Sum of all Bayes training actions

The neural network only starts training at 1,000 samples per class (rspamd default behaviour).

Bayes Classes

nmg supports 6 Bayes classes (not just spam/ham):

Class Description Use Case
spam Unwanted advertising mail Standard spam training
ham Legitimate mail Standard ham training
phishing Phishing attempt Train specific phishing patterns
bec Business Email Compromise CEO fraud, targeted impersonation
newsletter Mass mail / newsletter Correctly classify legitimate bulk mail
transactional Transactional mails Order confirmations, system notifications

For spam and ham, quick buttons are available directly in each row. The other 4 classes are accessed via the dropdown menu () in the action column.

The search field above the table enables full-text search across the mail corpus by sender, recipient, or subject. The search filters all table entries and updates the view immediately. Regular expressions are not supported — the search term is matched as a substring.

Mail Corpus

The table shows all mails available for training — in a single combined list:

  • Archived mails (source: delivered) — mails from the BCC archive
  • Quarantine mails (source: hold) — mails in the Postfix hold queue
Column Description
Time Receipt timestamp
From Sender (masked depending on role)
To Recipient (masked depending on role)
Subject Subject (masked depending on role)
Score rspamd score at delivery
Bayes Status manualSpam / manualHam / autoSpam / autoHam / notLearned
Trained By Admin account that triggered the training
Node Cluster node where the mail resides

Per-Mail Actions

  • Ham / Spam (quick buttons) — train directly as ham or spam
  • Other classes (dropdown) — phishing, bec, newsletter, transactional
  • Unlearn — undo the training for this mail
  • Preview — display mail body and headers
  • Download EML — download the raw file

Bulk Training

Select multiple rows and train in one step via Train as Spam or Train as Ham. Errors in individual rows do not interrupt bulk training — they are reported separately.

Privacy (GDPR)

Sender, recipient, and subject are masked depending on the user role:

Role Display
admin_full / admin Always shown in plain text
training_operator Masked — unmasking possible via Reveal button (creates audit entry)
other Always masked, no unmasking

Autolearn

When configured in Mail Configuration → Autolearn, nmg automatically trains: - High-score mails as spam - Low-score mails as ham

Spam Bursts

Under Spam Bursts, clusters of similar spam mails in short time windows are detected.

Burst Table

Column Description
Time Window Start and end of the burst window
Count Number of similar mails
Distinct Senders Number of different sender addresses
Sender Domain Most frequent sender domain
Sample Subject Typical subject line of the burst
Sample Recipients First affected recipients
Avg Score Average rspamd score
Active Whether the burst is still actively blocked
Expires Automatic expiry date of the block

Actions

  • Train as Spam — Add all burst mails to the Bayes corpus
  • Unblock — Mark burst as handled (without training)
  • Delete — Remove the burst entry

Enable Show Expired to see burst blocks that have already expired.

Spam Analytics

Under Spam Analytics, which rspamd symbols are most frequently active is shown.

Symbol Table

Column Description
Symbol rspamd symbol name (e.g. RCVD_IN_SPAMHAUS_SBL)
Hits Total hits in the selected time range
Avg Score Average score contribution
% of Spam Share of spam detection traffic
% of Ham Share of ham traffic (false positive indicator)

Symbols with a high ham percentage are potential false positive sources → reduce in Score Tuning.

Score Distribution

The bar chart shows which score ranges the processed mails fall into:

Bucket Meaning
< 0 (Ham) Clearly legitimate mail
0 – 2, 2 – 4, 4 – 6 Grey zones
6 – 8, 8 – 10, 10 – 14 Probable spam
≥ 14 (Reject) Immediately rejected mail

Near-Threshold Senders (Top 50)

Sender domains whose mails average close to the quarantine threshold — early warning for gradually worsening spam sources:

Column Description
Domain Sender domain
Count Mails in this time range
Avg Score Average rspamd score
Max Score Highest observed score

False Negatives

Mails reported as spam by users (that passed through the filter):

Column Description
Time Report time
Source delivered (archived), hold (quarantine), other
Subject Subject of the reported mail
Sender Sender address
Actor Who reported the mail

The time range filter (24h / 7d / 30d) applies to all three views.

Neural Network

rspamd contains a neural network (neural) that automatically learns from Bayes training. It only starts training at 1,000 spam and 1,000 ham samples. Configured in Mail Configuration → Neural Network.