Spam Scanner is a drop-in replacement and the best alternative to SpamAssassin, rspamd, SpamTitan, and more.
> \[!NOTE]
> Spam Scanner is actively maintained and used in production at [Forward Email](https://forwardemail.net) to protect millions of emails daily.
## Table of Contents
* [Foreword](#foreword)
* [Why Spam Scanner](#why-spam-scanner)
* [Key Advantages](#key-advantages)
* [Features](#features)
* [Core Detection Features](#core-detection-features)
* [Naive Bayes Classifier](#naive-bayes-classifier)
* [Phishing Detection](#phishing-detection)
* [Virus Scanning](#virus-scanning)
* [Executable Detection](#executable-detection)
* [NSFW Image Detection](#nsfw-image-detection)
* [Toxicity Detection](#toxicity-detection)
* [Macro Detection](#macro-detection)
* [Language Detection](#language-detection)
* [Pattern Recognition](#pattern-recognition)
* [URL Analysis](#url-analysis)
* [Comparison](#comparison)
* [Spam Scanner vs. Alternatives](#spam-scanner-vs-alternatives)
* [Architecture](#architecture)
* [System Overview](#system-overview)
* [Detection Flow](#detection-flow)
* [Component Architecture](#component-architecture)
* [Requirements](#requirements)
* [System Requirements](#system-requirements)
* [Dependencies](#dependencies)
* [Installation](#installation)
* [ClamAV Installation](#clamav-installation)
* [Quick Start](#quick-start)
* [Basic Usage](#basic-usage)
* [With Configuration](#with-configuration)
* [Checking Specific Features](#checking-specific-features)
* [API Documentation](#api-documentation)
* [Constructor Options](#constructor-options)
* [Methods](#methods)
* [Result Object](#result-object)
* [Advanced Usage](#advanced-usage)
* [Custom Classifier](#custom-classifier)
* [Custom Text Replacements](#custom-text-replacements)
* [Language Filtering](#language-filtering)
* [Performance Monitoring](#performance-monitoring)
* [Selective Feature Disabling](#selective-feature-disabling)
* [Custom Timeout](#custom-timeout)
* [Custom Logger](#custom-logger)
* [CLI (Command Line Interface)](#cli-command-line-interface)
* [CLI Installation](#cli-installation)
* [Commands](#commands)
* [Exit Codes](#exit-codes)
* [CLI Examples](#cli-examples)
* [ARF (Abuse Reporting Format)](#arf-abuse-reporting-format)
* [Parsing ARF Reports](#parsing-arf-reports)
* [Creating ARF Reports](#creating-arf-reports)
* [ARF Result Object](#arf-result-object)
* [Mail Server Integration](#mail-server-integration)
* [Postfix Integration](#postfix-integration)
* [Dovecot Integration](#dovecot-integration)
* [TCP Server Mode](#tcp-server-mode)
* [Performance](#performance)
* [Benchmarks](#benchmarks)
* [Optimization Tips](#optimization-tips)
* [Memory Usage](#memory-usage)
* [Contributing](#contributing)
* [Development Setup](#development-setup)
* [Running Tests](#running-tests)
* [License](#license)
* [Support](#support)
* [Acknowledgments](#acknowledgments)
## Foreword
Spam Scanner is a tool and service created after hitting countless roadblocks with existing spam-detection solutions. In other words, it's our current [plan for spam](https://forwardemail.net/blog/our-plan-for-spam) and our [better plan for spam](https://forwardemail.net/blog/a-better-plan-for-spam).
Our goal is to build and utilize a scalable, performant, simple, easy to maintain, and powerful API for use in our service at [Forward Email](https://forwardemail.net) to limit spam and provide other measures to prevent attacks on our users.
Initially we tried using [SpamAssassin](https://spamassassin.apache.org), and later evaluated [rspamd](https://rspamd.com) – but in the end we learned that all existing solutions (even ones besides these) are overtly complex, missing required features or documentation, incredibly challenging to configure; high-barrier to entry, or have proprietary storage backends (that could store and read your messages without your consent) that limit our scalability.
To us, we value privacy and the security of our data and users – specifically we have a "Zero-Tolerance Policy" on storing logs or metadata of any kind, whatsoever (see our [Privacy Policy](https://forwardemail.net/privacy-policy) for more on that). None of these solutions honored this privacy policy (without removing essential spam-detection functionality), so we had to create our own tool – thus "Spam Scanner" was born.
---
## Why Spam Scanner
> \[!TIP]
> Spam Scanner is the only modern, privacy-focused, Node.js-based spam detection solution with AI-powered features.
### Key Advantages
* ** Privacy-First** - Zero logging, zero metadata storage
* ** Modern** - Built with Node.js 18+, ES modules, and latest AI models
* ** Accurate** - 88%+ detection accuracy with Naive Bayes classifier
* ** Fast** - Scans emails in under 3 seconds (with model caching)
* **️ Comprehensive** - 10+ detection methods (virus, phishing, NSFW, toxicity, macros, etc.)
* ** Multilingual** - Supports 40+ languages with automatic detection
* ** Easy to Use** - Simple API, extensive documentation, TypeScript support
* ** Battle-Tested** - Used in production at Forward Email
---
## Features
Spam Scanner includes modern, essential, and performant features that help reduce spam, phishing, and executable attacks.
### Core Detection Features
| Feature | Description | Status |
| ----------------------------------------------------- | ------------------------------------------------------------------ | ------------ |
| **[Naive Bayes Classifier](#naive-bayes-classifier)** | Machine learning spam classification trained on 100K+ emails | Production |
| **[Phishing Detection](#phishing-detection)** | IDN homograph detection, confusables, suspicious link analysis | Production |
| **[Virus Scanning](#virus-scanning)** | ClamAV integration for attachment scanning | Production |
| **[Executable Detection](#executable-detection)** | Detects 195+ dangerous file extensions + magic number verification | Production |
| **[NSFW Image Detection](#nsfw-image-detection)** | TensorFlow.js-powered image content analysis | Production |
| **[Toxicity Detection](#toxicity-detection)** | AI-powered toxic language detection (threats, insults, harassment) | Production |
| **[Macro Detection](#macro-detection)** | VBA, PowerShell, JavaScript, Batch script detection in attachments | Production |
| **[Language Detection](#language-detection)** | Hybrid franc/lande detection for 40+ languages | Production |
| **[Pattern Recognition](#pattern-recognition)** | Credit cards, phone numbers, IPs, Bitcoin addresses, etc. | Production |
| **[URL Analysis](#url-analysis)** | TLD parsing, Cloudflare blocking detection, suspicious domains | Production |
### Naive Bayes Classifier
Our Naive Bayesian classifier is available in this [repository](classifier.json), the npm package, and is updated frequently as it gains upstream, anonymous, SHA-256 hashed data from [Forward Email](https://forwardemail.net).
* **Training Data**: 100,000+ spam and ham emails
* **Accuracy**: 88%+ classification accuracy
* **Languages**: Supports 40+ languages with language-specific tokenization
* **Stemming**: Porter Stemmer for English, Snowball for 15+ other languages
* **Privacy**: All training data is anonymized and SHA-256 hashed
### Phishing Detection
Advanced phishing detection using multiple techniques:
* **IDN Homograph Detection**: Detects lookalike domains (e.g., \`аpple.com\` using Cyrillic "а")
* **Confusables Integration**: Uses Unicode confusables database to detect character substitution
* **TLD Analysis**: Validates TLDs and detects suspicious domain patterns
* **Link Analysis**: Checks for mismatched display text and actual URLs
* **Cloudflare Detection**: Identifies domains blocked by Cloudflare
### Virus Scanning
Integrates with ClamAV for comprehensive virus detection:
* **Real-time Scanning**: Scans all email attachments
* **Buffer Support**: Direct buffer scanning without file I/O
* **Timeout Protection**: Configurable scan timeouts
* **Virus Database**: Uses ClamAV's regularly updated virus definitions
### Executable Detection
Detects dangerous executable files:
* **195+ File Extensions**: exe, dll, bat, vbs, ps1, scr, pif, cmd, com, etc.
* **Magic Number Verification**: Detects renamed executables by file content
* **Office Macros**: Detects macro-enabled Office documents (docm, xlsm, pptm)
* **Legacy Office**: Flags legacy Office formats (doc, xls, ppt) as high-risk
* **PDF JavaScript**: Detects malicious JavaScript in PDF files
* **Archive Detection**: Flags archives (zip, rar, 7z) that may hide executables
### NSFW Image Detection
AI-powered image content analysis using TensorFlow\.js:
* **Categories**: Porn, Hentai, Sexy, Neutral, Drawing
* **Model**: NSFWJS model trained on 60K+ images
* **Threshold**: Configurable detection threshold (default: 0.7)
* **Performance**: Model caching for fast subsequent scans
* **Formats**: Supports JPEG, PNG, GIF, WebP, BMP
### Toxicity Detection
Detects toxic language using TensorFlow\.js Toxicity model:
* **Categories**: Identity attack, insult, obscenity, severe toxicity, sexual explicit, threat
* **Threshold**: Configurable toxicity threshold (default: 0.7)
* **Languages**: Optimized for English, supports other languages
* **Performance**: Model caching for fast subsequent scans
### Macro Detection
Detects malicious macros in email content and attachments:
* **VBA Macros**: Detects Visual Basic for Applications code
* **PowerShell**: Detects PowerShell scripts and commands
* **JavaScript**: Detects JavaScript code in emails
* **Batch Scripts**: Detects Windows batch files
* **Office Documents**: Scans docm, xlsm, pptm, xlam, dotm, xltm, potm
* **PDF JavaScript**: Detects JavaScript in PDF attachments
### Language Detection
Hybrid language detection using franc and lande:
* **40+ Languages**: Supports all major world languages
* **Automatic Detection**: Detects language from email content
* **Fallback System**: Uses lande when franc returns "undetermined"
* **Mixed Language Support**: Optional mixed language detection
* **Language Filtering**: Filter results to supported languages only
### Pattern Recognition
Detects various patterns in email content:
* **Credit Cards**: Visa, MasterCard, Amex, Discover, etc.
* **Phone Numbers**: International phone number formats
* **Email Addresses**: RFC-compliant email detection
* **IP Addresses**: IPv4 and IPv6 addresses
* **URLs**: Full URL extraction and analysis
* **Bitcoin Addresses**: Cryptocurrency wallet addresses
* **MAC Addresses**: Network hardware addresses
* **Hex Colors**: Color codes (#RRGGBB)
* **Floating Point Numbers**: Decimal numbers
* **Dates**: Multiple date formats (MM/DD/YYYY, YYYY-MM-DD, etc.)
* **File Paths**: Windows and Unix file paths
### URL Analysis
Comprehensive URL analysis and validation:
* **TLD Parsing**: Uses tldts for accurate TLD extraction
* **Domain Analysis**: Extracts domain, subdomain, public suffix
* **IP Detection**: Identifies IP-based URLs
* **Cloudflare Check**: Detects Cloudflare-blocked domains
* **URL Normalization**: Normalizes URLs for consistent analysis
* **Suspicious Pattern Detection**: Identifies phishing URL patterns
---
## Comparison
### Spam Scanner vs. Alternatives
| Feature | Spam Scanner | SpamAssassin | rspamd | ClamAV |
| ----------------------------- | :----------: | :-----------: | :-----------: | :-----: |
| **License** | BSL 1.1 | Apache 2.0 | Apache 2.0 | GPLv2 |
| **Language** | Node.js | Perl | C | C |
| **Modern Architecture** | Yes | No | Partial | No |
| **Easy to Use** | Yes | No | No | Yes |
| **Privacy-Focused** | Yes | Partial | Partial | Yes |
| **Naive Bayes Classifier** | Yes | Yes | Yes | No |
| **Virus Scanning** | Yes | Yes | Yes | Yes |
| **Phishing Detection** | Yes | Yes | Yes | No |
| **IDN Homograph Detection** | Yes | No | Yes | No |
| **NSFW Image Detection** | Yes | No | No | No |
| **Toxicity Detection** | Yes | No | No | No |
| **Macro Detection** | Yes | Yes | Yes | Yes |
| **Language Detection** | Yes (40+) | Yes (limited) | Yes (limited) | No |
| **Pattern Recognition** | Yes | Yes | Yes | No |
| **Executable Detection** | Yes (195+) | Yes | Yes | Yes |
| **Magic Number Verification** | Yes | No | No | Yes |
| **PDF JavaScript Detection** | Yes | No | No | Partial |
| **Archive Detection** | Yes | Yes | Yes | Yes |
| **Performance Metrics** | Yes | No | Yes | No |
| **TypeScript Support** | Yes | No | No | No |
| **Active Development** | Yes | Yes | Yes | Yes |
| **Production Ready** | Yes | Yes | Yes | Yes |
> \[!NOTE]
> **Alternative to SpamAssassin**: Spam Scanner provides a modern, Node.js-based alternative to SpamAssassin with AI-powered features and better privacy.
>
> **Alternative to rspamd**: Spam Scanner offers easier configuration and better documentation than rspamd, with comparable detection accuracy.
>
> **Alternative to ClamAV**: While Spam Scanner uses ClamAV for virus scanning, it provides comprehensive spam and phishing detection that ClamAV doesn't offer.
---
## Architecture
### System Overview
\`\`\`mermaid
graph TB
A[Email Input] --> B\{Spam Scanner\}
B --> C[Preprocessing]
C --> D[Language Detection]
D --> E[Tokenization]
E --> F[Naive Bayes Classification]
B --> G[Phishing Detection]
G --> G1[IDN Homograph Check]
G --> G2[Confusables Analysis]
G --> G3[URL Analysis]
B --> H[Attachment Scanning]
H --> H1[Virus Scan]
H --> H2[Executable Check]
H --> H3[Macro Detection]
H --> H4[NSFW Detection]
B --> I[Content Analysis]
I --> I1[Toxicity Detection]
I --> I2[Pattern Recognition]
F --> J[Result Aggregation]
G --> J
H --> J
I --> J
J --> K\{Is Spam?\}
K -->|Yes| L[Spam Result]
K -->|No| M[Ham Result]
\`\`\`
### Detection Flow
\`\`\`mermaid
sequenceDiagram
participant Client
participant Scanner
participant Classifier
participant ClamAV
participant TensorFlow
Client->>Scanner: scan(email)
Scanner->>Scanner: Parse Email
Scanner->>Scanner: Extract URLs
Scanner->>Scanner: Detect Language
par Parallel Detection
Scanner->>Classifier: Classify Tokens
Scanner->>ClamAV: Scan Attachments
Scanner->>TensorFlow: Detect NSFW
Scanner->>TensorFlow: Detect Toxicity
Scanner->>Scanner: Check Phishing
Scanner->>Scanner: Check Macros
end
Scanner->>Scanner: Aggregate Results
Scanner->>Client: Return Result
\`\`\`
### Component Architecture
\`\`\`mermaid
graph LR
A[Spam Scanner] --> B[Core Engine]
A --> C[Classifiers]
A --> D[Detectors]
A --> E[Analyzers]
B --> B1[Email Parser]
B --> B2[Tokenizer]
B --> B3[Preprocessor]
C --> C1[Naive Bayes]
C --> C2[TensorFlow NSFW]
C --> C3[TensorFlow Toxicity]
D --> D1[Phishing Detector]
D --> D2[Virus Scanner]
D --> D3[Macro Detector]
D --> D4[Executable Detector]
E --> E1[Language Analyzer]
E --> E2[URL Analyzer]
E --> E3[Pattern Analyzer]
\`\`\`
---
## Requirements
> \[!WARNING]
> ClamAV is required for virus scanning. If you do not have it installed, virus scanning will be disabled.
### System Requirements
* **Node.js**: >= 18.0.0
* **ClamAV**: Latest version (for virus scanning)
* **Memory**: 2GB+ RAM recommended (for TensorFlow models)
* **Disk Space**: 500MB+ (for models and virus definitions)
### Dependencies
* **@tensorflow/tfjs-node**: For NSFW and toxicity detection
* **@ladjs/naivebayes**: For spam classification
* **clamscan**: For virus scanning
* **mailparser**: For email parsing
* **natural**: For NLP and tokenization
* **tldts**: For TLD parsing
* **confusables**: For Unicode confusables detection
---
## Installation
\`\`\`bash
npm install spamscanner
\`\`\`
### ClamAV Installation
#### macOS
\`\`\`bash
brew install clamav
freshclam
\`\`\`
#### Ubuntu/Debian
\`\`\`bash
sudo apt-get update
sudo apt-get install clamav clamav-daemon
sudo freshclam
sudo systemctl start clamav-daemon
\`\`\`
#### CentOS/RHEL
\`\`\`bash
sudo yum install clamav clamav-update
sudo freshclam
\`\`\`
> \[!TIP]
> See the [ClamAV configuration guide](https://github.com/spamscanner/spamscanner/blob/master/docs/clamav.md) for detailed installation instructions.
---
## Quick Start
### Basic Usage
\`\`\`js
import SpamScanner from 'spamscanner';
const scanner = new SpamScanner();
// Raw email string or Buffer
const email = \`
From: sender@example.com
To: recipient@example.com
Subject: Test Email
This is a test email.
\`;
const result = await scanner.scan(email);
console.log(result);
// \{
// isSpam: false,
// message: 'Ham',
// results: \{ ... \},
// ...
// \}
\`\`\`
### With Configuration
\`\`\`js
import SpamScanner from 'spamscanner';
const scanner = new SpamScanner(\{
// Enable performance metrics
enablePerformanceMetrics: true,
// Filter to supported languages
supportedLanguages: ['en', 'es', 'fr', 'de'],
// Enable macro detection
enableMacroDetection: true,
// Set scan timeout
timeout: 30000,
// Custom ClamAV configuration
clamscan: \{
preference: 'clamdscan',
clamdscanPath: '/usr/bin/clamdscan',
\},
\});
const result = await scanner.scan(email);
\`\`\`
### Checking Specific Features
\`\`\`js
// Check if email is spam
if (result.isSpam) \{
console.log('Spam detected!');
console.log('Reason:', result.message);
\}
// Check for viruses
if (result.results.viruses && result.results.viruses.length > 0) \{
console.log('Viruses found:', result.results.viruses);
\}
// Check for phishing
if (result.results.phishing && result.results.phishing.length > 0) \{
console.log('Phishing detected:', result.results.phishing);
\}
// Check for executables
if (result.results.executables && result.results.executables.length > 0) \{
console.log('Executables found:', result.results.executables);
\}
// Check for NSFW content
if (result.results.nsfw && result.results.nsfw.length > 0) \{
console.log('NSFW content detected:', result.results.nsfw);
\}
// Check for toxic language
if (result.results.toxicity && result.results.toxicity.length > 0) \{
console.log('Toxic language detected:', result.results.toxicity);
\}
\`\`\`
---
## API Documentation
### Constructor Options
#### \`new SpamScanner(options)\`
Creates a new Spam Scanner instance.
##### Options
| Option | Type | Default | Description |
| ---------------------------------- | ------------- | --------- | ----------------------------------------------------------------------------- |
| \`enableMacroDetection\` | \`boolean\` | \`true\` | Enable macro detection in emails and attachments |
| \`enablePerformanceMetrics\` | \`boolean\` | \`false\` | Track and return performance metrics |
| \`timeout\` | \`number\` | \`30000\` | Timeout in milliseconds for scans (virus, URL checks) |
| \`supportedLanguages\` | \`string[]\` | \`['en']\` | Array of supported language codes. Empty array \`[]\` = all languages supported |
| \`enableMixedLanguageDetection\` | \`boolean\` | \`false\` | Enable detection of mixed languages in emails |
| \`enableAdvancedPatternRecognition\` | \`boolean\` | \`true\` | Enable advanced pattern recognition (credit cards, phones, etc.) |
| \`toxicityThreshold\` | \`number\` | \`0.7\` | Threshold for toxicity detection (0.0-1.0, higher = more strict) |
| \`nsfwThreshold\` | \`number\` | \`0.6\` | Threshold for NSFW detection (0.0-1.0, higher = more strict) |
| \`debug\` | \`boolean\` | \`false\` | Enable debug logging |
| \`logger\` | \`object\` | \`console\` | Custom logger object (must have \`log\`, \`error\`, \`warn\` methods) |
| \`clamscan\` | \`object\` | See below | ClamAV configuration options |
| \`classifier\` | \`object\` | \`null\` | Custom Naive Bayes classifier data |
| \`replacements\` | \`Map\|object\` | \`null\` | Custom text replacements for preprocessing |
##### ClamAV Options (\`clamscan\`)
| Option | Type | Default | Description |
| -------------------- | -------------- | ---------------------- | ------------------------------------------------ |
| \`removeInfected\` | \`boolean\` | \`false\` | Remove infected files |
| \`quarantineInfected\` | \`boolean\` | \`false\` | Quarantine infected files |
| \`scanLog\` | \`string\|null\` | \`null\` | Path to scan log file |
| \`debugMode\` | \`boolean\` | \`false\` | Enable ClamAV debug mode |
| \`fileList\` | \`string\|null\` | \`null\` | Path to file list |
| \`scanRecursively\` | \`boolean\` | \`true\` | Scan directories recursively |
| \`clamscanPath\` | \`string\` | \`'/usr/bin/clamscan'\` | Path to clamscan binary |
| \`clamdscanPath\` | \`string\` | \`'/usr/bin/clamdscan'\` | Path to clamdscan binary |
| \`preference\` | \`string\` | \`'clamdscan'\` | Preferred scanner: \`'clamdscan'\` or \`'clamscan'\` |
##### Example
\`\`\`js
const scanner = new SpamScanner(\{
enableMacroDetection: true,
enablePerformanceMetrics: true,
timeout: 60000,
supportedLanguages: ['en', 'es', 'fr', 'de', 'ja', 'zh'],
enableMixedLanguageDetection: false,
enableAdvancedPatternRecognition: true,
debug: false,
logger: console,
clamscan: \{
preference: 'clamdscan',
clamdscanPath: '/usr/bin/clamdscan',
scanRecursively: true,
debugMode: false,
\},
\});
\`\`\`
---
### Methods
#### \`scanner.scan(source)\`
Scans an email for spam, viruses, phishing, and other threats.
##### Parameters
* \`source\` (\`string\` | \`Buffer\`) - Raw email content (RFC 822 format)
##### Returns
\`Promise