Bot Detection

Bot Detector

Express middleware for multi-layered bot detection with a two-phase pipeline of 17 configurable checkers, cumulative scoring, and canary cookie fingerprinting.

@riavzon/bot-detector is an Express middleware that filters incoming requests through a two-phase pipeline of 17 configurable checkers. Each checker contributes a penalty score toward a configurable ban threshold. Requests that cross the threshold receive a 403 response, or a firewall-level block when punishmentType.enableFireWallBan is enabled.

The pipeline runs cheap, synchronous checks first. When the accumulated score stays below the ban threshold, the heavy phase runs. This ordering keeps median pipeline latency around 1.2 ms.

punishmentType.enableFireWallBan requires a Linux environment with ufw available and passwordless sudo for the Node.js process. The detection pipeline itself runs on any Node.js 18+ platform.

Detection Pipeline

The middleware processes each request in two sequential phases to keep latency low without sacrificing accuracy.

Phase 1 - Cheap Checks

Synchronous in-memory lookups against MMDB and LMDB databases. Covers IP validation, User-Agent analysis, header fingerprinting, geolocation consistency, FireHOL threat feeds, Tor analysis, ASN classification, timezone consistency, and honeypot paths.

Phase 2 - Heavy Checks

Asynchronous checks that read from the visitor cache or perform database queries. Covers behavioral rate limiting, proxy and ISP detection, session coherence, velocity fingerprinting, and bad User-Agent pattern matching against the LMDB library.

Features

17 Configurable Checkers: Every checker ships with sensible defaults and can be individually disabled or tuned with custom penalty weights.
Custom Checkers: You can provide your own custom checkers via CheckerRegistry and custom data sources.
Self optimized: uses collected visitor data to become smarter and faster over time. Instead of running the full pipeline for known offenders, it compiles your latest database rows into local mmdb files to instantly drop past threats and high risk visitors.
Multi-Database Support: The visitor persistence layer supports SQLite, MySQL, PostgreSQL, Cloudflare D1, and PlanetScale through the db0 adapter.
Cumulative Scoring: Penalty points accumulate across all checks. Requests that exceed banScore (default 100) receive a 403 response.
Canary Cookie Fingerprinting: The middleware issues a cryptographic cookie on first contact. Returning visitors are identified by this cookie, enabling behavioral rate tracking and session coherence checks.
Good Bot Exemptions: Verified crawlers like Googlebot and Bingbot are identified through IP range matching and reverse DNS lookups, and are exempt from scoring.
Fast: around 1.2ms median latency for the full pipeline.
CLI Tools: The bot-detector CLI manages data source downloads, compilation, and custom threat database generation from your visitor history.

Documentation

Getting Started

Prerequisites, installation, data source setup, and first-run configuration.

CLI

Using the CLI to for data generation and getting started fast.

Data Sources

The data sources the bot detector uses.

Guides

Guides on how to use the module configure custom checkers and more.

API Reference

Complete reference for all exported functions, utilities, the checker system, and TypeScript types.

Configuration

Full configuration reference.

Checkers

Breakdown about each checker and what it does.

Security

The scoring system, penalty configuration, the canary cookie mechanism, and persistent bans.

Getting Started