Bot Detection

Security

The scoring system, penalty configuration, the canary cookie mechanism, and persistent bans.

The middleware evaluates every incoming request through a cumulative scoring model. Checkers contribute penalty points, and when the total reaches the banScore threshold, the request is rejected. This page explains how that model works end to end.

The Scoring System

Every checker returns a numeric score and a list of reason codes. The middleware accumulates these scores across all checkers that run for a given request. When the running total reaches or exceeds banScore, the pipeline stops immediately and the request receives a 403 Forbidden response. No further checkers run.

The score is bounded by maxScore. If multiple checkers fire on the same request and their combined total would exceed maxScore, the surplus is discarded. This prevents extreme outliers from producing scores that are impossible to reason about.

Two reason codes skip the scoring check entirely:

BAD_BOT_DETECTED: the pipeline stops and bans the visitor instantly, regardless of the current score.
GOOD_BOT_IDENTIFIED: the pipeline stops and allows the visitor through instantly.

The honeypot checker uses BAD_BOT_DETECTED for any request to a configured trap path. The good-bot DNS verifier uses GOOD_BOT_IDENTIFIED for verified legitimate crawlers.

Detection Phases

The pipeline splits into two sequential phases: cheap and heavy.

The cheap phase runs on every request. It consists of synchronous checks that read pre-loaded MMDB data (geo, ASN, Tor, threat feeds, proxy lists) and inspect headers without any I/O. If the score from cheap checkers reaches banScore, the heavy phase is skipped entirely.

The heavy phase only runs when the cheap phase score stays below banScore. It includes rate tracking, proxy confirmation, session coherence checks, and User-Agent pattern matching against the LMDB database. These operations may query the cache or the database.

This two-phase design means that obvious bots, those that trigger high-confidence cheap signals, are rejected without any database I/O at all.

When detectBots() runs for a visitor with no existing canary_id cookie, the middleware treats this as a first visit and sets a new canary_id cookie with HttpOnly: true, SameSite: "lax", Secure: true, and a 90-day maxAge. This cookie acts as a persistent visitor identifier across requests.

The canary cookie serves two roles. First, it allows the rate tracking and velocity fingerprint checkers to correlate requests from the same browser session over time. Second, its absence on a non-first request is itself a signal: bots that do not persist cookies between requests are caught by enableProxyIspCookiesChecks, which applies an 80-point penalty by default for a missing canary_id.

The middleware reads req.cookies.canary_id, which requires cookie-parser to be mounted before detectBots() in the Express middleware stack.

server.ts

import cookieParser from 'cookie-parser';
import { detectBots } from '@riavzon/bot-detector';

app.use(cookieParser());
app.use(detectBots());

Reputation Healing

A visitor's risk score persists in the database between requests. This is intentional, a bot that probes your server once should still carry that history on the next visit. But legitimate visitors can occasionally trigger one or two checkers due to unusual network conditions (VPN, shared IP, unusual browser configuration). Reputation healing gives them a path back to a clean score.

After every request that does not result in a ban, the middleware decrements the visitor's stored score by restoredReputationPoints (default 10). A visitor with an accumulated score of 40 recovers to zero after four consecutive clean requests.

Set restoredReputationPoints to 0 to disable healing entirely. Increase it to speed up recovery for environments where false positives are more likely.

Reading Detection Results

After detectBots() runs for a non-banned request, it sets req.botDetection on the request object. You can read this in any subsequent middleware or route handler.

interface BotDetectionResult {
  success: boolean;  // always true when this field is present
  banned: boolean;   // always false when this field is present; banned requests return early
  time: string;      // ISO 8601 timestamp of when detection completed
  ipAddress: string; // resolved client IP address
}

req.botDetection is only set when detection completes without banning. Banned requests receive the 403 and return before this field is written, so it is never accessible in a handler that runs after a banned request. If checksTimeRateControl.checkEveryRequest is false and the visitor is already cached, the pipeline is skipped entirely and req.botDetection is also not set for that request.

server.ts

app.get('/api/data', (req, res) => {
  const { time, ipAddress } = req.botDetection;
  // req.botDetection is present: visitor passed detection
  res.json({ ok: true });
});

Ban Enforcement

When a request reaches banScore, the middleware records the ban in the banned table and responds with 403. The visitor's canary_id and IP address are both stored. On all future requests, the enableKnownBadIpsCheck checker reads banned.mmdb (compiled by bot-detector generate) and catches the IP in the cheap phase without re-running the full pipeline.

Firewall-Level Bans

Setting punishmentType.enableFireWallBan: true adds an OS-level block on top of the application-level 403. When a visitor is banned, the middleware issues:

sudo ufw insert 1 deny from <ip>

This blocks the IP at the network layer so subsequent connections from that address never reach the Node.js process. The firewall rule is permanent and survives application restarts.

Firewall bans require a Linux environment with ufw installed and passwordless sudo access for the Node.js process user. This setting has no effect on macOS or Windows and should not be enabled on those platforms.

Updating Records Manually

updateBannedIP upserts a row directly into the banned table. updateIsBot flips the is_bot flag on a visitors row by canary cookie.

import { updateBannedIP, updateIsBot } from '@riavzon/bot-detector';
import type { BannedInfo } from '@riavzon/bot-detector';

// upsert a ban record
const info: BannedInfo = { score: 100, reasons: ['PREVIOUSLY_BANNED_IP'] };
await updateBannedIP('canary-cookie-value', '1.2.3.4', 'us', 'Mozilla/5.0 ...', info);

// flip the is_bot flag on a visitor record
await updateIsBot(false, 'canary-cookie-value');

Persistent Visitor Records

The middleware writes a record to the visitors table for every unique canary_id it encounters. Each row stores the visitor's IP, accumulated score, reason codes, first-seen and last-seen timestamps, and an is_bot flag. These records are the source data for bot-detector generate, which compiles banned.mmdb and highRisk.mmdb.

Use warmUp() at server startup to prime the database connection pool before the first real request arrives. It runs a set of parallel SELECT 1 queries to open connections and a dummy visitor lookup to warm the query plan cache.

server.ts

import { defineConfiguration, warmUp, detectBots } from '@riavzon/bot-detector';

await defineConfiguration({ store: { main: { driver: 'sqlite', name: './bot-detector.db' } } });
await warmUp();

app.use(detectBots());

Known Bad User-Agents

API Reference