Configuration
defineConfiguration accepts a single configuration object. Every field has a default value, only store.main is required. The configuration is validated against a Zod schema at startup. Invalid values cause the server to fail immediately with a clear error message.
await defineConfiguration({
store: {
main: { driver: 'sqlite', name: './bot-detector.db' },
},
// All other options are optional and have defaults
})
store
The store object configures the persistent database where visitor records and ban history are stored. Only store.main is required.
DbConfig is a discriminated union keyed on driver.| Driver | driver value | Required peer dep |
|---|---|---|
| SQLite | sqlite | better-sqlite3 |
| MySQL connection pool | mysql-pool | mysql2 >= 3 |
| PostgreSQL | postgresql | pg |
| Cloudflare D1 | cloudflare-d1 | Worker environment binding |
| PlanetScale | planetscale | Serverless driver |
// SQLite: good default for single-server deployments
store: { main: { driver: 'sqlite', name: './bot-detector.db' } }
// MySQL pool
store: { main: { driver: 'mysql-pool', host: 'localhost', user: 'root', password: 'secret', database: 'mydb' } }
// PostgreSQL
store: { main: { driver: 'postgresql', connectionString: 'postgres://user:pass@localhost/mydb' } }
storage
The storage field configures the cache layer where visitor state, behavioral rate counters, and session records are stored between requests. When omitted, everything is stored in process memory, which works for single-process deployments but is lost on restart and not shared across multiple instances.
CacheConfig is a discriminated union keyed on driver. Omit this field to use in-process memory.| Driver | driver value | Notes |
|---|---|---|
| Memory (default) | (omit storage) | Single-process only, lost on restart |
| LRU cache | lru | In-process LRU; set max (item limit) and ttl (ms) |
| Redis | redis | Recommended for multi-instance deployments |
| Upstash Redis | upstash | Serverless Redis via HTTP |
| Filesystem | fs | Persistent local storage, good for development |
| Cloudflare KV (binding) | cloudflare-kv-binding | Pass binding from the Worker environment |
| Cloudflare KV (HTTP) | cloudflare-kv-http | Pass accountId, namespaceId, apiToken |
| Cloudflare R2 | cloudflare-r2-binding | Pass binding |
| Vercel | vercel | Vercel Runtime Cache |
// Redis: shared state across multiple app instances
storage: { driver: 'redis', host: 'localhost', port: 6379 }
// LRU: bounded in-process cache
storage: { driver: 'lru', max: 10000, ttl: 1000 * 60 * 10 }
storage or using an in-process driver (lru, fs) means each process maintains its own independent behavioral state. The rate tracking, velocity fingerprint, and session coherence checkers all key their cache entries by canary cookie. If a load balancer routes consecutive requests from the same visitor to different processes, those checkers will see incomplete histories and produce weaker or inconsistent signals. Configure a shared external driver such as redis or upstash so all instances read and write the same state.banScore
100. The cumulative score threshold at which a visitor is banned. When a request's accumulated penalty points reach or exceed this value at any point in the pipeline, the middleware immediately responds with 403 and records the ban.Lower values ban visitors more aggressively. A value of 30 would ban a visitor that triggers just two or three moderate checks, while the default of 100 requires several checks to fail before a ban is issued.banScore: 75 // ban after accumulating 75 penalty points
maxScore
100. The ceiling on the total score that any single request can accumulate. Penalty points beyond this value are ignored. In most configurations this matches banScore, but you can set it lower to cap extreme outliers from inflating scores beyond what is meaningful.restoredReputationPoints
10. The number of points the reputation healer subtracts from a visitor's stored score after each clean request. A clean request is one that does not result in a ban. This gives legitimate visitors a path to recover from an initial high score caused by unusual network conditions.For example, with restoredReputationPoints: 10 and an initial score of 40, a visitor needs four consecutive clean requests to reach a score of 0.setNewComputedScore
false. Controls how the computed bot score is written to the database on each request.false: Snapshot then heal (default). The detector writes the computed score once on the visitor's first request (or after the cache expires). On every subsequent request, the reputation healer decrements the stored score by restoredReputationPoints. This mode is efficient because the expensive score computation only runs on cache misses.
true: Live snapshot. The detector overwrites the stored score on every single request. The reputation healer then immediately decrements it. The database always reflects the freshest computed risk for every visitor, at the cost of one extra database write per request.Choose true when you need your dashboard or reporting tools to show the current risk score after every page view.whiteList
[]. A list of IP addresses or CIDR strings that bypass the entire detection pipeline. Requests from these addresses skip all checkers and pass directly to the next handler. This is useful for internal monitoring tools, health check probes, or trusted partner IPs.whiteList: ['127.0.0.1', '::1', '10.0.0.0/8']
checksTimeRateControl
Controls how often the full detection pipeline runs for returning visitors who are already cached.
true. When true, runs the full pipeline on every request regardless of cache.300000. When checkEveryRequest is false, only re-runs the pipeline after this many milliseconds (5 minutes by default). A visitor whose result is cached passes through immediately until this interval elapses.checksTimeRateControl: {
checkEveryRequest: false,
checkEvery: 1000 * 60 * 2, // re-check every 2 minutes
}
batchQueue
The batch queue collects visitor writes and flushes them to the database asynchronously. This decouples visitor persistence from the request path, so database latency never affects response time.
5000. How often the queue flushes pending writes, in milliseconds. Increase this to reduce database load on busy servers.100. Triggers an immediate flush when this many jobs are queued. Increase this if your batch sizes consistently hit the limit before the interval fires.3. Retry attempts on a failed flush before the batch is discarded.punishmentType
false. When true, issues a ufw OS-level firewall rule in addition to the 403 response. Banned IPs are blocked at the network layer via sudo ufw insert 1 deny from <ip>, preventing traffic from reaching the Node.js process on subsequent connections.enableFireWallBan requires a Linux environment with ufw installed and passwordless sudo access for the Node.js process. It has no effect and should not be enabled on macOS or Windows.logLevel
'info'. Sets the Pino log level for the middleware. Use 'debug' during development to see per-request checker decisions. Set 'warn' or 'error' in production to reduce log volume.checkers
Every checker is enabled by default. To disable a checker entirely, pass { enable: false }. To adjust its penalty values, pass { enable: true, penalties: { ... } } with the fields you want to override, all unspecified fields keep their defaults.
checkers: {
enableTorAnalysis: { enable: false }, // disable completely
enableBehaviorRateCheck: {
enable: true,
behavioral_threshold: 20, // stricter rate limit
penalties: 80, // heavier penalty
},
}
enableIpChecks
Phase: cheap
Validates that the client IP is a properly formatted IPv4 or IPv6 address. Requests with a malformed or missing IP receive a penalty. Automated scripts that manipulate the X-Forwarded-For header to produce an invalid IP value are caught here.
10. Applied when the IP is invalid or cannot be parsed.enableGoodBotsChecks
Phase: cheap
Identifies legitimate crawlers such as Googlebot, Bingbot, DuckDuckBot, Apple, and Meta by matching the client IP against the compiled goodBots.mmdb. When a match is found, the request is immediately exempted from scoring.
banUnlistedBots controls what happens when a request presents a bot-like User-Agent that is not in the verified crawler list. When true (default), unlisted bots receive the full penalties score.
true. When true, bots not present in the verified crawler database receive the full penalty score.100. Score applied to unlisted bots when banUnlistedBots is true.enableBrowserAndDeviceChecks
Phase: cheap
Inspects the parsed User-Agent for browsers and device combinations that are impossible or highly suspicious in a real-user context. Each condition carries its own penalty weight.
All weights below live inside the penalties: {} sub-object.
100. User-Agent belongs to a CLI tool or HTTP library (curl, python-requests, etc.).100. User-Agent identifies as Internet Explorer.10. Desktop visit from Linux (elevated risk signal, not a ban signal alone).30. Browser and OS combination that cannot exist in practice.10. Browser type field could not be determined.10. Browser name field could not be determined.10. Desktop device type with no operating system in the UA.10. Device vendor could not be determined.10. Browser version could not be determined.5. Device model could not be determined.localeMapsCheck
Phase: cheap
Compares the Accept-Language header with the geolocation of the client IP. A legitimate browser sends a language that matches the country the IP is registered in. Automated tools often send a hardcoded or missing Accept-Language.
All weights below live inside the penalties: {} sub-object.
20. Accept-Language locale does not match the IP's geo locale.20. Accept-Language header is absent.20. Geo data is unavailable for the IP.30. Accept-Language header is present but cannot be parsed.enableKnownThreatsDetections
Phase: cheap
Checks the client IP against the FireHOL threat intelligence feeds. FireHOL maintains four ranked lists of known malicious IPs, plus an anonymizer feed for VPNs and anonymizing proxies.
All weights below live inside the penalties: {} sub-object. The four threatLevels fields are nested one level deeper inside penalties.threatLevels: {}.
20. IP is in the FireHOL anonymous (VPN/anonymizer) feed.40. IP is in FireHOL level 1 (active attack sources).30. IP is in FireHOL level 2 (current attack participants).20. IP is in FireHOL level 3 (broader threat list).10. IP is in FireHOL level 4 (extended watch list).enableAsnClassification
Phase: cheap
Checks the Autonomous System Number associated with the client IP. Hosting providers, cloud platforms, and data centers make up the majority of bot traffic. IPs from AS networks classified as hosting or content delivery receive a penalty. Networks with very few visible routes (low visibility) are also penalized, as they are characteristic of residential proxy services.
All weights below live inside the penalties: {} sub-object.
20. ASN is classified as a hosting or CDN network.10. ASN classification cannot be determined.10. ASN has fewer routes visible than lowVisibilityThreshold.15. Minimum route count before lowVisibilityPenalty applies.20. ASN is both hosting-classified and low-visibility.enableTorAnalysis
Phase: cheap
Checks the client IP against the compiled Tor relay database. Tor exit nodes are the most likely to be used for malicious automation, while guard nodes and running relays carry lower risk. Obsolete Tor versions suggest a misconfigured or malicious relay.
All weights below live inside the penalties: {} sub-object.
15. IP belongs to an active Tor relay.20. IP is a Tor exit node.15. IP is capable of exiting to web ports.10. IP is a Tor guard (entry) node.40. IP is flagged as a bad Tor exit.10. IP is running an outdated Tor version.enableTimezoneConsistency
Phase: cheap
Compares the timezone sent in request headers against the timezone expected for the client's geo IP. A mismatch suggests the visitor is using a VPN or proxy that routes through a different region than their actual location.
20. Applied when the declared timezone does not match the geo timezone.honeypot
Phase: cheap
Monitors requests to a configurable list of trap URLs. These paths serve no legitimate purpose and should never be visited by a real user. Automated scanners and vulnerability probes routinely request paths like /.env, /wp-login.php, and /admin. Any request to a listed path immediately sets the score to banScore, triggering a ban.
[]. List of URL paths to treat as honeypots. Any request matching one of these paths is banned immediately.honeypot: {
enable: true,
paths: ['/.env', '/wp-login.php', '/admin', '/phpMyAdmin'],
}
enableKnownBadIpsCheck
Phase: cheap
Checks the client IP against your custom highRisk.mmdb, which is generated by bot-detector generate from your own visitor history. IPs that have previously accumulated a high suspicion score in your database are caught here on all future requests without re-running the full pipeline.
30. Score applied when the IP is found in highRisk.mmdb.bot-detector generate periodically to keep this database current with your latest visitor data.enableBehaviorRateCheck
Phase: heavy
Tracks request frequency per canary cookie. When a visitor sends more requests than behavioral_threshold within the behavioral_window time period, the excess is penalized. This catches fast-scanning bots and automated scripts that make hundreds of requests per minute.
60000. Sliding time window in milliseconds (1 minute by default).30. Maximum requests allowed within the window before the penalty applies.60. Score applied when the threshold is exceeded.enableBehaviorRateCheck: {
enable: true,
behavioral_window: 60_000, // 1-minute window
behavioral_threshold: 15, // stricter: max 15 req/min
penalties: 80,
}
enableProxyIspCookiesChecks
Phase: heavy
Performs three related checks: proxy detection against the compiled proxy.mmdb, ISP classification for unknown or suspicious providers, and canary cookie presence. A missing canary cookie on a non-first request strongly suggests a bot that discards cookies between requests.
All weights below live inside the penalties: {} sub-object.
80. canary_id cookie is absent on a non-first request.40. IP is in the proxy database.50. IP belongs to a known hosting or data-center provider.10. ISP cannot be determined from the IP.10. Organisation cannot be determined from the IP.10. IP flagged by 2 to 3 proxy sources (cumulative risk bonus).20. IP flagged by 4 or more proxy sources.enableUaAndHeaderChecks
Phase: heavy
Runs a multi-factor inspection of the User-Agent and HTTP headers. It detects headless browsers by looking for tell-tale header patterns, penalizes suspiciously short User-Agents, and checks basic TLS cipher, protocol version, and HTTP version consistency.
All weights below live inside the penalties: {} sub-object.
100. Headers match known headless browser patterns (Puppeteer, Playwright, etc.).80. User-Agent string is shorter than expected for a real browser.60. The TLS cipher suite, TLS version, or HTTP protocol version forwarded by the proxy does not match what a real browser would use. Requires the proxy to set x-client-cipher and x-client-tls-version headers.true. When true, the LMDB User-Agent pattern library is consulted and penalties from knownBadUserAgents apply.enableGeoChecks
Phase: heavy
Checks for missing or incomplete geolocation data and enforces country-level bans. Legitimate residential IPs consistently resolve to a full set of geo fields. Requests where city, region, country, timezone, or coordinates are unknown may originate from misconfigured VPNs, private IP ranges leaked through proxies, or IP addresses not present in the geo database.
bannedCountries accepts a list of ISO 3166-1 alpha-2 country codes. Any request from a banned country receives the full banScore, triggering an immediate ban.
bannedCountries is a top-level field on the checker. All geo unknown penalty weights live inside the penalties: {} sub-object.
[]. ISO 3166-1 alpha-2 country codes to ban outright. Example: ['KP', 'IR'].10. Country cannot be determined.10. Region cannot be determined.10. Coordinates cannot be determined.10. District cannot be determined.10. City cannot be determined.10. Timezone cannot be determined.10. Sub-region cannot be determined.10. Phone prefix cannot be determined.10. Continent cannot be determined.enableSessionCoherence
Phase: heavy
Inspects the Referer header for consistency with the request path and domain. Legitimate browsers include a Referer header on navigation requests and it consistently matches the current site's domain. Bots that crawl by constructing URLs directly often produce missing, mismatched, or cross-domain referers.
All weights below live inside the penalties: {} sub-object.
10. Referer path does not match the expected navigation flow.20. Referer is absent on a request that should have one.30. Referer domain does not match the current site domain.enableVelocityFingerprint
Phase: heavy
Measures the statistical regularity of a visitor's inter-request timing using the coefficient of variation (CV). Human users have naturally irregular timing between requests. Automated scripts often produce highly regular intervals. When the CV of a visitor's request timing falls below cvThreshold, the request is penalized.
0.1. Minimum coefficient of variation before the penalty applies. Lower values require more regularity to trigger.40. Score applied when timing is unnaturally regular.knownBadUserAgents
Phase: heavy
Cross-references the User-Agent string against an LMDB database of known malicious, scraper, and vulnerability-scanner patterns. Each pattern carries a severity level that maps to a separate penalty. This checker is the most comprehensive User-Agent check and is backed by a continuously updated pattern library.
All weights below live inside the penalties: {} sub-object.
100. Pattern matches a known critical-severity bad User-Agent.80. Pattern matches a high-severity bad User-Agent.30. Pattern matches a medium-severity bad User-Agent.10. Pattern matches a low-severity bad User-Agent.enableUaAndHeaderChecks.penalties.badUaChecker is true (the default). Disabling that option bypasses this checker regardless of its own enable setting.headerOptions
Fine-grained penalty weights for the HTTP header fingerprint analysis performed by enableUaAndHeaderChecks. Each weight corresponds to a specific header anomaly. The total accumulated weight across all detected anomalies is added to the request score.
20. Each mandatory header that is absent for the declared browser type.30. No browser engine header present.50. Headers characteristic of Postman or Insomnia REST clients.30. X-Requested-With: XMLHttpRequest present in a non-AJAX context.20. Connection: close header, which browsers do not send by default.10. Origin header is present but set to null.30. Origin header domain does not match the request host.30. Accept header is completely absent.30. Client hint headers are absent for a Chromium (Blink) User-Agent.10. TE header present in a Chromium request (Chromium does not send it).30. Client hint headers are present for a Firefox (Gecko) User-Agent.20. TE header absent for a Firefox User-Agent (Firefox sends it).15. Cache-Control: no-cache or no-store on a GET request.10. Cross-site navigation without a Referer header.20. Sec-Fetch-Mode value inconsistent with the request type.40. Host header does not match the configured server host.pathTraveler
Configuration for the path traversal detection logic, which catches requests attempting to access files outside the web root using encoded ../ sequences.
3. Maximum decoding passes to attempt when looking for traversal sequences.1500. Maximum raw path length in characters before a penalty applies.100. Penalty applied when the path exceeds maxPathLength.100. Penalty applied when the path requires more than maxIterations decoding passes.60. Penalty applied when a ../ traversal sequence is found after decoding.generator
Controls how bot-detector generate (and the programmatic runGeneration()) compiles your visitor history into custom MMDB databases.
70. Minimum suspicious_activity_score a visitor row must have to be included in highRisk.mmdb. Lowering it includes more visitors in the fast-rejection list; raising it makes the list more conservative.false. When true, generates TypeScript type definitions alongside the MMDB files.false. When true, deletes the source database rows after a successful compile. Useful for keeping your database lean. Banned rows and high-risk visitor records compiled into MMDB are no longer needed in SQL form.'mmdbctl'. Path to the mmdbctl binary. Override this when mmdbctl is not on the system PATH.