Bot Detection

Data Sources

Overview of every compiled database that bot-detector loads at startup.

bot-detector init downloads and compiles threat intelligence from multiple public feeds into MMDB and LMDB format. All databases are loaded into memory at startup and queried synchronously during the cheap detection phase. The compiled files are written to _data-sources/ inside the package directory and reloaded automatically when the files change on disk.

Full schema documentation for each database record type, including every field returned by each lookup method, is covered in the Shield-base module reference.

Geo & Network

These databases resolve IP addresses to geographic and network metadata. The middleware merges results from all three before any checker runs, with city-level data taking precedence where both city and country records exist.

ASN (asn.mmdb)

Maps IP ranges to Autonomous System Numbers using BGP routing data sourced from BGP.tools. Each record includes the ASN identifier, organization name, and a network classification of "Content" (hosting/CDN), "Eyeballs" (residential/business), or "Unknown". The classification is the primary signal for the ASN checker and for the ctx.geoData.hosting field.

City (city.mmdb)

City-level geolocation built from a merged dataset covering city, region, country, postal code, timezone, coordinates, and locale fields. This database provides the majority of geo fields in ctx.geoData.

Country (country.mmdb)

Country-level geolocation used as a fallback when city-level data is unavailable. Records include country name, ISO code, continent, and top-level domain.

Threat Intelligence

These databases identify IPs with a known history of malicious activity. They are compiled from a combination of public threat feeds and anonymity network lists.

Proxy (proxy.mmdb)

A merged list of known proxy and anonymizer IPs aggregated from multiple public sources. Each record carries a comment field listing the source feed names that flagged the IP, which the middleware uses to compute multi-source risk bonuses.

Tor Relays (tor.mmdb)

Built from the Onionoo dataset. Records include relay type (exit, guard, middle), exit probability, guard probability, version status, and exit policy. Exit nodes carry the highest penalty; guard nodes and running relays carry lower risk.

FireHOL Anonymous (firehol_anonymous.mmdb)

FireHOL's anonymity feed covering VPNs, open proxies, and Tor exit nodes that are not already in the Tor database. Matching this database sets ctx.anon to true.

FireHOL Level 1 (firehol_l1.mmdb)

Active attack sources. FireHOL maintains this list with a strict no-false-positives policy. An IP in level 1 is a confirmed threat.

FireHOL Level 2 (firehol_l2.mmdb)

Current abuse participants: scanners, brute-force sources, and spam senders. Broader than level 1 and updated frequently.

FireHOL Level 3 (firehol_l3.mmdb)

Broader web threat aggregation combining multiple web attack, exploit, and scanning lists.

FireHOL Level 4 (firehol_l4.mmdb)

Extended watch list with more relaxed inclusion criteria. Higher false-positive rate than levels 1–3 but useful as a low-weight signal.

Verified Crawlers

Good Bots (goodBots.mmdb)

Compiled IP ranges for legitimate search engine and platform crawlers: Googlebot, Bingbot, DuckDuckBot, Yandex, Apple, Meta, and others. When a client IP matches this database, the middleware immediately exempts the request from scoring. The enableGoodBotsChecks checker triggers DNS verification before granting this exemption.

Behavioral Fingerprinting

These databases are stored in LMDB format and queried by string key rather than by IP.

User-Agent Patterns (useragent.mdb)

An LMDB key-value database mapping known malicious, scraper, and vulnerability scanner User-Agent strings to severity levels (critical, high, medium, low). Used by the knownBadUserAgents checker, which reads each pattern at request time and applies the corresponding penalty weight from your configuration.

Custom Generated

These two databases do not exist after a fresh init. They are compiled by bot-detector generate (or programmatically via runGeneration()) from your own visitor history and grow as your application accumulates data.

Banned IPs (banned.mmdb)

Built from all rows in your banned table that have a non-null IP address. Once compiled, previously banned visitors are checked in the cheap phase via enableKnownBadIpsCheck, making repeat offenders extremely fast to reject without re-running the full pipeline.

High Risk IPs (highRisk.mmdb)

Built from rows in your visitors table where suspicious_activity_score is greater than or equal to generator.scoreThreshold (default 70). IPs in this database receive the highRiskPenalty in the cheap phase on all future requests.

Run bot-detector generate after bulk ban operations and on a regular schedule to keep both custom databases current. The enableKnownBadIpsCheck checker only reads these databases if they exist, missing files are skipped silently.

CLI

Custom Checkers