Programmatic Usage
Every data source and compiler in Shield Base is exported as a typed function from @riavzon/shield-base. You can import and call them directly without using the CLI.
Installation
pnpm add @riavzon/shield-base
yarn add @riavzon/shield-base
npm install @riavzon/shield-base
bun add @riavzon/shield-base
Running All Sources
The generateData function runs the full pipeline in parallel and compiles every built-in data source in one call:
import { generateData as executeAll } from '@riavzon/shield-base';
const outputDirectory = './data/mmdb';
const contactInfo = 'Your Name https://example.com - [email protected]';
const mmdbPath = 'mmdbctl'; // or absolute path to the binary
const selectedSources = true; // true = all FireHOL levels included
await executeAll(outputDirectory, contactInfo, selectedSources, mmdbPath);
selectedSources accepts true to include all FireHOL lists, or an array of FireHOL list IDs to include specific ones:
const selectedSources = [
"firehol_anonymous",
"firehol_l1",
"firehol_l2",
"firehol_l3",
"firehol_l4"
]
Running Sources Individually
Each data source has its own exported function. You can run them individually or compose them with Promise.allSettled for parallel execution:
import {
getBGPAndASN,
buildCitiesData,
getGeoDatas,
getListOfProxies,
getTorLists,
getThreatLists,
getCrawlersIps,
getUserAgentLmdbList,
getDisposableEmailLmdbList,
} from '@riavzon/shield-base';
const output = './data/mmdb';
const mmdbPath = 'mmdbctl';
const contactInfo = 'Your Name https://example.com - [email protected]';
// Run all in parallel
const results = await Promise.allSettled([
getBGPAndASN(contactInfo, output, mmdbPath),
buildCitiesData(output, mmdbPath),
getGeoDatas(output, mmdbPath),
getListOfProxies(output, mmdbPath),
getTorLists(output, mmdbPath),
getThreatLists(output, mmdbPath, true),
getCrawlersIps(output, mmdbPath),
getUserAgentLmdbList(output),
getDisposableEmailLmdbList(output),
]);
Function Reference
| Function | Output file | Notes |
|---|---|---|
getBGPAndASN(contact, output, mmdbPath) | asn.mmdb | contact is the BGP.tools User-Agent string |
buildCitiesData(output, mmdbPath) | city.mmdb | |
getGeoDatas(output, mmdbPath) | country.mmdb | |
getListOfProxies(output, mmdbPath) | proxy.mmdb | |
getTorLists(output, mmdbPath) | tor.mmdb | |
getThreatLists(output, mmdbPath, sources) | firehol_*.mmdb | sources is true or a string array of list IDs |
getCrawlersIps(output, mmdbPath, customUrls?) | goodBots.mmdb | customUrls is optional ProvidersLists[] |
getUserAgentLmdbList(output) | useragent-db/useragent.mdb | No mmdbPath needed |
getDisposableEmailLmdbList(output) | email-db/disposable-emails.mdb | No mmdbPath needed |
getUserAgentLmdbList, getDisposableEmailLmdbList) do not require the mmdbctl binary.Custom Crawler URLs
getCrawlersIps accepts an optional array of custom provider definitions. This is merged with the built-in providers and compiled into a single goodBots.mmdb:
import { getCrawlersIps } from '@riavzon/shield-base';
import type { ProvidersLists } from '@riavzon/shield-base';
const customProviders: ProvidersLists[] = [
{
name: 'cloudflare', // Stored as the provider field in the database
type: 'JSON', // 'JSON' | 'CSV' | 'HTML'
urls: [
'https://www.cloudflare.com/ips-v4',
'https://www.cloudflare.com/ips-v6',
],
},
];
await getCrawlersIps('./data/mmdb', 'mmdbctl', customProviders);
Refreshing Data
The restartData function re-downloads and recompiles your previously compiled data sources using the cached configuration:
import { restartData } from '@riavzon/shield-base';
const outputDirectory = './data/mmdb';
const refreshAll = true; // true = refresh all cached sources
await restartData(outputDirectory, refreshAll);
Reading Compiled Databases
MMDB Databases
You can read from a compiled database via the command line with mmdbctl:
mmdbctl read -f json-pretty 8.8.8.8 outputDirectory/asn.mmdb
Or with a specialized library such as mmdb-lib, @maxmind/geoip2-node, or maxmind.
Example:
import fs from 'fs';
import * as mmdb from 'mmdb-lib';
import type { BgpRecord } from '@riavzon/shield-base';
const db = fs.readFileSync('./data/mmdb/asn.mmdb');
const reader = new mmdb.Reader<BgpRecord>(db);
const result = reader.get('8.8.8.8');
console.log(result);
// { asn_id: '15169', asn_name: 'Google LLC', classification: 'Content', ... }
import { Reader } from '@maxmind/geoip2-node';
const reader = await Reader.open('./data/mmdb/asn.mmdb');
LMDB Databases
You can query any .mdb database from the command line using the lm-read subcommand, or read it programmatically.
For production use, use the lmdb-js library directly, its fully ACID complaint, gives you full control over iteration, transactions, and much more:
import { open } from 'lmdb';
const db = open({
path: './data/mmdb/useragent-db/useragent.mdb',
name: 'useragent',
readOnly: true,
useVersions: true,
compression: true,
sharedStructuresKey: Symbol.for('structures'),
});
const record = db.get('sqlmap');
db.close();
The exported reader functions are a convenience wrapper suited for scripting and inspection:
import {
getByKey,
getRange,
getByPrefix,
countRecords,
doesExist,
} from '@riavzon/shield-base';
const dbPath = './data/mmdb/useragent-db/useragent.mdb';
const dbName = 'useragent';
const record = getByKey(dbPath, dbName, 'sqlmap');
const first10 = getRange(dbPath, dbName, 10);
const curlMatches = getByPrefix(dbPath, dbName, 'curl', 20);
const total = countRecords(dbPath, dbName);
const exists = doesExist(dbPath, dbName, 'nmap-scripting-engine');
See the API Reference for full signatures and return types.
Running Shell Commands
The run utility executes arbitrary shell commands and returns the output. It is used internally by some scripts to invoke mmdbctl:
import { run } from '@riavzon/shield-base';
await run('mmdbctl read -f json-pretty 8.8.8.8 ./asn.mmdb');