Programmatic Usage

Use Shield Base as a library to call individual data source scripts and the compiler directly from your own build pipelines and scripts.

Every data source and compiler in Shield Base is exported as a typed function from @riavzon/shield-base. You can import and call them directly without using the CLI.


Installation

pnpm add @riavzon/shield-base

Running All Sources

The generateData function runs the full pipeline in parallel and compiles every built-in data source in one call:

build.ts
import { generateData as executeAll } from '@riavzon/shield-base';

const outputDirectory = './data/mmdb';
const contactInfo = 'Your Name https://example.com - [email protected]';
const mmdbPath = 'mmdbctl'; // or absolute path to the binary
const selectedSources = true; // true = all FireHOL levels included

await executeAll(outputDirectory, contactInfo, selectedSources, mmdbPath);

selectedSources accepts true to include all FireHOL lists, or an array of FireHOL list IDs to include specific ones:

build.ts
const selectedSources = [
    "firehol_anonymous",
    "firehol_l1",
    "firehol_l2",
    "firehol_l3",
    "firehol_l4"
]

Running Sources Individually

Each data source has its own exported function. You can run them individually or compose them with Promise.allSettled for parallel execution:

build.ts
import {
  getBGPAndASN,
  buildCitiesData,
  getGeoDatas,
  getListOfProxies,
  getTorLists,
  getThreatLists,
  getCrawlersIps,
  getUserAgentLmdbList,
  getDisposableEmailLmdbList,
} from '@riavzon/shield-base';

const output = './data/mmdb';
const mmdbPath = 'mmdbctl';
const contactInfo = 'Your Name https://example.com - [email protected]';

// Run all in parallel
const results = await Promise.allSettled([
  getBGPAndASN(contactInfo, output, mmdbPath),
  buildCitiesData(output, mmdbPath),
  getGeoDatas(output, mmdbPath),
  getListOfProxies(output, mmdbPath),
  getTorLists(output, mmdbPath),
  getThreatLists(output, mmdbPath, true),
  getCrawlersIps(output, mmdbPath),
  getUserAgentLmdbList(output),
  getDisposableEmailLmdbList(output),
]);

Function Reference

FunctionOutput fileNotes
getBGPAndASN(contact, output, mmdbPath)asn.mmdbcontact is the BGP.tools User-Agent string
buildCitiesData(output, mmdbPath)city.mmdb
getGeoDatas(output, mmdbPath)country.mmdb
getListOfProxies(output, mmdbPath)proxy.mmdb
getTorLists(output, mmdbPath)tor.mmdb
getThreatLists(output, mmdbPath, sources)firehol_*.mmdbsources is true or a string array of list IDs
getCrawlersIps(output, mmdbPath, customUrls?)goodBots.mmdbcustomUrls is optional ProvidersLists[]
getUserAgentLmdbList(output)useragent-db/useragent.mdbNo mmdbPath needed
getDisposableEmailLmdbList(output)email-db/disposable-emails.mdbNo mmdbPath needed
LMDB data sources (getUserAgentLmdbList, getDisposableEmailLmdbList) do not require the mmdbctl binary.

Custom Crawler URLs

getCrawlersIps accepts an optional array of custom provider definitions. This is merged with the built-in providers and compiled into a single goodBots.mmdb:

build.ts
import { getCrawlersIps } from '@riavzon/shield-base';
import type { ProvidersLists } from '@riavzon/shield-base';

const customProviders: ProvidersLists[] = [
  {
    name: 'cloudflare',       // Stored as the provider field in the database
    type: 'JSON',             // 'JSON' | 'CSV' | 'HTML'
    urls: [
      'https://www.cloudflare.com/ips-v4',
      'https://www.cloudflare.com/ips-v6',
    ],
  },
];

await getCrawlersIps('./data/mmdb', 'mmdbctl', customProviders);
The type field is a special field that the success of the data retrieval depends on. If the links you are providing include a regular html/markdown/other-raw-text-data page, use HTML. If it is a link to a CSV file, use CSV. If it is a JSON (e.g., https://developers.google.com/static/search/apis/ipranges/googlebot.json), use JSON. Providing urls that mixes CSV with JSON data or raw text with CSV and JSON will fail to process this provider. Visit the built-in providers to get an idea of the parsing engine or check the source code.

Refreshing Data

The restartData function re-downloads and recompiles your previously compiled data sources using the cached configuration:

refresh.ts
import { restartData } from '@riavzon/shield-base';

const outputDirectory = './data/mmdb';
const refreshAll = true; // true = refresh all cached sources

await restartData(outputDirectory, refreshAll);

Reading Compiled Databases

MMDB Databases

You can read from a compiled database via the command line with mmdbctl:

mmdbctl read -f json-pretty 8.8.8.8 outputDirectory/asn.mmdb

Or with a specialized library such as mmdb-lib, @maxmind/geoip2-node, or maxmind.

Example:

read-mmdb.ts
import fs from 'fs';
import * as mmdb from 'mmdb-lib';
import type { BgpRecord } from '@riavzon/shield-base';

const db = fs.readFileSync('./data/mmdb/asn.mmdb');
const reader = new mmdb.Reader<BgpRecord>(db);

const result = reader.get('8.8.8.8');
console.log(result);
// { asn_id: '15169', asn_name: 'Google LLC', classification: 'Content', ... }
read-mmdb-alt.ts
import { Reader } from '@maxmind/geoip2-node';

const reader = await Reader.open('./data/mmdb/asn.mmdb');

LMDB Databases

You can query any .mdb database from the command line using the lm-read subcommand, or read it programmatically.

For production use, use the lmdb-js library directly, its fully ACID complaint, gives you full control over iteration, transactions, and much more:

read-lmdb.ts
import { open } from 'lmdb';

const db = open({
  path: './data/mmdb/useragent-db/useragent.mdb',
  name: 'useragent',
  readOnly: true,
  useVersions: true,
  compression: true,
  sharedStructuresKey: Symbol.for('structures'),
});

const record = db.get('sqlmap');
db.close();

The exported reader functions are a convenience wrapper suited for scripting and inspection:

read-lmdb.ts
import {
  getByKey,
  getRange,
  getByPrefix,
  countRecords,
  doesExist,
} from '@riavzon/shield-base';

const dbPath = './data/mmdb/useragent-db/useragent.mdb';
const dbName = 'useragent';

const record = getByKey(dbPath, dbName, 'sqlmap');
const first10 = getRange(dbPath, dbName, 10);
const curlMatches = getByPrefix(dbPath, dbName, 'curl', 20);
const total = countRecords(dbPath, dbName);
const exists = doesExist(dbPath, dbName, 'nmap-scripting-engine');

See the API Reference for full signatures and return types.


Running Shell Commands

The run utility executes arbitrary shell commands and returns the output. It is used internally by some scripts to invoke mmdbctl:

import { run } from '@riavzon/shield-base';

await run('mmdbctl read -f json-pretty 8.8.8.8 ./asn.mmdb');
This utility designed to be used internally and its provided input is not sanitized. Do not pass untrusted values to it.
Logo