Custom Data Sources
Shield Base can compile any JSON data you provide into a fully typed MMDB or LMDB database. The compile subcommand and the compiler function handle both formats. The only requirements are that MMDB records contain a range field, and LMDB records contain a key or id field.
Choosing a Format
| Format | Use when | Required field |
|---|---|---|
| MMDB | Your data is keyed by IP address or CIDR range | range (IPv4/IPv6 address or CIDR) |
| LMDB | Your data is keyed by any string identifier | key or id |
CLI: Compile a Single File
# Compile an MMDB database from a JSON file
pnpm dlx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out example.json
# Compile an LMDB database from a JSON file
pnpm dlx @riavzon/shield-base compile --type lmdb --name myKeys --outputDir ./out example.json
yarn dlx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out example.json
yarn dlx @riavzon/shield-base compile --type lmdb --name myKeys --outputDir ./out example.json
npx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out example.json
npx @riavzon/shield-base compile --type lmdb --name myKeys --outputDir ./out example.json
bunx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out example.json
bunx @riavzon/shield-base compile --type lmdb --name myKeys --outputDir ./out example.json
import { compiler } from '@riavzon/shield-base';
// MMDB (IP range data)
await compiler({
type: 'mmdb',
input: {
data: 'example.json', // file path, raw JSON string, or array of objects
dataBaseName: 'myRanges',
outputPath: './out',
mmdbPath: 'mmdbctl', // required for mmdb
generateTypes: true,
},
});
// LMDB (key-value data)
await compiler({
type: 'lmdb',
input: {
data: 'example.json',
dataBaseName: 'myKeys',
outputPath: './out',
generateTypes: true,
// no mmdbPath needed for lmdb
},
});
This produces:
./out/myRanges.mmdb(ormyKeys.mdb+myKeys.mdb-lockfor LMDB)./out/myRangesTypes.ts(TypeScript types auto-generated from the data schema)
Pass --no-types to skip type generation:
pnpm dlx @riavzon/shield-base compile --type mmdb --name myRanges --no-types example.json
yarn dlx @riavzon/shield-base compile --type mmdb --name myRanges --no-types example.json
npx @riavzon/shield-base compile --type mmdb --name myRanges --no-types example.json
bunx @riavzon/shield-base compile --type mmdb --name myRanges --no-types example.json
CLI: Batch Processing
When you provide multiple input files, the first output uses your --name and subsequent files are indexed:
pnpm dlx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out file1.json file2.json file3.json
yarn dlx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out file1.json file2.json file3.json
npx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out file1.json file2.json file3.json
bunx @riavzon/shield-base compile --type mmdb --name myRanges --outputDir ./out file1.json file2.json file3.json
Produces: myRanges.mmdb, myRanges-1.mmdb, myRanges-2.mmdb and a matching set of type files.
Programmatic: Input Formats
The data field in the compiler options accepts three forms:
import { compiler } from '@riavzon/shield-base';
// 1. File path
await compiler({ type: 'mmdb', input: { data: './example.json', dataBaseName: 'db', mmdbPath: 'mmdbctl', outputPath: './', generateTypes: true } });
// 2. Raw JSON string
await compiler({ type: 'mmdb', input: { data: '[{"range":"1.1.1.0/24"}]', dataBaseName: 'db', mmdbPath: 'mmdbctl', outputPath: './', generateTypes: true } });
// 3. JavaScript array
const data = [{ range: '1.1.1.0/24', name: 'Cloudflare' }];
await compiler({ type: 'mmdb', input: { data, dataBaseName: 'db', mmdbPath: 'mmdbctl', outputPath: './', generateTypes: true } });
Programmatic: Batch Processing
Provide a StringOfSources[] array to compile multiple JSON files into separate databases in one call:
import { compiler } from '@riavzon/shield-base';
import type { StringOfSources } from '@riavzon/shield-base';
const sources: StringOfSources[] = [
{ pathToJson: 'ranges1.json', dataBaseName: 'rangesA', outputPath: './out' },
{ pathToJson: 'ranges2.json', dataBaseName: 'rangesB', outputPath: './out' },
];
// MMDB batch
await compiler({
type: 'mmdb',
input: {
data: sources,
dataBaseName: 'rangesA',
mmdbPath: 'mmdbctl',
outputPath: './out',
generateTypes: true,
},
});
// LMDB batch
await compiler({
type: 'lmdb',
input: {
data: sources,
dataBaseName: 'keysA',
outputPath: './out',
generateTypes: true,
},
});
Full Example: Nested MMDB Data
Shield Base handles deeply nested JSON structures. Given this input:
[
{
"range": "1.1.1.0/24",
"metadata": {
"version": "1.0.0",
"author": "Person",
"tags": ["dns", "secure", "fast"],
"sub_data": {
"level_1": {
"level_2": {
"level_3": {
"level_4": {
"deep_value": "Success",
"array_of_objects": [
{ "index": 0, "active": true },
{ "index": 1, "active": false }
],
"mixed_types": [1, "two", { "three": 3 }]
}
}
}
}
}
},
"organization": {
"name": "Cloudflare, Inc.",
"details": {
"headquarters": "San Francisco",
"employees": 3000,
"is_public": true
}
}
},
]
Compile and query it:
pnpm dlx @riavzon/shield-base compile --type mmdb --name myDb --outputDir ./out example.json
yarn dlx @riavzon/shield-base compile --type mmdb --name myDb --outputDir ./out example.json
npx @riavzon/shield-base compile --type mmdb --name myDb --outputDir ./out example.json
bunx @riavzon/shield-base compile --type mmdb --name myDb --outputDir ./out example.json
mmdbctl read -f json-pretty 1.1.1.10 ./out/myDb.mmdb
{
"ip": "1.1.1.10",
"metadata": {
"author": "Person",
"sub_data": {
"level_1": {
"level_2": {
"level_3": {
"level_4": {
"array_of_objects": [
{
"active": true,
"index": 0
},
{
"active": false,
"index": 1
}
],
"deep_value": "Success",
"mixed_types": [
1,
"two",
{
"three": 3
}
]
}
}
}
}
},
"tags": [
"dns",
"secure",
"fast"
],
"version": "1.0.0"
},
"network": "1.1.1.0/24",
"organization": {
"details": {
"employees": 3000,
"headquarters": "San Francisco",
"is_public": true
},
"name": "Cloudflare, Inc."
}
}
The generated mymmdbdbTypes.ts type file reflects the full structure:
interface MyMmdbDb {
range: string;
metadata: Metadata;
organization: Organization;
}
interface Organization {
name: string;
details: Details;
}
interface Details {
headquarters: string;
employees: number;
is_public: boolean;
}
interface Metadata {
version: string;
author: string;
tags: string[];
sub_data: Subdata;
}
interface Subdata {
level_1: Level1;
}
interface Level1 {
level_2: Level2;
}
interface Level2 {
level_3: Level3;
}
interface Level3 {
level_4: Level4;
}
interface Level4 {
deep_value: string;
array_of_objects: Arrayofobject[];
mixed_types: (Mixedtype | number | string)[];
}
interface Mixedtype {
three: number;
}
interface Arrayofobject {
index: number;
active: boolean;
}
Custom Crawler Providers
getCrawlersIps accepts a ProvidersLists[] array to merge your own IP range sources with the built-in crawler datasets:
import { getCrawlersIps } from '@riavzon/shield-base';
import type { ProvidersLists } from '@riavzon/shield-base';
const customProviders: ProvidersLists[] = [
{
name: 'cloudflare', // Stored as the `provider` field in the database
type: 'JSON', // 'JSON' | 'CSV' | 'HTML'
urls: [
'https://www.cloudflare.com/ips-v4',
'https://www.cloudflare.com/ips-v6',
],
},
];
await getCrawlersIps('./out', 'mmdbctl', customProviders);
This compiles the built-in provider datasets and your custom sources into a single goodBots.mmdb database.
Input Type Reference
type CompilerOptions<T> =
| { type: 'lmdb'; input: LmdbInput<T> }
| { type: 'mmdb'; input: Input<T> };
type LmdbInput<T> = Omit<Input<T>, 'mmdbPath'> & {
data: LmdbSources<T>;
};
interface Input<T> {
outputPath: string;
dataBaseName: string;
data: T[] | StringOfSources[] | string;
mmdbPath: string;
generateTypes: boolean;
}
interface DatabaseRecord<T> {
key: string,
data: T
}
interface StringOfSources {
pathToJson: string,
dataBaseName: string;
outputPath: string;
}
type LmdbSources<T> = DatabaseRecord<T>[] | StringOfSources[] | string | (T & {key: string})[];
types subcommand to generate TypeScript types from a JSON file without compiling a database. This is useful for previewing the type output before committing to a schema.