XSS Protection

The multi-pass HTML sanitization pipeline, Zod integration, automatic IP banning on XSS detection, timing-attack prevention, and how to apply the full defense chain in custom handlers.

The IAM service provides a layered XSS defense pipeline that runs on every user-supplied string. The pipeline consists of three components that work together: sanitizeInput performs deep multi-pass HTML stripping and entity encoding, makeSanitizedZodString integrates that sanitizer into Zod schemas so validation and sanitization happen in a single step, and validateZodSchema orchestrates the whole flow while automatically banning the client's IP when an XSS payload is detected.

All three utilities are exported from @riavzon/auth for use in your own route handlers. The built-in authentication controllers (signup, login, MFA verification, email update, password reset, OAuth) already use the full pipeline on every field that accepts user input.


Sanitization pipeline

sanitizeInput (exported as the default from htmlSanitizer) is the core sanitization function. It accepts a raw string and returns a cleaned string along with a detection report indicating whether any HTML was found during processing.

import sanitizeInput from '@riavzon/auth'

const { vall, results } = sanitizeInput(userInput)

if (results.htmlFound) {
  // HTML or script injection was detected and stripped
  console.log('Detected tags:', results.tags)
}

// vall is the fully sanitized string, safe for storage or rendering

The function performs 8 sequential stages. Each stage builds on the previous one, and an attacker must bypass all of them for a payload to survive.

Length guard

Before any processing, the sanitizer rejects input longer than htmlSanitizer.maxAllowedInputLength (default 50000). Oversized input throws rather than entering the loop. This is the hard cap on CPU cost for a single call.

Unicode normalization

The input is normalized to NFKC, which collapses visually similar characters to their canonical form Zero-width characters, soft hyphens, byte-order marks, and bidirectional override characters are stripped in the same pass. Halfwidth and fullwidth ASCII characters (U+FF01 through U+FF5E) are transliterated back to standard ASCII. This defeats payloads that hide tags inside fullwidth substitutions such as \uFF1C for <.

Strict URI decode

decodeURIComponent is called once inside a try/catch. If the call throws (malformed percent-encoding like %ZZ), the input is rejected immediately and returned as an empty string with htmlFound: true. Legitimate input does not contain malformed URI sequences, so rejecting early keeps broken data out of the loop.

Iterative URI and entity decoding

The function enters a decode loop that alternates between decodeURIComponent and he.decode (he is an HTML entity decoder). Each iteration decodes one layer of encoding. The loop continues until the output stabilizes (no change between iterations) or until the IrritationCount limit is reached.

This catches payloads that rely on nested encoding: %253Cscript%253E decodes to %3Cscript%3E on the first pass, then to <script> on the second. Without the loop, a single-pass decoder would leave the inner encoding intact.

If the loop exceeds IrritationCount iterations without stabilizing, the input is rejected entirely and returned as an empty string with htmlFound: true. This protects against intentionally crafted inputs designed to consume CPU through deep encoding chains.

Residual cleanup

After the loop, zero-width characters are stripped again (the decoders may have reintroduced them) and any whitespace inside the bodies of surviving tag-like substrings is removed so that <scr\tipt> cannot slip past the tag regex.

Pattern detection

After decoding, the function tests the cleaned string against three patterns:

PatternCatches
/<\s*\/?\s*[A-Za-z][A-Za-z0-9-]*(?:\s+[^>]*?)?\s*>/iAny HTML tag
/on\w+\s*=/iInline event handlers (onclick=, onerror=)
/javascript\s*:/iJavaScript protocol URIs

If any pattern matches, htmlFound is set to true in the results. This flag is used downstream by validateZodSchema to trigger IP banning.

sanitize-html pass

The string is passed through sanitize-html with a strict configuration:

  • allowedTags: [],
  • allowedAttributes: {},
  • allowedIframeHostnames: [],
  • allowedSchemes: [],
  • allowProtocolRelative: false,
  • nestingLimit: 10,
  • nonTextTags: ['script', 'style', 'noscript', 'iframe', 'svg'].

The textFilter re-runs the tag regex on text nodes, and onOpenTag records any tag name and attribute set that the sanitizer had to strip. If the string shrinks during this pass, htmlFound is set to true even if pattern detection did not trigger.

Entity encoding

The final output is entity-encoded: &, <, >, ", ', backtick, and ${ are replaced with their entity or escaped equivalents. The backtick and template-literal escapes prevent injection into JavaScript template strings. The result is trimmed.

CharacterReplacement
&&amp;
<&lt;
>&gt;
"&quot;
'&#x27;
`&#x60;
${\${

Zod integration

makeSanitizedZodString creates a Zod string schema that validates length and optional regex constraints, then runs the full sanitization pipeline as a Zod transform. The returned value is always the sanitized output, and any HTML detection is reported as a Zod issue with an 'HTML found' message prefix.

import { makeSanitizedZodString } from '@riavzon/auth'
import { z } from 'zod'

const commentSchema = z.object({
  text: makeSanitizedZodString({ min: 1, max: 1000 }),
  name: makeSanitizedZodString({
    min: 2,
    max: 50,
    pattern: /^[A-Za-z\s]+$/,
    patternMsg: 'Name must contain only letters and spaces',
  }),
})

Parameters

min
number required
Minimum string length. Enforced before sanitization.
max
number required
Maximum string length. Enforced before sanitization.
pattern
RegExp
Optional regex the string must match. Validated after length checks and before sanitization.
patternMsg
string
Custom error message shown when the pattern does not match.

How it works

The schema chains three operations:

  1. Length and pattern validation using standard Zod .min(), .max(), and .regex() validators
  2. HTML detection check via .check() that calls sanitizeInput and pushes a custom Zod issue if htmlFound is true
  3. Sanitization transform via .transform() that calls sanitizeInput again and returns only the cleaned vall string
Every Zod schema in the built-in authentication controllers (signup, login, MFA, email update, password reset, OAuth) uses makeSanitizedZodString for all user-supplied string fields. If you add custom routes, use it for consistency.

The built-in schemas that use makeSanitizedZodString include:

SchemaFields
Signupname, email, password
Loginemail, password
Email updateemail, newEmail, password
MFA codecode
Password resetrandom, reason
Custom MFArandom, reason

Validation with XSS enforcement

validateZodSchema ties the full pipeline together. It parses input against a Zod schema, and when any Zod issue starts with 'HTML found' (produced by makeSanitizedZodString), it calls handleXSS to ban the client immediately.

import { validateZodSchema } from '@riavzon/auth'

const result = await validateZodSchema(commentSchema, req.body, req, log)

if ('valid' in result && result.valid === false) {
  // Validation failed (could be XSS ban or normal validation error)
  res.status(result.errors === 'XSS attempt' ? 403 : 400).json({ errors: result.errors })
  return
}

if (!result.success) {
  // Standard Zod validation error
  res.status(422).json(result.error.format())
  return
}

// result.data is fully sanitized and validated
const { text, name } = result.data

Validation flow

Zod parsing

The schema is parsed with safeParse(). If parsing succeeds, the validated and transformed data is returned immediately.

HTML issue scan

If parsing fails, the function scans the Zod error issues array for any issue whose message starts with 'HTML found'. This marker is set by makeSanitizedZodString when the sanitizer detects HTML content.

XSS punishment

When an HTML issue is found, handleXSS is called with the Express request object. The function bans the client's IP using the Bot Detector service with the configured banScore (defaults to 100) and the reason 'XSS SCRIPTING ATTEMPT'. It also marks the visitor's canary_id as a bot and updates the banned IP record.

The function returns { valid: false, errors: 'XSS attempt' } and the controller responds with HTTP 403.

Normal validation errors

If no HTML issues are found, the function collects all Zod issues into a key-value map (field name to error message) and returns { valid: false, errors: { ... } }.

The XSS ban is permanent at the Bot Detector level. The client IP is added to the banned list, the visitor is flagged as a bot, and subsequent requests from that IP receive elevated scrutiny from anomaly detection. There is no automatic expiry.

handleXSS

handleXSS is the enforcement function called when an XSS attempt is confirmed. It reads the banScore from botDetector.settings in the configuration (defaulting to 100 if not set) and executes three actions:

import { handleXSS } from '@riavzon/auth'

// Called automatically by validateZodSchema — you rarely need to call this directly
await handleXSS(req, '<script>alert(1)</script>', log)

IP ban

banIp is called first with the client IP and the configured banScore. This adds the IP to the Bot Detector ban list with the reason 'XSS SCRIPTING ATTEMPT'. This step runs before the remaining actions to ensure the IP is blocked as quickly as possible.

Visitor record and bot flag

Two actions run concurrently via Promise.all:

ActionFunctionEffect
Visitor recordupdateBannedIP(canary_id, ip, ua, { score: 10, reasons: [...] })Updates the banned IP record with the visitor's cookie, user agent, and ban reason
Bot flagupdateIsBot(true, canary_id)Marks the visitor's canary_id as a confirmed bot in the visitors table

The function logs a warning before and after the ban.


Timing attack prevention

timeEnumeration (exported as waitSomeTime internally) adds a fixed delay to responses where timing differences could leak information. Authentication endpoints use it to ensure that responses for valid and invalid inputs take the same amount of time.

import { timeEnumeration } from '@riavzon/auth'

const start = Date.now()

// ... process the request (may return early if user not found)

const elapsed = Date.now() - start
const minimumResponseTime = 3000 // 3 seconds

if (elapsed < minimumResponseTime) {
  await timeEnumeration(minimumResponseTime - elapsed, log)
}

The following controllers enforce a minimum 3-second response time:

ControllerRouteWhy
initPasswordResetPOST /auth/forgot-passwordPrevents enumerating which email addresses have accounts
initCustomMfaFlowPOST /custom/mfa/:reasonPrevents timing analysis of custom MFA initiation
Always use timeEnumeration on any endpoint that reveals presence or absence of a user account through its response time. A password reset endpoint that returns instantly for unknown emails and slowly for known ones leaks account existence.

Applying XSS protection in custom handlers

The full pipeline works in four steps: define a schema with makeSanitizedZodString, validate with validateZodSchema, check the result, and use the sanitized data.

import { validateZodSchema, makeSanitizedZodString } from '@riavzon/auth'
import { z } from 'zod'

const profileSchema = z.object({
  displayName: makeSanitizedZodString({ min: 1, max: 100 }),
  bio: makeSanitizedZodString({ min: 0, max: 500 }),
})

router.post('/profile', async (req, res) => {
  const log = getLogger().child({ route: '/profile' })
  const result = await validateZodSchema(profileSchema, req.body, req, log)

  if ('valid' in result && !result.valid) {
    if (result.errors === 'XSS attempt') {
      return res.status(403).json({ banned: true })
    }
    return res.status(400).json({ errors: result.errors })
  }

  if (!result.success) {
    return res.status(422).json(result.error.format())
  }

  // result.data.displayName and result.data.bio are fully sanitized
  await db.updateProfile(userId, result.data)
  res.json({ ok: true })
})

Configuration reference

htmlSanitizer.IrritationCount
number
Maximum number of URI-decode and entity-decode iterations. Higher values catch deeper encoding chains but increase CPU cost per request. The input is rejected entirely if this limit is reached without the output stabilizing.
htmlSanitizer.maxAllowedInputLength
number
Maximum input string length in characters. Inputs exceeding this limit are rejected before any processing begins. Prevents memory exhaustion from oversized payloads.
Setting IrritationCount to a very high value on a public endpoint creates a CPU exhaustion vector. Pair it with maxAllowedInputLength and a request body size limit (e.g. express.json({ limit: '2kb' })) to bound worst-case processing time.

Summary

The XSS protection pipeline integrates with several other IAM subsystems:

SystemIntegration point
Anomaly DetectionBanned IPs from XSS attempts raise the suspicious_activity_score, which triggers MFA challenges or hard blocks when the score exceeds 25% of the ban threshold
Bot DetectorhandleXSS calls banIp, updateBannedIP, and updateIsBot to record the threat in the bot detection database
FingerprintingThe canary_id cookie ties the XSS ban to a specific device, so the ban persists even if the client's IP changes
SignupNew account creation validates name, email, and password through makeSanitizedZodString
MFAOTP code submission validates the code field through the same pipeline
Logo