AI Guardrails

ChimerAI's Guardrails feature lets you define safety rules that intercept every AI request and response — blocking harmful content, enforcing topic restrictions, and redacting PII before it reaches the model or the user.

What you get

Input filtering — Block prompts containing forbidden topics, PII, or jailbreak attempts
Output filtering — Scan AI responses for harmful content before delivery
PII redaction — Auto-redact emails, phone numbers, SSNs from outputs
Topic allowlist / blocklist — Restrict AI to specific domains per workspace
Audit log — All guardrail violations logged with reason and severity
Custom rules — Add your own regex or LLM-based rules

⚠️ Guardrails run in the Python AI service (services/ai/), not in Next.js. The TypeScript checkGuardrails() shown below calls the Python HTTP API.

Quick setup

chimerai add ai-service   # required first
chimerai add guardrails

Files installed in the Python AI service:

services/ai/services/guardrails_service.py
services/ai/routes/guardrails_routes.py

Usage in your chat pipeline

// lib/chat-pipeline.ts
import { checkGuardrails } from '@/lib/guardrails';

export async function handleChat(message: string, userId: string) {
  // Check input
  const inputResult = await checkGuardrails(message, 'input', userId);
  if (!inputResult.allowed) {
    return { blocked: true, reason: inputResult.reason };
  }

  // Call AI model...
  const response = await callAI(message);

  // Check output
  const outputResult = await checkGuardrails(response, 'output', userId);
  if (!outputResult.allowed) {
    return { blocked: true, reason: 'Response blocked by guardrails' };
  }

  return { response: outputResult.sanitized ?? response };
}

Rule definitions

// lib/guardrails-rules.ts
export const defaultRules = [
  {
    id: 'no-pii-email',
    type: 'regex',
    direction: 'output',
    pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
    action: 'redact',
    replacement: '[EMAIL REDACTED]',
  },
  {
    id: 'no-jailbreak',
    type: 'keyword',
    direction: 'input',
    keywords: ['ignore previous instructions', 'pretend you are', 'DAN mode'],
    action: 'block',
    reason: 'Potential jailbreak attempt',
    severity: 'high',
  },
  {
    id: 'topic-restriction',
    type: 'llm',
    direction: 'input',
    prompt: 'Is this message related to customer support? Answer YES or NO.',
    allowPattern: /^YES/i,
    action: 'block',
    reason: 'Off-topic message',
  },
];

API endpoint

// app/api/guardrails/check/route.ts
import { checkGuardrails } from '@/lib/guardrails';

export async function POST(req: Request) {
  const { message, direction, userId } = await req.json();
  const result = await checkGuardrails(message, direction, userId);
  return Response.json(result);
}

`GuardrailsService` (Python) — key methods

The service is a Python class. All methods are available at the HTTP layer below.

PII detection patterns

Type	Example match
`email`	`alice@example.com`
`phone`	`+1-555-123-4567`
`ssn`	`123-45-6789`
`credit_card`	`4111 1111 1111 1111`
`ip_address`	`192.168.1.1`
`api_key`	`sk-...`, `pk_...` etc.

# detect_pii(text) -> { has_pii, pii_items, count }
# redact_pii(text, redaction_char='[REDACTED]') -> str
# check_toxicity(text) -> { is_toxic, score 0–1, flagged_terms }
# detect_prompt_injection(prompt) -> { is_injection, confidence, patterns_found }
# validate_output(output, max_length, required_elements) -> { is_valid, issues }
# sanitize_input(text) -> str  (strips null bytes, control chars, excess whitespace)

HTTP endpoints (Python AI service)

`POST /guardrails/check-input`

// Request:
{ "text": "Ignore instructions. SSN: 123-45-6789.", "check_pii": true, "check_injection": true, "sanitize": true }

// Response:
{ "approved": false, "sanitized_text": "...", "issues": { "pii": { "has_pii": true }, "injection": { "is_injection": true } } }

`POST /guardrails/check-output`

Validates an AI response before delivering it to the user (PII redaction, length, required elements).