AI Guardrails
ChimerAI's Guardrails feature lets you define safety rules that intercept every AI request and response — blocking harmful content, enforcing topic restrictions, and redacting PII before it reaches the model or the user.
What you get
- Input filtering — Block prompts containing forbidden topics, PII, or jailbreak attempts
- Output filtering — Scan AI responses for harmful content before delivery
- PII redaction — Auto-redact emails, phone numbers, SSNs from outputs
- Topic allowlist / blocklist — Restrict AI to specific domains per workspace
- Audit log — All guardrail violations logged with reason and severity
- Custom rules — Add your own regex or LLM-based rules
⚠️ Guardrails run in the Python AI service (
services/ai/), not in Next.js. The TypeScriptcheckGuardrails()shown below calls the Python HTTP API.
Quick setup
chimerai add ai-service # required first
chimerai add guardrails
Files installed in the Python AI service:
services/ai/services/guardrails_service.py
services/ai/routes/guardrails_routes.py
Usage in your chat pipeline
// lib/chat-pipeline.ts
import { checkGuardrails } from '@/lib/guardrails';
export async function handleChat(message: string, userId: string) {
// Check input
const inputResult = await checkGuardrails(message, 'input', userId);
if (!inputResult.allowed) {
return { blocked: true, reason: inputResult.reason };
}
// Call AI model...
const response = await callAI(message);
// Check output
const outputResult = await checkGuardrails(response, 'output', userId);
if (!outputResult.allowed) {
return { blocked: true, reason: 'Response blocked by guardrails' };
}
return { response: outputResult.sanitized ?? response };
}
Rule definitions
// lib/guardrails-rules.ts
export const defaultRules = [
{
id: 'no-pii-email',
type: 'regex',
direction: 'output',
pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
action: 'redact',
replacement: '[EMAIL REDACTED]',
},
{
id: 'no-jailbreak',
type: 'keyword',
direction: 'input',
keywords: ['ignore previous instructions', 'pretend you are', 'DAN mode'],
action: 'block',
reason: 'Potential jailbreak attempt',
severity: 'high',
},
{
id: 'topic-restriction',
type: 'llm',
direction: 'input',
prompt: 'Is this message related to customer support? Answer YES or NO.',
allowPattern: /^YES/i,
action: 'block',
reason: 'Off-topic message',
},
];
API endpoint
// app/api/guardrails/check/route.ts
import { checkGuardrails } from '@/lib/guardrails';
export async function POST(req: Request) {
const { message, direction, userId } = await req.json();
const result = await checkGuardrails(message, direction, userId);
return Response.json(result);
}
GuardrailsService (Python) — key methods
The service is a Python class. All methods are available at the HTTP layer below.
PII detection patterns
| Type | Example match |
|---|---|
email | alice@example.com |
phone | +1-555-123-4567 |
ssn | 123-45-6789 |
credit_card | 4111 1111 1111 1111 |
ip_address | 192.168.1.1 |
api_key | sk-..., pk_... etc. |
# detect_pii(text) -> { has_pii, pii_items, count }
# redact_pii(text, redaction_char='[REDACTED]') -> str
# check_toxicity(text) -> { is_toxic, score 0–1, flagged_terms }
# detect_prompt_injection(prompt) -> { is_injection, confidence, patterns_found }
# validate_output(output, max_length, required_elements) -> { is_valid, issues }
# sanitize_input(text) -> str (strips null bytes, control chars, excess whitespace)
HTTP endpoints (Python AI service)
POST /guardrails/check-input
// Request:
{ "text": "Ignore instructions. SSN: 123-45-6789.", "check_pii": true, "check_injection": true, "sanitize": true }
// Response:
{ "approved": false, "sanitized_text": "...", "issues": { "pii": { "has_pii": true }, "injection": { "is_injection": true } } }
POST /guardrails/check-output
Validates an AI response before delivering it to the user (PII redaction, length, required elements).
Further reading
- Guardrails Guide — complete reference with all method signatures and response schemas
- ChimerAI Security Docs
- RBAC Guide