Vision Tool (Image Analysis)
ChimerAI's Vision Tool analyses images using multimodal AI models (GPT-4o Vision, Claude 3, Gemini) — describe images, extract text (OCR), detect objects, read charts, or answer questions about visual content.
What you get
- Image description — Natural language description of image contents
- OCR / text extraction — Extract all text from images, screenshots, scanned documents
- Object detection — List objects, people, locations in the image
- Chart/graph reading — Extract data from bar charts, pie charts, line graphs
- Custom questions — Ask any question about the image
- URL or upload — Accepts image URLs or base64-encoded uploads
- Multi-image comparison — Compare two images in one prompt
Quick setup
npx chimerai add ai-tools --only vision
Scaffolds:
app/api/tools/vision/route.ts ← Vision endpoint
services/ai/tools/vision_tools.py ← Python vision implementation
Usage — describe an image
const res = await fetch('/api/tools/vision/analyse', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
imageUrl: 'https://example.com/screenshot.png',
task: 'describe',
prompt: 'What is shown in this image?',
}),
});
const { result } = await res.json();
Usage — OCR (text extraction)
const res = await fetch('/api/tools/vision/analyse', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
imageUrl: 'https://example.com/invoice.jpg',
task: 'ocr',
prompt: 'Extract all text from this image.',
}),
});
const { result } = await res.json();
// result: "Invoice #1234\nDate: 2025-01-15\nTotal: $499.00"
Usage — upload (base64)
const file = fileInput.files[0];
const reader = new FileReader();
reader.onload = async () => {
const base64 = (reader.result as string).split(',')[1];
const res = await fetch('/api/tools/vision/analyse', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
imageBase64: base64,
mimeType: file.type,
task: 'custom',
prompt: 'List all visible UI elements and their labels.',
}),
});
};
reader.readAsDataURL(file);
Python implementation
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def analyse_image(
image_url: str | None,
image_base64: str | None,
mime_type: str,
prompt: str,
) -> str:
if image_base64:
content_url = f"data:{mime_type};base64,{image_base64}"
else:
content_url = image_url
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": content_url}},
],
}],
max_tokens=1024,
)
return response.choices[0].message.content
Use cases
- Invoice processing — OCR + structured data extraction
- UI screenshot analysis — QA automation, accessibility checks
- Product image tagging — Auto-generate alt text and tags
- Chart data extraction — Parse dashboards and reports
- Document digitisation — Scan and extract form fields