⚡ You're viewing a live demo of ChimerAI. Data resets daily at midnight UTC.Get the CLI →

RAG Pipeline — Integration Guide

Overview

Retrieval-Augmented Generation (RAG) lets the AI answer questions grounded in your own documents. Upload PDFs, text files, or web pages — ChimerAI chunks them, embeds them, and retrieves the most relevant snippets before generating a response.


Quick Start

# Installs the Python AI service with RAG capabilities
chimerai add rag

⚠️ The RAG engine runs in the Python AI service (FastAPI, port 8002), not in Next.js. The Next.js frontend calls the Python service via HTTP.

# Start the Python AI service:
cd services/ai
pip install -r requirements.txt
uvicorn main:app --reload --port 8002

Minimum .env for local use:

OPENAI_API_KEY=sk-...
DEFAULT_CHAT_MODEL=gpt-3.5-turbo
# For local/free embeddings:
DEFAULT_EMBEDDING_MODEL=ollama/nomic-embed-text

Architecture

Document Upload
  → Chunker (text splitter)
  → Embedder (text-embedding-3-small)
  → Vector Store (pgvector / Pinecone / Qdrant)

User Query
  → Embed query
  → Nearest-neighbour search
  → Top-k chunks → LLM context
  → Streamed answer

Environment Variables

VariableDescription
OPENAI_API_KEYUsed for embeddings and generation
DATABASE_URLPostgreSQL with pgvector extension
PINECONE_API_KEYOptional — Pinecone vector store
PINECONE_INDEXPinecone index name

Python AI Service REST API

Base URL: http://localhost:8002

Upload documents

POST /api/rag/documents
{ "documents": ["text..."], "metadatas": [{"filename": "report.txt"}] }

// Response:
{ "status": "success", "added": 4, "total_vectors": 4 }

Long documents (>1000 chars) are automatically split into chunks.

POST /api/rag/search
{ "query": "What is deep learning?", "k": 5 }

// Response:
{ "results": [{ "id": 2, "text": "...", "score": 0.82, "rank": 1, "metadata": {} }] }

RAG chat (retrieve + generate)

POST /api/rag/chat
{ "query": "What is the capital of Germany?", "k": 3, "temperature": 0.7 }

// Response:
{ "choices": [{ "message": { "role": "assistant", "content": "Berlin." } }],
  "rag_metadata": { "documents": [{ "text": "...", "score": 0.81 }] } }

Other endpoints

EndpointDescription
GET /api/rag/statsTotal vectors, dimension, index type
DELETE /api/rag/deleteDelete specific document IDs
DELETE /api/rag/clearClear entire vector store

React / Next.js usage

const AI_URL = process.env.NEXT_PUBLIC_AI_SERVICE_URL ?? 'http://localhost:8002';

async function ragChat(query: string) {
  const res = await fetch(`${AI_URL}/api/rag/chat`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query, k: 3 }),
  });
  const data = await res.json();
  return data.choices[0].message.content;
}

---

## Supported Vector Stores

| Store | Adapter |
|---|---|
| PostgreSQL + pgvector | Built-in (default) |
| Pinecone | `@pinecone-database/pinecone` |
| Qdrant | `@qdrant/js-client-rest` |
| Weaviate | `weaviate-ts-client` |

---

## Further Reading

- [RAG Integration Guide](/docs/rag_integration_guide) — 927-line guide covering React, Blazor, Vue, Angular, plain HTML, auth patterns, and troubleshooting
- [RAG Documentation](/docs/rag)
- [Technical Reference](/docs/models)
ChimerAI Docs · Back to Demo