RAG Pipeline — Integration Guide
Overview
Retrieval-Augmented Generation (RAG) lets the AI answer questions grounded in your own documents. Upload PDFs, text files, or web pages — ChimerAI chunks them, embeds them, and retrieves the most relevant snippets before generating a response.
Quick Start
# Installs the Python AI service with RAG capabilities
chimerai add rag
⚠️ The RAG engine runs in the Python AI service (FastAPI, port 8002), not in Next.js. The Next.js frontend calls the Python service via HTTP.
# Start the Python AI service:
cd services/ai
pip install -r requirements.txt
uvicorn main:app --reload --port 8002
Minimum .env for local use:
OPENAI_API_KEY=sk-...
DEFAULT_CHAT_MODEL=gpt-3.5-turbo
# For local/free embeddings:
DEFAULT_EMBEDDING_MODEL=ollama/nomic-embed-text
Architecture
Document Upload
→ Chunker (text splitter)
→ Embedder (text-embedding-3-small)
→ Vector Store (pgvector / Pinecone / Qdrant)
User Query
→ Embed query
→ Nearest-neighbour search
→ Top-k chunks → LLM context
→ Streamed answer
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY | Used for embeddings and generation |
DATABASE_URL | PostgreSQL with pgvector extension |
PINECONE_API_KEY | Optional — Pinecone vector store |
PINECONE_INDEX | Pinecone index name |
Python AI Service REST API
Base URL: http://localhost:8002
Upload documents
POST /api/rag/documents
{ "documents": ["text..."], "metadatas": [{"filename": "report.txt"}] }
// Response:
{ "status": "success", "added": 4, "total_vectors": 4 }
Long documents (>1000 chars) are automatically split into chunks.
Semantic search
POST /api/rag/search
{ "query": "What is deep learning?", "k": 5 }
// Response:
{ "results": [{ "id": 2, "text": "...", "score": 0.82, "rank": 1, "metadata": {} }] }
RAG chat (retrieve + generate)
POST /api/rag/chat
{ "query": "What is the capital of Germany?", "k": 3, "temperature": 0.7 }
// Response:
{ "choices": [{ "message": { "role": "assistant", "content": "Berlin." } }],
"rag_metadata": { "documents": [{ "text": "...", "score": 0.81 }] } }
Other endpoints
| Endpoint | Description |
|---|---|
GET /api/rag/stats | Total vectors, dimension, index type |
DELETE /api/rag/delete | Delete specific document IDs |
DELETE /api/rag/clear | Clear entire vector store |
React / Next.js usage
const AI_URL = process.env.NEXT_PUBLIC_AI_SERVICE_URL ?? 'http://localhost:8002';
async function ragChat(query: string) {
const res = await fetch(`${AI_URL}/api/rag/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, k: 3 }),
});
const data = await res.json();
return data.choices[0].message.content;
}
---
## Supported Vector Stores
| Store | Adapter |
|---|---|
| PostgreSQL + pgvector | Built-in (default) |
| Pinecone | `@pinecone-database/pinecone` |
| Qdrant | `@qdrant/js-client-rest` |
| Weaviate | `weaviate-ts-client` |
---
## Further Reading
- [RAG Integration Guide](/docs/rag_integration_guide) — 927-line guide covering React, Blazor, Vue, Angular, plain HTML, auth patterns, and troubleshooting
- [RAG Documentation](/docs/rag)
- [Technical Reference](/docs/models)