Multi-modal RAG Knowledge Base Platform

A production-grade, fully containerized multi-modal RAG platform with real-time SSE streaming, multi-turn conversation compaction, session-scoped retrieval, and an admin console — all running locally via Docker Compose.

Overview

A production-grade, multi-modal RAG knowledge base platform featuring real-time streaming chat, multi-turn conversation-history compaction, session-scoped document retrieval, and an admin console — fully containerized and running entirely on local hardware via Docker Compose (no external API calls).

Implementation

Layered backend: FastAPI with a clean Router → Service → Repository → Schema architecture, async SQLAlchemy 2.x (AsyncSession + aiosqlite), Alembic migrations, and JWT (HS256) + bcrypt auth with role-based access (admin / user)
Local LLM stack via Ollama: gpt-oss for reasoning/chat, llava:7b as the vision model for image and multi-modal document parsing, and bge-m3 for 1024-dim GPU embeddings
RAG engine: RAGAnything built on LightRAG, with custom adapters (LLM / vision / embedding) and a ChromaVectorDBStorage adapter implementing LightRAG's BaseVectorStorage interface against ChromaDB
Multi-modal ingestion: MinerU for PDF layout/OCR parsing, LibreOffice for DOCX/PPTX/XLSX, and llava vision captioning for images — all chunked, embedded, and indexed into ChromaDB
Conversation compaction: an automatic mechanism that, once a session passes a message threshold, summarizes older turns via the LLM and keeps recent turns verbatim, keeping the context window bounded without breaking the conversation
Session-scoped retrieval: retrieval is automatically confined to documents attached to the current session, preventing cross-contamination with the global knowledge base
Streaming frontend: Next.js 16 App Router + TypeScript + shadcn/ui + Zustand, consuming an SSE stream (useSSEStream) for live token rendering, with a WebGL galaxy background (OGL)
Three query modes: Hybrid (semantic + knowledge graph), Local (focused chunks), and Global (graph-wide synthesis)
Infra: Docker Compose orchestrating ChromaDB, Ollama, backend, and a multi-stage-built frontend, with health checks and dependency gating