Overview
My Master's thesis: a RAG Q&A platform focused on air pollution, environmental policy, and public health — combining semantic retrieval with a local LLM.
Implementation
- Embeddings: BAAI/bge-m3 (Chinese semantic), recursive punctuation-aware chunking
- Vector DB: ChromaDB with rich metadata (source, page, topic), MMR + reranking
- Inference: Gemma 3:12B via Ollama, orchestrated with LangChain, served by FastAPI
- Frontend: React + Vite chat UI with format switching (paragraph / bullet / emoji)
- Eval: RAGAS (faithfulness, context & answer relevance)
- Deploy: DuckDNS dynamic domain + Nginx reverse proxy with Let's Encrypt HTTPS