TSAI_CHENG-HUNG
ALL PROJECTS
PROJECT_FILE · 2026– · ★ FEATURED

Construction Knowledge RAG & Steel Procurement Meeting Automation

A production-oriented construction knowledge assistant and steel procurement meeting generator built with LangGraph, hybrid RAG, Gemini grounded search, structured extraction, and DOCX automation.

Solo developer
LangGraphFastAPIHybrid RAGChromaDBBM25RRFGemini Grounded SearchOpenAI EmbeddingsSQLAlchemypython-docxNext.jsTypeScript

At a Glance

A production system for a construction company that does two things: answers internal-procedure questions with source-grounded, traceable answers, and turns a previously manual, multi-website weekly steel-procurement report into a ready-to-download Word document — automatically, in under three minutes.

Construction assistant — system & graph flowConstruction assistant — system & graph flow

Why I Built This

The idea came from watching two very different kinds of friction inside the same company.

New and existing staff regularly need to look up construction procedures, regulations, and the right form for a given situation — but that knowledge lives scattered across dozens of internal manuals. Finding the right passage (and trusting it's the right one, not an outdated or similar-sounding one) takes time and institutional memory that new hires simply don't have yet.

On the procurement side, the pain was more concrete: every week, before a roughly 30-minute steel-purchasing meeting, someone has to manually visit five-plus external sources — Feng Hsin's weekly opening prices, international scrap and iron-ore indices, the Xiben mainland-China index, LME copper, China Steel's monthly/quarterly benchmark prices — copy the numbers down, compute week-over-week deltas by hand, write up a market narrative, and assemble it all into a Word meeting record. It's repetitive, easy to get wrong, and it eats real working hours every week before the meeting even starts.

Both problems share the same shape underneath: people spending time finding and assembling information rather than using it. That's exactly the kind of work an LLM agent — grounded in the right sources, with deterministic checks where it matters — is good at taking off someone's plate.

System Overview

System architecture — browser → Caddy → FastAPI → LangGraph, ChromaDB, LLM providers and PostgreSQLSystem architecture — browser → Caddy → FastAPI → LangGraph, ChromaDB, LLM providers and PostgreSQL

The system is a single FastAPI backend and a single Next.js frontend serving two modules behind one login:

Both modules share the same FastAPI process, the same JWT-based authentication (with a fine-grained per-user permission flag gating the steel-price module), and the same Next.js bundle — but they're deliberately kept at arm's length: SEARCH has its own database and an explicit import boundary, so the two systems can keep evolving without breaking each other.

In production, everything runs behind PM2, fronted by Caddy, and exposed securely to the company over a Tailscale Funnel HTTPS endpoint.

Technology Choices — and Why

Inside the System: Three Things I'm Proud Of

1. A self-correcting retrieval loop (CRAG). Retrieval doesn't always return good context on the first try — especially with vague or multi-part questions. So after retrieving and merging results, a "grader" model judges whether the retrieved context actually suffices. If not, a "query rewriter" reformulates the question and the system tries again — capped at two retries, so a hard question degrades gracefully into "best effort with a clear answer" instead of spinning forever. That loop is the difference between technically retrieved something and actually answered the question.

CRAG — retriever → context builder → grader → query rewriter, capped at two retriesCRAG — retriever → context builder → grader → query rewriter, capped at two retries

2. Keeping the LLM out of the arithmetic. The steel-price pipeline pulls numbers from several sources, computes week-over-week deltas, derives related grades (e.g. SD420W = SD280 + 1,000), converts currencies, and writes a narrative paragraph around all of it. It's tempting to just ask an LLM to "write up this week's report" — but LLMs are unreliable at exact arithmetic and prone to mixing up absolute values with deltas. So the system splits the work on purpose: the LLM handles search, structured extraction, and prose; every number that must be exactly right is computed in plain Python from validated inputs. Anything missing or unpublished is explicitly marked low-confidence (shown in red in the final Word document for human review), and "borrowed" stale values are displayed but never silently written into the historical price table — so the system never quietly corrupts its own ground truth.

3. One LLM call decides what every turn is. The earlier design was a brittle keyword-plus-LLM hybrid that misclassified short messages and grew an ever-expanding keyword list. Now each turn makes a single structured-output LLM call that picks one of six intents — Q&A, static-form download, static-form fill, dynamic-form generate, form continuation, dynamic-form export — and decides whether retrieval is even needed. Plain Python then guard-rails the result: a form id the model hallucinated, a continuation with no prior form, an export with nothing to export all fall back safely to Q&A, so a confident-but-wrong classification can never derail the rest of the graph.

Unified intent — one structured LLM call, six intents, with code-level guard railsUnified intent — one structured LLM call, six intents, with code-level guard rails

Challenges Along the Way

A few of the harder problems, and what they taught me:

Results & Impact

Reflections & What's Next

Building two related-but-independent modules inside one shared process taught me more about boundaries than about RAG or LangGraph specifically — explicit import rules, separate databases, and "no cross-imports except through one mount point" turned what could have been a tangled merge into something I could reason about and roll back safely.

The SQLite incident was a hard lesson, but a useful one: a database choice that's perfect for "ship something that works" can become the wrong choice the moment you have real concurrent users and real schema migrations. Migrating the whole system to PostgreSQL (now done — three databases on one server) removed an entire category of Windows/SQLite-specific failure modes I'd hit more than once.

Image input is now live in the knowledge assistant — a vision-capable model in the loop, so staff can upload a site photo, a scanned form, or a diagram and ask questions grounded in that image, with multi-turn continuity so follow-up questions still know which image you meant. The trickiest design decision — dropping the carried-over image the moment a new document is uploaded, so "summarize this file" isn't hijacked by a stale photo — came straight out of a real bug.

Longer-term, I want to keep extending the steel-price source adapters as external sites inevitably change shape again — and keep applying the idea this project has now validated twice: don't make people assemble information by hand when an agent, grounded in the right sources and backed by deterministic checks where correctness matters, can do it for them.

Construction Knowledge RAG & Steel Procurement Meeting Automation — Tsai Cheng-Hung