No hype, no 'thought leadership'. The failures, the fixes, and what the architecture actually looks like after it survives contact with a real environment.
VRAM fit, quantization impact, multi-node thresholds, egress cost, and real TCO — the five calculations that determine whether a GPU deployment actually works.
Four bugs found while hardening a RAG system for FSI and life sciences — and what they reveal about the gap between a working system and a trustworthy one.
The vendor numbers are real. The benchmarks are valid. And the procurement question still does not have a clean answer — here is the gap I kept running into.
Five transport-layer decisions — session pooling, eviction, cancel scope isolation, timeouts, and heartbeat design — each driven by a real failure in a KYC onboarding system.
Bearer token auth at the transport layer, correlation IDs across four servers, lazy session init, and clean shutdown — the system-level decisions that make an MCP client deployable.
Most professional knowledge lives in people's heads. Here's what it looks like when you structure it as an agentic system — personas, tools, skills, rules, and memory.
MCP was designed as an LLM-to-tool protocol. I used it as a service-to-service layer between a LangGraph orchestrator and independently deployable integration servers. It worked — with real tradeoffs.
Why ReAct agents struggle in production, why deterministic orchestration (LangGraph, Temporal) is the pattern that ships in regulated workflows (KYC, lending), and why the auditability argument outlives model improvements.