The knowledge existed. That was the problem. It lived in documents nobody had indexed, tickets nobody had time to read, and people who were already stretched thin answering the same questions they'd answered a hundred times before. Every customer query landed on a human who had to go find the answer somewhere. Resolution times were slow not because the team was slow — because the information wasn't where it needed to be when it needed to be there.
What I Built
A Claude-powered retrieval layer that became the customer-facing AI system:
Pinecone vector database — every document, ticket, and knowledge asset indexed and chunked with precision. Milliseconds from query to answer.
Chunk-level RBAC — sales queries sales knowledge. HR queries HR knowledge. Zero cross-partition leakage, enforced at the vector level before retrieval, not after.
Multi-turn memory via Claude API — the system held context across an exchange. Follow-up questions worked like talking to someone who was paying attention.
Custom evaluation framework — I built an eval pipeline measuring accuracy, relevance, and hallucination rates against 500+ real queries before a single user touched the system. I didn't ship and guess.
Zero-trust retrieval — prompt injection guards from day one. Every query verified at the source.
Architecture Decisions
No SDK. Claude API plus Pinecone plus custom middleware, built from scratch so every layer was auditable and controllable.
RBAC at the vector level, not post-retrieval — because filtering after the fact isn't a security model, it's a liability.
The eval framework ran before any user touched the system, not after the first complaint came in.
Results
Resolution time dropped 60%
Architecture to production in weeks, solo
No security incidents since deployment
The system is still running with minimal intervention — which is the only honest test of whether infrastructure actually worked
Claude APIPineconeRAGn8nTypeScriptPythonRBACPrompt Engineering