AI Lab

Featured Experiment 01

LLM · Document Intelligence · RAG · 2025

SynthDoc: Turning Messy PDFs into Structured Knowledge

Enterprise documents are chaotic: scanned invoices with inconsistent layouts, regulatory filings buried in legalese, multi-column research papers with nested tables. SynthDoc is an LLM-powered extraction pipeline that transforms these unstructured documents into queryable, structured data. It chains Claude for semantic understanding with a custom layout parser for spatial reasoning, feeding results into Pinecone for vector-based retrieval. The system processes 10,000+ pages per run with 96.4% extraction accuracy on benchmark datasets, handling edge cases like rotated text, merged cells, and handwritten annotations through a multi-pass verification loop. Currently deployed in a Salesforce-integrated workflow where extracted contract data auto-populates opportunity records.

Explore project →

The best AI systems disappear into the workflow. If the user has to think about the model, you have already failed at the design level.

Operating philosophy for every experiment in this lab

All Experiments 02

Agents 2025

Agentic Workflow Orchestrator

Multi-agent system using Claude and tool-use to plan, execute, and self-correct complex business processes. Agents negotiate task allocation, escalate edge cases, and produce audit trails for every decision made.

Explore →

RAG 2025

Context-Aware Code Assistant

RAG-powered code review tool that ingests an entire repository, maps dependency graphs, and provides architecture-aware suggestions. Uses hybrid search with BM25 + dense embeddings for precise retrieval.

Explore →

Vision 2024

Visual Scene Understanding

Multimodal pipeline combining GPT-4V and Claude for architectural photo analysis. Extracts spatial relationships, material identification, and design-style classification with structured JSON output.

Explore →

Fine-tuning 2025

Domain-Specific Summarizer

Fine-tuned Mistral 7B on 15K legal and compliance documents to generate executive summaries that preserve critical clauses. Outperforms zero-shot GPT-4 on domain-specific ROUGE-L by 18%.

Explore →

Evals 2025

LLM Evaluation Harness

Custom evaluation framework for comparing LLM outputs across accuracy, latency, cost, and hallucination rate. Runs head-to-head benchmarks with human-in-the-loop scoring and automated regression detection.

Explore →

Infra 2024

Prompt Versioning & Observability

Git-like version control for prompt templates with A/B testing, cost tracking, and latency monitoring. Integrates with LangSmith and custom dashboards to catch regressions before they reach production.

Explore →

Current Focus 03

My research interests sit at the boundary where large language models meet messy, real-world enterprise data. I am particularly drawn to problems where off-the-shelf solutions fall short and custom pipelines are the only path to production-grade reliability. Three threads I keep pulling on:

Retrieval-Augmented Generation at scale. Most RAG demos work on a handful of documents. I focus on what breaks when you point the same architecture at 50,000 PDFs with inconsistent formatting, mixed languages, and no clean metadata. Chunking strategy, re-ranking, and hybrid search become the real engineering challenges.

Agentic systems with guardrails. Autonomous agents are powerful but brittle. My work emphasizes structured tool-use, explicit reasoning traces, and human-in-the-loop checkpoints that let agents operate in regulated environments like financial services and healthcare without sacrificing auditability.

Evaluation-driven development. You cannot improve what you cannot measure. I build custom evaluation harnesses before writing the first line of application code, establishing baselines against which every prompt revision and architecture change is tested.

Tech Stack 04

Python Claude API LangChain LangGraph LangSmith OpenAI API Pinecone ChromaDB Hugging Face PyTorch FastAPI Pydantic Docker AWS Bedrock Salesforce APIs PostgreSQL Redis Streamlit Jupyter Git

Now Exploring

Salesforce + AI Agents: autonomous case routing and field auto-population via Claude tool-use In Progress

Voice-Driven Interfaces: real-time speech-to-action pipelines with Whisper and function calling Prototype

Multi-Modal RAG: ingesting images, tables, and charts alongside text for richer retrieval In Progress

SynthDoc v2: adding layout-aware table extraction and cross-document entity resolution Shipped

Guardrail Framework: content filtering, PII redaction, and output validation for regulated industries Prototype

SynthDoc: Turning Messy PDFs into Structured Knowledge

Agentic Workflow Orchestrator

Context-Aware Code Assistant

Visual Scene Understanding

Domain-Specific Summarizer

LLM Evaluation Harness

Prompt Versioning & Observability

Now Exploring

Interested in AI collaboration?