Salesforce AI Lab Blog Photography About Contact

AI Lab

Building at the intersection of large language models, enterprise data, and real-world utility. Less hype, more shipping.

200+
Research papers read across NLP, retrieval, and agent architectures
12
Models fine-tuned or prompt-engineered for production workloads
96.4%
Avg. extraction accuracy across document intelligence pipelines
Featured Experiment 01

The best AI systems disappear into the workflow. If the user has to think about the model, you have already failed at the design level.

Operating philosophy for every experiment in this lab
All Experiments 02
Agents 2025

Agentic Workflow Orchestrator

Multi-agent system using Claude and tool-use to plan, execute, and self-correct complex business processes. Agents negotiate task allocation, escalate edge cases, and produce audit trails for every decision made.

Explore
RAG 2025

Context-Aware Code Assistant

RAG-powered code review tool that ingests an entire repository, maps dependency graphs, and provides architecture-aware suggestions. Uses hybrid search with BM25 + dense embeddings for precise retrieval.

Explore
Vision 2024

Visual Scene Understanding

Multimodal pipeline combining GPT-4V and Claude for architectural photo analysis. Extracts spatial relationships, material identification, and design-style classification with structured JSON output.

Explore
Fine-tuning 2025

Domain-Specific Summarizer

Fine-tuned Mistral 7B on 15K legal and compliance documents to generate executive summaries that preserve critical clauses. Outperforms zero-shot GPT-4 on domain-specific ROUGE-L by 18%.

Explore
Evals 2025

LLM Evaluation Harness

Custom evaluation framework for comparing LLM outputs across accuracy, latency, cost, and hallucination rate. Runs head-to-head benchmarks with human-in-the-loop scoring and automated regression detection.

Explore
Infra 2024

Prompt Versioning & Observability

Git-like version control for prompt templates with A/B testing, cost tracking, and latency monitoring. Integrates with LangSmith and custom dashboards to catch regressions before they reach production.

Explore
Current Focus 03

My research interests sit at the boundary where large language models meet messy, real-world enterprise data. I am particularly drawn to problems where off-the-shelf solutions fall short and custom pipelines are the only path to production-grade reliability. Three threads I keep pulling on:

Retrieval-Augmented Generation at scale. Most RAG demos work on a handful of documents. I focus on what breaks when you point the same architecture at 50,000 PDFs with inconsistent formatting, mixed languages, and no clean metadata. Chunking strategy, re-ranking, and hybrid search become the real engineering challenges.

Agentic systems with guardrails. Autonomous agents are powerful but brittle. My work emphasizes structured tool-use, explicit reasoning traces, and human-in-the-loop checkpoints that let agents operate in regulated environments like financial services and healthcare without sacrificing auditability.

Evaluation-driven development. You cannot improve what you cannot measure. I build custom evaluation harnesses before writing the first line of application code, establishing baselines against which every prompt revision and architecture change is tested.

Tech Stack 04
Python Claude API LangChain LangGraph LangSmith OpenAI API Pinecone ChromaDB Hugging Face PyTorch FastAPI Pydantic Docker AWS Bedrock Salesforce APIs PostgreSQL Redis Streamlit Jupyter Git

Now Exploring

Salesforce + AI Agents: autonomous case routing and field auto-population via Claude tool-use In Progress
Voice-Driven Interfaces: real-time speech-to-action pipelines with Whisper and function calling Prototype
Multi-Modal RAG: ingesting images, tables, and charts alongside text for richer retrieval In Progress
SynthDoc v2: adding layout-aware table extraction and cross-document entity resolution Shipped
Guardrail Framework: content filtering, PII redaction, and output validation for regulated industries Prototype

Interested in AI collaboration?

Get in Touch