Field notes on building AI systems
Hi, I'm Anubhav Anand — I build production AI systems and write about how they actually work.
Full Stack AI/ML Engineer focused on RAG pipelines, LLM fine-tuning, and agent frameworks that hold up at scale. Currently building at Publicis Sapient; previously at Gesund.ai and Spritle.
Outside of work, I also…
- Write about RAG, agents, MCP, fine-tuning, and the fundamentals of generative AI — that's this blog.
- Build the full pipeline end-to-end — from ingestion, chunking, and embeddings to retrieval, reranking, and autoscaling Kubernetes serving.
- Take applied AI products from prototype to production — agentic assistants, RAG systems, and multi-agent workflows.
- Contribute to open source — merged work across promptfoo, deepset-ai/Haystack, WordPress/ai, Supabase, LibreChat, and Arize Phoenix.
- Share notes on LangGraph, LlamaIndex, and whatever I'm figuring out in the field.
The cheapest, fastest token is the one you never generate.
Using one language model to grade another feels like asking the fox to audit the henhouse and trust
Your model scores 89 on MMLU.
"A picture is worth a thousand words" is wrong by about an order of magnitude.
Hallucination is not the model malfunctioning.
There are two ways to get JSON out of a language model.
You bought a model with a giant context window.
Most explanations of the transformer open with a wall of matrices.
A language model does not write a sentence.
Same GPU, same model, same LoRA config — and the run finishes in a third of the time using most of t
Nobody demos the data cleaning.
You proved the task is solvable.
DPO answered the common case.
For a couple of years, teaching a model to prefer good answers over bad ones meant running three mod
Four methods, one question: when you sit down to fine-tune, which do you reach for?
Try to full-fine-tune an 8B model on a single 24 GB consumer card and you won't get to the first tra
A 7-billion-parameter model has 7 billion knobs.
Most fine-tuning projects should have stayed a prompt.
If you only learn one distinction about the protocol landscape, make it this one: **MCP connects an
Give a filesystem server your home directory and you've handed a language model your SSH keys, your
A server that runs as a subprocess on your laptop never has a scaling problem.
Most introductions to MCP stop at tools, resources, and prompts and call it a day.
A protocol that lets a language model run tools on your machine is a loaded gun pointed at your file
"Should I use MCP or function calling?
We built a server last time.
The fastest way to understand a protocol is to make something speak it.
Every couple of years something shows up wearing the word "standard" and promising to end integratio
# Exploration and Discovery Most agents take the obvious path every time. That'…
# Prioritization Give an agent one task and it does it. Give it three, and you…
# Evaluation and Monitoring "It works." On whose machine, against which inputs,…
# Guardrails and Safety Patterns A support agent at a company I won't name had…
# Reasoning Techniques Three years ago, getting a model to reason meant trickin…
# Resource-Aware Optimization The first month, the agent cost $40 to run. It wa…
# Inter-Agent Communication (A2A) For a while in 2025, every team building a "m…
# Human-in-the-Loop Autonomy isn't a switch. It's a dial, and the whole craft i…
# Exception Handling and Recovery The agent booked the flight. Then the hotel A…
# Goal Setting and Monitoring "Be helpful." Try writing a test for that. You c…
# Learning and Adaptation Here's a thing that sounds like heresy: most agents p…
# Memory Management You tell the agent your name on Monday. By Wednesday it ask…
Most multi-agent systems are a meeting that should have been an email.
You hand a planning agent the *what* — "organize the team offsite, budget's $8k, twelve people, some
A language model is a brain in a sealed jar.
There's a demo that always lands.
Do the arithmetic on a slow agent and the answer is almost always the same: it's slow because it's w
A chain assumes you already know the path.
Give one prompt five jobs and it will quietly do four.
A support agent escalated a ticket because the assistant kept telling customers about a discount tha
Every time a model ships with a bigger context window, the same headline returns: *RAG is dead.
A team swaps in a new reranker and declares the RAG system "better.
Eight posts in, we've built up an arsenal: hybrid search, reranking, HyDE, multi-query, the agentic
Ask a normal RAG system "what are the major themes across these 400 board meeting transcripts?
Every pipeline in this series so far has been a conveyor belt.
"why is it slow" That's a real query a real user typed into a real RAG system.…
Most accuracy improvements in RAG cost you something painful — a new index, a bigger model, a re-arc
Dense vector search, the thing this whole series has been building on, has a stupid failure mode: it
An embedding is not a summary of meaning.
Pick a chunk size of 500 tokens and you've made a decision worth more than your choice of embedding
The first RAG demo always works.