Agent Engineering Lab

Agent Engineering LabField Notes, Signal, Recipes, and Lab Reports from the agentic AI publication by Sunil Prakash.https://agenticlab.sunilprakash.com/en-usFN-004: Loop engineering is a 30-year-old loop with a new hashtaghttps://agenticlab.sunilprakash.com/field-notes/004-loop-engineering-30-year-old-loop/https://agenticlab.sunilprakash.com/field-notes/004-loop-engineering-30-year-old-loop/Loop engineering renamed a three-decade-old idea. The stop condition it treats as a detail was named unsolved in 1995, the self-verification it assumes is contradicted by the evidence, and nobody connects loops that run while you sleep to the credential gap that deletes production databases.Thu, 25 Jun 2026 00:00:00 GMTField NotesSG-003: Multi-agent is not an architecture decision. It is a workload decision.https://agenticlab.sunilprakash.com/signal/003-multi-agent-decision-variable/https://agenticlab.sunilprakash.com/signal/003-multi-agent-decision-variable/Anthropic and Cognition published seemingly opposite conclusions on multi-agent systems within a day of each other in June 2025. Both are right, because they measured different workload shapes. The decision is not multi-agent yes or no. It is whether the work requires decomposition a router cannot encode. This Signal gives the decision rule, the primary sources on both sides, and a first-party 100-query experiment measuring the cost of getting it wrong.Sun, 07 Jun 2026 00:00:00 GMTSignalFN-003: Anthropic just told you the harness is the producthttps://agenticlab.sunilprakash.com/field-notes/003-anthropic-told-you-harness-is-product/https://agenticlab.sunilprakash.com/field-notes/003-anthropic-told-you-harness-is-product/Anthropic's recent moves, an earlier independent essay, and a late-2025 preprint say the same thing. The unit of evaluation is harness plus model plus task.Tue, 26 May 2026 00:00:00 GMTField NotesSG-002: Long context did not kill RAG. It changed what retrieval is for.https://agenticlab.sunilprakash.com/signal/002-when-long-context-replaces-rag/https://agenticlab.sunilprakash.com/signal/002-when-long-context-replaces-rag/Long context replaced lookup, not retrieval. Production architecture turns on whether the system needs to select, cite, govern, or update evidence. This Signal gives the routing rule, the academic literature behind it, and the control-plane reframe that follows.Sat, 23 May 2026 00:00:00 GMTSignalLAB-001: Multi-agent vs router on 100 customer-support querieshttps://agenticlab.sunilprakash.com/labs/multi-agent-vs-router-100-queries/https://agenticlab.sunilprakash.com/labs/multi-agent-vs-router-100-queries/100 real support queries from a public dataset, run twice — once through a single-agent workflow router, once through a 3-agent hierarchical multi-agent system. Measured against the same eval rubric.Wed, 20 May 2026 00:00:00 GMTLabsFN-002: Your eval rubric needs failure buckets, not just scoreshttps://agenticlab.sunilprakash.com/field-notes/002-eval-rubric-failure-buckets/https://agenticlab.sunilprakash.com/field-notes/002-eval-rubric-failure-buckets/Two agents can score identically on a benchmark and fail differently in production. Failure buckets are the rubric. The aggregate is what falls out of it.Sun, 17 May 2026 00:00:00 GMTField NotesFN-001: The multi-agent papers nobody is citinghttps://agenticlab.sunilprakash.com/field-notes/001-multi-agent-papers-nobody-cites/https://agenticlab.sunilprakash.com/field-notes/001-multi-agent-papers-nobody-cites/Three papers and one practitioner survey put multi-agent failure rates between 41 and 86.7 percent and error amplification up to 17.2x. The version that survived production looks more like a CI pipeline than a faculty meeting.Thu, 14 May 2026 00:00:00 GMTField NotesSG-001: The skills.md discourse is sharpening the wrong knifehttps://agenticlab.sunilprakash.com/signal/001-skills-md-wrong-knife/https://agenticlab.sunilprakash.com/signal/001-skills-md-wrong-knife/Most skills.md advice in circulation is optimizing the wrong surface. Anthropic's docs name the real constraint: a 1% context-window budget for descriptions that quietly trims the labels of skills you invoke least. The marketplace economy is a retrieval anti-pattern.Thu, 14 May 2026 00:00:00 GMTSignalR-001: Build and deploy your first Strands agent on AgentCore Runtimehttps://agenticlab.sunilprakash.com/recipes/001-strands-on-agentcore-runtime/https://agenticlab.sunilprakash.com/recipes/001-strands-on-agentcore-runtime/Build and deploy a working Strands agent on AWS Bedrock AgentCore Runtime in an afternoon. Uses the modern @aws/agentcore CLI (npm) on a current AWS account, not the deprecated Python starter toolkit. Verified on a live deploy.Wed, 13 May 2026 00:00:00 GMTRecipes