Field Notes — Agent Engineering Lab

FN-004 · 2026-06-25

Loop engineering is a 30-year-old loop with a new hashtag

A two-sentence post got 8.4 million views and the field spent the next week defining it. The definition is a control loop computer science named in 1995, and it quietly hands you back the one part that was always hard.

9 min read →

FN-003 · 2026-05-26

Anthropic just told you the harness is the product

Three Anthropic moves in April and May 2026, an independent essay from January, and a late-2025 preprint converge on the same point. The unit of evaluation is harness plus model plus task, not model.

8 min read →

FN-002 · 2026-05-17

Your eval rubric needs failure buckets, not just scores

Two agents can score identically on a benchmark and fail differently in production. The aggregate is not a rubric. It is what falls out of one.

7 min read →

FN-001 · 2026-05-14

The multi-agent papers nobody is citing

Three studies published since March 2025 put multi-agent failure rates between 41 and 86.7 percent and error amplification up to 17.2x over single-agent baselines. The vendor decks have not caught up.

8 min read →