Agents create and judge. Code enforces.
A reusable, enforcement-first delivery operating system for teams of AI coding agents.
Abstract
Individual developers got faster with AI assistants; teams got a new governance problem. Ambiguous intent scales into code in seconds, reviews drown in volume, and "done" becomes a claim rather than a fact. agents-foundation is an attempt to answer that at the process level: a team of role-based agents that create and judge, a Git-native markdown kanban they execute against, and a two-layer system of deterministic gates that make state transitions and quality invariants non-bypassable. The central design rule — agents judge; code does the bookkeeping and enforces it — is what separates this from prompt-centric methods that still trust a human (or an LLM) to remember the rules. It ships as a Claude Code plugin marketplace, split into an agnostic process layer and swappable stack layers.
An AI agent will happily mark its own homework. It will tick a checkbox it didn't satisfy, move a task to done it didn't finish, and report a green suite that only asserts a mock. None of this is malice — it is the predictable behavior of a system optimized to produce plausible continuations. The question this project asks is narrow and practical: what should an LLM be trusted to do, and what should be taken out of its hands entirely?
The problem: speed without governance
The first wave of AI coding tooling optimized the inner loop — one developer, one prompt, more code per hour. At team scale a different friction appears. Requirements turn into implementation before anyone agrees on scope. Pull requests arrive faster than they can be reviewed. Generated code is locally plausible but globally adrift from intent. And the record of why a system is the way it is evaporates the moment the chat window closes.
The naïve fix — "have the agent follow a checklist" — fails for the same reason the problem exists: the agent is the thing you cannot fully trust to follow the checklist. A method that depends on the model never forgetting a constraint has simply moved the failure, not removed it.
The thesis: judgment vs. bookkeeping
The foundation is built on one distinction, applied everywhere:
Agents create and judge. Deterministic steps apply state, and hooks enforce it.
Creating a plan, implementing a feature, judging whether code meets a spec — these need context, taste, and reasoning. They belong to agents. Ticking the acceptance-criteria boxes, stamping the verdict, moving the task between states, refusing a commit that violates an invariant — these are mechanical. They must never depend on an LLM's discretion, because discretion is exactly what fails silently.
So the reviewer agent returns a structured verdict; a deterministic script applies it (ticks the criteria, stamps the section, moves the file); and a hook refuses to let any task reach done without a recorded approval and every criterion checked. The judgment is the agent's. The bookkeeping is code. The enforcement is not optional.
Anatomy of the foundation
Concretely, installing the foundation into a repository gives it five things: a team of agents, a board, a task contract, a set of rules, and the gates that bind them. The rest of this article walks each one and the reasoning behind it.
Two layers: process and stack
The system is split into two plugins. delivery-team carries how work flows — the role agents, the kanban, the gates, the engineering principles — and knows nothing about any particular framework. stack-turbo-nest-react carries which technology — the implementer agents and the opinionated conventions for one concrete stack (NestJS + React/Turborepo). A project installs the process layer alone, or pairs it with a stack.
| Layer | Plugin | Owns |
|---|---|---|
| Process (agnostic) | delivery-team | kanban workflow, role agents, review discipline, the deterministic gates, engineering principles, ADR + doc + test philosophy |
| Stack (opinionated) | stack-turbo-nest-react | implementer agents (backend/frontend/infra) and their rules; stack-specific gates; the C4 documentation model |
The split is not cosmetic. It is what lets the process be reused: a future stack-go-chi would plug into the same reviewer, board, and bootstrapper without touching a line of the process layer — because the process layer's active components contain no framework specifics.
The role team
The orchestrator (a Delivery Manager persona) reads the board, builds a dependency graph, and dispatches role agents — implementers in parallel, isolated Git worktrees when their work is disjoint, serialized when it isn't. Workers run on a fast model; the reviewer — the judgment that matters most — runs on the strongest one.
flowchart TD DM["Delivery Manager
(orchestrator)"] PL["Planner
PO + Tech Lead"] RV["Reviewer
Quality Gate"] subgraph ENG["Engineering — stack layer"] BE["Backend"] FE["Frontend"] IN["Platform / DevOps"] end DOC["Technical Writer"] QA["QA
(specialist)"] DM --> PL DM --> ENG DM --> DOC DM --> QA DM --> RV PL -. specifies tasks .-> ENG ENG -. implements + unit/e2e .-> RV QA -. load / journeys / resilience .-> RV DOC -. living docs .-> RV RV -. "approve / changes-requested" .-> DM
Two roles deliberately are not agents. Backlog replenishment and verdict application are deterministic skills — procedures run in-context, not reasoning tasks — precisely because they are bookkeeping.
The work kanban: state is a location
Work lives as a markdown kanban in work/, and a task's state is the folder it sits in: backlog → ready → active → review → done. A transition is a git mv; the commit history is the audit trail. There is deliberately no status: field in the task — a field goes stale the moment someone forgets to update it, but a file's location cannot lie about where it is.
Each task is a single file with a standardized shape — Spec (an immutable contract, with acceptance criteria as a checklist), Plan, Todo, Verdict, and Log. The headings and the verdict vocabulary are standardized because automation parses them. The reviewer judges the criteria; it never edits the file.
The gates: two layers of enforcement
The same invariant — no task reaches done without a recorded verdict and every acceptance criterion ticked — is enforced twice, by design:
- Agent-time — a
PreToolUsehook fires the moment an agent tries to move a task intodone/, blocking it early with the reason. - Commit-time — a Git pre-commit hook runs the same validator over the staged change, so the gate holds even outside the agent's context. (Because Git hooks run outside the tool, the bootstrapper copies the validators into the repository — a small, deliberate duplication: anything executed by something other than the agent must live where that something can see it.)
A second gate enforces that a schema migration ships with its data-model documentation, so the diagram of the database can't silently drift behind the schema. That gate is stack-specific, and so it lives in the stack layer — not the agnostic one.
Rule-driven agents, not rule-containing agents
An early version of the reviewer hard-coded a stack's concerns — rate limiting, pagination style, design tokens. That was a leak: the "agnostic" process layer secretly knew about NestJS and React. The fix generalizes a principle worth stating on its own:
A strong convention belongs in a rule, applied to every task — never restated per task, where it can be forgotten.
The reviewer became a rule-interpreter instead of a rule-container: it loads whatever rules are present and treats each one as part of the contract. Install a stack, and its rules join the checklist automatically. Install the process layer alone, and there is no stack noise. The same pattern was applied to the documentation agent and the docs-refresh command. The opinions live in exactly one place — the rules — so an agent and a rule can never disagree.
Abstraction-first planning
One idea was adopted from the research surveyed below: model the domain and the module boundaries before writing code, "otherwise the AI sprints on implementation details while the structure falls apart." But it was adopted the foundation's way — not as a section an author fills in per feature (and forgets), but as a principle the planner applies to every task and the reviewer checks. Every Plan opens by naming the entities, their relationships, and the dependency direction across modules, before any step-by-step. Structure that has drifted from the modeled shape is a review finding even when the behavior works.
Related work: prompt-canvas methods (SPDD)
Structured Prompt-Driven Development (SPDD) shares this project's core conviction: make intent explicit and versioned before code, and keep humans in control through judgment rather than typing. SPDD's instrument is the REASONS Canvas, a seven-part structured prompt treated as a first-class, version-controlled artifact, kept in two-way sync with the code. It is a genuinely good idea, and the abstraction-first discipline above is borrowed from it.
Where this foundation diverges is the axis it optimizes. A canvas optimizes the time axis — one feature's spec lives, syncs, and compounds into the next. This foundation optimizes the control axis — the state of many concurrent tasks is mechanically correct and impossible to skip.
| Dimension | agents-foundation | Prompt-canvas (e.g. SPDD) |
|---|---|---|
| Enforcement | Deterministic & non-bypassable — hooks + scripts refuse a bad done | Process discipline + manual review |
| State tracking | Explicit Git-native kanban, dependency graph, auto-replenish | Tracked implicitly through commits |
| Team / scale | Multi-agent orchestration, parallel worktrees, model tiering | One developer + AI, sequential |
| Standards | Strong rules applied to every task — impossible to forget | Declared per feature — relies on the author remembering |
| Reusability | Installable marketplace: agnostic process + swappable stacks | A single methodology + its CLI |
| Judgment vs. bookkeeping | Split by construction | Largely manual |
The sharpest difference is the standards row. A canvas trusts the author to restate the security, performance, and structure constraints on every feature; people forget, and the gap ships silently. This foundation keeps those as strong rules enforced for all tasks, so a constraint cannot be omitted from one task by accident.
If you want a method a disciplined developer follows, a prompt canvas is excellent. If you want a system that won't let the process be skipped, and that scales to a team of agents, this is the bet.
Design principles, distilled
- Take bookkeeping away from the LLM. If a step is mechanical, a script does it and a hook enforces it.
- Make state un-lie-able. Encode it where it cannot be forgotten — a folder, a commit — not in a field someone updates by hand.
- Conventions are rules, not reminders. Enforce for all; never rely on per-task recall.
- Components read opinions; rules hold them. One source of truth, so an agent and a rule never diverge.
- Model the shape before the detail. Abstraction-first, on every task.
- Choose the form by the work. Agent for judgment, skill for procedure, rule for constraint, hook for the non-bypassable.
Try it
A marketplace is just a Git repository — no registry. Add it, install one or both plugins, and bootstrap a repo:
/plugin marketplace add fcms14/agents-foundation
/plugin install delivery-team@agents-foundation
/plugin install stack-turbo-nest-react@agents-foundation # optional stack layer
/delivery-team:init # scaffold work/, docs/, rules, gates
/delivery-team:task-new <goal> # specify → /delivery-team:task-start → review → apply-verdict
The bootstrapper scaffolds the kanban and an ADR-seeded docs/ tree, materializes the rules, and wires the commit-time gates into whatever hook mechanism the repo uses.
References & further reading
- Patton, J. et al. Structured Prompt-Driven Development. martinfowler.com.
- Nygard, M. Documenting Architecture Decisions — the ADR practice the foundation seeds into every repo.
- Brown, S. The C4 model for visualising software architecture — the documentation model used by the stack layer.
- Anthropic. Claude Code documentation — plugins, marketplaces, hooks, and subagents.
- agents-foundation — the source, the plugins, and the full design notes.