Engineering Contracts for Agentic AI: The New Standard for Software Development

How spec-driven engineering turns AI agents into accountable software collaborators

May 20, 2026

AI coding has entered a new phase.

The first phase was autocomplete. Developers used AI to speed up syntax, boilerplate, and routine suggestions.

The second phase was AI-assisted creative flow. In my earlier piece on Vibe Coding, I argued that this style can unlock fast experimentation in the hands of skilled builders, but it becomes dangerous when speed replaces architecture, testing, security, and accountability. The risk was never creativity. The risk was handing too much control to AI without understanding what the code was doing, how it would fail, or whether it could survive real production pressure.

The third phase was agentic coding. In Adapt or Obsolete, I argued that modern developers are no longer hired only to write code. They are expected to command AI tools, manage intelligent agents, maintain context across systems, and use platforms such as Claude Code, Cursor, Windsurf, Copilot, and MCP-enabled workflows responsibly. AI fluency is no longer an advanced skill. It is becoming the baseline.

Now we are entering the fourth phase: spec-driven agentic engineering.

This phase is not defined by bigger prompts or more autonomous tools. It is defined by engineering contracts.

A prompt tells an agent what to do now.

A spec tells the team what must remain true across planning, implementation, validation, review, deployment, and maintenance.

That distinction matters because coding agents have become powerful enough to create real risk. They can inspect repositories, edit multiple files, run commands, generate branches, fix tests, open pull requests, and work in the background. GitHub Copilot cloud agent, for example, can research a repository, create an implementation plan, make code changes on a branch, and prepare work for review. OpenAI Codex can read, edit, and run code in cloud environments and work on background tasks in parallel. Claude Code reads codebases, makes changes across files, runs tests, and delivers committed code.

The old question was: Can AI write code?

The new question is: Can AI execute engineering intent reliably, safely, and reviewably?

That is the real problem spec-driven agentic coding solves.

Why the Conversation Has Changed

In late 2025, spec-driven agentic coding still felt like an emerging method. Today, it is becoming a platform pattern.

GitHub Spec Kit provides an open-source toolkit for bringing spec-driven development into coding agent workflows, including GitHub Copilot, Claude Code, and Gemini CLI. AWS Kiro explicitly turns prompts into detailed specs, then into working code, documentation, and tests. Kiro’s spec workflow breaks work into requirements, design docs, and implementation tasks. Google Antigravity frames agentic development around artifacts such as task lists, implementation plans, screenshots, walkthroughs, and browser recordings, making agent output easier to verify.

At the same time, the agent ecosystem is becoming more standardized. MCP gives AI applications a standard way to connect to external tools, data sources, and context. A2A gives agents a way to communicate and collaborate across vendors and frameworks. AGENTS.md is emerging as a common instruction file for coding agents, with OpenAI Codex and GitHub Copilot both supporting it as a way to guide how agents understand, build, test, and validate changes.

This is not a minor tooling update. It is a structural shift in how software engineering work is being delegated.

The industry is moving from:

Prompt -> Code -> Hope

to:

Intent -> Spec -> Plan -> Code -> Validate -> Review -> Learn

That shift is overdue.

From “Prompt → Code → Hope” to accountable AI engineering workflows.

The Core Argument: Specs Are the Control Plane

Spec-driven agentic AI coding is the practice of using written, structured, reviewable artifacts to guide autonomous or semi-autonomous coding agents.

A serious workflow does not begin with:

Build this feature.

It begins with:

PRD -> Requirements -> Design -> Tasks -> Implementation -> Validation -> Review

Each artifact answers a different question.

A PRD defines the product problem, users, scope, success metrics, and acceptance criteria.

Requirements define what must be true.

Design defines how the system should satisfy those requirements.

Tasks break the design into executable units of work.

Implementation prompts tell the coding agent exactly what to execute.

Validation proves that the implementation satisfies the requirements.

Review checks maintainability, security, operational risk, and alignment with the original intent.

This is also the model behind SpecRoute, an open-source framework from Enovatr Labs for spec-driven agentic software engineering. The framework treats agentic development as a chain of reviewable artifacts:

PRD -> Spec -> Implementation -> Validation -> Review

The point is traceability. PRD acceptance criteria map to requirement IDs. Requirement IDs map to design decisions. Design decisions map to tasks. Tasks map to code and tests. Tests map to validation results. Reviewers can trace the final implementation back to the original product intent.

That is what AI coding needs now.

Not just better prompts.

Better contracts.

Why Prompt-Driven Coding Breaks Down

Prompt-driven coding works well for prototypes, isolated functions, exploratory scripts, and quick scaffolding. It becomes fragile when the work involves architecture, security, product nuance, cross-file changes, data models, legacy systems, regulatory constraints, or non-trivial acceptance criteria.

A prompt such as:

Build user onboarding.

is not an engineering contract. It is an invitation for the model to guess.

The agent must infer:

Who the user is
What onboarding means
Which product state matters
What data should be collected
Which systems should be updated
Which errors should be handled
Which tests prove completion
Which files should not change

That is not autonomy. That is unmanaged ambiguity.

A realistic failure looks like this: a coding agent is asked to “add onboarding” and notices that authentication logic is already connected to the signup flow. Because the prompt never defined scope boundaries, the agent modifies the core authentication file, adds a new onboarding state to the user model, changes a redirect rule, updates session behavior, and generates a passing local test for the new flow.

The code appears to work.

But the agent has quietly changed behavior for existing users. Password reset now routes through onboarding. Admin-created accounts inherit the wrong state. A downstream analytics event breaks because the user lifecycle changed without being documented. The pull request looks productive, but the implementation solved a broader problem than the team actually approved.

This is how agentic coding fails in production. Not because the model cannot write code, but because the task was under-specified.

The same problem appears in chaotic multi-agent workflows. One agent updates the backend. Another modifies the frontend. A third writes tests against assumptions that neither implementation actually satisfies. Without a shared specification, the agents do not collaborate. They multiply ambiguity.

This was the central warning in my earlier writing on vibe coding. Creative AI-assisted development is powerful when the developer understands the system. It becomes dangerous when the developer stops acting as architect, tester, and guardian of quality.

Spec-driven engineering does not reject creative flow. It matures it.

Vibe coding is useful for exploration. Spec-driven engineering is necessary for production.

The Spec Generation Loop

There is one fair objection to spec-driven development: writing specs is hard.

Most developers do not resist specs because they dislike quality. They resist specs because many organizations turned specification writing into slow documentation theater. Long documents were written upfront, ignored during implementation, and abandoned once code shipped.

Spec-driven agentic development must not repeat that mistake.

The better pattern is the Spec Generation Loop.

In this loop, the agent helps generate the spec before it writes the code.

A planner agent can interview the user, extract requirements, identify missing constraints, propose acceptance criteria, and produce the first draft of the PRD, requirements, design, and tasks. The human then reviews, corrects, and approves the artifact before implementation begins.

The loop looks like this:

Human intent
-> Planner agent drafts PRD
-> Human reviews scope and success criteria
-> Planner agent drafts requirements
-> Architecture agent drafts design
-> Task agent decomposes work
-> Human approves the implementation plan
-> Coding agent executes
-> Validation agent checks results
-> Human reviews final diff

This makes the methodology adoptable.

The best version of spec-driven development does not ask every engineer to become a documentation specialist overnight. It gives engineers an AI-assisted way to turn intent into durable artifacts.

Kiro reflects this pattern by structuring work around requirements.md, design.md, and tasks.md, and its documentation says specs help teams break down requirements into user stories, build design docs with sequence diagrams and architecture plans, track implementation progress, and collaborate between product and engineering teams.

The practical lesson is simple:

Do not ask agents to code first. Ask them to clarify first.

What an AI-Ready Spec Looks Like

An AI-ready spec is not a 40-page document. It is a structured contract that an agent can read, execute, and validate.

A practical feature spec might look like this:

spec_id: feature-user-email-opt-in
version: 1.0
summary: Add email opt-in support during signup

prd_reference: prd-user-communications-v2

requirements:
  - id: REQ-001
    statement: Users must be able to opt in to product emails during signup.
    priority: must

  - id: REQ-002
    statement: The opt-in value must be stored as a boolean on the user profile.
    priority: must

  - id: REQ-003
    statement: The default opt-in state must be false.
    priority: must

  - id: REQ-004
    statement: A welcome email must be sent only when opt-in is true.
    priority: should

design_constraints:
  - Use existing signup form components.
  - Do not introduce a new email provider.
  - Do not change existing authentication behavior.
  - Store email send events in email_logs.

tasks:
  - id: TASK-001
    requirement_refs: [REQ-001, REQ-003]
    instruction: Add opt-in checkbox to signup UI using existing form components.

  - id: TASK-002
    requirement_refs: [REQ-002]
    instruction: Add opted_in boolean field to user profile persistence layer.

  - id: TASK-003
    requirement_refs: [REQ-004]
    instruction: Trigger welcome email only when opted_in is true.

validation:
  - REQ-001 must have UI test coverage.
  - REQ-002 must have persistence test coverage.
  - REQ-003 must have default-state test coverage.
  - REQ-004 must have email behavior test coverage.

review_gates:
  - No unrelated files changed.
  - No authentication behavior changed.
  - All tests must pass.
  - Reviewer must confirm requirement-to-test mapping.

This is not bureaucracy. It is executable clarity.

The agent knows what to build. The reviewer knows what to check. The team knows what changed. The future maintainer knows why the change exists.

The Spec Triplet: Requirements, Design, Tasks

One of the strongest patterns in SpecRoute is the spec triplet:

requirements.md
design.md
tasks.md

The value is separation of concerns.

requirements.md says what must be true.

design.md says how the system should satisfy it.

tasks.md says what work must be done.

SpecRoute keeps those files separate to avoid a common failure mode: mixing product intent, implementation detail, and task planning into one long document. When everything lives in one blob, reviewers must untangle whether a statement is a requirement, a design choice, or an implementation step. The triplet keeps the engineering contract clean.

This matters because agents execute what they are given.

A clean task file gives the agent a scoped unit of work. A requirement file gives it the outcome that must be satisfied. A design file gives it the acceptable architectural path. A validation plan tells it what must be proven.

That combination reduces drift.

The Tooling Landscape in 2026

The tooling landscape is moving quickly, but most platforms now point toward the same pattern: agents need clearer instructions, better context, stronger validation, and more reviewable artifacts.

Windsurf and Google Antigravity belong in the agentic IDE layer because they show how development environments are evolving from passive editors into active engineering workspaces. In a spec-driven workflow, they are strongest when agents are first grounded in a PRD, requirements, design constraints, and scoped tasks. Their value is not in letting the IDE improvise. Their value is in using the IDE as an execution environment for a clear engineering contract. When paired with traceable requirements, validation checks, and review gates, agentic IDEs become controlled workspaces for turning intent into verified software changes.

The important point is not that every team should adopt every tool. The important point is that the ecosystem is converging around the same operating model.

Agents need:

Structured intent
Persistent project context
Defined task boundaries
Validation rules
Human review gates
Traceability from requirement to implementation

This is why spec-driven development is becoming more than a workflow preference. It is becoming the control layer for agentic software engineering.

The Pattern That Works: Bounded Orchestration

The strongest pattern emerging in agentic software development is not the uncontrolled swarm. It is bounded orchestration.

The weak version of agentic coding says:

Launch multiple agents and let them figure it out.

That sounds powerful, but it often fails in practice.

Without a shared specification, agents multiply ambiguity. One agent updates the backend. Another modifies the frontend. A third writes tests against assumptions that neither implementation actually satisfies. The result is not collaboration. It is parallel confusion.
More agents do not automatically create more engineering throughput.

The stronger pattern is different:

Clear role -> Clear artifact -> Clear task -> Clear validation -> Human review

A planner agent drafts the PRD and task breakdown.

An architecture agent checks constraints.

A coding agent implements scoped tasks.

A validation agent maps tests back to requirements.

A review agent checks drift, security, maintainability, and scope control.

A human approves the final change.

This is where spec-driven development becomes essential. It gives every agent a boundary and every human reviewer a stable reference point.

The goal is not to make agents more independent. The goal is to make their work more accountable.

That is why the best agentic workflows increasingly rely on structured artifacts: requirements, design docs, task files, validation reports, review gates, and project instructions. Autonomy without structure creates risk. Autonomy inside an engineering contract creates leverage.

The Spec Complexity Threshold

Not every task needs a full spec.

A typo fix does not need a PRD. A small unit test may not need a design document. A one-line dependency update can use a lightweight task instruction.

The practical question is: when does spec-driven development become worth it?

Here is a simple threshold model.

Tier 1: Prompt only

Use for trivial, low-risk work.

Examples:

Rename a variable
Add a small unit test
Fix a typo
Explain a function
Format a file

Tier 2: Task spec

Use when the work touches code but not architecture.

Examples:

Add a validation rule
Fix a contained bug
Add logging
Update a test suite
Modify one endpoint

Artifact needed:

task.md
acceptance criteria
validation command

Tier 3: Spec triplet

Use when the work affects product behavior or multiple files.

Examples:

Add a feature
Modify data flow
Change UI and backend behavior
Refactor a module
Add integrations

Artifacts needed:

requirements.md
design.md
tasks.md
validation plan

Tier 4: PRD-to-production workflow

Use when the work affects users, business logic, security, compliance, data models, infrastructure, or multiple teams.

Examples:

Launch a new product feature
Migrate a service
Introduce agent workflows
Change authentication
Modify payment, healthcare, legal, or financial logic
Modernize legacy systems

Artifacts needed:

PRD
requirements
design
tasks
risk register
validation report
review gate
rollback plan
operational metrics

This threshold model prevents spec-driven development from becoming ceremony. The goal is not to write more documents. The goal is to match artifact depth to task risk.

What CTOs and Engineering Leaders Should Do Now

The strategic move is not to buy every coding agent.

The strategic move is to standardize the engineering contract.

1. Standardize the artifact chain

Adopt a default chain:

PRD -> Requirements -> Design -> Tasks -> Validation -> Review

This can be implemented with SpecRoute, GitHub Spec Kit, Kiro specs, internal templates, or a custom workflow. The tool matters less than the discipline.

2. Create agent-readable project instructions

Use AGENTS.md or equivalent project instruction files to define setup commands, test commands, coding conventions, security rules, review expectations, and prohibited actions.

Keep them focused. Long instruction files can become noise. The purpose is not to flood the model with rules. The purpose is to give it the right rules at the right time.

3. Build a spec generation workflow

Do not force engineers to write every spec manually.

Use planner agents to draft PRDs, requirements, design options, risks, and task breakdowns. Humans should approve the contract before agents implement it.

4. Require validation mapping

Every non-trivial agent-generated change should answer:

Which requirement does this satisfy?
Which test proves it?
Which design constraint did it follow?
What changed outside scope?
What remains unvalidated?

5. Measure outcomes, not AI activity

Do not measure success by lines of AI-generated code.

Measure:

Pull request acceptance rate
Review cycle time
Regression rate
Test coverage improvement
Incident reduction
Time from issue to validated patch
Percentage of changes mapped to requirements
Amount of rework caused by ambiguous instructions

Agentic coding is not a productivity story unless it improves the quality and speed of shipped software.

Where SpecRoute Fits

The methodology matters more than any single tool.

That said, teams need practical reference models. It is not enough to say “write better specs.” Engineering teams need templates, workflows, prompts, validation structures, review gates, and examples that make the discipline usable.

That is where SpecRoute, an open-source framework from Enovatr Labs, fits.

SpecRoute is not another coding agent. It is not tied to one model, IDE, or vendor. It is an artifact framework for spec-driven agentic software engineering.

Its core workflow is:

PRD -> Spec -> Implementation -> Validation -> Review

The value is traceability.

A PRD defines product intent. Requirements define what must be true. Design explains how the system should satisfy those requirements. Tasks break the design into executable units of work. Validation proves the implementation satisfies the requirements. Review checks quality, scope control, maintainability, security, and operational risk.

SpecRoute’s contribution is the structure around the agent:

PRD
requirements.md
design.md
tasks.md
task prompts
validation plans
review gates
agent definitions
skills
hooks
rules
sample project

That structure matters because the coding agent you use today may not be the one you use next quarter. One team may prefer Claude Code. Another may use Codex. Another may use Gemini CLI, Kiro, Cursor, Windsurf, Copilot, Devin, OpenHands, or Google Antigravity.

Vendors will change. Models will change. Interfaces will change.

The engineering contract should survive those changes.

SpecRoute is useful because it gives teams a vendor-neutral way to make agentic coding portable, reviewable, and auditable.

The simplest experiment is this:

Open the SpecRoute sample project.
Choose a coding agent.
Ask the agent to read the PRD, requirements, design, and task files before touching code.
Assign one scoped task.
Require the agent to map implementation and tests back to requirement IDs.
Run validation.
Review the diff against the spec, not against the agent’s explanation.

A practical starter prompt:

You are working inside the SpecRoute sample project.

Before writing code:
1. Read the PRD.
2. Read requirements.md.
3. Read design.md.
4. Read tasks.md.
5. Select the next incomplete task.
6. Identify the requirement IDs it satisfies.
7. Inspect the existing code.
8. Make the smallest scoped change that satisfies the task.
9. Add or update tests mapped to the referenced requirements.
10. Run validation.
11. Summarize the diff by requirement ID.

Do not modify unrelated files.
Do not invent new requirements.
Do not skip validation.

That single experiment makes the difference clear. The agent behaves less like a guessing machine and more like an engineer executing a scoped plan.

The Strategic Bet

The companies that win with AI coding will not be the ones with the most prompts.

They will be the ones with the best engineering contracts.

Models will continue improving. Context windows will grow. IDEs will become more agentic. Background agents will become normal. Multi-agent workflows will expand through MCP and A2A. But none of that removes the need for clear intent.

In fact, it increases it.

When an agent can change one file, a prompt may be enough.

When an agent can change a system, run commands, open pull requests, invoke tools, coordinate with other agents, and operate in the background, the team needs something stronger than conversation.

It needs specs.

Spec-driven development is not a return to slow documentation. It is the operating discipline for AI-enabled software engineering.

The new software development lifecycle is no longer:

Prompt -> Code -> Hope

It is:

Intent -> Spec -> Plan -> Code -> Validate -> Review -> Learn

Specs should not be dead documents. They should become living memory for humans and agents.

Closing Thought

Agentic AI coding will not become reliable through larger prompts alone.

It needs durable engineering artifacts: PRDs, requirements, designs, task plans, validation reports, review gates, and operational feedback loops.

This is the real shift. The future of software development is not just AI-assisted coding. It is AI-governed engineering.

The developer’s role is not disappearing. It is becoming more architectural, more judgment-driven, and more accountable. The best developers will not merely ask agents to write code. They will define the contracts that agents execute.

That is the new standard.

Vibe coding accelerated exploration.

Agentic coding expanded delegation.

Spec-driven engineering brings discipline.

The next generation of software teams will be judged not by how much code their agents generate, but by how reliably those agents can turn intent into verified, reviewable, production-ready systems.

For teams ready to experiment, SpecRoute offers one practical starting point. Take the sample project, plug it into your preferred coding agent, and compare the result to a normal prompt-driven session.

The difference is immediate.

When the agent has a contract, the work becomes traceable.

When the work is traceable, it becomes reviewable.

When it is reviewable, it becomes engineering.

Glossary

Agentic AI
AI systems that can pursue a goal through multiple steps, including planning, tool use, code changes, testing, and iteration.

Agentic Coding
A software development workflow where AI agents do more than suggest code. They inspect repositories, make changes, run commands, fix errors, and prepare work for human review.

Spec-Driven Development
A development approach where requirements, design, tasks, validation, and review criteria are defined before implementation begins.

Engineering Contract
A structured agreement between humans and agents that defines what must be built, what constraints apply, how success will be measured, and how the result will be reviewed.

PRD, Product Requirements Document
A document that defines the product problem, users, scope, success metrics, constraints, and acceptance criteria.

Requirements
Statements that define what must be true for the system to satisfy the product intent.

Design Document
A document that explains how the system will satisfy the requirements, including architecture, data flow, interfaces, tradeoffs, and constraints.

Task Plan
A breakdown of implementation work into scoped units that an agent or developer can execute.

Validation Mapping
The practice of linking tests, checks, and validation results back to specific requirement IDs.

Review Gate
A checkpoint where a human or review agent verifies quality, security, scope control, maintainability, and requirement alignment before work is accepted.

MCP, Model Context Protocol
An open protocol that connects AI applications to external tools, data sources, and context. MCP is important because coding agents need access to the right project context, not just a prompt. (Model Context Protocol)

A2A, Agent2Agent Protocol
A protocol for enabling agents to communicate and collaborate across different vendors, systems, and frameworks. (Google Developers Blog)

AGENTS.md
A project-level instruction file that tells coding agents how to work inside a repository, including setup commands, test commands, coding conventions, and validation expectations. (OpenAI Developers)

Agentic IDE
A development environment where AI agents can reason across a codebase, plan changes, edit files, run commands, and generate reviewable artifacts.

Spec Triplet
A three-part spec structure consisting of requirements.md, design.md, and tasks.md. This separates what must be true, how the system should satisfy it, and what work must be done.

Spec Generation Loop
A workflow where planner agents help draft PRDs, requirements, design options, task plans, and validation criteria before coding begins.

References

Chika Ihejimba, PhD, “Vibe Coding: Innovation in the Right Hands, Instability in the Wrong Ones,” Decode with Dr. Chika, April 9, 2025. This article establishes the earlier argument that AI-assisted creative coding is powerful, but risky when used without fundamentals, structure, testing, and accountability.
Read the article
Chika Ihejimba, PhD, “Adapt or Obsolete: The New Standard for Developers,” Decode with Dr. Chika, June 19, 2025. This article frames AI fluency, agentic thinking, and command of modern AI development tools as baseline skills for developers.
Read the article
GitHub, Spec Kit Documentation. GitHub’s Spec Kit is one of the clearest public examples of spec-driven development becoming part of the coding agent workflow.
View Spec Kit
GitHub Docs, About GitHub Copilot Cloud Agent. GitHub documents Copilot cloud agent as an autonomous coding agent that can work on repository tasks in the background and prepare changes for review.
Read the documentation
OpenAI Developers, Codex Cloud. OpenAI describes Codex as a coding agent that can work on software tasks in a cloud environment.
Read the documentation
Anthropic, Claude Code. Anthropic describes Claude Code as an agentic coding system that reads codebases, makes changes across files, runs tests, and delivers committed code.
View Claude Code
Model Context Protocol, MCP Specification. MCP defines a standardized way for AI applications to connect with external data sources, tools, and context.
Read the specification
Google Developers Blog, Announcing the Agent2Agent Protocol. A2A introduces a protocol for agent interoperability across systems and vendors.
Read the announcement
OpenAI Developers, Custom Instructions with AGENTS.md. AGENTS.md provides a way to define project-level instructions for coding agents.
Read the guide
Kiro Docs, Specs. Kiro’s spec workflow formalizes development around requirements, design, and tasks.
Read the documentation
Enovatr Labs, SpecRoute. SpecRoute is an open-source framework for spec-driven agentic software engineering. It models the workflow as PRD -> Spec -> Implementation -> Validation -> Review and emphasizes traceability across requirements, tasks, validation, and review.
View SpecRoute

Decode with Dr. Chika

Discussion about this post

Ready for more?