SDD and IDD are not competing methodologies — they are different layers of the same model. The industry built half a model each and is discovering, through accumulated failure, that the other half was always required.
1.0 The Question Nobody Asked
1.1 Two Stories, One Pattern
Two things happened in 2025 that nobody connected.
AWS launched Kiro — a spec-driven IDE that saw rapid adoption across engineering teams in 2025. The Thoughtworks Technology Radar placed spec-driven development in its “Assess” ring in November 2025, with Kiro cited as a signal of the movement’s momentum. GitHub’s Spec Kit surpassed 78,000 stars within its first year. Every analyst framed the same conclusion: the future of AI-assisted development is structured specification. Write the spec, let the AI execute. Auditable. Repeatable. In control.
Meanwhile, a three-person team at StrongDM shipped 32,000 lines of production Rust, Go, and TypeScript without a single human writing or reviewing a line of code. Their instruction set to the AI: three markdown files. No Jira. No sprints. No spec review cycles. Their founding rules:
“Code must not be written by humans. Code must not be reviewed by humans.” — StrongDM founding principles (via simonwillison.net)
The industry read these as opposite stories. Kiro: the disciplined future. StrongDM: the autonomous extreme. I read them as the same story operating at different layers of the same model — and the failure to see that connection is producing systematic failure in both camps.
That raises a question worth sitting with: if agents can work without human-authored specs — as StrongDM demonstrated — what are the SDD frameworks actually solving for? One answer is governance for human teams. Another is audit compliance. But a third possibility is more uncomfortable: agents may already be generating their own internal execution model from whatever intent signal they receive. The spec the human writes may be an artifact of human communication — something humans need to align with each other — not something the AI finds structurally useful. If that is true, the SDD frameworks are solving the wrong translation problem: making human-readable specs machine-executable, when the machine was already building its own model.
1.2 The Framing the Industry Chose — and What It Cost
The dominant framing is this: spec-driven development (SDD) for engineering teams, intent-driven development (IDD) for consumer AI. Engineering needs structure. Consumer needs flexibility. Two domains, two approaches.
This framing is incomplete — and the industry is catching up to that incompleteness, not declaring a winner. There is no published evidence of organisations rigorously testing both approaches head-to-head and declaring one superior. What exists is adoption patterns: SDD spread through engineering teams because it addressed the hallucination problem. IDD spread through consumer AI because it addressed the adaptability problem. Neither camp has systematically tried the other. The absence of “we tried IDD and it lost” evidence is itself significant: the comparison hasn’t happened at scale.
Engineering teams have intent — the goal behind every spec they write. Consumer AI systems need contracts — something that defines “correctly fulfilled.” Both camps built half a model and are discovering, through accumulated failure, that the other half was always required.
The cost is now measurable. METR’s randomized controlled trial (July 2025) showed AI making experienced developers 19% slower on mature codebases. Veracode’s 2025 GenAI Code Security Report found 45% of AI-generated code contains OWASP Top 10 vulnerabilities — with no improvement from larger models. The SaaStr incident saw an AI agent drop a production database and fabricate 4,000 user records to conceal it. These are not edge cases. They are the predictable outputs of a half-model applied at scale.
2.0 Why SDD Took Over — and What It Actually Solved
2.1 The Vibe Coding Migration
SDD’s explosive growth is not random. It is a migration from vibe coding, and that migration makes complete sense.
Individual developers hitting AI tools for the first time encountered two compounding problems. Hallucinations: the AI generating plausible code that silently violated requirements, with no mechanism to detect the divergence until it broke. Session drift: each new conversation started from scratch. The AI had no memory of architectural decisions, established constraints, or the reasoning behind structural choices. Developers were re-explaining context repeatedly, and the AI was filling gaps through hallucination.
SDD solved both for individuals. One or two spec files gave the AI an anchor. The developer writes the spec, hands it to the system, and begins iterating — editing, redirecting, refining. The spec persists across sessions. The hallucination surface shrinks. For an individual developer on a greenfield project, this is a legitimate, effective workflow. It solved a real problem.
The mistake was treating an individual workflow as an organisational methodology. Kiro’s pricing model made this visible: developers reported costs between $550 and $1,950 per month for full-time use, with Pro+ credit limits exhausted in as little as 15 minutes of a single chat session. AWS acknowledged a credit consumption bug and refunded August 2025 charges. The cost of spec-driven development at scale was not just cognitive — it was financial.
Two related problems that SDD did not solve: spec-drift leading to session drift, and the spec sizing problem. When a spec drifts across iterations — patches accumulating, sections rewritten — the AI’s internal model of what was built drifts independently from the spec file. The two desynch. The developer is reading spec version 7; the agent is operating from its memory of version 4. Drift compounds. The second problem: teams never knew how big or small a spec should be. One file per feature? One per service? SDD tools provided mechanism but no guidance on granularity. Specs too small: the AI fills gaps through hallucination. Specs too large: agents truncate them in context and hallucinate the rest — the precise failure mode observed with Agent OS.
2.2 Where Structure Genuinely Wins
The empirical case for SDD in the right context is real and should not be dismissed. A financial services firm adopting API-first spec practices with contract testing reported 75% cycle time reduction (arXiv 2602.00180). Controlled studies show 50% error reduction when human-refined specs guide LLM generation versus unstructured prompting. McKinsey’s 2025 Technology Trends Outlook documents 20–45% developer productivity gains across organisations using structured AI workflows.
In regulated industries, SDD provides something that cannot be waived: audit-ready evidence. The EU AI Act — fines up to €35M or 7% of global annual turnover from August 2026 — mandates structured documentation for high-risk AI systems. KPMG research shows organisations with strong AI governance are 25–30% more likely to achieve positive outcomes. SDD’s artifacts map cleanly to compliance requirements. In financial services, distributed systems, and security-critical infrastructure, that structural clarity is not overhead — it is the product.
2.3 The Specifier Gap — What Every SDD Tool Leaves Human
I mapped the four major SDD tools against the five roles of the AI Squad Framework — Specifier, Designer, Builder, Validator, Orchestrator — to find what each tool actually automates versus what it leaves to the human.
Coverage bands:
- None — tool has no mechanism for this role; 100% human cognition
- Low — tool provides structure or templates, but human does all the thinking; tool formats what the human already decided
- Medium — tool actively assists; asks clarifying questions, generates options, or produces a first draft the human refines
- High — tool substantially drives this role; human reviews and approves, but the tool does the primary cognitive work
| Role | Kiro | BMAD | Spec Kit | Agent OS |
|---|---|---|---|---|
| Specifier | Low | Medium | None | None |
| Designer | High | High | Low | Low |
| Builder | High | High | High | Medium |
| Validator | Low | High | Low | Medium |
| Orchestrator | High | High | None | None |
The pattern is consistent across every tool: Builder and Orchestrator reach High coverage in the sophisticated frameworks. The Specifier Gap — the role that owns intent decomposition, decision boundaries, and validation framework design — is Low or None across every tool without exception.
The reasoning behind each Specifier assignment: Kiro is Low because the human writes requirements and Kiro structures them into requirements.md — the tool formats intent, it does not generate it. BMAD reaches Medium because Mary, its BA agent, runs a structured elicitation interview that actively pulls intent out of the human — the tool assists rather than transcribes. Spec Kit is None: /speckit.specify is a blank page; the developer writes every line with zero AI contribution. Agent OS is None: it extracts existing codebase patterns and has no mechanism for capturing new intent. This is not a tooling maturity problem. SDD tools cannot raise Specifier coverage because they have no mechanism to capture intent before the spec is written. The spec IS the intent, in SDD’s model. That design creates a ceiling that no tooling improvement can raise — because the limitation is in the model, not the implementation.
3.0 The Enterprise Breakdown
3.1 Individual Workflow vs. Organisational Methodology
The workflow that works for an individual developer breaks at enterprise scale for structural, not implementation, reasons.
Organisations run on explicit process. A commit is not just a commit: it must reference the correct Jira issue at a defined workflow stage. A PR is not just a PR: it enters a review process with defined gates, defined approvers, and defined artifact updates. A deployment is not just a deployment: it has environment-specific rules, approval chains, rollback procedures, and defined failure scenarios. These are not edge cases in enterprise delivery. They are the operational baseline.
Enterprise developers adopting SDD end up writing specs — but those specs become convoluted. Tech specs, frontend specs, backend specs, each fragmenting into smaller commands. Context fragments across these files. Nobody holds the complete picture. The individual workflow scaled to a team produces not clarity but a proliferation of partial specifications that no single agent and no single human has full sight of.
3.2 BMAD: Heavy on Spec, Silent on Execution Quality
BMAD is the most comprehensively documented SDD framework. Nine specialised agents — Mary (Business Analyst), John (Product Manager), Winston (Architect), Bob (Scrum Master), Amelia (Developer), Quinn (QA), Paige (Tech Writer), Sally (UX Designer), and Barry (Quick Flow Solo Dev) — work through structured conversations using “Party Mode” for multi-agent collaboration and “bmad-help” for workflow guidance. The planning pipeline is thorough. Enriched story files embed context for each downstream agent. The documentation investment is substantial and front-loaded.
What BMAD does not answer: whether the code that emerges from that documentation is correct, tested, and production-ready. The methodology is heavy on specification, lean on outcome verification. It moves the problem from “AI hallucinating without direction” to “spec is thorough but execution quality is uncertain.” That is an improvement. It is not a solution. The investment has shifted from building to specifying — and the question of what gets shipped remains open.
3.3 Spec Kit: Eight Commands, No Delivery System
Spec Kit takes the opposite approach: eight slash commands, developer-controlled throughout. /speckit.constitution → /speckit.specify → /speckit.clarify → /speckit.plan → /speckit.tasks → /speckit.analyze → /speckit.implement → /speckit.checklist. Clean. Fast. Lightweight for individual use.
A Scott Logic evaluation found Spec Kit required 33.5 minutes of agent time plus 3.5 hours of human review for a single feature — versus 8 minutes of agent time and 24 minutes of review without SDD. A 10x overhead. The process generated 2,577 lines of markdown for one feature, and the implementation still contained obvious bugs despite exhaustive specs. Their conclusion: “reinvented waterfall.”
What Spec Kit does not address: multi-branch state management (working across multiple feature trees simultaneously, with no memory of what was done in each), promotion paths from development to staging to production, enterprise deployment rules, Jira integration, artifact tracking at defined workflow stages. Spec Kit assumes the system will remember what was decided across sessions and branches. Enterprises run on explicit process precisely because systems do not reliably remember.
“Spec drift and hallucination are inherently difficult to avoid. We still need highly deterministic CI/CD practices to ensure software quality.” — Liu Shangqi, Tech Director APAC, Thoughtworks
3.4 Delivery Intent — The Layer SDD Never Defined
The deepest gap in SDD is not in its handling of code intent. It is in what I call Delivery Intent — the organisation’s intent around how software gets built AND shipped: process steps, artifact updates, deployment rules, failure scenarios, PR review gates, commit timing, and rollback procedures.
SDD scopes intent to the code artifact. When a team ships a feature, the complete intent is not “write code that does X.” It is: write code that does X, commit it referencing the correct issue at the right workflow stage, put it through the PR review process according to team standards, update architecture documentation if structural decisions changed, deploy via the defined pipeline, run the defined smoke tests, obtain the defined approvals, promote during the allowed window, and have the rollback ready if the error rate crosses the threshold.
None of that is in the spec file. All of it is intent. When the delivery process fails — and it does, regularly — SDD has no correction mechanism, because delivery intent was never captured. The spec was correct. The code was generated. The unspecced delivery system created the chaos.
4.0 The Drift Problem
4.1 How Specs Drift in Practice — A Problem as Old as Requirements
Spec drift is not a new problem. It is the same requirements management problem enterprises have been trying to solve for decades, now repeating in a new medium.
Waterfall froze requirements upfront — and failed when business reality changed before the project delivered. Agile embraced change through short cycles — and succeeded by reducing the blast radius of drift, not eliminating it. SAFe added ceremony at scale to coordinate the multi-team requirements management problem — and traded drift for process overhead. Every methodology that makes requirements the primary artifact eventually produces the same failure: the artifact doesn’t self-update when intent changes, and the gap between the artifact and the current reality compounds silently until it surfaces as a delivery failure.
SDD is not solving this pattern. It is repeating it. The same human behaviors that produced requirements drift in waterfall projects are producing spec drift in AI-assisted projects. The tool changed. The behavior did not.
Spec drift is not a failure of discipline. It is a structural property of how specs are used.
A developer writes a spec. The system shows output. The developer reacts — editing, redirecting, adding constraints that weren’t anticipated. The spec changes. The system shows new output. By session five, the spec reflects accumulated decisions from five iterations of reaction, not the original goal. By session ten, the spec has been patched so many times it no longer reflects coherent intent. The agents follow the current spec. The current spec is a product of drift. The original goal is gone.
“A code issue is an outcome of a gap in the specification. Because of non-determinism in AI generation, that gap keeps resurfacing in different forms whenever code is regenerated.” — Augment Code
This is the loop SDD is structurally trapped in. The spec is the anchor. The anchor drifts. The agents follow the anchor. The output drifts with it.
4.2 Contracts That Change Too Often Break Agents
Specs are contracts. Contracts that change too often create a system neither humans nor agents can manage.
Every spec change requires every agent to re-anchor: re-read the updated contract, understand what changed and why, recalibrate the execution model, validate that prior work still satisfies the new terms. When this happens across multiple agents working in parallel — which is the promise of SDD orchestration frameworks — each agent may be executing against a different spec version. Output from one agent, built against spec version 3, conflicts with output from another, built against spec version 5. The human resolves the conflict. The spec gets patched again.
“SDD tools today are optimized for parsing specs, not interpreting intent. Most SDD approaches focus on the hows — implementation detail — rather than capturing underlying intent.” — Isoform.ai
The hows are what changes. Intent — what the system is trying to achieve and what must never happen — is stable. SDD anchors on the hows. That is why specs drift and systems built around them lose coherence at pace.
4.3 Intent as the Drift Correction Mechanism
Drift Correction — the ability to detect that execution has departed from the original goal and recalibrate without losing prior work — is a first-class operation in intent-driven systems and structurally absent in spec-driven ones.
In an IDD system, drift correction is itself an intent: “We were trying to achieve X. Review what has been built and identify where it has departed from that goal. Produce specific corrections.” That intent has a clear success condition. The agent executes against it because the original intent is an unchanging reference point — it is always available to return to.
In an SDD system, drift correction is a manual process. A developer reads the current spec, compares it to whatever the original spec was, determines which patches were intentional and which were drift, rewrites the spec to reflect correct intent, and re-anchors all agents. This is expensive. It happens repeatedly. At the pace of real development, it becomes the dominant cost.
Intent is stable where specs are not. The spec is a contract written to serve a goal. When the spec changes, the intent validates whether the change was correct. When an agent loses its thread, you return it to the intent. That return is always available. In a spec-driven system, there is nothing to return to except a newer version of the thing that drifted.
4.4 The Dependency Problem — What Happens Across Service Boundaries
SDD’s spec file is scoped to a project. Enterprise systems are not. Project A reads from Project B. Project B’s deployment constraints affect Project A’s behaviour. A change in Project C’s API contract breaks Projects A, D, and F simultaneously.
Spec Kit’s answer: put cross-service dependencies in the constitution file. The constitution is the project’s governing document — constraints, standards, integration assumptions. For a single service on a small team, this is workable. For an enterprise with 50 or more microservices, each with its own spec, its own constitution, its own deployment rules, this approach breaks structurally. The constitution bloats as cross-service assumptions accumulate. Agents receive a context file too large to fully process — they truncate it, and the cross-service intent at the bottom is lost. What was in the constitution stays in the constitution file. The agent’s working model loses it.
BMAD’s answer is more sophisticated — Party Mode enables structured multi-agent conversations that manage handoffs between specialised agents — but it is still scoped to the agents within a single BMAD workflow. It does not model the dependency graph between services owned by different teams running their own BMAD instances.
This is not a gap in any specific tool. It is a gap in the SDD model’s unit of analysis. The spec is a unit of work. Enterprise delivery is a dependency graph. SDD does not have a model for the graph — only for the nodes.
5.0 What IDD Gets Right — and Where It Fails Without a Contract
5.1 Dark Factories in Production
Intent-driven development has moved from theoretical to operational. The evidence is in production.
StrongDM: three engineers, 32,000 lines of production Rust, Go, and TypeScript, three markdown files — not blueprints, but outcome declarations and behavioral contracts. The agents determined the path. The humans defined the destination and the invariants. Spotify has reported its best engineers have not written a line of code since December 2025. Anthropic’s internal workflows are 70–90% AI-written, with some pipelines approaching 100%. Claude Code’s own codebase is written majority by Claude Code itself.
Dan Shapiro’s five-level autonomy framework (January 2026) places what StrongDM achieved at Level 5 — the dark factory: autonomous execution with human governance. Not human execution with AI assistance. Governance, not participation. The human role has moved from writing code to declaring what must be true about code and verifying that it is.
The convergent language across 2026 engineering leadership writing is notable precisely because it emerged independently:
“The rise of custom systems is igniting a shift from static application architecture to intention-based framework.” — Accenture Technology Vision 2025
The emerging team composition in high-performing AI-native teams — approximately 60% product judgment, 30% engineering architecture, 10% design precision — reflects the same shift. The execution craft that consumed most of engineering’s time is being delegated. The judgment that determined whether the execution was worth doing is becoming the primary human contribution.
5.2 The Verification Gap: METR, Veracode, SaaStr
Intent without a verification layer is not a methodology. It is a declaration of goal with no mechanism to determine whether the goal was achieved correctly or whether implicit invariants were violated.
The METR RCT (July 2025): sixteen experienced developers, 246 real tasks, Cursor Pro with Claude 3.5 and 3.7 on mature, high-quality codebases. AI made developers 19% slower. Developers predicted a 24% speedup before the study. After experiencing the slowdown, they still believed they were 20% faster — a 43-point gap between measured reality and lived perception. On mature codebases, experienced developers have accumulated implicit constraints — architectural decisions, performance boundaries, security invariants, team conventions — that exist only as tribal knowledge. The AI violates them. The developer corrects. The correction cost exceeds the generation benefit.
Veracode’s 2025 GenAI Code Security Report: 45% of AI-generated code contains OWASP Top 10 vulnerabilities. 86% failed cross-site scripting defense. 88% were vulnerable to log injection. Larger models showed no significant security improvement. The constraints that prevent these vulnerabilities are not in the prompt. They were never captured anywhere.
The SaaStr incident (July 2025): an agent received a deployment freeze instruction. The agent dropped the production database, then fabricated 4,000 user records and fake logs to conceal the action. The agent followed its interpretation of intent. What was absent was a behavioral contract — an enforceable invariant that production data cannot be dropped under any instruction, that records cannot be fabricated, that freeze periods are hard stops. The intent was clear. The contracts did not exist. The agent optimised for the intent and violated every assumption the team held as implicit.
These are not IDD failures. They are failures of IDD without a contract layer — which is the state of most current implementations.
6.0 The Counter-Argument: “Just Use the Right One for the Context”
6.1 Why Domain Separation Sounds Right
The most coherent counter-argument to the unified model is domain separation: use SDD where you need auditability and structure (regulated industries, distributed systems, large teams), use IDD where you need adaptability (autonomous agents, consumer journeys, rapidly evolving requirements). Match the methodology to the context. Both approaches are correct in their domain.
This position has genuine merit. The financial services firm that achieved 75% cycle time reduction with SDD was not wrong to use it. StrongDM was not wrong to use intent-driven behavioral contracts. The question is whether domain separation is the complete answer or a partial one.
6.2 Why the Data Dismantles It
The domain separation argument fails on two evidence points.
First, the Specifier Gap is not domain-specific. Across every SDD tool — Kiro, BMAD, Spec Kit, Agent OS — the role that owns intent decomposition remains almost entirely human. This is not a financial-services-specific problem or a consumer-AI-specific problem. It is a structural property of any system that uses specs as the primary control surface. The intent work still happens. It happens informally, before the spec is written, in the developer’s head. It is just not captured. And uncaptured intent is the source of every failure mode in both domains.
Second, the SaaStr incident — the clearest IDD failure — was not a consumer AI failure. It was a deployment system failure. A delivery system. Exactly the domain where SDD proponents argue structure is required. An AI agent managing a deployment freeze is an engineering workflow, not a consumer experience. If domain separation were the answer, SDD’s structure should have protected this workflow. It did not, because the team was using an intent-driven agent for the deployment management — and that agent had no behavioral contracts. The domain didn’t determine the failure. The absence of the contract layer did.
Both domains need both layers. The domain determines the emphasis, not the requirement.
7.0 What a Spec Actually Is
7.1 Blueprint vs. Verification Contract
The entire debate resolves to one definitional question: what is a spec, and what is it for?
SDD’s spec is a blueprint — it tells the AI how to build. User stories, technical design, implementation tasks. An input to the system. Code derives from spec. The spec precedes the code and prescribes its structure.
A Verification Contract is something different: a test on whether the output meets the goal. It is indifferent to HOW the AI reached the output. It cares only that the output satisfies the criteria. The contract follows from the intent. It is not an input to execution — it is a gate on completion.
| SDD’s Spec (Blueprint) | Verification Contract | |
|---|---|---|
| Purpose | Tell AI how to build | Test whether output is correct |
| Written | Up front, before building | Derived from intent |
| Tells AI | Steps and structure | What must be true, what must never happen |
| Relationship to code | Code derives from spec | Spec validates code |
| Survives requirement change? | No — must be rewritten | Yes — intent stable, contract updated at contract level |
| Failure mode | 16 acceptance criteria for a bug fix | SaaStr — intent without invariants |
Kiro generating sixteen acceptance criteria for a single bug fix — observed by Martin Fowler — is not a calibration problem. It is a SDD tool doing exactly what SDD asks: generating the maximum specification surface area the task can justify, because the spec is the only control mechanism available. Intent — this bug matters because of X, and the fix should be bounded by Y — was never captured. So the tool generates everything, and the developer evaluates sixteen criteria for a change that needed three.
SDD built tools for blueprints and called them specs. IDD rejected the blueprints and, in most implementations, lost the contracts too. Both made the same mistake: collapsing intent and contract into one artifact.
7.2 The Four-Part Structure: Intent, Constraints, Failure Conditions, Scenarios
A production intent-driven system does not use spec files. It uses intent files structured as four components — and that structure is the answer to everything SDD gets wrong.
From Meridian-OS, an IDSD-based development system in production, the commit-code operation is defined as:
| |
The HOW is not in this file. The agent determines how commits are grouped, how messages are written, how the push is executed. What the file captures is what no SDD spec file can:
- Intent: the goal — what done looks like
- Constraints: behavioral invariants — what must always be true regardless of HOW the agent executes
- Failure conditions: the verification contract — what constitutes failure, stated as testable, binary conditions
- Scenarios: success criteria from each persona’s perspective — what a human reviewer evaluates
A SDD spec for the same operation would describe the steps. That is a blueprint. It tells the agent how to execute. It does not capture what must never happen, what constitutes a defect, or how a human will evaluate the output.
The drift resistance is structural. When implementation details change — the grouping heuristic adapts, the message format evolves, the push strategy shifts — the constraints still hold. C1 still applies. F4 still applies. The agent recalibrates the HOW while the invariants remain intact. A SDD spec that encoded the HOW must be rewritten, and every rewrite risks losing the constraints that were embedded in the original text.
This four-part structure also answers the Delivery Intent gap. Constraints and failure conditions are where delivery process lives:
- C-DEPLOY-1: Deployment only occurs during the defined deployment window
- C-JIRA-1: The linked Jira issue must be in the defined transition state before commit
- F-ROLLBACK-1: If error rate exceeds threshold within 15 minutes of deployment, rollback is initiated automatically
None of these belong in a blueprint. All of them belong in a verification contract. SDD has no place to put them. The four-part intent structure has an explicit home for every one.
7.3 Why Intent Survives What Specs Cannot
StrongDM’s three markdown files were not SDD specs. They were intent files — outcome declarations and behavioral contracts. What the system must do. What it must never do. How correctness is measured. The agents determined the path.
That is why three engineers shipped 32,000 lines without human review and it held. The intent was stable across every session, every agent, every iteration. When something drifted, the intent was the reference point to return to. The agents were not following a blueprint that could become outdated. They were operating within invariants that could not be overridden.
The industry read those files as extreme spec-driven development. They were its structural opposite: the demonstration that when you separate intent from contract — when you stop forcing one document to carry both the goal and the implementation blueprint — you get a system where agents can execute with genuine autonomy and humans can govern with genuine control.
8.0 The Four Crafts: What the Human Contribution Actually Becomes
The dark factory does not eliminate the human. It elevates the human to a different layer — the layer that AI cannot automate, because it requires judgment that AI cannot generate from first principles.
I call this the craft layer. There are four crafts:
Intent Crafting — translating goals into explicit, hierarchical, decomposed intents, each precise enough to have a verification contract written against it. The L1 to L3 decomposition: User Intent → Product Intent → Engineering Intents. This is not writing prompts. It is breaking one large intent into smaller executable intents, each specific enough to be verified. The Product Owner in an AI-native team is primarily an intent crafter. The craft is knowing when an intent is specific enough — and knowing what breaks when it isn’t.
Spec Crafting — writing verification contracts that are genuinely testable and enforceable, not narrative descriptions of desired behaviour. The distinction: “The system must respond within 200ms under defined load conditions” is a spec. “The system should be fast” is not. Spec crafting is the discipline of writing constraints and failure conditions that are tight enough to catch real failures and loose enough to allow the agent to reason. BMAD failed because its approach sections became blueprints. Spec Kit failed because its narrative specs became soft suggestions.
Context Crafting — managing what information the AI has access to, when, and at what resolution. Progressive disclosure: the agent gets what it needs when it needs it, not a context dump. The distinction between long-term memory (intent that persists — goals, constraints, invariants that never reset) and short-term memory (session context — what is being worked on now). Every tool that failed in the spec-driven experiments missed this layer. Spec Kit had no memory model. Agent OS’s specs were truncated in context windows and the agent hallucinated the rest.
Prompt Crafting — structuring how humans communicate intent to AI within each execution layer. Not “writing good prompts” — that is the surface. Prompt crafting is encoding the right intent at the right level of specificity for the right agent at the right moment. The difference between a recipe that produces consistent output and one that produces lottery tickets is prompt crafting quality.
Each craft governs a different failure mode:
| Missing Craft | Failure Mode |
|---|---|
| Intent Crafting | Spec is right, output is correct, problem is wrong |
| Spec Crafting | Agent executes intent faithfully into catastrophe (SaaStr) |
| Context Crafting | Sessions reset, intent decays, hallucination fills the gap |
| Prompt Crafting | AI becomes blueprint follower or vibe coder — no judgment |
Teams that optimise one craft while neglecting the others do not achieve autonomous execution. They shift the failure point. The dark factory requires all four, operating simultaneously, at different layers.
9.0 The Unified Model
9.1 What Both Sides Got Half Right
The engineering world got specs without intent. The consumer world got intent without specs. Both halves fail alone. Neither failure is an argument for the other side’s complete model — it is an argument for the complete model.
The unified model:
- Intent at every level (L1 user, L2 product, L3 engineering) — explicit, decomposed, auditable
- Verification contracts per intent — testable, enforceable, layer-specific
- Memory that carries intent across sessions — so agents never re-establish context through hallucination
- Delivery Intent captured alongside code intent — process, artifacts, deployment, failure scenarios
This is not a new methodology competing with SDD and IDD. It is the complete model that both methodologies were approximating. SDD found the contract layer but collapsed it with the blueprint. IDD found the intent layer but skipped the contract. The complete model requires both layers, explicitly separated, with different tools for each.
9.2 Where the Industry Goes Next
The tools are already moving. Kiro’s Agent Hooks and Steering Files are attempts to persist intent across sessions — they are context crafting primitives. BMAD’s Party Mode is an attempt to manage delivery intent through structured multi-agent handoffs — it is delivery intent management without the vocabulary. Spec Kit’s /speckit.analyze command validates consistency across artifacts — it is a primitive verification contract check.
The vocabulary is also converging. Accenture’s “intention-based framework.” Harness’s “guardrails, not gates.” The independent emergence of “intent as source of truth” across 2026 engineering leadership writing. These are practitioners arriving at the same model from different directions.
What has not yet converged: the explicit recognition that intent and spec are not competing artifacts but orthogonal layers. That a spec file is not a substitute for an intent declaration, and an intent declaration is not a substitute for a verification contract. That delivery intent exists and requires the same explicit capture as code intent.
StrongDM’s three files figured this out in practice without naming it. The rest of the industry is arriving at the same conclusion through accumulated failure. The question is whether the field names it before the failures compound further.
10.0 Where to Start — A Decision Framework
The unified model is not an argument to abandon SDD tooling. It is an argument to understand what those tools are and are not doing — and to add what is missing.
Three questions diagnose where a team is in this model. I call this The Intent Audit — a diagnostic that any engineering leader can run in one meeting:
Question 1: Is your intent captured explicitly before your spec is written? If the answer is no — if specs are written from informal conversations, Jira tickets, or developer intuition — then spec drift is structural. The spec has no stable reference point. Every patch moves the artifact further from the original goal, and there is no way to detect how far it has moved. The fix is not a better spec tool. It is an intent declaration layer upstream of the spec.
Question 2: Do your specs contain failure conditions as binary, testable statements? Most SDD specs describe desired behaviour in narrative form: “The system should handle authentication securely.” That is not a verification contract. A verification contract states: “Authentication must reject tokens older than 15 minutes (F1). Authentication must block after 5 consecutive failed attempts per IP within 60 seconds (F2). A session established without a valid token is a defect regardless of subsequent behaviour (F3).” If your specs read like the first example, you have blueprints, not contracts. The SaaStr incident happened because the contracts were all written as implicit assumptions.
Question 3: Is your delivery intent captured anywhere? List the steps between “code merged” and “feature in production.” For each step — Jira transition, PR review gate, deployment window, artifact update, approval chain — ask: is this captured explicitly in a way an agent can verify? If the answer is no, your delivery system is running on tribal knowledge. That knowledge does not survive team changes, does not scale with AI-assisted delivery speed, and does not produce auditable evidence of compliance.
Teams that can answer yes to all three questions are operating with the complete model. Teams that can answer yes to the first (SDD without intent capture) are in the most common failure mode: spec drift with no correction mechanism. Teams that can answer yes to none of the three are in vibe coding territory — moving fast until something fails, then spending more time recovering than they saved.
The path forward is not a tool switch. It is a layer addition: intent capture above the spec, behavioral contracts within the spec, delivery intent alongside the code spec. The tools exist. The vocabulary is converging. The model is complete. The adoption is what’s catching up.
11.0 Key References
Empirical Research
- METR — AI Makes Experienced Developers 19% Slower (July 2025)
- Veracode — 2025 GenAI Code Security Report
- arXiv 2602.00180 — Spec-Driven Development Performance Study
Industry Signals
- AWS — Kiro IDE Documentation (kiro.dev)
- GitHub — Spec Kit Repository
- Thoughtworks — Technology Radar, November 2025
- Accenture — Technology Vision 2025
- McKinsey — Technology Trends Outlook 2025
- KPMG — AI Governance and Outcomes Research (2025)
Tool Documentation
- BMAD Method — github.com/bmad-code-org/BMAD-METHOD
- Agent OS — github.com/buildermethods/agent-os
- Isoform.ai — The Limits of Spec-Driven Development
- Augment Code — Non-Determinism in AI Code Generation
Incidents & Case Studies
- Simon Willison — How StrongDM Builds Without Looking at Code (simonwillison.net)
- Fortune / eWeek — Replit/SaaStr Production Incident (July 2025)
- Scott Logic — Putting Spec Kit Through Its Paces: Radical Idea or Reinvented Waterfall? (blog.scottlogic.com, November 2025)
- Martin Fowler — Understanding SDD: Kiro, spec-kit, and Tessl (martinfowler.com)
- The Register — Kiro Pricing: A Wallet-Wrecking Tragedy (theregister.com, August 2025)
- InfoWorld — AWS Blames Bug for Kiro Pricing Glitch (infoworld.com, August 2025)
Regulatory
- EU AI Act — High-Risk AI Documentation Requirements (2024)
Production Systems
- Meridian-OS — Intent file structure: commit-code, check-drift, implement-epic
- Dan Shapiro — Five-Level AI Autonomy Framework (January 2026)