Directed Vibe Engineering — DVE V2

Guided Vibe Engineering 2.0 A Full Architecture for Voice-to-Code Software Development With Premortems, Risk Control, Living Architecture, and Real-World Engineering References

Executive Summary Guided Vibe Engineering, or GVE, is a structured system for building software with AI through voice-to-code, vibe coding, or conversational development. The original GVE idea was:

Keep the speed and creativity of vibe coding, but add engineering discipline so the project does not collapse into confusion, rework, and technical debt.

This expanded version turns GVE into a fuller software development architecture. It keeps the original core ideas:

Intent capture

AI clarification questions

Living requirements

Architecture maps

Premortems

State flow diagrams

Debug history

Iteration logs

Testing

Deployment and rollback

But now it adds 10 major upgrades based on real-world documented practices:

Assumption Ledger

Architecture Decision Records

C4 Architecture Views

SLOs and Error Budgets

Architecture Fitness Functions

Checklist Gates

Secure AI Coding Threat Model

GVE Readiness Levels

Feature Flags and Kill Switches

Cognitive Load Budget

Together, these make GVE more than “AI that asks better questions.” They turn it into a repeatable operating system for AI-assisted software development.

The Core Problem GVE Solves Most vibe coding fails for one simple reason:

The AI starts building before the project is truly understood.

The user gives a rough prompt:

“Build me a dashboard.”

The AI immediately begins generating files, components, routes, database models, and styling. But the AI may not know:

Who the dashboard is for

What decision the dashboard supports

Where the data comes from

How often the data updates

Who can access it

What happens if data is missing

What happens if an API fails

What must be logged

What must be tested

How it will be deployed

What “done” means

This creates the central GVE problem:

AI can generate code faster than humans can clarify intent.

GVE fixes this by forcing the AI to become a guided engineering partner, not just a code generator.

The New GVE Principle The original GVE principle was:

Do not let the AI write code until it proves it understands the problem.

The expanded GVE 2.0 principle is:

Do not let the AI write, change, debug, or deploy code unless the intent, assumptions, risks, architecture, tests, and readiness level are visible and traceable.

That sounds heavier, but the user experience should still feel simple. GVE should not feel like:

“Fill out this 40-page requirements form.”

It should feel like:

“Before I build this, let me confirm the two things most likely to cause rework later.”

That is the heart of GVE.

The Full GVE 2.0 Architecture The new GVE architecture has 12 layers. Layer 1: Conversation Layer The human speaks naturally. The AI listens, summarizes, asks questions, and turns messy language into structured project understanding. Layer 2: Intent Layer The AI captures the purpose of the project. Output:

Project Intent Brief

Target users

Success criteria

MVP definition

Constraints

Layer 3: Assumption Layer The AI identifies what is being assumed but not yet proven. Output:

Assumption Ledger

This is based on Assumption-Based Planning, a RAND-developed method that focuses on identifying important assumptions, vulnerabilities, signposts, and hedging or shaping actions. Quick reference: “Assumption-Based Planning, RAND, Dewar.” Layer 4: Premortem Layer The AI imagines the project failed and works backward to identify likely reasons. Output:

Premortem Register

Risk list

Prevention actions

Quick reference: “Gary Klein, Performing a Project Premortem, Harvard Business Review, 2007.” Layer 5: Requirements Layer The AI converts intent, assumptions, and premortem risks into requirements. Output:

Living Requirements Document

Edge case list

Constraint list

Layer 6: Architecture Layer The AI designs the system before coding. Output:

C4 architecture diagrams

State flow diagram

Dependency graph

Data ownership map

Architecture Decision Records

The C4 model uses hierarchical architecture views: system context, containers, components, and code. It was created by Simon Brown as a developer-friendly way to visualize software architecture. Quick reference: “Simon Brown, C4 Model.” Layer 7: Decision Memory Layer The AI records why major architecture choices were made. Output:

Architecture Decision Records, or ADRs

ADRs are small records of individual architecture decisions and their context, tradeoffs, and consequences. The widely referenced article is Michael Nygard’s “Documenting Architecture Decisions.” Recent research has also explored whether LLMs can help generate ADRs, finding that LLMs can produce relevant architecture decisions but still require human oversight. Quick reference: “Michael Nygard, Documenting Architecture Decisions.” Layer 8: Implementation Layer The AI builds modular code in controlled increments. Output:

Code modules

Module contracts

Updated architecture map

Updated change log

Layer 9: Verification Layer The AI checks whether the code still obeys the architecture. Output:

Tests

Fitness functions

Checklist gates

Readiness score

Architecture fitness functions come from evolutionary architecture thinking. The basic idea is that architecture should have executable checks that verify whether the system is still meeting its design goals. Quick reference: “Neal Ford, Rebecca Parsons, Patrick Kua, Building Evolutionary Architectures.” Layer 10: Security and Governance Layer The AI checks the risks of both the software being built and the AI agent doing the building. Output:

AI threat model

Permission boundaries

Tool-use approvals

Audit logs

OWASP’s Top 10 for LLM Applications includes risks such as prompt injection, insecure output handling, sensitive information disclosure, excessive agency, and supply chain vulnerabilities. These are directly relevant to AI-assisted coding workflows. Quick reference: “OWASP Top 10 for LLM Applications” and “NIST AI Risk Management Framework.” Layer 11: Deployment and Reliability Layer The AI plans how the software will safely go live. Output:

Deployment plan

Rollback plan

Feature flag registry

SLOs

Error budgets

Observability plan

Google’s SRE framework defines SLIs, SLOs, and error budgets as a way to measure and manage service reliability around what users actually care about. Quick reference: “Google SRE Book, Service Level Objectives.” Layer 12: Memory and Iteration Layer The AI preserves project history. Output:

Iteration Change Log

Debug History Log

Premortem Register

Assumption Ledger

ADRs

Readiness history

Predicted-vs-actual failure record

This is what turns GVE from a one-time coding conversation into a long-term development system.

The Updated GVE Flow The full flow now looks like this:

Conversation → Intent → Assumptions → Premortem → Requirements → Architecture → Decisions → Implementation → Verification → Security → Deployment → Memory → Iteration

Or in plain English:

Understand what the user wants.

Identify hidden assumptions.

Imagine how the project could fail.

Convert risks into requirements.

Design the system.

Record why decisions were made.

Build in small modules.

Test against behavior and architecture.

Check security and AI-agent risk.

Deploy safely with rollback.

Track everything.

Improve without losing context.

The Expanded GVE Artifacts GVE 2.0 should maintain these living artifacts.
Project Intent Brief Defines the project’s purpose. Includes:

Problem being solved

Target users

Success criteria

MVP scope

Constraints

Non-goals

Assumption Ledger Tracks every important assumption. Includes:

Assumption

Source

Risk if false

Warning sign

Validation method

Status

Premortem Register Tracks imagined future failures. Includes:

Imagined failure

Likely cause

Risk score

Prevention action

Test needed

Status

Linked bug if it later happens

Living Requirements Document Tracks what the system must do. Includes:

Functional requirements

Non-functional requirements

Edge cases

User roles

Permissions

Failure behavior

C4 Architecture Map Shows the system at multiple levels. Includes:

Context diagram

Container diagram

Component diagram

Code-level view when needed

State Flow Diagram Shows where data comes from, where it goes, and where truth lives.
Dependency Graph Shows how modules depend on each other.
Architecture Decision Records Records why major decisions were made.
Module Contract Registry Defines what each module does, accepts, returns, and depends on.
Architecture Fitness Function List Defines automated architecture checks.
GVE Checklist Set Short checklists for phase transitions.
Secure AI Coding Threat Model Defines what the AI agent is allowed to do.
Test Specification Defines behavioral, integration, regression, and edge-case tests.
Observability Plan Defines logs, metrics, traces, alerts, and dashboards.
Reliability Contract Defines SLIs, SLOs, and error budgets.
Feature Flag Registry Defines flags, kill switches, rollout rules, owners, and cleanup dates.
GVE Readiness Scorecard Defines maturity level of each feature or module.
Cognitive Load Budget Tracks whether the system is becoming too hard for humans to understand.
Debug History Log Records bugs, root causes, fixes, affected modules, and related premortem risks.
Iteration Change Log Tracks what changed, why, and what was affected.
Upgrade 1: Assumption Ledger Simple reference to look up later Assumption-Based Planning, RAND, James Dewar What it adds to GVE The Assumption Ledger captures what the AI or human is assuming before those assumptions quietly become architecture. In vibe coding, the AI often fills gaps automatically. That is useful, but dangerous. Example user prompt:

“Build a login system.”

The AI may assume:

Email/password login

No social login

No MFA

Users self-register

Password reset is needed

Sessions expire after a certain time

Admins exist

User roles exist

But the user may not have said any of that. GVE implementation After Intent Capture, the AI creates an Assumption Ledger. Each entry should include: FieldPurposeIDUnique assumption numberAssumptionWhat is being assumedSourceUser said it, AI inferred it, default convention, technical constraintConfidenceLow, medium, highRisk if falseWhat breaks if wrongWarning signHow we know it may be wrongValidation questionWhat to ask or testStatusUnvalidated, validated, invalidated, accepted How it flows with GVE The Assumption Ledger feeds:

Requirements

Premortem

Architecture

Test planning

Debugging

Example Assumption:

Users only need one role: admin.

Risk if false:

Authorization model may need redesign later.

Validation question:

“Will this system ever need different user permissions, such as admin, manager, employee, or client?”

If the answer is yes, that changes the architecture before code begins. Why it improves GVE It stops invisible assumptions from becoming expensive rework.

Upgrade 2: Architecture Decision Records Simple reference to look up later Michael Nygard, Documenting Architecture Decisions What it adds to GVE The living architecture map shows what the system looks like. ADRs explain why it looks that way. Without ADRs, future developers and future AI sessions will see the code but not the reasoning. GVE implementation Every meaningful architecture decision gets an ADR. Examples:

Use PostgreSQL instead of MongoDB

Use polling instead of WebSockets

Use server-side validation

Use background jobs for email

Use feature flags for risky launches

Use a monolith for MVP instead of microservices

ADR format Each ADR should be short. Recommended structure:

Title

Status: proposed, accepted, superseded, deprecated

Context

Decision

Alternatives considered

Consequences

Linked assumptions

Linked premortem risks

Linked files/modules

How it flows with GVE ADRs are created during:

Architecture design

Major implementation decisions

Debug fixes that change design

Iterations that alter module contracts

Example Title:

ADR-003: Use polling instead of WebSockets for dashboard MVP

Context:

User wants “real-time” dashboard but accepts 30-second refresh.

Decision:

Use polling every 30 seconds for MVP.

Consequence:

Simpler deployment, lower complexity, but not true real-time.

Linked assumption:

Dashboard users do not require sub-second updates.

Linked premortem risk:

Users may perceive data as stale.

Why it improves GVE It creates project memory. The AI can later say:

“We chose polling because you prioritized fast MVP delivery over true real-time updates.”

That prevents confusion and accidental reversal.

Upgrade 3: C4 Architecture Views Simple reference to look up later Simon Brown, C4 Model What it adds to GVE The original GVE architecture map is useful, but it needs structure. The C4 model gives GVE a clean diagram system:

Context

Containers

Components

Code

The official C4 model describes itself as a developer-friendly approach using hierarchical abstractions and diagrams, including software systems, containers, components, and code. GVE implementation Replace the generic “architecture map” with a GVE-C4 Architecture Map. Level 1: Context Shows:

Users

External systems

Business environment

Major data sources

Question:

“Who or what interacts with this system?”

Level 2: Containers Shows:

Frontend app

Backend API

Database

Worker services

Cache

Queue

Third-party services

Question:

“What deployable or runtime pieces make up this system?”

Level 3: Components Shows:

Auth component

Dashboard component

Reporting component

Notification component

Billing component

Question:

“What major parts exist inside each container?”

Level 4: Code Used only when necessary. Shows:

Classes

Functions

Modules

Interfaces

Question:

“What code-level structure matters enough to document?”

How it flows with GVE C4 views should be updated:

Before implementation

After major module creation

After architecture-changing debug fixes

Before deployment

After major iterations

Why it improves GVE It lets GVE speak to different audiences:

Investor: context view

Client: container view

Developer: component view

Debugging AI: code view

That keeps the architecture understandable instead of overwhelming.

Upgrade 4: SLOs and Error Budgets Simple reference to look up later Google SRE Book, Service Level Objectives What it adds to GVE GVE should not only ask:

“Does it work?”

It should ask:

“How well must it work for users to trust it?”

Google’s SRE guidance emphasizes defining service behavior around what users care about, using SLIs, SLOs, and error budgets. GVE implementation Add a Reliability Contract artifact. Key terms SLI: Service Level Indicator What you measure. Examples:

API latency

Dashboard load time

File upload failure rate

AI response timeout rate

SLO: Service Level Objective The target. Examples:

95% of dashboard loads complete in under 2 seconds

99% of login attempts return a response in under 1 second

File upload failure rate stays below 1%

Error Budget How much failure is acceptable before development slows down to fix reliability. Example:

If 5% of dashboard loads can be slow, that 5% is the error budget.

How it flows with GVE SLOs should be defined after architecture and before deployment. They feed:

Test plans

Observability

Rollback triggers

Feature flag rules

Maintenance priorities

Example Feature:

Client reporting dashboard

SLI:

Dashboard load time

SLO:

95% of dashboard requests load in under 2 seconds

Error budget:

5% may exceed 2 seconds over a 30-day window

Rollback trigger:

If new deployment causes 15% of requests to exceed 2 seconds, roll back or disable the feature.

Why it improves GVE It turns quality into a measurable agreement. GVE stops saying:

“Looks good.”

And starts saying:

“It meets the agreed reliability target.”

Upgrade 5: Architecture Fitness Functions Simple reference to look up later Building Evolutionary Architectures, Neal Ford, Rebecca Parsons, Patrick Kua What it adds to GVE The architecture map is useful, but it can become stale. Fitness functions make architecture enforceable. A fitness function is a test or check that proves the system is still following its intended design. GVE implementation Create an Architecture Fitness Function List. Examples:

Frontend cannot directly access database.

All API routes must require authentication unless marked public.

Every external API call must include timeout handling.

No module may import from deprecated modules.

All database migrations must include rollback notes.

All new endpoints must include tests.

All high-risk modules must include logging.

No secrets may appear in source code.

Business logic should not live inside UI components.

How it flows with GVE Fitness functions are created from:

ADRs

Premortem risks

Security rules

Architecture map

Module contracts

They run during:

Implementation

Pull request review

Debug fixes

Iteration

Pre-deployment checks

Example Premortem risk:

AI may create frontend components that bypass backend authorization.

Fitness function:

No frontend component may call protected database functions directly.

Test/check:

Static scan for forbidden imports or database client usage in frontend directories.

Why it improves GVE It prevents architectural drift. The AI no longer just documents architecture. It actively checks whether the code still obeys it.

Upgrade 6: Checklist Gates Simple reference to look up later Atul Gawande, The Checklist Manifesto WHO Surgical Safety Checklist, Haynes et al., NEJM 2009 What it adds to GVE GVE should stay conversational, but it still needs short checklists at critical transitions. Checklists are powerful because they prevent obvious mistakes during complex work. The well-known WHO surgical checklist study reported meaningful reductions in complications and deaths after checklist implementation. Quick reference: “A Surgical Safety Checklist to Reduce Morbidity and Mortality in a Global Population, Haynes, NEJM, 2009.” GVE implementation Add short checklist gates. These are not long forms. They are 5–8 item safety checks. Before Coding Checklist

Intent confirmed?

MVP defined?

Assumptions logged?

Premortem completed?

Requirements drafted?

Architecture map created?

State ownership defined?

High-risk items addressed or accepted?

Before Module Build Checklist

Module responsibility clear?

Inputs defined?

Outputs defined?

Dependencies listed?

Failure behavior defined?

Tests planned?

Architecture impact known?

Before Debug Fix Checklist

Expected behavior defined?

Actual behavior captured?

Root cause identified?

Affected modules listed?

Regression risk checked?

Test added?

Debug log updated?

Before Deployment Checklist

Environment variables confirmed?

Secrets protected?

Migrations tested?

Rollback plan defined?

Monitoring active?

Feature flags configured?

Smoke tests passed?

High risks accepted or mitigated?

How it flows with GVE Checklist gates sit between phases. They should not stop flow unless a major missing piece appears. The AI should say:

“We are ready to code except for one missing item: we have not defined who owns user state. Let’s settle that before I generate the module.”

Why it improves GVE It protects fast-moving projects from skipping basics.

Upgrade 7: Secure AI Coding Threat Model Simple reference to look up later OWASP Top 10 for LLM Applications NIST AI Risk Management Framework What it adds to GVE The original GVE protects the software project. This upgrade protects the development process itself. That matters because AI coding agents may:

Read files

Write files

Run commands

Install dependencies

Access secrets

Call APIs

Modify databases

Interpret untrusted input

Follow malicious instructions embedded in files

OWASP’s LLM Top 10 includes threats such as prompt injection and excessive agency. A 2026 empirical study also examined prompt injection and tool-poisoning attacks across AI-assisted development tools, showing that agentic coding workflows need explicit safeguards. GVE implementation Add a Secure AI Coding Threat Model artifact. It should define: AI permissions

Can the AI read files?

Can it write files?

Can it delete files?

Can it run terminal commands?

Can it install packages?

Can it access environment variables?

Can it access production systems?

Trust boundaries

Which files are trusted?

Which files are user-provided?

Which inputs may contain prompt injection?

Which tool outputs are untrusted?

Approval rules Require human approval for:

Deleting files

Installing packages

Changing auth/security code

Running migrations

Accessing secrets

Modifying deployment settings

Sending external communications

Touching production data

Audit requirements Track:

Prompt

AI plan

Tool calls

Files changed

Commands run

Human approvals

Results

How it flows with GVE This layer runs across the entire workflow. It should especially trigger when:

External files are uploaded

Dependencies are added

Auth code changes

Secrets are involved

Production resources are touched

The AI wants to execute commands

Why it improves GVE It prevents the AI from becoming an uncontrolled developer with too much power. This is critical if GVE becomes a product or client service.

Upgrade 8: GVE Readiness Levels Simple reference to look up later NASA Technology Readiness Levels What it adds to GVE AI often says a feature is “done” when it only means:

“The code was generated.”

NASA’s Technology Readiness Levels, or TRLs, are a 1–9 measurement system used to assess technology maturity, where TRL 1 is lowest and TRL 9 is highest. GVE needs its own readiness scale. GVE implementation Create GVE Readiness Levels, or GVE-RL. GVE-RL 1: Idea Feature has been mentioned but not clarified. GVE-RL 2: Intent Confirmed User, purpose, and success criteria are defined. GVE-RL 3: Requirements Defined Requirements, assumptions, and edge cases are documented. GVE-RL 4: Architecture Designed Architecture, state flow, dependencies, and ADRs exist. GVE-RL 5: Built Locally Code exists and runs in a development environment. GVE-RL 6: Tested Unit, integration, edge case, and regression tests exist. GVE-RL 7: Observable Logs, metrics, error handling, and monitoring are defined. GVE-RL 8: Deployable Deployment plan, rollback plan, feature flags, and checklist are complete. GVE-RL 9: Production Proven Feature is live, monitored, stable, and supported by debug/change history. How it flows with GVE Every feature and module gets a readiness level. Example:

“The reporting module is GVE-RL 5. It is built locally, but it is not production-ready because tests, observability, and rollback are incomplete.”

Why it improves GVE It makes progress honest. GVE can stop claiming completion too early.

Upgrade 9: Feature Flags and Kill Switches Simple reference to look up later Martin Fowler, Feature Toggles Software Development with Feature Toggles, Mahdavi-Hezaveh, Dremann, Williams What it adds to GVE GVE already includes deployment and rollback. Feature flags make deployment safer by allowing features to be enabled, disabled, or rolled out gradually without redeploying everything. Research on feature toggles notes they are widely used for continuous integration and delivery, but improper use can cause complexity, dead code, and even system failure. GVE implementation Add a Feature Control Registry. Each feature flag should include:

Flag name

Feature controlled

Owner

Default state

Rollout audience

Kill-switch behavior

Monitoring condition

Expiration date

Cleanup task

Related premortem risk

Related SLO

Types of flags Release flag Used to deploy code without releasing it to everyone. Experiment flag Used for A/B testing. Permission flag Used to enable features for certain users. Ops flag Used to disable risky behavior quickly. Kill switch Used to immediately shut off a feature if it causes harm. How it flows with GVE Feature flags should be considered when:

Premortem identifies launch risk

Feature is high-impact

Feature uses AI outputs

Feature changes user data

Feature affects billing, auth, reporting, or production workflows

Example Feature:

AI-generated report summary

Flag:

enable_ai_report_summary

Kill switch:

Disable if hallucination reports exceed threshold or response errors exceed SLO.

Cleanup:

Remove flag after 30 days of stable production use.

Why it improves GVE It lets GVE ship quickly without betting the whole system on one release.

Upgrade 10: Cognitive Load Budget Simple reference to look up later Team Topologies, Matthew Skelton and Manuel Pais What it adds to GVE AI can generate more code than humans can understand. That is a serious problem. A system can work technically but still be too mentally heavy for a team to maintain. Team Topologies emphasizes cognitive load as a key design principle for fast flow and team effectiveness. Quick reference: “Team Topologies, Skelton and Pais.” GVE implementation Add a Cognitive Load Budget. Each module should be scored for understandability. Score factors

Number of responsibilities

Number of dependencies

Number of external services

Number of edge cases

Amount of configuration

Amount of hidden state

Number of concepts needed to understand it

Clarity of documentation

Test coverage

Debug difficulty

Score levels Low cognitive load Easy to understand, test, and modify. Medium cognitive load Manageable, but needs clear docs and tests. High cognitive load Too complex; should be split, simplified, or documented more carefully. How it flows with GVE Cognitive load is checked during:

Architecture design

Module planning

Implementation

Debugging

Iteration review

Example AI proposes a UserManager module that handles:

Password reset

Role permissions

Billing profile

Email notifications

Audit logs

Session management

GVE should warn:

“This module has high cognitive load. It mixes authentication, authorization, billing, notifications, and auditing. I recommend splitting it into AuthService, UserProfileService, RoleService, and AuditLogService.”

Why it improves GVE It keeps AI-generated software human-maintainable. This may be one of the most important additions because vibe coding’s biggest hidden risk is not broken code. It is too much code, too fast, with too little shared understanding.

The New GVE Phase-by-Phase Workflow Now we integrate everything into the actual GVE lifecycle.

Phase 1: Conversation Intake Goal Capture the raw idea. AI behavior The AI listens, summarizes, and reflects. Outputs

Raw idea summary

Initial intent

Open questions

New architecture elements used

Conversation Layer

Intent Layer

Phase 2: Intent Capture Goal Define what is being built and why. AI questions

What problem are we solving?

Who is this for?

What does success look like?

What is the minimum useful version?

What should this not become?

Outputs

Project Intent Brief

Success criteria

MVP definition

Non-goals

New architecture elements used

Project Intent Brief

GVE-RL starts at level 1 or 2

Phase 3: Assumption Discovery Goal Make hidden assumptions visible. AI questions

What am I assuming from your prompt?

Which assumptions are safe?

Which assumptions could break the project if wrong?

What needs validation?

Outputs

Assumption Ledger

New architecture elements used

Assumption Ledger

Reference Assumption-Based Planning, RAND

Phase 4: Project Premortem Goal Imagine failure before building. AI question

“Imagine this project failed three months after launch. What probably caused it?”

Outputs

Premortem Register

Top risks

Prevention actions

New architecture elements used

Premortem Register

Risk scoring

Reference Gary Klein, Performing a Project Premortem

Phase 5: Requirements Definition Goal Turn intent, assumptions, and risks into requirements. AI questions

What should the system do?

What should it never do?

What are the user roles?

What are the edge cases?

What are the failure states?

Outputs

Living Requirements Document

Edge case list

Requirement gap premortem

New architecture elements used

Checklist gate

Assumption-to-requirement conversion

Premortem-to-requirement conversion

Phase 6: Architecture Design Goal Design before coding. AI questions

Where does truth live?

What are the main containers?

What are the major components?

What depends on what?

What could fail under real usage?

Outputs

C4 context diagram

C4 container diagram

C4 component diagram

State flow diagram

Dependency graph

Data ownership map

ADRs

New architecture elements used

C4 model

ADRs

Architecture failure premortem

Cognitive load budget

Security threat model draft

References

Simon Brown, C4 Model

Michael Nygard, ADRs

Phase 7: Reliability and Security Planning Goal Define trust, safety, and production behavior before implementation. AI questions

What user-facing behavior must be reliable?

What should be measured?

What failure rate is acceptable?

What can the AI coding agent access?

What actions require approval?

What data is sensitive?

Outputs

Reliability Contract

SLI/SLO definitions

Error budget

Secure AI Coding Threat Model

Observability draft

New architecture elements used

SLOs

Error budgets

AI threat model

References

Google SRE Book

OWASP Top 10 for LLM Applications

NIST AI RMF

Phase 8: Module Planning Goal Prepare each module before coding. AI questions

What is this module responsible for?

What does it accept?

What does it return?

What errors can it produce?

What depends on it?

What does it depend on?

How could this module fail?

Outputs

Module contract

Module-level premortem

Module tests

Cognitive load score

New architecture elements used

Module Contract Registry

Cognitive Load Budget

GVE-RL update

Phase 9: Controlled Implementation Goal Build code in small, traceable increments. AI rules

No giant code dumps

Reference architecture before coding

Update architecture after meaningful changes

Add tests as code is created

Log decisions

Update readiness level

Outputs

Code

Tests

Updated architecture map

Updated dependency graph

Updated change log

New architecture elements used

Fitness functions

Checklist gates

ADRs if decisions change

Security threat model if permissions/tools are used

Phase 10: Verification Goal Prove the system works and still matches the architecture. AI checks

Do tests pass?

Do fitness functions pass?

Are high-risk premortem items mitigated?

Does the architecture map match the code?

Are assumptions validated?

Is cognitive load acceptable?

Outputs

Test results

Fitness function results

Updated readiness score

Open risk list

New architecture elements used

Architecture fitness functions

Test coverage premortem

Checklist gates

Phase 11: Debugging Goal Fix root causes without creating new problems. AI questions

What was expected?

What actually happened?

Was this risk predicted?

Which assumption failed?

Which module contract was violated?

What side effects could this fix cause?

What regression test should be added?

Outputs

Debug History Log

Fix premortem

Regression test

Architecture update if needed

Readiness downgrade/upgrade if needed

New architecture elements used

Premortem Register linkage

Assumption Ledger linkage

ADR update if design changes

Fitness function update if needed

Phase 12: Deployment Planning Goal Ship safely. AI questions

Where will this run?

What could fail during deployment?

What is the rollback plan?

Do we need feature flags?

What SLOs must be watched?

What smoke tests are required?

Outputs

Deployment plan

Rollback plan

Feature Flag Registry

Launch premortem

Observability plan

Production checklist

New architecture elements used

Feature flags

Kill switches

SLOs

Error budgets

Checklist gates

References

Martin Fowler, Feature Toggles

Google SRE Book

Phase 13: Production Monitoring Goal See what is happening after launch. AI/system checks

Are SLOs being met?

Is the error budget being consumed?

Are logs showing expected behavior?

Are users hitting failure states?

Do feature flags need adjustment?

Did premortem risks become real?

Outputs

Monitoring dashboard

Error budget report

Incident notes

Predicted-vs-actual failure comparison

New architecture elements used

Observability plan

Reliability Contract

Feature Control Registry

Premortem Register

Phase 14: Iteration and Maintenance Goal Improve without losing the thread. AI questions

Does this change support the original intent?

Does it create scope drift?

Does it break a module contract?

Does it increase cognitive load?

Does it require new ADRs?

Does it change readiness level?

Does the feature flag need cleanup?

Outputs

Iteration Change Log

Updated requirements

Updated architecture

Updated ADRs

Updated readiness levels

Updated cognitive load score

New architecture elements used

Iteration drift premortem

Cognitive load budget

ADRs

Feature flag cleanup

GVE-RL

GVE Risk Scoring System GVE should use a simple scoring model for assumptions, premortems, and security risks. Each risk gets: Likelihood 1 = unlikely 2 = possible 3 = likely Impact 1 = minor 2 = serious 3 = severe Detectability 1 = easy to detect 2 = somewhat detectable 3 = hard to detect Risk Score

Risk Score = Likelihood × Impact × Detectability

Maximum score is 27. Risk levels

1–6: Low

7–14: Medium

15–27: High

High-risk items require one of four actions:

Mitigate

Test

Accept explicitly

Defer with reason

This keeps GVE fast while making risk visible.

GVE “Definition of Done” In normal vibe coding, “done” often means:

The AI generated the code.

In GVE, that is not enough. A feature is done only when:

Intent is clear

Assumptions are logged

Premortem risks are handled

Requirements are documented

Architecture map is updated

ADRs exist for major decisions

Module contract is defined

Tests are passing

Fitness functions are passing

Security risks are reviewed

Cognitive load is acceptable

Observability exists where needed

Deployment plan is ready

Feature flag/kill switch exists if needed

Readiness level is appropriate

Change log is updated

Short version:

Done means working, understood, tested, observable, reversible, and traceable.

Example: How GVE 2.0 Handles “Build Me a Dashboard” User says

“Build me a customer dashboard.”

Normal AI response Starts coding a dashboard UI. GVE response

“Got it. Before I build, I want to clarify the two things most likely to cause rework: who uses this dashboard, and what decision should it help them make?”

GVE creates Intent Brief Customer dashboard for account managers to track client status. Assumption Ledger

Assumption: data comes from CRM

Assumption: refresh every 5 minutes is acceptable

Assumption: only internal employees access it

Premortem Possible failure:

Dashboard fails because users do not trust stale data.

Prevention:

Define refresh rate and data timestamp.

Requirements

Show customer list

Show account status

Show last contact

Show open issues

Show data timestamp

C4 Map

User: account manager

Frontend: dashboard web app

Backend: dashboard API

Database: reporting database

External: CRM API

ADR Decision:

Use polling every 5 minutes instead of WebSockets.

Reason:

User does not require sub-second real-time updates.

SLO

95% of dashboard loads under 2 seconds.

Fitness Function

Dashboard frontend cannot call CRM API directly.

Feature Flag

enable_customer_dashboard_v1

Cognitive Load Budget Dashboard module split into:

Dashboard UI

Metrics service

CRM sync service

Account status component

Readiness Level Starts at GVE-RL 2. Moves to GVE-RL 9 only after deployment, monitoring, and stable usage.

The New GVE Reference List This is the simple “look it up later” list you asked for. Premortem Gary Klein — Performing a Project Premortem — Harvard Business Review — 2007 Assumptions RAND — Assumption-Based Planning — James Dewar Architecture Decisions Michael Nygard — Documenting Architecture Decisions Architecture Diagrams Simon Brown — C4 Model Reliability Google SRE Book — Service Level Objectives Architecture Enforcement Neal Ford, Rebecca Parsons, Patrick Kua — Building Evolutionary Architectures Checklists Atul Gawande — The Checklist Manifesto Haynes et al. — WHO Surgical Safety Checklist — NEJM 2009 AI Security OWASP Top 10 for LLM Applications NIST AI Risk Management Framework Readiness Levels NASA — Technology Readiness Levels Feature Flags Martin Fowler — Feature Toggles Mahdavi-Hezaveh, Dremann, Williams — Software Development with Feature Toggles Cognitive Load Matthew Skelton and Manuel Pais — Team Topologies
Final GVE 2.0 Summary GVE started as a way to make vibe coding safer and more structured. With these 10 additions, it becomes a full development architecture. The upgraded system now does five major things:
Clarifies intent So the AI does not build the wrong thing.
Exposes assumptions So hidden guesses do not become expensive rewrites.
Predicts failure So likely problems are handled before they happen.
Preserves memory So the project does not lose context across iterations.
Enforces readiness So “generated code” is not mistaken for production software. The final version of GVE is:

A conversational software engineering system where AI guides the human from idea to production using intent capture, assumption tracking, premortems, living architecture, decision records, modular implementation, tests, security controls, reliability targets, deployment safeguards, and long-term project memory.

Simpler:

GVE keeps the vibe, but makes the engineering real.