Directed Vibe Engineering — DVE V2
Guided Vibe Engineering 2.0 A Full Architecture for Voice-to-Code Software Development With Premortems, Risk Control, Living Architecture, and Real-World Engineering References
- Executive Summary Guided Vibe Engineering, or GVE, is a structured system for building software with AI through voice-to-code, vibe coding, or conversational development. The original GVE idea was:
Keep the speed and creativity of vibe coding, but add engineering discipline so the project does not collapse into confusion, rework, and technical debt.
This expanded version turns GVE into a fuller software development architecture. It keeps the original core ideas:
Intent capture
AI clarification questions
Living requirements
Architecture maps
Premortems
State flow diagrams
Debug history
Iteration logs
Testing
Deployment and rollback
But now it adds 10 major upgrades based on real-world documented practices:
Assumption Ledger
Architecture Decision Records
C4 Architecture Views
SLOs and Error Budgets
Architecture Fitness Functions
Checklist Gates
Secure AI Coding Threat Model
GVE Readiness Levels
Feature Flags and Kill Switches
Cognitive Load Budget
Together, these make GVE more than “AI that asks better questions.” They turn it into a repeatable operating system for AI-assisted software development.
- The Core Problem GVE Solves Most vibe coding fails for one simple reason:
The AI starts building before the project is truly understood.
The user gives a rough prompt:
“Build me a dashboard.”
The AI immediately begins generating files, components, routes, database models, and styling. But the AI may not know:
Who the dashboard is for
What decision the dashboard supports
Where the data comes from
How often the data updates
Who can access it
What happens if data is missing
What happens if an API fails
What must be logged
What must be tested
How it will be deployed
What “done” means
This creates the central GVE problem:
AI can generate code faster than humans can clarify intent.
GVE fixes this by forcing the AI to become a guided engineering partner, not just a code generator.
- The New GVE Principle The original GVE principle was:
Do not let the AI write code until it proves it understands the problem.
The expanded GVE 2.0 principle is:
Do not let the AI write, change, debug, or deploy code unless the intent, assumptions, risks, architecture, tests, and readiness level are visible and traceable.
That sounds heavier, but the user experience should still feel simple. GVE should not feel like:
“Fill out this 40-page requirements form.”
It should feel like:
“Before I build this, let me confirm the two things most likely to cause rework later.”
That is the heart of GVE.
- The Full GVE 2.0 Architecture The new GVE architecture has 12 layers. Layer 1: Conversation Layer The human speaks naturally. The AI listens, summarizes, asks questions, and turns messy language into structured project understanding. Layer 2: Intent Layer The AI captures the purpose of the project. Output:
Project Intent Brief
Target users
Success criteria
MVP definition
Constraints
Layer 3: Assumption Layer The AI identifies what is being assumed but not yet proven. Output:
Assumption Ledger
This is based on Assumption-Based Planning, a RAND-developed method that focuses on identifying important assumptions, vulnerabilities, signposts, and hedging or shaping actions. Quick reference: “Assumption-Based Planning, RAND, Dewar.” Layer 4: Premortem Layer The AI imagines the project failed and works backward to identify likely reasons. Output:
Premortem Register
Risk list
Prevention actions
Quick reference: “Gary Klein, Performing a Project Premortem, Harvard Business Review, 2007.” Layer 5: Requirements Layer The AI converts intent, assumptions, and premortem risks into requirements. Output:
Living Requirements Document
Edge case list
Constraint list
Layer 6: Architecture Layer The AI designs the system before coding. Output:
C4 architecture diagrams
State flow diagram
Dependency graph
Data ownership map
Architecture Decision Records
The C4 model uses hierarchical architecture views: system context, containers, components, and code. It was created by Simon Brown as a developer-friendly way to visualize software architecture. Quick reference: “Simon Brown, C4 Model.” Layer 7: Decision Memory Layer The AI records why major architecture choices were made. Output:
Architecture Decision Records, or ADRs
ADRs are small records of individual architecture decisions and their context, tradeoffs, and consequences. The widely referenced article is Michael Nygard’s “Documenting Architecture Decisions.” Recent research has also explored whether LLMs can help generate ADRs, finding that LLMs can produce relevant architecture decisions but still require human oversight. Quick reference: “Michael Nygard, Documenting Architecture Decisions.” Layer 8: Implementation Layer The AI builds modular code in controlled increments. Output:
Code modules
Module contracts
Updated architecture map
Updated change log
Layer 9: Verification Layer The AI checks whether the code still obeys the architecture. Output:
Tests
Fitness functions
Checklist gates
Readiness score
Architecture fitness functions come from evolutionary architecture thinking. The basic idea is that architecture should have executable checks that verify whether the system is still meeting its design goals. Quick reference: “Neal Ford, Rebecca Parsons, Patrick Kua, Building Evolutionary Architectures.” Layer 10: Security and Governance Layer The AI checks the risks of both the software being built and the AI agent doing the building. Output:
AI threat model
Permission boundaries
Tool-use approvals
Audit logs
OWASP’s Top 10 for LLM Applications includes risks such as prompt injection, insecure output handling, sensitive information disclosure, excessive agency, and supply chain vulnerabilities. These are directly relevant to AI-assisted coding workflows. Quick reference: “OWASP Top 10 for LLM Applications” and “NIST AI Risk Management Framework.” Layer 11: Deployment and Reliability Layer The AI plans how the software will safely go live. Output:
Deployment plan
Rollback plan
Feature flag registry
SLOs
Error budgets
Observability plan
Google’s SRE framework defines SLIs, SLOs, and error budgets as a way to measure and manage service reliability around what users actually care about. Quick reference: “Google SRE Book, Service Level Objectives.” Layer 12: Memory and Iteration Layer The AI preserves project history. Output:
Iteration Change Log
Debug History Log
Premortem Register
Assumption Ledger
ADRs
Readiness history
Predicted-vs-actual failure record
This is what turns GVE from a one-time coding conversation into a long-term development system.
- The Updated GVE Flow The full flow now looks like this:
Conversation → Intent → Assumptions → Premortem → Requirements → Architecture → Decisions → Implementation → Verification → Security → Deployment → Memory → Iteration
Or in plain English:
Understand what the user wants.
Identify hidden assumptions.
Imagine how the project could fail.
Convert risks into requirements.
Design the system.
Record why decisions were made.
Build in small modules.
Test against behavior and architecture.
Check security and AI-agent risk.
Deploy safely with rollback.
Track everything.
Improve without losing context.
- The Expanded GVE Artifacts GVE 2.0 should maintain these living artifacts.
- Project Intent Brief Defines the project’s purpose. Includes:
Problem being solved
Target users
Success criteria
MVP scope
Constraints
Non-goals
- Assumption Ledger Tracks every important assumption. Includes:
Assumption
Source
Risk if false
Warning sign
Validation method
Status
- Premortem Register Tracks imagined future failures. Includes:
Imagined failure
Likely cause
Risk score
Prevention action
Test needed
Status
Linked bug if it later happens
- Living Requirements Document Tracks what the system must do. Includes:
Functional requirements
Non-functional requirements
Edge cases
User roles
Permissions
Failure behavior
- C4 Architecture Map Shows the system at multiple levels. Includes:
Context diagram
Container diagram
Component diagram
Code-level view when needed
State Flow Diagram Shows where data comes from, where it goes, and where truth lives.
Dependency Graph Shows how modules depend on each other.
Architecture Decision Records Records why major decisions were made.
Module Contract Registry Defines what each module does, accepts, returns, and depends on.
Architecture Fitness Function List Defines automated architecture checks.
GVE Checklist Set Short checklists for phase transitions.
Secure AI Coding Threat Model Defines what the AI agent is allowed to do.
Test Specification Defines behavioral, integration, regression, and edge-case tests.
Observability Plan Defines logs, metrics, traces, alerts, and dashboards.
Reliability Contract Defines SLIs, SLOs, and error budgets.
Feature Flag Registry Defines flags, kill switches, rollout rules, owners, and cleanup dates.
GVE Readiness Scorecard Defines maturity level of each feature or module.
Cognitive Load Budget Tracks whether the system is becoming too hard for humans to understand.
Debug History Log Records bugs, root causes, fixes, affected modules, and related premortem risks.
Iteration Change Log Tracks what changed, why, and what was affected.
Upgrade 1: Assumption Ledger Simple reference to look up later Assumption-Based Planning, RAND, James Dewar What it adds to GVE The Assumption Ledger captures what the AI or human is assuming before those assumptions quietly become architecture. In vibe coding, the AI often fills gaps automatically. That is useful, but dangerous. Example user prompt:
“Build a login system.”
The AI may assume:
Email/password login
No social login
No MFA
Users self-register
Password reset is needed
Sessions expire after a certain time
Admins exist
User roles exist
But the user may not have said any of that. GVE implementation After Intent Capture, the AI creates an Assumption Ledger. Each entry should include: FieldPurposeIDUnique assumption numberAssumptionWhat is being assumedSourceUser said it, AI inferred it, default convention, technical constraintConfidenceLow, medium, highRisk if falseWhat breaks if wrongWarning signHow we know it may be wrongValidation questionWhat to ask or testStatusUnvalidated, validated, invalidated, accepted How it flows with GVE The Assumption Ledger feeds:
Requirements
Premortem
Architecture
Test planning
Debugging
Example Assumption:
Users only need one role: admin.
Risk if false:
Authorization model may need redesign later.
Validation question:
“Will this system ever need different user permissions, such as admin, manager, employee, or client?”
If the answer is yes, that changes the architecture before code begins. Why it improves GVE It stops invisible assumptions from becoming expensive rework.
- Upgrade 2: Architecture Decision Records Simple reference to look up later Michael Nygard, Documenting Architecture Decisions What it adds to GVE The living architecture map shows what the system looks like. ADRs explain why it looks that way. Without ADRs, future developers and future AI sessions will see the code but not the reasoning. GVE implementation Every meaningful architecture decision gets an ADR. Examples:
Use PostgreSQL instead of MongoDB
Use polling instead of WebSockets
Use server-side validation
Use background jobs for email
Use feature flags for risky launches
Use a monolith for MVP instead of microservices
ADR format Each ADR should be short. Recommended structure:
Title
Status: proposed, accepted, superseded, deprecated
Context
Decision
Alternatives considered
Consequences
Linked assumptions
Linked premortem risks
Linked files/modules
How it flows with GVE ADRs are created during:
Architecture design
Major implementation decisions
Debug fixes that change design
Iterations that alter module contracts
Example Title:
ADR-003: Use polling instead of WebSockets for dashboard MVP
Context:
User wants “real-time” dashboard but accepts 30-second refresh.
Decision:
Use polling every 30 seconds for MVP.
Consequence:
Simpler deployment, lower complexity, but not true real-time.
Linked assumption:
Dashboard users do not require sub-second updates.
Linked premortem risk:
Users may perceive data as stale.
Why it improves GVE It creates project memory. The AI can later say:
“We chose polling because you prioritized fast MVP delivery over true real-time updates.”
That prevents confusion and accidental reversal.
- Upgrade 3: C4 Architecture Views Simple reference to look up later Simon Brown, C4 Model What it adds to GVE The original GVE architecture map is useful, but it needs structure. The C4 model gives GVE a clean diagram system:
Context
Containers
Components
Code
The official C4 model describes itself as a developer-friendly approach using hierarchical abstractions and diagrams, including software systems, containers, components, and code. GVE implementation Replace the generic “architecture map” with a GVE-C4 Architecture Map. Level 1: Context Shows:
Users
External systems
Business environment
Major data sources
Question:
“Who or what interacts with this system?”
Level 2: Containers Shows:
Frontend app
Backend API
Database
Worker services
Cache
Queue
Third-party services
Question:
“What deployable or runtime pieces make up this system?”
Level 3: Components Shows:
Auth component
Dashboard component
Reporting component
Notification component
Billing component
Question:
“What major parts exist inside each container?”
Level 4: Code Used only when necessary. Shows:
Classes
Functions
Modules
Interfaces
Question:
“What code-level structure matters enough to document?”
How it flows with GVE C4 views should be updated:
Before implementation
After major module creation
After architecture-changing debug fixes
Before deployment
After major iterations
Why it improves GVE It lets GVE speak to different audiences:
Investor: context view
Client: container view
Developer: component view
Debugging AI: code view
That keeps the architecture understandable instead of overwhelming.
- Upgrade 4: SLOs and Error Budgets Simple reference to look up later Google SRE Book, Service Level Objectives What it adds to GVE GVE should not only ask:
“Does it work?”
It should ask:
“How well must it work for users to trust it?”
Google’s SRE guidance emphasizes defining service behavior around what users care about, using SLIs, SLOs, and error budgets. GVE implementation Add a Reliability Contract artifact. Key terms SLI: Service Level Indicator What you measure. Examples:
Login success rate
API latency
Dashboard load time
File upload failure rate
AI response timeout rate
SLO: Service Level Objective The target. Examples:
95% of dashboard loads complete in under 2 seconds
99% of login attempts return a response in under 1 second
File upload failure rate stays below 1%
Error Budget How much failure is acceptable before development slows down to fix reliability. Example:
If 5% of dashboard loads can be slow, that 5% is the error budget.
How it flows with GVE SLOs should be defined after architecture and before deployment. They feed:
Test plans
Observability
Rollback triggers
Feature flag rules
Maintenance priorities
Example Feature:
Client reporting dashboard
SLI:
Dashboard load time
SLO:
95% of dashboard requests load in under 2 seconds
Error budget:
5% may exceed 2 seconds over a 30-day window
Rollback trigger:
If new deployment causes 15% of requests to exceed 2 seconds, roll back or disable the feature.
Why it improves GVE It turns quality into a measurable agreement. GVE stops saying:
“Looks good.”
And starts saying:
“It meets the agreed reliability target.”
- Upgrade 5: Architecture Fitness Functions Simple reference to look up later Building Evolutionary Architectures, Neal Ford, Rebecca Parsons, Patrick Kua What it adds to GVE The architecture map is useful, but it can become stale. Fitness functions make architecture enforceable. A fitness function is a test or check that proves the system is still following its intended design. GVE implementation Create an Architecture Fitness Function List. Examples:
Frontend cannot directly access database.
All API routes must require authentication unless marked public.
Every external API call must include timeout handling.
No module may import from deprecated modules.
All database migrations must include rollback notes.
All new endpoints must include tests.
All high-risk modules must include logging.
No secrets may appear in source code.
Business logic should not live inside UI components.
How it flows with GVE Fitness functions are created from:
ADRs
Premortem risks
Security rules
Architecture map
Module contracts
They run during:
Implementation
Pull request review
Debug fixes
Iteration
Pre-deployment checks
Example Premortem risk:
AI may create frontend components that bypass backend authorization.
Fitness function:
No frontend component may call protected database functions directly.
Test/check:
Static scan for forbidden imports or database client usage in frontend directories.
Why it improves GVE It prevents architectural drift. The AI no longer just documents architecture. It actively checks whether the code still obeys it.
- Upgrade 6: Checklist Gates Simple reference to look up later Atul Gawande, The Checklist Manifesto WHO Surgical Safety Checklist, Haynes et al., NEJM 2009 What it adds to GVE GVE should stay conversational, but it still needs short checklists at critical transitions. Checklists are powerful because they prevent obvious mistakes during complex work. The well-known WHO surgical checklist study reported meaningful reductions in complications and deaths after checklist implementation. Quick reference: “A Surgical Safety Checklist to Reduce Morbidity and Mortality in a Global Population, Haynes, NEJM, 2009.” GVE implementation Add short checklist gates. These are not long forms. They are 5–8 item safety checks. Before Coding Checklist
Intent confirmed?
MVP defined?
Assumptions logged?
Premortem completed?
Requirements drafted?
Architecture map created?
State ownership defined?
High-risk items addressed or accepted?
Before Module Build Checklist
Module responsibility clear?
Inputs defined?
Outputs defined?
Dependencies listed?
Failure behavior defined?
Tests planned?
Architecture impact known?
Before Debug Fix Checklist
Expected behavior defined?
Actual behavior captured?
Root cause identified?
Affected modules listed?
Regression risk checked?
Test added?
Debug log updated?
Before Deployment Checklist
Environment variables confirmed?
Secrets protected?
Migrations tested?
Rollback plan defined?
Monitoring active?
Feature flags configured?
Smoke tests passed?
High risks accepted or mitigated?
How it flows with GVE Checklist gates sit between phases. They should not stop flow unless a major missing piece appears. The AI should say:
“We are ready to code except for one missing item: we have not defined who owns user state. Let’s settle that before I generate the module.”
Why it improves GVE It protects fast-moving projects from skipping basics.
- Upgrade 7: Secure AI Coding Threat Model Simple reference to look up later OWASP Top 10 for LLM Applications NIST AI Risk Management Framework What it adds to GVE The original GVE protects the software project. This upgrade protects the development process itself. That matters because AI coding agents may:
Read files
Write files
Run commands
Install dependencies
Access secrets
Call APIs
Modify databases
Interpret untrusted input
Follow malicious instructions embedded in files
OWASP’s LLM Top 10 includes threats such as prompt injection and excessive agency. A 2026 empirical study also examined prompt injection and tool-poisoning attacks across AI-assisted development tools, showing that agentic coding workflows need explicit safeguards. GVE implementation Add a Secure AI Coding Threat Model artifact. It should define: AI permissions
Can the AI read files?
Can it write files?
Can it delete files?
Can it run terminal commands?
Can it install packages?
Can it access environment variables?
Can it access production systems?
Trust boundaries
Which files are trusted?
Which files are user-provided?
Which inputs may contain prompt injection?
Which tool outputs are untrusted?
Approval rules Require human approval for:
Deleting files
Installing packages
Changing auth/security code
Running migrations
Accessing secrets
Modifying deployment settings
Sending external communications
Touching production data
Audit requirements Track:
Prompt
AI plan
Tool calls
Files changed
Commands run
Human approvals
Results
How it flows with GVE This layer runs across the entire workflow. It should especially trigger when:
External files are uploaded
Dependencies are added
Auth code changes
Secrets are involved
Production resources are touched
The AI wants to execute commands
Why it improves GVE It prevents the AI from becoming an uncontrolled developer with too much power. This is critical if GVE becomes a product or client service.
- Upgrade 8: GVE Readiness Levels Simple reference to look up later NASA Technology Readiness Levels What it adds to GVE AI often says a feature is “done” when it only means:
“The code was generated.”
NASA’s Technology Readiness Levels, or TRLs, are a 1–9 measurement system used to assess technology maturity, where TRL 1 is lowest and TRL 9 is highest. GVE needs its own readiness scale. GVE implementation Create GVE Readiness Levels, or GVE-RL. GVE-RL 1: Idea Feature has been mentioned but not clarified. GVE-RL 2: Intent Confirmed User, purpose, and success criteria are defined. GVE-RL 3: Requirements Defined Requirements, assumptions, and edge cases are documented. GVE-RL 4: Architecture Designed Architecture, state flow, dependencies, and ADRs exist. GVE-RL 5: Built Locally Code exists and runs in a development environment. GVE-RL 6: Tested Unit, integration, edge case, and regression tests exist. GVE-RL 7: Observable Logs, metrics, error handling, and monitoring are defined. GVE-RL 8: Deployable Deployment plan, rollback plan, feature flags, and checklist are complete. GVE-RL 9: Production Proven Feature is live, monitored, stable, and supported by debug/change history. How it flows with GVE Every feature and module gets a readiness level. Example:
“The reporting module is GVE-RL 5. It is built locally, but it is not production-ready because tests, observability, and rollback are incomplete.”
Why it improves GVE It makes progress honest. GVE can stop claiming completion too early.
- Upgrade 9: Feature Flags and Kill Switches Simple reference to look up later Martin Fowler, Feature Toggles Software Development with Feature Toggles, Mahdavi-Hezaveh, Dremann, Williams What it adds to GVE GVE already includes deployment and rollback. Feature flags make deployment safer by allowing features to be enabled, disabled, or rolled out gradually without redeploying everything. Research on feature toggles notes they are widely used for continuous integration and delivery, but improper use can cause complexity, dead code, and even system failure. GVE implementation Add a Feature Control Registry. Each feature flag should include:
Flag name
Feature controlled
Owner
Default state
Rollout audience
Kill-switch behavior
Monitoring condition
Expiration date
Cleanup task
Related premortem risk
Related SLO
Types of flags Release flag Used to deploy code without releasing it to everyone. Experiment flag Used for A/B testing. Permission flag Used to enable features for certain users. Ops flag Used to disable risky behavior quickly. Kill switch Used to immediately shut off a feature if it causes harm. How it flows with GVE Feature flags should be considered when:
Premortem identifies launch risk
Feature is high-impact
Feature uses AI outputs
Feature changes user data
Feature affects billing, auth, reporting, or production workflows
Example Feature:
AI-generated report summary
Flag:
enable_ai_report_summary
Kill switch:
Disable if hallucination reports exceed threshold or response errors exceed SLO.
Cleanup:
Remove flag after 30 days of stable production use.
Why it improves GVE It lets GVE ship quickly without betting the whole system on one release.
- Upgrade 10: Cognitive Load Budget Simple reference to look up later Team Topologies, Matthew Skelton and Manuel Pais What it adds to GVE AI can generate more code than humans can understand. That is a serious problem. A system can work technically but still be too mentally heavy for a team to maintain. Team Topologies emphasizes cognitive load as a key design principle for fast flow and team effectiveness. Quick reference: “Team Topologies, Skelton and Pais.” GVE implementation Add a Cognitive Load Budget. Each module should be scored for understandability. Score factors
Number of responsibilities
Number of dependencies
Number of external services
Number of edge cases
Amount of configuration
Amount of hidden state
Number of concepts needed to understand it
Clarity of documentation
Test coverage
Debug difficulty
Score levels Low cognitive load Easy to understand, test, and modify. Medium cognitive load Manageable, but needs clear docs and tests. High cognitive load Too complex; should be split, simplified, or documented more carefully. How it flows with GVE Cognitive load is checked during:
Architecture design
Module planning
Implementation
Debugging
Iteration review
Example AI proposes a UserManager module that handles:
Login
Password reset
Role permissions
Billing profile
Email notifications
Audit logs
Session management
GVE should warn:
“This module has high cognitive load. It mixes authentication, authorization, billing, notifications, and auditing. I recommend splitting it into AuthService, UserProfileService, RoleService, and AuditLogService.”
Why it improves GVE It keeps AI-generated software human-maintainable. This may be one of the most important additions because vibe coding’s biggest hidden risk is not broken code. It is too much code, too fast, with too little shared understanding.
- The New GVE Phase-by-Phase Workflow Now we integrate everything into the actual GVE lifecycle.
Phase 1: Conversation Intake Goal Capture the raw idea. AI behavior The AI listens, summarizes, and reflects. Outputs
Raw idea summary
Initial intent
Open questions
New architecture elements used
Conversation Layer
Intent Layer
Phase 2: Intent Capture Goal Define what is being built and why. AI questions
What problem are we solving?
Who is this for?
What does success look like?
What is the minimum useful version?
What should this not become?
Outputs
Project Intent Brief
Success criteria
MVP definition
Non-goals
New architecture elements used
Project Intent Brief
GVE-RL starts at level 1 or 2
Phase 3: Assumption Discovery Goal Make hidden assumptions visible. AI questions
What am I assuming from your prompt?
Which assumptions are safe?
Which assumptions could break the project if wrong?
What needs validation?
Outputs
Assumption Ledger
New architecture elements used
Assumption Ledger
Reference Assumption-Based Planning, RAND
Phase 4: Project Premortem Goal Imagine failure before building. AI question
“Imagine this project failed three months after launch. What probably caused it?”
Outputs
Premortem Register
Top risks
Prevention actions
New architecture elements used
Premortem Register
Risk scoring
Reference Gary Klein, Performing a Project Premortem
Phase 5: Requirements Definition Goal Turn intent, assumptions, and risks into requirements. AI questions
What should the system do?
What should it never do?
What are the user roles?
What are the edge cases?
What are the failure states?
Outputs
Living Requirements Document
Edge case list
Requirement gap premortem
New architecture elements used
Checklist gate
Assumption-to-requirement conversion
Premortem-to-requirement conversion
Phase 6: Architecture Design Goal Design before coding. AI questions
Where does truth live?
What are the main containers?
What are the major components?
What depends on what?
What could fail under real usage?
Outputs
C4 context diagram
C4 container diagram
C4 component diagram
State flow diagram
Dependency graph
Data ownership map
ADRs
New architecture elements used
C4 model
ADRs
Architecture failure premortem
Cognitive load budget
Security threat model draft
References
Simon Brown, C4 Model
Michael Nygard, ADRs
Phase 7: Reliability and Security Planning Goal Define trust, safety, and production behavior before implementation. AI questions
What user-facing behavior must be reliable?
What should be measured?
What failure rate is acceptable?
What can the AI coding agent access?
What actions require approval?
What data is sensitive?
Outputs
Reliability Contract
SLI/SLO definitions
Error budget
Secure AI Coding Threat Model
Observability draft
New architecture elements used
SLOs
Error budgets
AI threat model
References
Google SRE Book
OWASP Top 10 for LLM Applications
NIST AI RMF
Phase 8: Module Planning Goal Prepare each module before coding. AI questions
What is this module responsible for?
What does it accept?
What does it return?
What errors can it produce?
What depends on it?
What does it depend on?
How could this module fail?
Outputs
Module contract
Module-level premortem
Module tests
Cognitive load score
New architecture elements used
Module Contract Registry
Cognitive Load Budget
GVE-RL update
Phase 9: Controlled Implementation Goal Build code in small, traceable increments. AI rules
No giant code dumps
Reference architecture before coding
Update architecture after meaningful changes
Add tests as code is created
Log decisions
Update readiness level
Outputs
Code
Tests
Updated architecture map
Updated dependency graph
Updated change log
New architecture elements used
Fitness functions
Checklist gates
ADRs if decisions change
Security threat model if permissions/tools are used
Phase 10: Verification Goal Prove the system works and still matches the architecture. AI checks
Do tests pass?
Do fitness functions pass?
Are high-risk premortem items mitigated?
Does the architecture map match the code?
Are assumptions validated?
Is cognitive load acceptable?
Outputs
Test results
Fitness function results
Updated readiness score
Open risk list
New architecture elements used
Architecture fitness functions
Test coverage premortem
Checklist gates
Phase 11: Debugging Goal Fix root causes without creating new problems. AI questions
What was expected?
What actually happened?
Was this risk predicted?
Which assumption failed?
Which module contract was violated?
What side effects could this fix cause?
What regression test should be added?
Outputs
Debug History Log
Fix premortem
Regression test
Architecture update if needed
Readiness downgrade/upgrade if needed
New architecture elements used
Premortem Register linkage
Assumption Ledger linkage
ADR update if design changes
Fitness function update if needed
Phase 12: Deployment Planning Goal Ship safely. AI questions
Where will this run?
What could fail during deployment?
What is the rollback plan?
Do we need feature flags?
What SLOs must be watched?
What smoke tests are required?
Outputs
Deployment plan
Rollback plan
Feature Flag Registry
Launch premortem
Observability plan
Production checklist
New architecture elements used
Feature flags
Kill switches
SLOs
Error budgets
Checklist gates
References
Martin Fowler, Feature Toggles
Google SRE Book
Phase 13: Production Monitoring Goal See what is happening after launch. AI/system checks
Are SLOs being met?
Is the error budget being consumed?
Are logs showing expected behavior?
Are users hitting failure states?
Do feature flags need adjustment?
Did premortem risks become real?
Outputs
Monitoring dashboard
Error budget report
Incident notes
Predicted-vs-actual failure comparison
New architecture elements used
Observability plan
Reliability Contract
Feature Control Registry
Premortem Register
Phase 14: Iteration and Maintenance Goal Improve without losing the thread. AI questions
Does this change support the original intent?
Does it create scope drift?
Does it break a module contract?
Does it increase cognitive load?
Does it require new ADRs?
Does it change readiness level?
Does the feature flag need cleanup?
Outputs
Iteration Change Log
Updated requirements
Updated architecture
Updated ADRs
Updated readiness levels
Updated cognitive load score
New architecture elements used
Iteration drift premortem
Cognitive load budget
ADRs
Feature flag cleanup
GVE-RL
- GVE Risk Scoring System GVE should use a simple scoring model for assumptions, premortems, and security risks. Each risk gets: Likelihood 1 = unlikely 2 = possible 3 = likely Impact 1 = minor 2 = serious 3 = severe Detectability 1 = easy to detect 2 = somewhat detectable 3 = hard to detect Risk Score
Risk Score = Likelihood × Impact × Detectability
Maximum score is 27. Risk levels
1–6: Low
7–14: Medium
15–27: High
High-risk items require one of four actions:
Mitigate
Test
Accept explicitly
Defer with reason
This keeps GVE fast while making risk visible.
- GVE “Definition of Done” In normal vibe coding, “done” often means:
The AI generated the code.
In GVE, that is not enough. A feature is done only when:
Intent is clear
Assumptions are logged
Premortem risks are handled
Requirements are documented
Architecture map is updated
ADRs exist for major decisions
Module contract is defined
Tests are passing
Fitness functions are passing
Security risks are reviewed
Cognitive load is acceptable
Observability exists where needed
Deployment plan is ready
Feature flag/kill switch exists if needed
Readiness level is appropriate
Change log is updated
Short version:
Done means working, understood, tested, observable, reversible, and traceable.
- Example: How GVE 2.0 Handles “Build Me a Dashboard” User says
“Build me a customer dashboard.”
Normal AI response Starts coding a dashboard UI. GVE response
“Got it. Before I build, I want to clarify the two things most likely to cause rework: who uses this dashboard, and what decision should it help them make?”
GVE creates Intent Brief Customer dashboard for account managers to track client status. Assumption Ledger
Assumption: data comes from CRM
Assumption: refresh every 5 minutes is acceptable
Assumption: only internal employees access it
Premortem Possible failure:
Dashboard fails because users do not trust stale data.
Prevention:
Define refresh rate and data timestamp.
Requirements
Show customer list
Show account status
Show last contact
Show open issues
Show data timestamp
C4 Map
User: account manager
Frontend: dashboard web app
Backend: dashboard API
Database: reporting database
External: CRM API
ADR Decision:
Use polling every 5 minutes instead of WebSockets.
Reason:
User does not require sub-second real-time updates.
SLO
95% of dashboard loads under 2 seconds.
Fitness Function
Dashboard frontend cannot call CRM API directly.
Feature Flag
enable_customer_dashboard_v1
Cognitive Load Budget Dashboard module split into:
Dashboard UI
Metrics service
CRM sync service
Account status component
Readiness Level Starts at GVE-RL 2. Moves to GVE-RL 9 only after deployment, monitoring, and stable usage.
The New GVE Reference List This is the simple “look it up later” list you asked for. Premortem Gary Klein — Performing a Project Premortem — Harvard Business Review — 2007 Assumptions RAND — Assumption-Based Planning — James Dewar Architecture Decisions Michael Nygard — Documenting Architecture Decisions Architecture Diagrams Simon Brown — C4 Model Reliability Google SRE Book — Service Level Objectives Architecture Enforcement Neal Ford, Rebecca Parsons, Patrick Kua — Building Evolutionary Architectures Checklists Atul Gawande — The Checklist Manifesto Haynes et al. — WHO Surgical Safety Checklist — NEJM 2009 AI Security OWASP Top 10 for LLM Applications NIST AI Risk Management Framework Readiness Levels NASA — Technology Readiness Levels Feature Flags Martin Fowler — Feature Toggles Mahdavi-Hezaveh, Dremann, Williams — Software Development with Feature Toggles Cognitive Load Matthew Skelton and Manuel Pais — Team Topologies
Final GVE 2.0 Summary GVE started as a way to make vibe coding safer and more structured. With these 10 additions, it becomes a full development architecture. The upgraded system now does five major things:
Clarifies intent So the AI does not build the wrong thing.
Exposes assumptions So hidden guesses do not become expensive rewrites.
Predicts failure So likely problems are handled before they happen.
Preserves memory So the project does not lose context across iterations.
Enforces readiness So “generated code” is not mistaken for production software. The final version of GVE is:
A conversational software engineering system where AI guides the human from idea to production using intent capture, assumption tracking, premortems, living architecture, decision records, modular implementation, tests, security controls, reliability targets, deployment safeguards, and long-term project memory.
Simpler:
GVE keeps the vibe, but makes the engineering real.