The Persistence of Theory Reevaluating Naur's "Programming as Theory Building" in the Generative AI Era
Introduction
Generative Artificial Intelligence (AI) is fundamentally changing how we build and maintain software. In 1985, computer scientist Peter Naur wrote a famous paper called "Programming as Theory Building." He argued that programming isn't just about writing code or creating documentation. Instead, the real work of programming is building a shared mental model—a "theory"—inside the developers' minds. This mental model is what actually allows a team to understand and manage a complex system.
For decades, this idea was mostly a philosophical talking point. But today's AI coding tools have made Naur's thesis urgently relevant. AI can write thousands of lines of code in minutes. Because of this, we can no longer measure a developer's productivity just by how much code they produce; doing so hides the true health of the software.
This report looks at Naur's theory in the age of generative AI. Evidence shows that his ideas are critical for diagnosing the hidden problems in modern software teams. Fast AI code generation removes the natural friction of programming, replacing obvious technical problems with invisible "cognitive" and "intent" debt.
The Foundations of Theory Building
To understand AI's impact, we first need to understand Naur's concept of a "theory." Drawing on philosopher Gilbert Ryle's work, Naur rejected the idea that programming is just a manufacturing process where specifications are blindly translated into code.
In this context, having a theory means "knowing how" to do something, not just "knowing that" a fact is true. A programmer with a strong theory of a system can explain why it works, justify their decisions, and respond constructively to unexpected changes. They hold a dynamic mental map that links the messy real world to the strict rules of the software. This map includes the system's goals, historical trade-offs, and overarching intent.
Crucially, Naur pointed out that this theory is always lost when translated into text. Code and documentation explain what the system does and how, but they cannot fully capture why specific choices were made.
Naur illustrated this by comparing two teams building a compiler:
- Group A built it from scratch and developed a deep mental theory of how it worked.
- Group B was later handed the finished code and perfect documentation, but failed to extend the system safely because they lacked the underlying theory.
Naur concluded that a program "dies" the moment the original team dissolves. Subsequent developers can only guess how the system works by reading the surviving text. When this happens, it is often cheaper to rebuild the system from scratch than to try and guess the lost theory.
The Paradox of Code Abundance
AI models have completely changed how code is produced. AI can now successfully write boilerplate, complex algorithms, and tests with an over eighty percent success rate for standard tasks. It can even refactor old code in seconds.
We can understand this shift using Fred Brooks' famous distinction between two types of software complexity:
- Accidental Complexity: The mechanical friction of writing code (e.g., fighting with syntax, fixing typos, and setting up files).
- Essential Complexity: The actual difficulty of solving the real-world problem and designing the right architecture.
AI is the ultimate accidental-complexity killer. It removes the friction of typing. However, the essential complexity remains untouched; deciding what to build is still a human responsibility.
This creates a paradox. Historically, the frustrating friction of writing code manually forced developers to build Naur's "theory" in their heads. Figuring out syntax errors and reading library docs built a localized mental model. When AI removes this friction, the code is produced instantly, but the theory is never built.
The Danger of "Vibecoding"
This frictionless environment has led to "vibecoding". Coined by AI researcher Andrej Karpathy, vibecoding involves chatting with an AI to build software without ever actually reading or understanding the generated code.
While vibecoding works for simple, standard apps, it violates the core principles of theory building. The developer ships code without reviewing the architectural choices. The AI guesses the logic based on statistics, leaving the code completely untethered from the specific business domain. Because the developer doesn't reverse-engineer the AI's logic, the codebase grows in a theoretical vacuum. If the project gets complex—like trying to build an operating system with a flawed data disk strategy—the developer becomes paralyzed and incapable of safely extending the system.
The Triple Debt Model
Because AI bypasses human theory building, we need a new way to measure software health. Historically, we only worried about "technical debt"—the compounding cost of messy code. Since AI can easily fix messy code, researchers like Margaret-Anne Storey propose the "Triple Debt Model". This model tracks three types of debt:
| Debt Type | Where it Lives | Definition | Symptoms | AI Impact |
|---|---|---|---|---|
| Technical Debt | The Code | Poorly written code, bad modularity, and rigid dependencies. | Slow builds, fragile tests, and localized bugs. | GenAI reduces this by automating refactoring and writing tests. |
| Cognitive Debt | The Human Mind | The erosion of shared team understanding. A gap between what the code does and what humans comprehend. | Unpredictable bugs from "safe" changes, slow onboarding, and "cognitive surrender". | GenAI severely accelerates this by writing code faster than humans can learn it. |
| Intent Debt | The Specifications | The loss of the why. Missing goals, constraints, and recorded decision history. | Useless features, ignored metrics, and AI hallucinating bad architecture. | GenAI makes this worse because AI confidently invents intent if humans don't explicitly write it down. |
Cognitive Debt and "Cognitive Surrender"
Cognitive debt is the silent loss of Naur's theory. Every time a developer accepts AI code without fully understanding it, they take out an invisible loan against their future ability to maintain that code.
In the past, taking a shortcut left an obvious mark, like a messy copy-paste from Stack Overflow or an apologetic code comment. Today, AI code looks flawless and perfectly matches the company's style guide. It easily passes automated tests, yet it lacks a human custodian.
This leads to "cognitive surrender"—where developers blindly trust AI outputs without thinking critically. The team only realizes they have cognitive debt when the system breaks in production and they discover nobody possesses the transactive memory needed to fix it.
Intent Debt
If cognitive debt is forgetting how the system works, intent debt is forgetting why it exists. When human programmers write code manually, they hold the business constraints in their heads. In an AI workflow, the AI doesn't know the business constraints unless they are explicitly written down in files like AGENTS.md or Architectural Decision Records.
Without written intent, AI guesses what the constraints should be, leading to plausible but catastrophic assumptions. For example, an AI might "optimize" code by quietly deleting a rare financial safeguard. Because AI is great at answering questions humans forgot to document, it constantly drifts away from the software's true purpose.
Empirical Evidence: Skill Collapse and Brain Impact
The dangers of cognitive debt are now backed by hard science. Studies show that over-relying on AI actively degrades human learning and brain connectivity.
The Neurological Cost ("Your Brain on ChatGPT")
A study by MIT Media Lab researcher Nataliya Kosmyna tracked 54 participants writing essays using ChatGPT, Google, or just their brains. Using EEG brain scans, they found that the AI-assisted group experienced a fifty-five percent drop in functional neural connectivity.
While AI made the task faster, it bypassed the "desirable difficulties"—the mental struggle required to build dense neural pathways. Furthermore, eighty-three percent of the AI users couldn't remember or quote what they had just "written". Even weeks later, when asked to work without AI, their brains still showed lingering weaknesses. Relying on AI causes cognitive atrophy and homogenizes human thought.
Anthropic's Software Engineering Study
This neurological reality maps perfectly to programming. In January 2026, Anthropic tested 52 junior developers using an unfamiliar Python library (Trio) with and without AI.
While the AI group finished the task marginally faster, they failed the subsequent comprehension and debugging test, scoring roughly fifty percent (17 points lower than the manual coders). They struggled the most with debugging questions.
The study found that the tool itself wasn't the problem; the way developers interacted with it was:
| Interaction Pattern | How They Used AI | Comprehension Score | Impact on Skill |
|---|---|---|---|
| Blind Delegation | Asked AI to write everything, didn't read the code. | < 40% | Destructive. Creates severe cognitive debt and inability to debug. |
| Progressive Reliance | Started manually but gave up and let AI do the hard parts. | 40% - 60% | Stagnant. Prevents mastery. |
| Comprehension-First | Used AI to generate code, but asked it to explain the logic and typed parts manually. | > 60% | Constructive. Builds theory while leveraging AI speed. |
This proves Naur's thesis: generating text is not the same as generating understanding. Bypassing the struggle of coding prevents developers from building the neural pathways needed to govern systems safely. This threatens to hollow out the pipeline of future senior engineers.
The Impact on Team Dynamics
Software engineering is a deeply socio-technical discipline. Naur noted that a program's theory is a shared mental model distributed across the team. Engineering teams are essentially institutions built to transmit this theory from seniors to juniors.
Generative AI causes a profound degradation of these institutions. Traditionally, juniors learn by struggling, failing, and undergoing rigorous code reviews with senior engineers—this transfers tacit knowledge. When a junior instantly generates a solution with AI, this vital pipeline is short-circuited. The AI acts as a sycophantic partner; it never pushes back on bad ideas or forces the developer to explain their logic. The result is a flood of contextless code entering the system.
The industry impact is highly measurable:
- Code Churn: Code that is written and immediately rewritten doubled from 3.1% to 5.7%.
- Copy-Pasting: For the first time, the volume of poorly copied-and-pasted code exceeded properly refactored code.
- Stunted Growth: Internal tracking at firms like Infobip showed that junior technical skill growth dropped from +2.50 points pre-AI to just +1.56 points in the AI era.
We are witnessing a redefinition of socio-technical systems. We are moving from a "tool-and-operator" model to a "manager-and-team" dynamic. A team's speed is no longer gated by typing, but by the human "verification cost". Validating AI output is exhausting; pull requests with AI code now take 5.3 times longer to review.
Can AI Agents Build the Theory Instead?
If humans are losing the theory, can advanced AI agents hold it for us?
Current agents (like Claude Code) do engage in a temporary form of theory building. When debugging, they use scientific methodology: forming hypotheses, running tests, and adjusting their mental models. They can sometimes rival human capability in isolating bugs.
However, agents suffer from a fatal flaw: they are amnesiacs. Because they operate within ephemeral context windows, they cannot retain their theories between sessions. They must rebuild their understanding from scratch every time they boot up. Even with automated summaries, these are lossy approximations. Written text simply cannot capture tacit knowledge.
Furthermore, when agents write code, they rely on the most plausible generalized abstraction from their training data, rather than the specific abstraction your business needs. If different agents work on different tickets, they will build subtly conflicting localized theories, leading to massive architectural drift and turning the system into an unmaintainable "big ball of mud". Human oversight remains entirely indispensable.
Architecting for Understanding
The generative AI revolution has permanently shifted the value of a human developer. AI has severed the link between conceptual design and syntactic coding. Because generating text is now abundant, cheap, and instantaneous, the true scarce asset is the intent—the human judgment required to govern the machine.
To survive, engineering organizations must actively re-introduce friction back into their workflows. Thoughtworks' Technology Radar Volume 34 explicitly warns against "Codebase Cognitive Debt," noting that as permission-hungry agents move faster, traditional rigor is more important than ever.
Modern teams must adopt these advanced competencies:
- Intent Ledgers: Prompts must become rigorous specifications. Teams must document exact constraints and rules in files like AGENTS.md before letting AI write code.
- Verification: Engineers must review AI code mathematically. Since AI rarely makes syntax errors, bugs are now logical or domain violations. This requires strict property-based testing and mutation testing.
- Architectural Fitness: Teams must build rigid guardrails (like strict API schemas and clean architecture) to restrict the AI's creativity and prevent architectural drift.
- Deliberate Theory Building: Organizations must fight invisible cognitive debt by forcing human engagement. This means mandating "AI-free" checkpoints for juniors, enforcing pair programming, and requiring developers to verbally explain AI-generated code to their peers.
As seen in complex projects—like writing an operating system kernel using an AI agent—the human must maintain absolute authority over the architecture. The AI handles the typing, but the unifying theory remains firmly anchored in the human mind.
Conclusion
Generative AI is the biggest disruption to software engineering in decades. Yet, Peter Naur's 1985 thesis remains an immutable, enduring law. A program is not its code; it is the living, shared understanding of how that code maps to the chaotic complexities of the real world.
By eliminating the friction of writing code, AI has severely exacerbated the human problem of understanding. Skipping the cognitive struggle of implementation directly causes a silent, catastrophic accumulation of cognitive and intent debt. Empirical evidence proves that blindly delegating work to AI degrades neural connectivity, comprehension, and the skill growth of junior engineers.
To thrive, the industry must cease conflating the automated production of text with the intellectual act of building software. The future isn't in autonomous delegation. It lies in hybrid systems where AI handles the mechanical syntax, while humans fiercely protect the essential complexity of theory building. Only by enforcing rigorous intent and maintaining the friction that fosters human learning can teams harness the extraordinary power of AI without losing control of their systems.
Sources
- Writing as theory-building - Vivian Qu,
- Summary of "Programming as Theory Building" - Invent with Python,
- Peter Naur, "Programming as Theory Building" (1985) - Computer Sciences User Pages,
- Peter Naur's legacy: Mental models in the age of AI coding - Nutrient iOS,
- Naur, Ehn, Musashi - Programming as Theory Building - Gwern.net,
- Human-AI Collaboration and the Transformation of Software Engineering Work - arXiv,
- Programming as theory building is true now more than ever - Isaac Clayton,
- Human-AI Collaboration and the Transformation of Software Engineering Work,
- Human-AI Collaboration and the Transformation of Software Engineering Work - arXiv,
- Programming as Theory Building, Part II: When Institutions Crumble - cekrem.github.io,
- Agentic Coding Compresses Cognitive Effort - Dr. Leif Singer,
- AI did not invent cognitive debt. It made it invisible. | Arkadiusz Kondas,
- From Technical Debt to Cognitive and Intent Debt - ACM Queue,
- From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI - arXiv,
- Beyond Technical Debt: Architecting for Cognitive and Intent Clarity in the Age of AI-Generated Code | by Jusuf Topic | Jun, 2026 | Medium,
- Peter Naur – Programming as Theory Building (1985) [pdf] - Hacker News,
- Ian Cooper - Staccato Signals — Write.as,
- Programming as Theory Building (1985) [pdf] - Hacker News,
- Generative AI and Empirical Software Engineering: A Paradigm Shift - arXiv,
- Generative AI and Empirical Software Engineering: A Paradigm Shift,
- I built a Claude Code skill from Naur's Programming as Theory Building - Reddit,
- Programmers Were Never Paid to Program. Even Before AI,
- Read Programming as Theory Building | Hacker News,
- The Theory of a Program | Creating.Software,
- Programming (with AI agents) as theory building - sean goedecke,
- Coding Is Dead, Long Live Programming - Ian Cooper - Staccato Signals,
- Programming as theory building (1985) | Peter Naur | 276 Citations - SciSpace,
- Responsible AI pair programming with GitHub Copilot,
- "Can Simplicity Scale?": an examination of clean-and-simple vs. "real world" : r/programming,
- Thoughts on Out of the Tar Pit - A Drop In Calm,
- The Standard Model: Fundamental Forces of Software Engineering,
- Review the Design, Not the Code,
- My stance on generative models,
- Vibecoders can't build for longevity - LessWrong,
- Good Vibrations? A Qualitative Study of Co-Creation, Communication, Flow, and Trust in Vibe Coding - arXiv,
- I Vibe Coded an Entire Operating System From Scratch — Here's What I Learned,
- [2603.22106] From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI - arXiv,
- The Intent Debt - AddyOsmani.com,
- Let's read Peter Naur's “Programming as Theory Building” | by Maria Rey | Medium,
- Cognitive Debt: The code nobody understands - VirtusLab,
- AI coding creates two kinds of debt. You're only measuring one - LeadDev,
- The accounting system is being built. It still will not work. | Arkadiusz Kondas,
- KV the Apostate - ACM Queue,
- Generative AI: Cognitive Debt, Dependency, and Critical Thinking | Observatory,
- From Technical Debt to Cognitive and Intent Debt - ResearchGate,
- Your Brain on ChatGPT with Nataliya Kosmyna - StarTalk Special Edition,
- Your Brain On ChatGPT: Everything Educators Need To Know About MIT's AI Study,
- Your brain on ChatGPT - PMC - NIH,
- How cognitive debt is messing human minds because of ai apps like chatgpt and gemini? : r/cognitivescience - Reddit,
- AI assistance speeds coding but may blunt learning, Anthropic study finds - YourStory.com,
- The Developer Identity Crisis - When AI Split Programmers Into Two Tribes | Devmystify,
- Give AI Something Worth Amplifying: Three Priorities for Technical Leaders,
- Paying the Cognitive Debt: An Experiential Learning Framework for Integrating AI in Social Work Education - MDPI,
- Heidi Seibold's research works | Institute for Globally Distributed Open Research and Education and other places - ResearchGate,
- Climbing the Generative AI Mountain - ACM Queue,
- More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems - arXiv,
- Hi Reddit! We're Nimisha Asthagiri and Alessio Ferri from Thoughtworks. We're here to host an AMA about the Thoughtworks Technology Radar Vol.34 and to explore what it tells us about software engineering in 2026. Join us on May 13 and share your questions now.,
- Human-AI Collaboration and the Transformation of Software Engineering Work - arXiv,
- Programming as Theory Building [pdf] | Hacker News,
- As AI Accelerates Software Complexity, Thoughtworks Technology Radar Urges a Return to Engineering Fundamentals to Combat Cognitive Debt,
- As AI Accelerates Software Complexity, Thoughtworks Technology Radar Urges a Return to Engineering Fundamentals to Combat Cognitive Debt - PR Newswire,
- Thoughtworks Technology Radar,
- The SPACE of AI - ACM Queue,
Researched with Google Gemini Deep Research, prompted and edited by Giorgio Polvara.