The Persistence of Theory Reevaluating Naur's "Programming as Theory Building" in the Generative AI Era

Jun 19, 2026 · 11 min read

AI
Engineering

Introduction

Generative Artificial Intelligence (AI) is fundamentally changing how we build and maintain software. In 1985, computer scientist Peter Naur wrote a famous paper called "Programming as Theory Building." He argued that programming isn't just about writing code or creating documentation. Instead, the real work of programming is building a shared mental model—a "theory"—inside the developers' minds. This mental model is what actually allows a team to understand and manage a complex system.

For decades, this idea was mostly a philosophical talking point. But today's AI coding tools have made Naur's thesis urgently relevant. AI can write thousands of lines of code in minutes. Because of this, we can no longer measure a developer's productivity just by how much code they produce; doing so hides the true health of the software.

This report looks at Naur's theory in the age of generative AI. Evidence shows that his ideas are critical for diagnosing the hidden problems in modern software teams. Fast AI code generation removes the natural friction of programming, replacing obvious technical problems with invisible "cognitive" and "intent" debt.

The Foundations of Theory Building

To understand AI's impact, we first need to understand Naur's concept of a "theory." Drawing on philosopher Gilbert Ryle's work, Naur rejected the idea that programming is just a manufacturing process where specifications are blindly translated into code.

In this context, having a theory means "knowing how" to do something, not just "knowing that" a fact is true. A programmer with a strong theory of a system can explain why it works, justify their decisions, and respond constructively to unexpected changes. They hold a dynamic mental map that links the messy real world to the strict rules of the software. This map includes the system's goals, historical trade-offs, and overarching intent.

Crucially, Naur pointed out that this theory is always lost when translated into text. Code and documentation explain what the system does and how, but they cannot fully capture why specific choices were made.

Naur illustrated this by comparing two teams building a compiler:

Group A built it from scratch and developed a deep mental theory of how it worked.
Group B was later handed the finished code and perfect documentation, but failed to extend the system safely because they lacked the underlying theory.

Naur concluded that a program "dies" the moment the original team dissolves. Subsequent developers can only guess how the system works by reading the surviving text. When this happens, it is often cheaper to rebuild the system from scratch than to try and guess the lost theory.

The Paradox of Code Abundance

AI models have completely changed how code is produced. AI can now successfully write boilerplate, complex algorithms, and tests with an over eighty percent success rate for standard tasks. It can even refactor old code in seconds.

We can understand this shift using Fred Brooks' famous distinction between two types of software complexity:

Accidental Complexity: The mechanical friction of writing code (e.g., fighting with syntax, fixing typos, and setting up files).
Essential Complexity: The actual difficulty of solving the real-world problem and designing the right architecture.

AI is the ultimate accidental-complexity killer. It removes the friction of typing. However, the essential complexity remains untouched; deciding what to build is still a human responsibility.

This creates a paradox. Historically, the frustrating friction of writing code manually forced developers to build Naur's "theory" in their heads. Figuring out syntax errors and reading library docs built a localized mental model. When AI removes this friction, the code is produced instantly, but the theory is never built.

The Danger of "Vibecoding"

This frictionless environment has led to "vibecoding". Coined by AI researcher Andrej Karpathy, vibecoding involves chatting with an AI to build software without ever actually reading or understanding the generated code.

While vibecoding works for simple, standard apps, it violates the core principles of theory building. The developer ships code without reviewing the architectural choices. The AI guesses the logic based on statistics, leaving the code completely untethered from the specific business domain. Because the developer doesn't reverse-engineer the AI's logic, the codebase grows in a theoretical vacuum. If the project gets complex—like trying to build an operating system with a flawed data disk strategy—the developer becomes paralyzed and incapable of safely extending the system.

The Triple Debt Model

Because AI bypasses human theory building, we need a new way to measure software health. Historically, we only worried about "technical debt"—the compounding cost of messy code. Since AI can easily fix messy code, researchers like Margaret-Anne Storey propose the "Triple Debt Model". This model tracks three types of debt:

Debt Type	Where it Lives	Definition	Symptoms	AI Impact
Technical Debt	The Code	Poorly written code, bad modularity, and rigid dependencies.	Slow builds, fragile tests, and localized bugs.	GenAI reduces this by automating refactoring and writing tests.
Cognitive Debt	The Human Mind	The erosion of shared team understanding. A gap between what the code does and what humans comprehend.	Unpredictable bugs from "safe" changes, slow onboarding, and "cognitive surrender".	GenAI severely accelerates this by writing code faster than humans can learn it.
Intent Debt	The Specifications	The loss of the why. Missing goals, constraints, and recorded decision history.	Useless features, ignored metrics, and AI hallucinating bad architecture.	GenAI makes this worse because AI confidently invents intent if humans don't explicitly write it down.

Cognitive Debt and "Cognitive Surrender"

Cognitive debt is the silent loss of Naur's theory. Every time a developer accepts AI code without fully understanding it, they take out an invisible loan against their future ability to maintain that code.

In the past, taking a shortcut left an obvious mark, like a messy copy-paste from Stack Overflow or an apologetic code comment. Today, AI code looks flawless and perfectly matches the company's style guide. It easily passes automated tests, yet it lacks a human custodian.

This leads to "cognitive surrender"—where developers blindly trust AI outputs without thinking critically. The team only realizes they have cognitive debt when the system breaks in production and they discover nobody possesses the transactive memory needed to fix it.

Intent Debt

If cognitive debt is forgetting how the system works, intent debt is forgetting why it exists. When human programmers write code manually, they hold the business constraints in their heads. In an AI workflow, the AI doesn't know the business constraints unless they are explicitly written down in files like AGENTS.md or Architectural Decision Records.

Without written intent, AI guesses what the constraints should be, leading to plausible but catastrophic assumptions. For example, an AI might "optimize" code by quietly deleting a rare financial safeguard. Because AI is great at answering questions humans forgot to document, it constantly drifts away from the software's true purpose.

Empirical Evidence: Skill Collapse and Brain Impact

The dangers of cognitive debt are now backed by hard science. Studies show that over-relying on AI actively degrades human learning and brain connectivity.

The Neurological Cost ("Your Brain on ChatGPT")

A study by MIT Media Lab researcher Nataliya Kosmyna tracked 54 participants writing essays using ChatGPT, Google, or just their brains. Using EEG brain scans, they found that the AI-assisted group experienced a fifty-five percent drop in functional neural connectivity.

While AI made the task faster, it bypassed the "desirable difficulties"—the mental struggle required to build dense neural pathways. Furthermore, eighty-three percent of the AI users couldn't remember or quote what they had just "written". Even weeks later, when asked to work without AI, their brains still showed lingering weaknesses. Relying on AI causes cognitive atrophy and homogenizes human thought.

Anthropic's Software Engineering Study

This neurological reality maps perfectly to programming. In January 2026, Anthropic tested 52 junior developers using an unfamiliar Python library (Trio) with and without AI.

While the AI group finished the task marginally faster, they failed the subsequent comprehension and debugging test, scoring roughly fifty percent (17 points lower than the manual coders). They struggled the most with debugging questions.

The study found that the tool itself wasn't the problem; the way developers interacted with it was:

Interaction Pattern	How They Used AI	Comprehension Score	Impact on Skill
Blind Delegation	Asked AI to write everything, didn't read the code.	< 40%	Destructive. Creates severe cognitive debt and inability to debug.
Progressive Reliance	Started manually but gave up and let AI do the hard parts.	40% - 60%	Stagnant. Prevents mastery.
Comprehension-First	Used AI to generate code, but asked it to explain the logic and typed parts manually.	> 60%	Constructive. Builds theory while leveraging AI speed.

This proves Naur's thesis: generating text is not the same as generating understanding. Bypassing the struggle of coding prevents developers from building the neural pathways needed to govern systems safely. This threatens to hollow out the pipeline of future senior engineers.

The Impact on Team Dynamics

Software engineering is a deeply socio-technical discipline. Naur noted that a program's theory is a shared mental model distributed across the team. Engineering teams are essentially institutions built to transmit this theory from seniors to juniors.

Generative AI causes a profound degradation of these institutions. Traditionally, juniors learn by struggling, failing, and undergoing rigorous code reviews with senior engineers—this transfers tacit knowledge. When a junior instantly generates a solution with AI, this vital pipeline is short-circuited. The AI acts as a sycophantic partner; it never pushes back on bad ideas or forces the developer to explain their logic. The result is a flood of contextless code entering the system.

The industry impact is highly measurable:

Code Churn: Code that is written and immediately rewritten doubled from 3.1% to 5.7%.
Copy-Pasting: For the first time, the volume of poorly copied-and-pasted code exceeded properly refactored code.
Stunted Growth: Internal tracking at firms like Infobip showed that junior technical skill growth dropped from +2.50 points pre-AI to just +1.56 points in the AI era.

We are witnessing a redefinition of socio-technical systems. We are moving from a "tool-and-operator" model to a "manager-and-team" dynamic. A team's speed is no longer gated by typing, but by the human "verification cost". Validating AI output is exhausting; pull requests with AI code now take 5.3 times longer to review.

Can AI Agents Build the Theory Instead?

If humans are losing the theory, can advanced AI agents hold it for us?

Current agents (like Claude Code) do engage in a temporary form of theory building. When debugging, they use scientific methodology: forming hypotheses, running tests, and adjusting their mental models. They can sometimes rival human capability in isolating bugs.

However, agents suffer from a fatal flaw: they are amnesiacs. Because they operate within ephemeral context windows, they cannot retain their theories between sessions. They must rebuild their understanding from scratch every time they boot up. Even with automated summaries, these are lossy approximations. Written text simply cannot capture tacit knowledge.

Furthermore, when agents write code, they rely on the most plausible generalized abstraction from their training data, rather than the specific abstraction your business needs. If different agents work on different tickets, they will build subtly conflicting localized theories, leading to massive architectural drift and turning the system into an unmaintainable "big ball of mud". Human oversight remains entirely indispensable.

Architecting for Understanding

The generative AI revolution has permanently shifted the value of a human developer. AI has severed the link between conceptual design and syntactic coding. Because generating text is now abundant, cheap, and instantaneous, the true scarce asset is the intent—the human judgment required to govern the machine.

To survive, engineering organizations must actively re-introduce friction back into their workflows. Thoughtworks' Technology Radar Volume 34 explicitly warns against "Codebase Cognitive Debt," noting that as permission-hungry agents move faster, traditional rigor is more important than ever.

Modern teams must adopt these advanced competencies:

Intent Ledgers: Prompts must become rigorous specifications. Teams must document exact constraints and rules in files like AGENTS.md before letting AI write code.
Verification: Engineers must review AI code mathematically. Since AI rarely makes syntax errors, bugs are now logical or domain violations. This requires strict property-based testing and mutation testing.
Architectural Fitness: Teams must build rigid guardrails (like strict API schemas and clean architecture) to restrict the AI's creativity and prevent architectural drift.
Deliberate Theory Building: Organizations must fight invisible cognitive debt by forcing human engagement. This means mandating "AI-free" checkpoints for juniors, enforcing pair programming, and requiring developers to verbally explain AI-generated code to their peers.

As seen in complex projects—like writing an operating system kernel using an AI agent—the human must maintain absolute authority over the architecture. The AI handles the typing, but the unifying theory remains firmly anchored in the human mind.

Conclusion

Generative AI is the biggest disruption to software engineering in decades. Yet, Peter Naur's 1985 thesis remains an immutable, enduring law. A program is not its code; it is the living, shared understanding of how that code maps to the chaotic complexities of the real world.

By eliminating the friction of writing code, AI has severely exacerbated the human problem of understanding. Skipping the cognitive struggle of implementation directly causes a silent, catastrophic accumulation of cognitive and intent debt. Empirical evidence proves that blindly delegating work to AI degrades neural connectivity, comprehension, and the skill growth of junior engineers.

To thrive, the industry must cease conflating the automated production of text with the intellectual act of building software. The future isn't in autonomous delegation. It lies in hybrid systems where AI handles the mechanical syntax, while humans fiercely protect the essential complexity of theory building. Only by enforcing rigorous intent and maintaining the friction that fosters human learning can teams harness the extraordinary power of AI without losing control of their systems.

Sources

Researched with Google Gemini Deep Research, prompted and edited by Giorgio Polvara.