The Paradox of Technological Deflation Historical Cost Trajectories and the Economic Scaling of Generative Artificial Intelligence

Mar 24, 2026 · 14 min read

AI
Economics

The historical progression of technological advancement is frequently characterized by a singular, seductive narrative: that through the relentless application of human ingenuity and industrial scale, capability inevitably becomes cheaper over time. This observation, grounded in the exponential growth of the semiconductor industry and the rapid democratization of digital communication, has shaped the strategic expectations of modern economies. However, an exhaustive analysis of cross-sector cost curves reveals that this deflationary path is far from universal. While digital logic has followed the predictable descent of Moore’s Law, other critical technologies—ranging from nuclear energy and pharmaceutical discovery to heavy transportation infrastructure—have exhibited "negative learning," where costs escalate despite technical maturation. As generative artificial intelligence (AI) transitions from an experimental novelty to the foundational architecture of global productivity, it stands at a precarious intersection of these two conflicting economic realities. By 2026, the industry faces an "inference iceberg," where the precipitous decline in per-token pricing is counteracted by a structural surge in physical resource demands, regulatory overhead, and the diminishing marginal returns of data acquisition.

The Theoretical Foundations of Technological Cost Reduction

To evaluate whether generative AI will follow the historical pattern of technology becoming cheaper, one must first deconstruct the mechanisms that drive deflation in other sectors. The primary drivers of cost reduction are typically categorized into two distinct but related empirical frameworks: Moore’s Law and Wright’s Law.

Moore’s Law and the Experience Curve in Digital Logic

Moore’s Law is the observation that the number of transistors on an integrated circuit (IC) doubles approximately every two years. Initially articulated by Gordon Moore in 1965, the projection was based on a log-linear relationship between device complexity and time. This observation is fundamentally an "experience curve" effect, quantifying efficiency gains from learned experience in production. Between 1960 and 1975, Moore calculated that components per chip increased by a factor of 65,000, driven by a combination of shrinking transistor dimensions (Dennard scaling), increased chip area, and "cleverness" in architectural design.

The economic implication of Moore’s Law was profound: as density increased, the cost per function declined as the inverse of the number of devices per chip, until limited by manufacturing yields. By the mid-1970s, David House deduced that computer chip performance would double every 18 months, not only increasing capability but also improving energy efficiency. However, this trend began to deviate from its historical cadence around 2010 due to escalating technical challenges and the end of Dennard scaling, which had previously ensured that power consumption per unit area remained constant as transistors shrank.

Wright’s Law and the Learning Rate

While Moore’s Law focuses on the passage of time, Wright’s Law (or the "learning curve" effect) posits that cost reduction is a function of cumulative production volume. Discovered by aeronautical engineer Theodore Paul Wright in 1936, the law observes that for every doubling of the total quantity of products produced, the unit cost falls by a fixed proportion. Wright initially observed this in aircraft production, where the labor required for each unit was reduced by approximately 20 percent with each doubling of experience.

Wright’s Law is considered more universally applicable than Moore’s Law because it accounts for the frequency of activity rather than mere chronological time. Technologies that follow Wright’s Law, such as solar panels, batteries, and semiconductors, exhibit a constant "learning rate" over decades.

Technology Category	Metric for Doubling	Empirical Learning Rate (%)	Historical Persistence
Solar Photovoltaics (PV)	Cumulative Installed Capacity	20.2% - 24%	40+ Years
Lithium-ion Batteries	Cumulative Battery Production	7.5% - 19%	Driven by EVs
Utility-Scale Wind	Cumulative Installed Capacity	15%	Variable by epoch
Semiconductors (Transistors)	Cumulative Transistor Count	~40%	Moore's Law proxy
Internet Transit	Cumulative Traffic/Port Density	25% - 50%	Annual decline

The durability of these learning rates suggests that modular, mass-produced technologies are the most likely to become cheaper in the long term. However, the exact rate differs based on geographic location, time-span, and the chosen proxy for experience.

Counter-Examples to the Deflationary Rule: Rock’s Law and Eroom’s Law

The assumption that all technology becomes cheaper is challenged by the "darker side" of Moore’s Law, known as Rock’s Law. This principle holds that as chips get denser, the cost of the manufacturing equipment and facilities required to produce them rises exponentially. While the price to the consumer falls, the capital expenditure required from the producer follows an opposite trend. Leading-edge fabrication facilities (fabs) now cost between $10 billion and $20 billion, with the high-NA EUV scanners at the heart of modern lithography costing north of $400 million each.

Eroom’s Law and the Crisis of Pharmaceutical Productivity

The most prominent example of technology becoming more expensive over time is found in the pharmaceutical industry. Eroom’s Law (Moore’s Law spelled backward) describes the observation that the cost of developing a new drug doubles approximately every nine years—a trend that has persisted since the 1950s. Despite exponential improvements in high-throughput screening, biotechnology, and computational drug design, fewer drugs make it to market per billion dollars spent. By 2024, the average cost to bring a new asset to market had risen to $2.23 billion.

The drivers of Eroom’s Law provide a sobering parallel for the future of AI:

The "Better than the Beatles" Problem: New innovations must compete against existing, highly effective products (such as off-patent generic drugs like Lipitor) that already have excellent safety records and low prices.
The "Cautious Regulator" Problem: Increasing risk intolerance by regulatory bodies (e.g., following safety crises like Thalidomide or Vioxx) has progressively raised the bar for approval, mandating larger and more expensive clinical trials.
The Exhaustion of Low-Hanging Fruit: Many of the most accessible drug targets have been exploited, forcing researchers to tackle increasingly complex and higher-risk biological pathways.

Baumol’s Cost Disease and the Service Sector Stagnation

A further constraint on cost reduction is Baumol’s Cost Disease, which explains why costs rise in labor-intensive sectors such as healthcare, education, and the performing arts. In these sectors, human labor is the end product; a string quartet still requires four musicians and nine minutes to perform Beethoven, just as it did in the 19th century. While manufacturing productivity explodes, these stagnant sectors must still increase wages to compete for labor, leading to costs that outpace inflation.

The Historical Cost Trajectory of Digital Infrastructure

To understand the context in which generative AI is scaling, it is necessary to examine the long-term price curves of its underlying "bones": cloud storage and internet bandwidth.

Cloud Storage: The 15-Year Descent of Amazon S3

When Amazon S3 (Simple Storage Service) launched in 2006, it offered a revolutionary price of 15 cents per gigabyte per month. Over the subsequent decade, intensive competition and the economies of scale described by Moore’s Law drove prices down by approximately 85 percent.

Date	AWS S3 Tier	Storage Price ($/GB per month)	Cumulative Change
March 2006	Standard (Launch)	$0.150	-
Nov 2008	First 50TB	$0.150 (Tiers introduced)	-
Nov 2010	First 1TB	$0.140	-6.7%
Feb 2012	First 1TB	$0.125	-16.7%
April 2014	Standard	$0.030	-80%
Dec 2016	Standard	$0.023	-84.7%
Jan 2021	Standard	$0.023	Stagnation

Despite this dramatic decline, recent evidence suggests a "silicon plateau" in cloud storage. The price for S3 Standard has remained largely unchanged for nearly eight years. While the cost of underlying hard disk drives (HDDs) has continued to fall by approximately 13 percent annually, AWS has arguably lacked the competitive incentive to pass these savings to consumers, instead focusing on "Intelligent Tiering" to optimize margins.

Internet Transit and Bandwidth

The cost of internet transit has followed a similar "gravitational pull" downward. In 1998, internet transit cost approximately $1,200 per Mbps; by 2015, this had fallen to $0.63 per Mbps. This deflation was driven by a massive increase in global traffic—from 15 Gigabytes per month in 1984 to 15 Gigabytes per month per user by 2014.

Year	Transit Price (per Mbps)	Annual % Decline
1998	$1,200.00	-
2002	$200.00	50% (Max single year)
2006	$50.00	33%
2010	$5.00	44%
2014	$0.94	40%

This consistent decline led to the commoditization of the CDN and transit markets, where margins tended toward zero, forcing providers to seek value in "add-on" services like reliability guarantees and consumption data analytics.

The Economic Architecture of Generative AI: 2023–2026

Generative AI is unique because it combines the extreme capital intensity of the semiconductor industry (training) with the variable operational costs of a utility (inference). As the landscape transitions from experimental prototyping in 2024 to sustained, industrial-scale deployment in 2026, the industry's economic model is undergoing a radical realignment.

The Training Factory vs. The Inference Engine

The "Training Factory" phase is defined by massive, one-time capital expenditures (CapEx) required to teach a Large Language Model (LLM) how to think. In 2024, training costs for frontier models like GPT-4 ranged from $78 million to $100 million, while Google’s Gemini Ultra 1.0 cost approximately $191 million. Doubling a model's size more than doubles its training cost due to the necessity of multi-GPU parallelism, longer convergence times, and the exponential increase in required data.

However, the industry’s focus is shifting to the "Inference Engine." By 2026, inference workloads—the ongoing operating cost of running AI in the real world—are projected to account for two-thirds of all compute. This shift is critical because training creates capability, but inference determines profitability.

Token Economics: The New KPI

In 2026, the primary metric for AI success has evolved from raw FLOPS (Floating Point Operations Per Second) to "Tokens Per Second per Dollar" (TPS/$). Cost per token represents the total cost required to generate a unit of AI output, capturing compute consumption, energy usage, cooling overhead, and infrastructure amortization.

Model Class	Representative Model	Input Price ($/1M tokens)	Output Price ($/1M tokens)
Budget Tier	Gemini Flash-Lite 3.1	$0.075	$0.30
Budget Tier	Llama 3.2 3B	$0.06	$0.06
Mid-Tier	DeepSeek R1	$0.55	$2.19
Mid-Tier	Claude 3.5 Sonnet	$3.00	$15.00
Frontier	Claude 4.5 Opus	$5.00	$25.00
Frontier (2023)	GPT-4 (Initial)	$30.00	$60.00

Data compiled from.

While headlines celebrate a 10x annual decline in token prices—faster than the deflation of PC compute or dotcom bandwidth—the total bills for enterprises are climbing. This is the "Token Consumption Paradox": as per-token prices drop, the number of tokens consumed by modern "reasoning" models is exploding. Models like the OpenAI o1 series may consume 100x more internal "thinking" tokens than they output, creating a scenario where cheaper unit prices lead to higher total invoices.

The Physical Wall: Energy and Infrastructure Constraints

The most significant threat to the continued cheapening of AI technology is the "Shift from Silicon to Watts". By 2026, the constraints on the AI boom have shifted from the availability of chips to the availability of electricity and grid capacity.

The Energy Shortfall and Grid Dysfunction

Modern AI data centers operate far more like industrial-scale power consumers than traditional office-server facilities. A single AI-focused data center can demand 50 to 100 megawatts of electricity—comparable to the load of a manufacturing plant or a small city.

Metric	2024 Value	2026 Projection	2030 Projection
Global DC Power Use	~1.5% of total	~2% (>500 TWh)	-
US DC Power Demand	25 GW	45 GW	74 - 120 GW
Projected US Shortfall	-	Emerging	49 GW

Data derived from.

The "temporal mismatch" between data center construction (under two years) and transmission infrastructure permitting/construction (15 to 30 years) has created grid dysfunction. In the PJM region of the U.S., data center demand has increased energy market costs by $9.3 billion, translating into an additional $18 per month on some household electricity bills. To avoid this "power cliff," hyperscalers like Microsoft, Google, and Meta are scrambling to secure long-term power purchase agreements (PPAs) and are increasingly pursuing a "self-generation model" (BYOG - Bring Your Own Generator) involving natural gas, solar, and nuclear power.

The Resource Entropy of Data

A second physical limit is the "entropy of internet text." Research suggests that internet text contains approximately 1.82 bits of information per token. As models improve, the gap between their current performance and this "irreducible loss" (E) shrinks, leading to diminishing returns. When model loss falls close to the entropy of the dataset, there is less signal available for the model to learn from, making further scaling exponentially more expensive in terms of both compute and data acquisition.

Software and Architectural Mitigation Strategies

In response to these physical and economic constraints, the industry is pivoting toward "advanced packaging" and software optimization to sustain performance gains.

Advanced Packaging and Chiplets

As Moore’s Law reaches its limit at 3nm, the industry has shifted from pure-play transistor scaling to a system-level approach. "Advanced packaging" technologies, such as TSMC's CoWoS (Chip-on-Wafer-on-Substrate), allow multiple specialized "chiplets" to be stacked together. This bypasses the need for traditional external cables, reducing latency and time overhead while delivering performance scaling from 1x to over 40x.

The P-KD-Q Optimization Sequence

Enterprises are increasingly adopting the "P-KD-Q" (Pruning → Knowledge Distillation → Quantization) sequence to reduce the Total Cost of Ownership (TCO) of AI deployments.

Pruning: Removes redundant parameters to achieve 50-60 percent sparsity with minimal accuracy loss.
Knowledge Distillation: A large "teacher" model trains a smaller "student" model to mimic its logic. Well-distilled models (7B–20B parameters) can solve up to 80-90 percent of reasoning queries previously sent to 70B+ models.
Quantization: Reduces the precision of weights (e.g., from FP16 to INT4). This can cut inference costs by 75 percent while maintaining 95 percent of model quality.

When combined with "speculative decoding"—using a smaller model to predict the likely output of a larger one—these techniques can reduce latency by 2x to 3x and energy usage by up to 73 percent.

Regulatory and Environmental Friction in 2026

The year 2026 marks the arrival of the "regulatory invoice" for AI. The European Union AI Act, which entered into force in 2024, becomes fully applicable by August 2026.

The EU AI Act Risk Hierarchy

The Act imposes a risk-based regulatory framework that significantly impacts operational costs:

Unacceptable Risk: Banned applications, including social scoring and harmful manipulation.
High Risk: Systems used in critical infrastructure, education, and employment. These must be registered in an EU database and undergo strict conformity assessments.
General-Purpose AI (GPAI): Foundational models like GPT-4 face specific transparency obligations and must report energy consumption and technical data.

Compliance Requirement	Sector	Cost Implication
AI Literacy Training	All EU Firms	Aug 2025 Start
High-Risk Registration	Infrastructure/Med	Operational Overhead
Energy Transparency	GPAI Providers	Mandatory Audits
Non-Compliance Penalty	All	Up to 7% of Turnover

The "Subsidy Cliff" and Energy Accountability

In the United States, legislation like the PRICE Act (New Jersey/Texas) proposes requiring data centers to generate their own renewable energy and transition to 100 percent carbon-free sources by 2040. State regulators are shifting from a model of "unconditional incentives" to "accountability," where data center developers must pay for the grid infrastructure upgrades their facilities necessitate. This transition creates a "subsidy cliff," fundamentally altering the internal rate of return (IRR) for new AI infrastructure projects.

Synthesis: Will Generative AI Follow the Deflationary Pattern?

The convergence of historical patterns and current trends suggests that generative AI is experiencing a bifurcation in its cost structure.

The Path of Digital Logic (Cheaper in the Long Term)

The "raw" unit of intelligence—the individual token generated—is following a classic deflationary path driven by Wright’s Law and intensive market competition. The 10x annual drop in token pricing suggests that for basic, high-volume tasks (summarization, simple coding, translation), AI will become as cheap and ubiquitous as internet bandwidth or cloud storage. The "democratization of intelligence" is real, as open-source models through providers like Together.ai achieve performance comparable to 2023's frontier models at 1/1000th of the cost.

The Path of Heavy Infrastructure (More Expensive in the Long Term)

Conversely, the "frontier" of AI capability is following the path of the nuclear industry and heavy infrastructure. The "Physical Wall" of energy, the "Rock’s Law" of semiconductor manufacturing, and the "Eroom-like" diminishing returns of data entropy mean that leading-edge capability is becoming structurally more expensive to produce.

The "Inference Iceberg": The total spend for enterprises is rising despite lower unit costs because the complexity of "agentic" and "reasoning" workloads requires exponentially more tokens.
The "Siphon Effect": Just as high-speed rail draws resources to major cities at the expense of rural counties, AI investment is concentrating in geographic "hotspots" where power is available, creating a new class of digital inequality.
The "Regulatory Paradox": Transparency and safety requirements, while necessary for "human-centric AI," introduce the same bureaucratic friction that has slowed the pharmaceutical and construction industries.

Conclusion: The New Economic Equilibrium

Technology does not always become cheaper; it only becomes cheaper when it is modular, mass-produced, and operating within a regime of high productivity gains. Generative AI is currently the most dynamic technology in history because it acts as a bridge between the deflationary digital world and the stagnant physical world. By 2026, the industry will have moved past the initial hype to a "fundamentals-based" economy. The "AI bill" will come due for customer experience leaders, who must navigate usage-based volatility and premium "gated" intelligence tiers. While the cost per token will likely continue its descent toward the marginal cost of energy, the "total cost of intelligence" for a meaningfully transformed enterprise will remain a substantial, and potentially escalating, capital commitment. Success in this new era will depend less on advances in pure computing and more on the ability to modernize the legal, institutional, and energy frameworks that underpin the global power system.

Sources

Researched with Google Gemini Deep Research, prompted and edited by Giorgio Polvara.