As I prepare for my teaching duties, I am delving into a seminal article in AI research: The Bitter Lesson, authored by Richard Sutton in 2019. My aim is to identify passages that strike me as prescient and to observe any areas where Sutton’s predictions may have faltered. Towards the end, I’ll explore the economic implications of his insights.
In his article, Sutton draws on decades of AI history to highlight a “bitter” lesson: researchers often assume that advances in intelligence will emerge from specialized human knowledge. However, recent trends indicate that methods capable of scaling with computational power consistently outperform those reliant on such expertise. For example, in chess, brute-force searches executed on specialized hardware proved superior to strategies based on human knowledge. Sutton cautions that many researchers resist this lesson, as incorporating human knowledge feels intuitively rewarding, yet true breakthroughs stem from the relentless scaling of computation. In AI, this scaling translates to enlarging models and training them with more data and computational power.
The essence of The Bitter Lesson lies not in a specific algorithm but in fostering intellectual humility: AI progress has been driven by the acknowledgment that general-purpose learning, when persistently scaled, surpasses our best attempts to program intelligence manually. This debate is particularly timely as we continue navigating what has been termed The Scaling Era by Dwarkesh Patel.
Guests on EconTalk have speculated about AI’s future, with opinions ranging from it being a savior to a potential threat. These extreme predictions assume that AI capabilities will continue to advance. Although AI has indeed shown rapid improvement since Sutton’s writing in 2019, there is no inherent law dictating that this upward trajectory must persist. Some experts even report signs of a potential plateau in AI capabilities and note that hallucinations continue to be a problem in even the most advanced models.
If Sutton’s hypothesis holds that scaling leads to greater intelligence, we might expect AI performance to exceed expectations with further hardware investment. This theory is currently being evaluated, as US private AI investments could surpass $100 billion annually, marking a monumental technological investment. Let’s analyze Sutton’s thesis alongside recent advancements in AI performance.
Three key pieces of evidence validate Sutton’s claims about scaling. First, game-playing AI serves as a straightforward natural experiment. AlphaZero mastered chess and Go through self-play without relying on human strategies, outshining previous systems based on domain expertise. Its triumph arose from scaling and computational power, aligning perfectly with Sutton’s predictions.
Second, in natural language processing (NLP)—the AI branch that focuses on understanding and generating human language—the pattern is the same. Earlier NLP models prioritized language-based rules and symbolic structures. In contrast, OpenAI’s GPT-3 and its successors utilize generic architectures trained on vast datasets with considerable computational resources. Performance improvements correlate more reliably with scale than with architectural innovations.
The third example comes from computer vision. Until convolutional neural networks (CNNs)—which learn visual patterns automatically—were trained at scale, hand-engineered feature pipelines (where programmers crafted algorithms to detect edges and shapes) dominated. As more data and computational resources became available, accuracy improved significantly.
While Sutton’s argument emphasizes the scalability of methods, this scalability only becomes apparent when capital investments reduce computational hindrances.
The pace of AI advancements mirrors not only technological possibilities but also the exceptional mobilization of financial resources. Many casual users of tools like ChatGPT might not fully grasp the term “scaling.” A frequent misunderstanding regarding the rate of progress might stem from underestimating the scale of financial investment in AI.
This situation parallels the Manhattan Project. Naysayers doubted this initiative, not due to its violation of scientific principles but because it seemed prohibitively expensive. Niels Bohr purportedly stated it would necessitate “turning the whole country into a factory.” Yet we succeeded then, and we are undergoing a similar transformation as we turn our country into a factory for AI. Without these financial commitments, technological progress would undoubtedly be stunted.
Nevertheless, both pessimistic and optimistic forecasts may fall short if we reach limits in either scaling capabilities or our capacity to sustain growth. Understanding whether Sutton’s bitter lesson can guide us through 2026 and beyond is crucial for addressing unemployment today and anticipating potential existential threats tomorrow.
Recent economic research presents a more nuanced perspective. In a January 2026 paper, economist Joshua Gans proposes a model of “artificial jagged intelligence.” He notes that generative AI systems demonstrate inconsistent performance across tasks that seem closely related: they can excel at one prompt while confidently providing incorrect answers to another with only slight variations in wording. Those who have utilized ChatGPT for work-related tasks have likely encountered this inconsistently firsthand.
What makes Gans’s analysis compelling is his examination of scaling laws. His model illustrates that increased scale—measured by the density of known points within a knowledge landscape—reduces average gaps and enhances mean quality in a nearly linear fashion. This outcome aligns well with Sutton’s thesis that greater computational power leads to improved average performance. However, unpredictability and errors still persist. Scaling may elevate average performance but does not entirely eliminate surprising or significant failures.
Gans portrays AI adoption as an information challenge: while users prioritize local reliability (can the AI assist me with my task?), they often encounter only broad, global quality indicators (benchmark scores). This discrepancy introduces economic frictions. A paralegal may trust AI that efficiently reviews 95% of contracts, only to be caught off guard by a confidently incorrect response regarding a seemingly routine clause. As Gans illustrates, these experiences are intensified by the “inspection paradox,” whereby users face errors at critical junctures when they most require assistance.
While Gans’s 2026 paper doesn’t explicitly reference or contradict Sutton’s work, it can be interpreted as addressing an ongoing structural limitation that persists even when adhering to the principles of the Bitter Lesson. Scaling may be effective, but the economic rewards associated with scaling could be partially undermined by the predictability that remains elusive.
This limitation carries important implications for businesses adopting AI: they should not rely solely on benchmark performance but must also invest in human oversight and domain-specific evaluations. This reality suggests that AI won’t spell the end of human employment.
Sutton’s perspective about the general direction of AI development is correct, but it’s essential to appreciate his insights within context. Scaling alone is insufficient, and indiscriminately increasing scale is unlikely to lead us to superintelligence. Models still demand human insight and structure to deliver optimal value to organizations. Techniques like Reinforcement Learning from Human Feedback (RLHF), in which human evaluators assess AI outputs to guide learning, introduce essential human values into models. Earlier architectures didn’t evolve into GPT-4 merely by increasing data volume.
Moreover, we cannot expect to keep scaling indefinitely. Real-world constraints like energy costs and data limitations exist. Therefore, for AI to improve significantly, it must incorporate efficiency and innovative algorithms, not just sheer force. Human insights remain critically relevant, shifting focus from encoding intelligence directly to shaping, constraining, and steering scaled learning systems.
In conclusion, we must commend Sutton for his insights; scaling indeed yields results. However, the effectiveness of this scaling hinges on human ingenuity in structuring and implementing AI systems. Economists will recognize this pattern: capital and labor remain complementary, even when framed in terms of GPUs and tasks involving designing loss functions.
Gans’s contributions serve as a crucial economic addendum: while scaling enhances average AI performance, the unpredictable variance of this performance incurs tangible costs for users. Both businesses and individuals must navigate a terrain where AI is increasingly competent yet persistently unreliable in unforeseen ways. The economic benefits of AI investments rely not solely on technological capability but also on developing complementary institutions and expertise to manage these intrinsic variances.
The bitter lesson might suggest that while pure scaling is potent, it is the sweet truth that human innovation continues to play an essential role in guiding future advancements.
[1] Compute refers to the total computational power (typically measured in floating-point operations (FLOPs)) used to train or execute an AI model.