Understanding the AI Compute Crunch and Usage Limits on AI Tools

In late March, many enthusiastic users of Anthropic’s Claude large language models noticed an unusual and concerning trend: they were hitting their five-hour usage limits within just 20 minutes. Frustration grew as complaints surfaced on platforms like Reddit, GitHub, and X. In response, Anthropic informed its subscribers that their usage would deplete limits more rapidly during peak hours. Additionally, the company restricted access to some third-party tools, including OpenClaw, which were previously drawing from subscription limits. Just weeks earlier, Boris Cherny, head of Claude Code, indicated that the default setting for the model’s cognitive operations had been adjusted to a lower capacity.

Users were right to raise questions. Why was a paid AI service suddenly offering less value? Did the rapid growth of AI technology exceed the capabilities of the infrastructure supporting it?

The challenges faced by Anthropic are not unique. OpenAI has also begun to wind down Sora, its video-generation platform, as demand for its coding assistant, Codex, has surged to an impressive four million users per week. Investors and developers are now voicing concerns about a “compute crunch”—the notion that the demand for AI is accelerating faster than companies can expand their data center capacities and ensure sufficient power supply.

On Supporting Science Journalism

If you’re enjoying this article, consider helping sustain our award-winning journalism by subscribing. Your subscription plays a vital role in ensuring that impactful stories about the discoveries and innovations that shape our world continue to be told.

The ramifications extend beyond mere frustration for developers. As AI becomes integral to various fields—from coding and scientific research to education, healthcare, customer service, defense planning, and clerical work—access to computational resources translates to economic advantages. Currently, however, limitations are beginning to manifest in the tools people rely on.

The statistics underline the enormity of the situation. In a July 2025 white paper, Anthropic projected that to maintain global AI leadership, the U.S. AI sector would require at least 50 gigawatts of electrical capacity by 2028—equivalent to the output of 50 large nuclear reactors. Meanwhile, the International Energy Agency forecasts that global electricity consumption by data centers is on track to double by 2030.

Computational resources aren’t new; each interaction with Claude or GPT relies on a vast network of devices performing tasks as varied as calculating spreadsheet figures and rendering video games. These functions depend on silicon wafers etched with billions of microscopic switches, organized into specialized processors. Training a cutting-edge model demands thousands of these processors operating for weeks or even months. Even after training, each user query continues to draw on computational power, amplifying demand across the entire supply chain. On January 15, Taiwan Semiconductor Manufacturing Company (TSMC), the leading manufacturer of advanced AI chips, announced plans to invest up to $56 billion this year to expand its capacity, driven by continued customer demand.

AI policy expert Lennart Heim offers valuable insights into this complex machinery. Previously leading compute research at the RAND Center on AI, Security, and Technology, he co-founded Epoch AI, an organization monitoring resources for frontier AI models. His expertise lies at the intersection of digital demand and physical infrastructure.

[An edited transcript of a telephone interview is included below.]

Developers are pointing out that the rate limits and restricted access to third-party tools suggest a compute crunch. What does this shortage actually entail?

When we refer to “compute,” we’re talking about computing power. For AI, the resources required for training a model increase with its size—larger neural networks demand more data, and more data necessitates greater processing capability. Crucially, this same relationship applies to the deployment phase. Utilizing the model for user interactions—inference—is immensely resource-intensive because larger models require more power to operate effectively. Thus, if user numbers multiply and engagement intensity rises, the need for computational resources escalates dramatically. For example, if 10 times more users engage with AI 10 times more robustly, you may need close to 100 times the computing power.

Why do flat-rate subscriptions fail for AI services in ways they didn’t for earlier internet offerings?

The internet typically operates on flat-rate subscriptions—you pay a monthly fee and gain unlimited access. This model works when variable costs per user are minimal; a power user in Google Workspace doesn’t significantly increase costs for Google. However, with AI, it’s different. Using AI more intensively incurs greater costs for providers. Paying per token accurately reflects resource usage, while a flat fee often leads to exceeding the value of what $20 can procure. Consequently, we see that rate limits become common in monthly plans, necessitating some form of restriction.

Aside from rate limits, what strategies can companies employ to manage user consumption of compute resources?

Companies have various strategies at their disposal. For instance, when using ChatGPT, users are set to a mode called Auto by default, allowing the system to select the most appropriate model for their query—whether a sophisticated model requiring extended computation or a straightforward response for a simple question. Anthropic has opted to default users to Claude Sonnet, a smaller model that, while less powerful, is more cost-effective both for the company and the user.

Moreover, many users are not leveraging these tools as efficiently as possible. It’s akin to asking Albert Einstein for advice on opening a bottle of wine.

OpenAI’s Codex appears to provide better value compared to Claude Code. Is this sustainable, or will we see a shift toward restrictive plans in the future?

OpenAI currently holds a competitive edge due to its higher valuation, providing it with more compute resources. Constructing data centers is a complex and costly endeavor, particularly in producing chips—an exceptionally challenging task. Even if OpenAI were to halt its development efforts today, it possesses substantial computing capabilities, granting it significant influence.

In contrast, Anthropic faces the daunting costs of maintaining data centers, especially when reliant on NVIDIA components. If companies like Anthropic overbuild, they risk significant expenditure on underutilized capacity. The objective is to align production with actual need, but accurately forecasting demand is notoriously tricky.

In the foreseeable future, compute availability is likely to remain constrained, leading to market responses—such as price increases. However, at present, many companies prefer implementing rate limits to ensure a good user experience rather than immediately raising prices.

Can you outline the supply chain? What are the main bottlenecks preventing AI companies from simply expanding their computational capacity?

Historically, software companies could scale dramatically due to a lack of physical limitations—a hallmark of Silicon Valley’s ethos. However, if we were to suddenly increase the number of AI users by a factor of 100, there simply wouldn’t be enough compute resources available to meet that demand.

This challenge arises from the supply chain. For instance, TSMC faces financial risks if they construct a factory without sufficient customer demand, leading to potential bankruptcy due to underutilization. When high-profile figures like Sam Altman request a 100-fold increase in chip supply, the reality is more complex than it seems, contributing to the compute shortage.

This problem extends down the supply chain as well; once you acquire the chips, securing power becomes necessary. If you approach gas turbine manufacturers with requests for a dramatic increase, they might express disbelief—this industry has remained stagnant for years. Here, the digital realm collides with the physical world. Currently, demand for memory exceeds supply, as more of it will be allocated to AI chips, driving memory prices up and increasing costs for consumer electronics, such as smartphones. Companies eager to expand memory production face limitations in clean-room space, having to rely on specialized factories called “fabs”—of which only a select few companies worldwide can build, and they are all fully booked.

Are the demands of training models and responding to user queries competing for the same computational resources?

Companies are simultaneously striving to create larger, more capable models to attract more funding while also needing to generate income in the here and now. Inference demand increases when users are actively engaged, whereas training happens continuously.

A more nuanced perspective might focus on how resources are allocated between R&D computing and serving computing. Many reports suggest that a significant portion—about 60 percent—of compute resources is dedicated to research and development, underscoring the constant tension between improving product offerings and catering to user demands.

In conclusion, as AI technology continues to evolve, balancing the availability of computational resources with user demand becomes increasingly crucial. Understanding the underlying limitations will help shape the future of AI applications and ensure that valuable innovations can thrive.

On Supporting Science Journalism

Leave a Reply 取消回复

You May Also Like

I’ve Got a Hunch

Using GPT-5.6: A Guide from Ben’s Bites

Grok and Cursor Collaboration