Key Takeaways

  • AWS will raise EC2 Capacity Block pricing by about 20% starting July 1, marking its second increase of 2026.
  • Enterprises face rising compute costs as hyperscalers compete for scarce Nvidia GPUs and high-bandwidth memory.
  • Industry analysts expect AI infrastructure demand to keep accelerating, tightening supply further.

Amazon Web Services (AWS) is preparing another price hike for its EC2 Capacity Blocks for ML. The service, which lets customers reserve GPU instances across Nvidia hardware and AWS custom silicon up to eight weeks in advance, saw roughly a 15% increase in January. AWS is adding another increase of about 20% beginning July 1.

For example, an H200-based p5e.48xlarge instance cost around $34.61 per hour before January. After the July 1 adjustment, the same instance will run about $39 in most global regions, and $49.74 in the U.S. West. AWS stated that prices are updated periodically based on supply and demand.

Prices for compute resources are moving upward as demand for AI training capacity surges. According to Gartner, global public cloud end-user spending is forecast to reach $1.19 trillion in 2027, driven by the need for scalable GPU and memory-intensive infrastructure. IDC projects that AI infrastructure spending, including GPUs, high-bandwidth memory, and accelerated cloud services, will reach $105 billion by 2027, up from $44 billion in 2024. Enterprises are shifting from on-premises hardware toward cloud-based AI training and inference environments to meet these demands.

Even with cloud spending rising quickly, the availability of GPUs continues to lag. Major cloud rivals including AWS, Microsoft Azure, and Google Cloud are competing for scarce Nvidia H100 and H200 processors, as well as upcoming B100 GPUs. High-bandwidth memory and advanced accelerators remain constrained by tight DRAM production cycles and limited HBM supply. Long lead times and higher reservation costs make it harder for enterprises to secure required compute capacity.

Kubernetes adoption has become central to orchestrating production AI workloads. Research from the Cloud Native Computing Foundation indicates that more than 60% of organizations now run production AI or ML workloads on Kubernetes-based cloud environments. This shift intensifies demand for scalable GPU pools. The combination of cloud-native architectures, AI model training, and inference at scale is creating capacity pressure across multiple clouds, leading to increased competition for block reservations and higher costs for guaranteed access.

McKinsey estimates generative AI could add between $2.6 trillion and $4.4 trillion in annual value globally, while noting that infrastructure and compute costs are among the top constraints on deployment economics. This tension reflects directly in AWS's pricing actions as cloud providers navigate a constrained global supply chain.

As organizations adopt robust AI governance frameworks such as the NIST AI RMF, there is an increasing need to architect compliant and scalable environments. These environments depend heavily on accelerators and high-bandwidth memory. Enterprises with long-term AI roadmaps often rely on predictable reservation blocks, while other organizations are shifting toward mixed strategies that pair reserved GPU capacity with on-demand usage to mitigate rising costs.

The July 1 price change demonstrates that cloud economics are closely tied to physical supply constraints. As AI demand continues to outpace GPU supply, organizations are adjusting budgets and re-evaluating multi-cloud strategies to secure necessary compute capacity amidst rising infrastructure costs.