New Accelerator from OpenAI and Broadcom Improves LLM Inference Performance Per Watt

Key Takeaways

OpenAI and Broadcom introduced Jalapeño, a custom inference accelerator designed from scratch for modern LLM workloads.
Early testing indicates substantially better performance per watt than current state-of-the-art accelerators.
The chip marks the start of a multi-generation roadmap aimed at gigawatt-scale deployments.

OpenAI and Broadcom have taken another step into the AI infrastructure race with the debut of Jalapeño, a purpose-built inference accelerator engineered specifically around large language model workloads. It lands at a moment when enterprises are increasingly scrutinizing the performance and operating costs of running LLMs at scale, and it fits neatly into the broader trend of hyperscalers designing their own silicon. According to a 2024 report by IDC, custom accelerator programs from companies like Google and AWS have steadily expanded as demand intensifies. Analysts have framed this shift as a response to both economic pressure and the need to control performance characteristics that general-purpose GPUs were never designed to optimize.

The velocity of the Jalapeño program sets it apart. OpenAI and Broadcom went from initial design to tape-out in nine months, an unusually rapid cycle for high-performance semiconductors. The companies credit deep software and hardware co-development, plus the use of OpenAI’s own models to accelerate design work. It is an interesting example of AI being used to build the next generation of AI hardware. Whether that design loop becomes a standard industry pattern remains an open question, but several semiconductor analysts at publications like Gartner note that more vendors are experimenting with similar approaches.

Jalapeño itself was architected as a clean-sheet design focused entirely on LLM inference. OpenAI emphasized that it is not a general-purpose accelerator and not a repurposing of older AI chips. Instead, it reflects the serving patterns and kernel behavior OpenAI observes daily across ChatGPT, Codex, its API, and early agentic products. That granular knowledge shaped decisions around memory movement, networking, and utilization so that the chip could run workloads closer to its theoretical peak. Early testing shows that Jalapeño will deliver substantially better performance per watt compared with today’s leading accelerators. The companies plan to publish a detailed technical report on performance in the coming months, once final numbers are complete.

OpenAI frames Jalapeño as part of its full-stack infrastructure strategy, one where chips, kernels, networking, scheduling, and product experience all move in sync. It is a view echoed by several infrastructure-focused research groups. The IEEE has also highlighted the growing influence of system-level co-design in AI fabrics, particularly in contexts where high-speed Ethernet and advanced interconnects determine real-world performance as much as silicon instructions do. Broadcom’s involvement in networking silicon, including Tomahawk products, gives this partnership an additional angle that many industry experts find notable.

Power consumption remains a critical variable in AI infrastructure. With AI data centers projected to reach up to 4% of global electricity consumption by 2030, as noted by IEA research, the performance-per-watt improvements OpenAI cites carry weight. Even incremental reductions in data movement or idle cycles can translate into substantial operational savings at scale. When OpenAI discusses enabling gigawatt-scale clusters over multiple generations, those power efficiencies become more than a marketing point; they influence long-term siting decisions, data center design, and capital allocation strategies. This dynamic raises a broader question: How quickly will hardware efficiency gains offset the rising demand for larger and more capable models?

Better infrastructure tends to feed back into better model capabilities and more efficient serving. OpenAI describes this as a flywheel effect: improved compute efficiency enables new model generations, which then feed product innovation and user growth, funding the next round of infrastructure expansion. While this is a familiar pattern for hyperscale platforms, the difference now is that inference is where the majority of user interactions occur. Milliseconds matter, reliability is critical, and cost transparency is essential when developers and businesses depend on predictable API economics.

Jalapeño samples are already running ML workloads in OpenAI labs, including GPT-5.3-Codex-Spark at production-target frequency and power envelopes. Production-scale deployments will anchor a multi-generation roadmap spanning accelerators, networking, system integration, and partner data centers. Celestica’s role in board and rack design highlights the industrialization stage that follows chip development, a phase often less visible but just as consequential for scaling.

NVIDIA’s Hopper and Blackwell GPUs, AMD’s MI300 accelerators, and cloud chips from Google and AWS form the landscape Jalapeño will enter. Analysts at firms like McKinsey and Deloitte have observed that the accelerator market is expanding rapidly, with global spending expected to surpass $150 billion by 2027. The variety of architectures now competing for inference workloads suggests this is no longer a single-vendor market, but an increasingly heterogeneous one where system design choices can shift quickly.

Making advanced AI more accessible requires lowering the cost and raising the reliability of inference at massive scale. Every incremental gain in performance or efficiency can ripple outward as faster responses in ChatGPT, longer Codex task chains without delay, or more predictable API pricing for enterprises and small developers. Jalapeño is positioned as a building block for those outcomes. Whether it ultimately reshapes inference economics in the way OpenAI and Broadcom intend will depend on real-world benchmarks, supply chain execution, and how rapidly the broader ecosystem evolves leading up to initial deployments.

New Accelerator from OpenAI and Broadcom Improves LLM Inference Performance Per Watt

Key Takeaways

Share this article

Related Articles

Senior AI researchers leave Google for competitors

Engineering jobs resilient as AI reshapes tech hiring

AI Agents Boost Enterprise Modernization with Governance