Internal AI Rankings Inflate Employee Usage, Then Halted

Key Takeaways

Amazon removed its Kirorank dashboard after employees inflated AI activity to improve their standing
The company reiterated its shift toward measuring normalized deployments instead of token consumption
Rising infrastructure costs and a shift to consumption-based AI pricing models influenced the decision

Amazon shut down an internal leaderboard known as Kirorank after employees inflated their AI activity to improve their standing. The dashboard, which graded employees on how actively they used the Kiro platform's AI capabilities, inadvertently encouraged staff to push unnecessary workloads. For a workforce adapting to changes in how AI integrates into engineering processes, the removal of the service addresses the unintended consequences of gamified metrics.

Kirorank was initially designed to build excitement around internal AI tools. Once usage scores became visible, however, the dynamic shifted. Some employees assigned autonomous AI agents to carry out trivial or redundant tasks because the metric rewarded volume over functional utility. An Amazon senior vice-president told staff the tool was created with good intentions, but it ultimately drove unnecessary spending, issuing a pointed reminder to developers not to use AI simply for the sake of using it.

Amazon is expected to spend $200bn in capital expenditure in 2026, with the vast majority directed toward AI systems and supporting data centers. As these environments scale, unmanaged compute usage creates direct financial strain. Industry analysts, including researchers at Gartner, note that accelerated AI adoption often leads enterprises to discover hidden infrastructure costs later than expected. Because Amazon must manage the demands of its cloud customers alongside internal operations, needless internal compute draws actively compete for critical infrastructure resources.

The broader AI ecosystem has also experienced a pricing transformation, with organizations shifting away from flat monthly access to consumption-based billing. This model frequently produces unexpected cost overruns for enterprise clients, a trend tracked by consulting firms like McKinsey. Because Amazon utilizes external models alongside its own for internal development, unnecessary token usage carries a direct financial cost, making gamified consumption metrics unsustainable.

Large organizations must balance encouraging AI adoption without inadvertently creating a race for inflated metrics. Previous initiatives at Amazon reportedly set AI usage targets for more than 80% of its developers to drive broad participation. However, pairing mandates with a competitive leaderboard generated conflicting incentives. This behavior mimics patterns seen at other tech firms; Meta employees have similarly sought to boost their positions on internal tables by driving up token consumption, a trend also highlighted in coverage by Reuters.

The shutdown of Kirorank occurred as employees continued experimenting with Kiro and MeshClaw, an in-house tool that allows staff to run agents locally. According to the Financial Times, staff leveraged these tools to manufacture extra AI traffic, prompting the company to intervene and halt the practice.

Metrics fundamentally shape developer behavior. Amazon's shift toward evaluating normalized deployments rather than raw token consumption indicates a preference for measuring whether engineers use models to create functional code. In mature AI engineering cultures, competencies are defined by outcome quality rather than raw interaction frequency.

Balancing rapid AI integration with parallel cost reduction efforts remains an operational priority. Amazon has trimmed expenses to redirect capital toward its broader AI infrastructure strategy, meaning internal leaderboards that drive up compute spending directly conflict with the company's financial goals.

Because of Amazon's massive scale, its internal policy adjustments influence broader market practices. Cloud customers monitoring the company's cost optimization efforts can interpret the rollback of runaway token usage as guidance for their own development frameworks. As organizations struggle to build governance around agentic systems, establishing strict policies against inflating usage metrics provides a necessary starting point for controlling infrastructure expenses.

The executive's message delivered a clear directive to focus on building better products rather than inflating token counts. As enterprise AI tooling matures, corporate enthusiasm is shifting from unchecked experimentation to operational reliability, forcing companies to align their internal incentives with actual business value.

Deprecating Kirorank inside a $2.9tn organization highlights how tech giants are rebalancing priorities to manage the high costs of generative AI. By steering away from vanity metrics, Amazon is establishing stricter guardrails for its developers and prioritizing measurable value creation over raw platform activity.