Key Takeaways
- Researchers detected three campaigns abusing exposed Ollama and LiteLLM endpoints between March and May.
- Misconfigurations and weak authentication practices enabled attackers to weaponize AI agents without full system compromise.
- The incidents highlight rapid growth in attacker sophistication around AI systems and underscore the need for tighter monitoring of model backends.
Zenity's latest findings reveal how exposed inference endpoints from self-hosted AI platforms can quickly become a resource for threat activity. The research observed three separate campaigns between March and May, each attempting to use honeypot infrastructure as the computational layer for offensive AI operations. It is a reminder that AI agents, even those intended for internal use, can become targets when configuration mistakes bring them into the open.
The attackers did not rely on traditional software exploits. Instead, they exploited open interfaces, such as the Ollama endpoints at port 11434 and LiteLLM's responses interface at port 4000. These interfaces are intended for developers and applications, but when left exposed on the internet, they can serve as entry points for anyone who knows where to look.
The researchers found that an attacker did not need authentication or elevated privileges. Simply knowing the endpoint's address enabled them to send full agent payloads that defined system prompts, personas, and toolsets. This tactic effectively outsourced the attacker's computational needs to someone else's infrastructure, sometimes in surprisingly sophisticated ways.
One operation involved Strix, an autonomous penetration testing framework. A single IP address pushed a 140,000-character system prompt through a LiteLLM client, instructing the agent to run without pause and to conceal any markers that would reveal the Strix identity. The honeypot sensors blocked the attempt, but the repeated retries suggested a live operator actively experimenting with the compromise.
Another operator deployed HexStrike AI. This campaign pointed a desktop LLM client at an Ollama instance and loaded a 150-plus offensive toolset. No target was specified, suggesting it may have been preparation work. Still, the method showed how attackers can stage capabilities with minimal interaction.
A third operator used an OpenAI Codex agent through a LiteLLM proxy. This agent was cloaked in a security auditor persona and used the instance to conduct web reverse engineering tasks. It had the hallmarks of a classical reconnaissance effort applied through AI-powered automation.
Authentication settings on Ollama and LiteLLM made these attacks more feasible. According to the report, Ollama ships with no built-in authentication on port 11434, while LiteLLM uses optional authentication that many users leave unset. One widely known placeholder key, sk-1234, has become a predictable target. The situation worsens when misconfigurations expose normally local interfaces to the public internet.
Part of the reason these tactics are emerging lies in how quickly AI deployment has expanded across enterprise environments. Many security teams are adding new model endpoints faster than they can audit them, mirroring patterns seen in earlier cloud service adoption. A recent Gartner analysis highlighted that more than 50% of creative and technical professionals rely on AI-enhanced search and automation in daily workflows. Any fast-moving tech addition tends to introduce new surface area.
There is also a communication gap in how organizations understand responsibility. The company's CTO and co-founder noted that customer-owned cloud infrastructure, commercial platforms, and custom agents create a blended attack surface. Vendors provide the building blocks, but customers often deploy them without hardened defaults. This dynamic has existed for years in areas like identity management and storage buckets. AI systems are simply the newest layer.
Multiple industry observers have raised flags about the operational risks of AI infrastructure. For instance, recent guidance from the NIST AI Risk Management Framework stressed that model interfaces should be evaluated with the same rigor applied to API gateways and traditional software components. The guidance does not address Ollama or LiteLLM by name, but its emphasis on interface exposure resonates closely with what researchers encountered.
Suggested protection strategies revolve around traffic inspection and endpoint control. Recommendations include filtering requests that carry oversized system prompts or embedded toolsets, since most legitimate usage does not involve a full agent persona being transmitted in one request. Blocking requests tied to unauthorized models or tool frameworks can help reduce risk as well. Monitoring IP ranges and rejecting placeholder keys also address these vulnerabilities.
These recommendations map to parallel trends in enterprise visual content management. Organizations that handle large volumes of digital assets, including those using Getty Images for licensed photography, already understand the importance of controlled access. Analysts at IDC have noted that major vendors with digital content pipelines integrate strict asset permissions into workflows. The same mindset is becoming relevant for AI systems that power internal tools.
A final takeaway for CISOs is the speed with which attackers are exploring AI agentic behavior. The CTO put it plainly by saying attackers aim to find and hijack AI infrastructure and will target any internet-exposed system within hours. Security analysts at McKinsey and other research groups have written that the rise of autonomous agents introduces novel execution paths that do not resemble typical malware or script-based compromise patterns. When attackers can deploy a persona-driven agent directly into a misconfigured endpoint, the result is closer to operational misuse than classic exploitation.
The emerging dynamic prompts an essential question for enterprise leaders. Are AI model endpoints being treated as production interfaces with the same scrutiny as API gateways, or are they viewed as experimental components left open for testing and iteration? The difference matters. As these incidents show, the cost of visibility gaps in AI deployment is increasing, and organizations that do not track exposure could find their resources powering someone else's operations.
Zenity's findings serve as early documentation of a fast-moving pattern. AI infrastructure is being probed, mapped, and occasionally repurposed by threat actors. Enterprises that build internal agents or self-host models should consider this a signal to revisit authentication defaults, endpoint exposure, and logging pipelines. In many cases, fixing the problem is less about advanced tooling and more about applying long-standing API security practices to a new, rapidly expanding class of systems.
⬇️