Cisco’s Talos security research team has found over 1,100 Ollama servers exposed to the public internet, where miscreants can use them to do nasty things.
Ollama provides a framework that makes it possible to run large language models locally, on a desktop machine or server. Cisco decided to research it because, in the words of Senior Incident Response Architect Dr. Giannis Tziakouris, Ollama has “gained popularity for its ease of use and local deployment capabilities.”
Talos researchers used the Shodan scanning tool to find unsecured Ollama servers, and spotted over 1,100, around 20 percent of which are “actively hosting models susceptible to unauthorized access.” Cisco’s scan found over 1,000 exposed servers within 10 minutes of commencing its sweep of the internet.
Leaving an Ollama server dangling on the open internet means anyone who learns of its existence could query the LLM and use its API, perhaps consuming all its resources or running up a big bill for hosted systems. Targeted attacks are possible because Cisco found many of the servers expose information that makes it possible to identify hosts.
Cisco’s infosec investigators also worry about the following consequences:
Cisco classified 80 percent of the open Ollama servers it spotted as “dormant” because they were not running any models, meaning the above attacks would be futile. The bad news is that those servers “remain susceptible to exploitation via unauthorized model uploads or configuration manipulation.”
But Dr. Tziakouris warned “their exposed interfaces could still be leveraged in attacks involving resource exhaustion, denial of service, or lateral movement.”
The USA is home to most of the exposed servers (36.6 percent), followed by China (22.5 percent) and Germany (8.9 percent).
Tziakouris concluded the findings of the Cisco study “highlight a widespread neglect of fundamental security practices such as access control, authentication, and network isolation in the deployment of AI systems.” As is often the case when organizations rush to adopt the new hotness, frequently without informing IT departments because they don’t want to be told to slow down and do security right.
He thinks things may get worse.
“The uniform adoption of OpenAI-compatible APIs further exacerbates the issue, enabling attackers to scale exploit attempts across platforms with minimal adaptation,” he wrote, before calling for development of “standardized security baselines, automated auditing tools, and improved deployment guidance for LLM infrastructure.”
He also acknowledged that Cisco’s Shodan scan cannot deliver a definitive view on LLM security, and called for work on tools that include adaptive fingerprinting and active probing techniques, and target other model-hosting frameworks including Hugging Face, Triton, and vLLM, to help researchers better understand the situation. ®