Background
Last month, Cisco researchers detected over 1,000 Ollama instances within the first 10 minutes using Shodan scanning on 11434 port. Other services such as vLLM/llama.cpp/LangChain on 8000, LM Studio on 1234, GPT4All on 4891, are also identified.
Later, Censys found 10.6K Ollama instances publicly available online, and 1.5K of these instances respond to prompts. That poses not only a great security risk of RCE, injection and poisoning, but also possible to expose private chat memory via unauthorized prompting.
Therefore, if an AI instance is used for handling sensitive i…
Background
Last month, Cisco researchers detected over 1,000 Ollama instances within the first 10 minutes using Shodan scanning on 11434 port. Other services such as vLLM/llama.cpp/LangChain on 8000, LM Studio on 1234, GPT4All on 4891, are also identified.
Later, Censys found 10.6K Ollama instances publicly available online, and 1.5K of these instances respond to prompts. That poses not only a great security risk of RCE, injection and poisoning, but also possible to expose private chat memory via unauthorized prompting.
Therefore, if an AI instance is used for handling sensitive information, it should always be hosted locally, perhaps as the same security level of a NAS—Ideally, within a hardened network—behind a hardware firewall such as pfSense/OPNsense, and block internet access with firewall rule or compartmentalized inside a VLAN with LAN-only ACL configuration.
But that might be too much for most AI enthusiasts who is less likely enthusiastic about networking, so I want to provide some easy solution to enhance their level of security and privacy.
As the mitigation section from the articles above stated, letting the service hosting on 0.0.0.0 is the main cause. Although, in most cases, as long as your server is behind a router, it should be isolated from listening to the internet by default. However, depending on your specific network environment, the service can be accidentally exposed to the internet in many ways.
So what we need is to change the address from listening requests from anywhere into a restricted network, i.e. LAN (192.168.x.x) or localhost (127.0.0.1) if you only use it on the server machine.
I will not cover for Local AI engines like LM Studio, GPT4All, llama.cpp, Koboldcpp, Jan… They are meant to be used on localhost even if they have the ability to host as a server. These tools are able to run offline and can be secured by application firewalls like OpenSnitch or Portmaster.
Ollama Service
First, let’s deal with Ollama. For anyone who installed their Ollama on Linux with scripts, you need to change the service configuration by sudo nano /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="OLLAMA_HOST=192.168.x.x"
Just replace 0.0.0.0 with the actual ip addr of the server, then apply changes with
sudo systemctl daemon-reload
sudo systemctl restart ollama
Now, your Ollama service is only reachable within your LAN, and nobody from the internet can get access to it.
Docker/Harbor
If your Ollama instance (or any other service) is hosted via docker, simply change -p 11434:11434 to -p 192.168.x.x:11435:11434 with your server ip when you start it up.
docker run -d --gpus=all -v ollama:/root/.ollama -p 192.168.x.x:11434:11434 --name ollama ollama/ollama
In contrast, Harbor is more secure by default. You have to explicitly configure tunnels in order to get internet exposure.
So, by default, Harbor sets its urls as:
$ harbor url ollama
http://localhost:33821
$ harbor url --lan ollama
http://192.168.x.x:33821
$ harbor url -i ollama
http://harbor.ollama:11434
To see more details, run harbor config list | grep OLLAMA
OLLAMA_CACHE                   ~/.ollama
OLLAMA_HOST_PORT               33821
OLLAMA_VERSION                 latest
OLLAMA_INTERNAL_URL            http://ollama:11434
OLLAMA_DEFAULT_MODELS          mxbai-embed-large:latest
OLLAMA_CONTEXT_LENGTH          4096
HOLLAMA_HOST_PORT              33871
CHATUI_OLLAMA_MODEL            llama3.1:8b
Its internal url is configured as inside a docker intranet, theoretically already isolated from the internet. Although we can override it with a LAN ip by harbor config set ollama.internal_url http://192.168.x.x:11434, I don’t think that is really needed. Please let me know if anybody finds out hardening Harbor services are necessary.
Gradio
Many python projects use Gradio as their webui, like Stable Diffusion and GPT-SoVITS. Although they usually do some hardening by default, but not always.
I have been noticed and addressed Gradio’s privacy intrusion for quite a long time. Many Gradio apps have data collection (analytics/telemetry) enabled by default (no opt-in consent violates GDPR).
So, here is the way to run Gradio securely and make it respect user’s privacy:
GRADIO_SHARE=False GRADIO_SERVER_NAME=192.168.x.x GRADIO_SERVER_PORT=7860 GRADIO_ANALYTICS_ENABLED=False DISABLE_TELEMETRY=1 python app.py
Let’s break down the environment variables briefly.
- GRADIO_SHARE=Falsedisables sharing a public URL like- https://somethingrandom.gradio.live
- GRADIO_SERVER_NAME=192.168.x.xrestricts the server as only reachable through LAN. Feel free to use- 0.0.0.0if your network is properly configured, and use the default value- 127.0.0.1for localhost usage.
- GRADIO_SERVER_PORT=7860sets to default port. Change it to something else to avoid port scanning or for better port management.
- GRADIO_ANALYTICS_ENABLED=Falsedisables the privacy intrusive activities.
- DISABLE_TELEMETRY=1same as above, deprecated.
Streamlit
Another popular framework that many Whisper based projects are using is Streamlit, like subsai and VideoLingo. Many projects based on this WebUI are even worse by default settings, which not only collects privacy data without consent, but also exposes its service port publicly to the internet.
Here is what I have been using:
streamlit run app.py --browser.gatherUsageStats false --server.port 8501 --browser.serverAddress 192.168.x.x --server.headless false
These options are simpler to explain, browser.gatherUsageStats false disables the data collection and server.port sets the port number.
By default, you will get an unwanted External URL exposed to the public internet:
Network URL: http://192.168.x.x:8501
External URL: http://xxx.xxx.xxx.xxx:8501
Pass either --browser.serverAddress 192.168.x.x or --server.headless false to mitigate the problem of External URL. You will get either one single URL or something like this after hardening:
Local URL: http://localhost:8501
Network URL: http://192.168.x.x:8501
Unfortunately, these options are not well documented although you can also put them into the config.toml file.
System/Browser
For other tools like vLLM and huggingface_hub, they have more dedicated options like VLLM_NO_USAGE_STATS=1 and HF_HUB_DISABLE_TELEMETRY=1, but also have more universal options like DO_NOT_TRACK=1.
When we don’t need the portability of passing arguments, we can make those in common or essentials as default at the system level via nano ~/.bashrc
export GRADIO_SHARE="False"
export GRADIO_ANALYTICS_ENABLED="False"
export GRADIO_SERVER_NAME="192.168.x.x"
export TRANSFORMERS_OFFLINE=1
export DISABLE_TELEMETRY=1
export DO_NOT_TRACK=1
export HF_HUB_OFFLINE=1
export HF_DATASETS_OFFLINE=1
export HF_HUB_DISABLE_IMPLICIT_TOKEN=1
export HF_HUB_DISABLE_TELEMETRY=1
Then run source ~/.bashrc to apply the changes
Note: In some cases, you need to put these options into a overriding file like .env, .sh, .toml, .json, .bat…
Thanks for u/campingtroll in this thread collecting all these options for privacy. Based on that, I added some more to harden the security aspect.
By this way, you can ensure those affected projects will respect your privacy and are isolated from the internet. You can always turn things on when needed.
In the end, always make sure the Linux Server itself is hardened properly (at least to the bare minimum), otherwise service/application level hardening is nonsensical shenanigans.
While on the client side, choose a trustworthy browser which is dedicated for local WebUI services. Here is a rule of thumb:
- Avoid any proprietary browsers like Edge, Opera/Vivaldi (although they have some very nice features) and extensions/plugins. Less is more
- Avoid those backed by private company, even if they claim it’s for privacy or open source like Brave, DuckDuckGo, SRWare Iron, Comodo Dragon…
- Avoid Googled Chrome and vanilla Firefox, apply arkenfox or Betterfox if with existed Firefox
- Be cautious when choosing FOSS projects that are not well maintained, like Iridium, Mercury, r3dfox (although they are super cool and okay to use)
- Choose FOSS projects that are backed by an active community and pro-privacy like Ungoogled Chromium, LibreWolf, Waterfox, Thorium, supermium
- Choose old fashion FOSS projects with less feature/extensions like PaleMoon/Basilisk/SeaMonkey, IceCat, Otter, Midori (because they are awesome and most suitable for the purpose)