4 min read1 day ago
–
Press enter or click to view image in full size
RabbitMQ Secure Architecture with TLS Termination at Proxy
I love contributing to the community by sharing real problems I’ve faced and the solutions I’ve learned along the way.
This article is based on a real-world issue I encountered, and I’m sharing it to help others avoid the same pitfalls. Please review and apply the ideas at your own discretion and risk.
All sensitive or confidential information has been carefully masked or removed. The focus is purely on the technical challenges, architectural decisions, and problem-solving process not on any proprietary or internal details.
RabbitMQ: 3.13.7 Kubernetes: 1.33.1
Enabling TLS for RabbitMQ looks simple. In reality, once you run RabbitMQ on…
4 min read1 day ago
–
Press enter or click to view image in full size
RabbitMQ Secure Architecture with TLS Termination at Proxy
I love contributing to the community by sharing real problems I’ve faced and the solutions I’ve learned along the way.
This article is based on a real-world issue I encountered, and I’m sharing it to help others avoid the same pitfalls. Please review and apply the ideas at your own discretion and risk.
All sensitive or confidential information has been carefully masked or removed. The focus is purely on the technical challenges, architectural decisions, and problem-solving process not on any proprietary or internal details.
RabbitMQ: 3.13.7 Kubernetes: 1.33.1
Enabling TLS for RabbitMQ looks simple. In reality, once you run RabbitMQ on Kubernetes with clustering, external clients, browsers, and IoT devices, TLS becomes a trap full of sharp edges.
This post explains three critical TLS problems that cannot be solved with RabbitMQ configuration alone and the only architecture that actually works in production today.
If you’re trying to combine:
- Erlang clustering
- Kubernetes DNS
- Vault PKI
- Let’s Encrypt
- mTLS for devices
- Browsers for humans
…you’ll likely hit the same wall we did.
The Goal
We wanted:
- Encrypted Erlang clustering
- Automatic certificate rotation
- Trusted HTTPS for browsers (Management UI)
- mTLS for IoT devices (MQTT)
- Clean separation of internal vs external trust
What we discovered:
RabbitMQ TLS is global. You don’t get per-listener policies.
That single fact causes all the problems below.
Problem 1: Erlang Clustering Fails with TLS Hostname Verification
What Happened
We enabled TLS for RabbitMQ clustering using Vault-issued certificates:
tls: secretName: vault-rabbitmq-tls caSecretName: vault-ca-bundle
RabbitMQ pods immediately failed to cluster with this error:
{tls_alert,{handshake_failure,{bad_cert,hostname_check_failed}}}
Why This Fails
Erlang connects using pod-level DNS names, like:
xxx-rabbit-server-0.rabbitmq-nodes-svc.rabbitmq.svc.cluster.local
But our certificate only contained:
DNS: rabbitmq.rabbitmq.svc
Hostname mismatch TLS handshake rejected Cluster never forms
The Fix: Vault PKI Wildcard Certificates
You must include a wildcard DNS entry that matches every pod hostname.
Vault PKI role:
vault write pki/roles/rabbitmq-role \ allowed_domains="*.rabbitmq-nodes-svc.rabbitmq.svc.cluster.local,rabbitmq.rabbitmq.svc" \ allow_subdomains=true \ allow_glob_domains=true
Certificate request:
altNames: - "*.rabbitmq-nodes-svc.rabbitmq.svc.cluster.local" - "rabbitmq.rabbitmq.svc"
Erlang hostname verification passes TLS clustering works
Problem 2: You Cannot Mix Let’s Encrypt and Vault CA
What We Wanted
- Let’s Encrypt for external access
- Vault CA for internal clustering
tls: secretName: rabbitmq-server-tls # Let's Encrypt caSecretName: vault-ca-bundle # Vault CA
Why This Fails (Always)
RabbitMQ does this internally:
- Node A connects to Node B
- Node B presents Let’s Encrypt cert
- Node A validates using Vault CA
- CA mismatch →
{bad_cert, unknown_ca}
The Rule (Non-Negotiable)
The certificate and CA must come from the same authority.
If:
caSecretName: vault-ca-bundle
Then:
secretName: MUST be Vault-issued
You cannot mix CAs inside RabbitMQ.
Problem 3: TLS and mTLS Settings Are Global (This Breaks Everything)
RabbitMQ has global SSL options:
ssl_options.fail_if_no_peer_cert = true
When TLS is enabled:
- Same certificate
- Same CA
- Same mTLS policy
…apply to every listener:
Get Mohamed Rasvi’s stories in your inbox
Join Medium for free to get updates from this writer.
Port Purpose 25672 Erlang clustering 5671 AMQP 8883 MQTT 15671 Management UI
Why This Is a Disaster
Issue A: Browsers Don’t Trust Vault CA
If you add external DNS to Vault certs:
- Browsers show “Not Secure”
- Users lose trust instantly
Installing a private CA on browsers? Not realistic Not scalable
Issue B: mTLS Breaks the Management UI
If you enforce mTLS for IoT devices:
fail_if_no_peer_cert = true
Then:
- MQTT devices work
- Browsers fail (no client cert)
There is no way to disable mTLS only for the UI.
Why RabbitMQ 4.x Doesn’t Save You
I read their doc RabbitMQ 4.2.2.
Result:
- Same global TLS limitation
- No per-listener SSL policies
Future versions might fix this but production can’t wait on unstable releases.
The Only Practical Solution: TLS Termination Proxy
The fix is architectural, not configurational.
The Key Idea
Separate external TLS from internal mTLS for node formation.
Use:
- Let’s Encrypt at the edge
- Vault PKI internally
- Proxy to enforce per-service policies
Final Architecture
Press enter or click to view image in full size
Erlang clustering over mTLS
External Traffic (Trusted by Everyone)
- Browser / IoT Device
- TLS terminated at Ingress / HAProxy / Nginx
- Let’s Encrypt certificate
- Optional mTLS enforced at the proxy
Client → HTTPS/MQTTS → Proxy → HTTP/MQTT → RabbitMQ
Internal Traffic (Trusted Only by RabbitMQ)
- Erlang clustering over TLS
- Vault-issued certificates
- Wildcard DNS
- Full mTLS between nodes
Why This Works
Browsers see trusted certs IoT devices get mTLS Clustering is encrypted Certificates rotate automatically No RabbitMQ hacks Works on 3.13.x and 4.2.x
Key Takeaways
- RabbitMQ TLS is global
- You cannot mix CAs
- You cannot set per-listener TLS policies
- Vault PKI is perfect for internal trust
- Let’s Encrypt is perfect for external trust
- A TLS termination proxy is not optional at scale
Conclusion
This architecture isn’t fancy it’s necessary.
If you’re running RabbitMQ on Kubernetes with real-world requirements (browsers, devices, security teams), TLS termination is the only sane solution.
Sometimes the right fix isn’t another config flag it’s accepting the system’s limits and designing around them.
If this saved you a few days of debugging Erlang TLS errors, feel free to share it.