Part 01: Building a Sovereign Software Factory: Docker Networking & Persistence

GitHub: https://github.com/InfiniteConsult/0005_cicd_part01_docker

TL;DR: In this first installment, we reject the fragility of default Docker environments. We build a permission-safe "Control Center" using Docker-out-of-Docker (DooD), construct a private "Road Network" (cicd-net) for internal DNS resolution, and pour the "Foundations" using a hybrid persistence strategy (Bind Mounts vs. Named Volumes) to ensure our data survives container restarts.

The Sovereign Software Factory Series:

Part 01: Building a Sovereign Software Factory: Docker Networking & Persistence (You are here)
Part 02: Building a Sovereign Software Factory: The Local Root CA & Trust Chains
**Part…

GitHub: https://github.com/InfiniteConsult/0005_cicd_part01_docker

The Sovereign Software Factory Series:

Part 01: Building a Sovereign Software Factory: Docker Networking & Persistence (You are here)
Part 02: Building a Sovereign Software Factory: The Local Root CA & Trust Chains
Part 03: Building a Sovereign Software Factory: Self-Hosted GitLab & Secrets Management
Part 04: Building a Sovereign Software Factory: Jenkins Configuration as Code (JCasC)
Part 05: Building a Sovereign Software Factory: Artifactory & The "Strict TLS" Trap
Part 06: Building a Sovereign Software Factory: SonarQube Quality Gates
Part 07: Building a Sovereign Software Factory: ChatOps with Mattermost
Part 08: Building a Sovereign Software Factory: Observability with the ELK Stack
Part 09: Building a Sovereign Software Factory: Monitoring with Prometheus & Grafana
Part 10: Building a Sovereign Software Factory: The Python API Package (Capstone)

Chapter 1: Introduction - Rejecting Fragility

1.1 The Goal: Building the "Factory Floor"

The Problem: Default Docker is Isolated and Ephemeral

Before we can deploy a single CI/CD tool, we must first confront the "pain points" of a default Docker installation. We cannot simply run docker run gitlab and docker run jenkins and expect them to work. This is because a default setup is fundamentally fragile, suffering from two critical flaws:

Network Isolation: By default, Docker containers are like isolated, soundproof "bubbles." They cannot find or communicate with each other. Our Jenkins container would have no way to find the GitLab container, making it useless for integration.
Container Ephemerality: A container’s filesystem is ephemeral. This is the most dangerous flaw. It’s like an Etch A Sketch: the moment you stop and remove a container, all the data written inside it—your Git repositories, your build logs, your user accounts—is permanently destroyed.

This fragility is unacceptable for a stateful, interconnected stack like a CI/CD pipeline, which must communicate and must persist data.

The Analogy: "CI/CD City Planning"

This article is about "city planning." Before a city can build its first skyscraper (GitLab) or factory (Jenkins), the city planner must lay down the fundamental infrastructure. A good plan here makes the entire city function.

We will build:

The "Control Center": A single, secure place from which to manage all construction.

The "Roads": A custom network grid so all buildings can communicate.

The "Foundations": Permanent, zoned land plots for each building to store its data.

The Solution: Our Three-Part Foundation

We will solve these problems sequentially, building our foundation layer by layer. This article will guide you through building the absolute minimum viable foundation for a professional, multi-service Docker environment.

1.2 The "Why": Choosing Docker as Our Foundation

Before we build our "city," we must ask a fundamental question: why are we building it with Docker? Why not install GitLab, Jenkins, and SonarQube directly on our host operating system?

The Problem: "Dependency Hell" and Server "Drift"

The traditional method of server setup is a high-stakes, one-way process. You would SSH into a server and run apt install ... for every service. This creates a fragile, unmanageable system.

The "Dependency Hell" Pain Point: What happens when GitLab requires one version of PostgreSQL, but SonarQube requires a different, conflicting version? What happens when Jenkins requires Java 17, but Artifactory needs Java 11? You are now stuck in "dependency hell," trying to make incompatible tools coexist on one machine.
The "Server Drift" Pain Point: Your development machine and your production server inevitably "drift" apart. The server has packages and configurations that your local machine doesn’t, leading to the most dreaded phrase in engineering: "But it works on my machine."
The "Heavy VM" Problem: The old solution was to use Virtual Machines (VMs). You would run one full VM for GitLab, another for Jenkins, and so on. This provides isolation but is incredibly resource-intensive.

The Analogy: "Houses vs. Apartments"

To understand why Docker is the solution, we must contrast it with VMs using a "first principles" analogy.

A Virtual Machine is a separate House. To run 5 services, you must build 5 separate houses. Each house needs its own foundation, its own plumbing, its own electrical grid, and its own complete operating system. This is safe and isolated, but monumentally heavy, slow to build, and wastes resources.

A Docker Container is a private Apartment. You have one large apartment building (your Host OS) that provides shared, foundational infrastructure (the Linux Kernel). A container is a single, prefabricated apartment that is "dropped" into the building. It shares the building’s main plumbing (the kernel), but it is fully isolated with its own walls, door, and key.

The Solution: Isolation Without the Overhead

Docker gives us the "apartment" model, which is the perfect balance of isolation and efficiency. It achieves this by using two powerful, "first principles" features built directly into the Linux kernel:

Namespaces: These are the "walls" of the apartment. They provide process isolation. A process inside a "GitLab" container cannot see or interact with processes inside a "Jenkins" container, even though they are on the same machine.
Control Groups (cgroups): This is the "utility meter" for the apartment. It allows Docker to limit how much CPU and RAM each container is allowed to consume.

We choose Docker as our foundation because it is:

Reproducible: A Dockerfile is a precise, repeatable blueprint. The GitLab container you build is guaranteed to be identical to the one I build. This eliminates server drift.
Lightweight: Containers share the host kernel. Services start in seconds, not the minutes it takes to boot a full VM.
Clean: To "uninstall" GitLab, you don’t run a complex script. You just docker rm the container. Your host OS is left perfectly untouched, solving the "dependency hell" problem forever.

1.3 The "Control Center": Docker-out-of-Docker

The Problem: Where do ‘docker’ commands come from?

The "Why": We are now working inside our dev-container. This is our "Control Center." But how can we run docker commands from inside this container to create and manage other containers, like GitLab and Jenkins?

If we try, we’ll find a problem.

The Analogy: "The Master Remote Control"

The "What": We must contrast the two ways to solve this:

Docker-in-Docker (DinD): This is the "heavy" way. It’s like building a tiny, new, fully-functional apartment building inside your existing apartment. It’s redundant, complex, and has security implications.
Docker-out-of-Docker (DooD): This is the "smart" way. It’s like finding the building manager’s master remote control (the Docker socket) just outside your door. By bringing it inside, you can sit in your apartment and control every other door and light in the entire building.

We will implement the DooD pattern.

The Principle: Docker CLI vs. Docker Daemon

The "First Principles": To understand DooD, we must deconstruct how Docker works. It’s a client-server application:

Docker Daemon (dockerd): This is the "engine." It’s the background service running on your host machine that manages images, containers, and networks.
Docker CLI (docker): This is the "remote control." It’s a simple client that sends instructions (e.g., "run," "stop") to the daemon.

Crucially, the CLI communicates with the daemon via a socket file: /var/run/docker.sock.

Our Solution:

Install only the Docker CLI inside our dev-container.
"Pass in" the Docker socket from the host by mounting it as a file.
Grant our container user permission to use that socket.

1.4 The "How": A Pedagogical Example (Failure First)

We’ve established our goal: to run docker commands from inside our dev-container to control the host’s Docker daemon.

The "Pain Point": This is not as simple as just installing the docker client. The real, hidden "pain point" is permissions. The Docker socket on the host is a protected file. To access it, our container user must have the correct permissions, specifically, they must be part of a group that has the exact same Group ID (GID) as the host’s docker group.

Let’s prove this by demonstrating every way a "simple" setup fails, using a "blank slate" Debian container instead of our already-solved dev-container.

Example 1: The ‘docker’ command fails

First, we’ll run a new, temporary Debian container. The -it flag gives us an interactive shell, and --rm means the container will be deleted the moment we exit.

# On your host machine, run this command
docker run -it --rm debian:12 bash

You are now in a shell inside the Debian container. Now, let’s try to run a Docker command:

# (Inside debian container)
root@...:/# docker ps

Result:

bash: docker: command not found

Explanation: This is Failure #1. The "remote control" (the Docker CLI) is not installed in a standard container. This is the most obvious problem, but not the hardest one.

Example 2: The ‘permission denied’ failure (The Real Pain Point)

This is a more advanced example, but it is the correct one. We will simulate our dev-container setup by installing the CLI, creating a non-privileged user, and seeing why they fail to get permission.

First, exit the previous container. Now, run a new one, mounting the socket:

# On your host machine
# 1. Start the container, mounting the socket
docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock debian:12 bash

Now, inside the container as root, we will install the CLI and create our test user.

# (Inside debian container, as root)

# 2. Install prerequisites
apt update && apt install -y curl gpg ca-certificates sudo

# 3. Add Docker's GPG key and repository
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null

# 4. Install the Docker CLI
apt update
apt install -y docker-ce-cli

# 5. Create a non-privileged user, just like our dev-container user
useradd -m -s /bin/bash tempuser

# 6. Switch to this new user
su - tempuser

Now that we are tempuser, let’s try to use Docker.

# (Inside debian container, as tempuser)
# 7. Try to use Docker.
tempuser@...:$ docker ps

Result (Failure #2):

Got permission denied while trying to connect to the Docker daemon socket...
...connect: permission denied

Explanation: This is logical. The tempuser is not root and is not part of any docker group.

Example 3: The Real GID Mismatch Failure

But what if we create the docker group? This is the most critical part of the lesson.

# (Inside debian container, as tempuser)
# 1. Go back to the root shell
tempuser@...:$ exit

# (Inside debian container, as root)
# 2. Create a 'docker' group. The container OS will assign it
#    a GID (e.g., 1001) that is different from your host's.
root@...:/# groupadd docker

# 3. Add 'tempuser' to this new 'docker' group
root@...:/# usermod -aG docker tempuser

# 4. Switch back to 'tempuser'
root@...:/# su - tempuser

# (Inside debian container, as tempuser)
# 5. Try again. The user is now in a 'docker' group.
tempuser@...:$ docker ps

Result (Failure #3):

Got permission denied while trying to connect to the Docker daemon socket...
...connect: permission denied

Explanation: This is the most important takeaway. We’ve proved that being in a docker group inside the container is not enough.

The problem is a GID Mismatch. The host’s socket file (/var/run/docker.sock) is protected by the host’s docker group GID (e.g., 998). The docker group we created inside the container has a totally different, random GID (e.g., 1001).

As far as the host’s kernel is concerned, our tempuser is a member of group 1001, not 998, so it is denied access. This is the exact problem our Dockerfile and build-dev.sh script are designed to solve.

1.5 The "Action Plan": Implementing DooD

We have successfully proven our "pain point" by demonstrating that a standard, non-privileged user in a fresh container cannot access the Docker daemon, even when the CLI is installed and the socket is mounted.

Now, we will implement the correct, robust solution in our dev-container environment. We will modify our Dockerfile and build-dev.sh scripts to fix these problems at the image level, ensuring the fix is permanent and works for all sessions, including SSH.

Step 1: Install the Docker CLI (The "Remote Control")

First, we must add the Docker "remote control" (docker-ce-cli) to our Dockerfile blueprint.

Action: Open your Dockerfile. We need to modify the main RUN apt update \ ... block. This change, which is already reflected in the latest version of the Dockerfile in this repository, performs the same steps we just did in our temporary Debian container:

Install Prerequisites: It ensures curl, gpg, and ca-certificates are installed.
Add Docker’s GPG Key: It adds Docker’s official GPG key to establish trust.
Add Docker Repository: It adds the Docker APT repository to our container’s "phone book".
Update and Install: It runs apt update again to load that new repository and then adds docker-ce-cli to our apt install -y list.

Code: The main RUN layer in your Dockerfile should look like this:

# (Inside Dockerfile)
RUN apt update \
&& apt install -y \
build-essential ca-certificates cmake curl flex fontconfig \
fonts-liberation git git-lfs gnupg2 iproute2 \
less libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 \
libatspi2.0-0 libbz2-dev libcairo2 libcups2 libdbus-1-3 \
libffi-dev libfl-dev libfl2 libgbm1 libgdbm-compat-dev \
libgdbm-dev libglib2.0-0 libgtk-3-0 liblzma-dev libncurses5-dev \
libnss3 libnss3-dev libpango-1.0-0 libreadline-dev libsqlite3-dev \
libssl-dev libu2f-udev libx11-xcb1 libxcb-dri3-0 libxcomposite1 \
libxdamage1 libxfixes3 libxkbcommon0 libxrandr2 libxshmfence1 \
libxss1 libzstd-dev libzstd1 lzma m4 \
nano netbase openssh-client openssh-server openssl \
patch pkg-config procps python3-dev python3-full \
python3-pip python3-tk sudo tmux tzdata \
uuid-dev wget xvfb zlib1g-dev \
linux-perf bpftrace bpfcc-tools tcpdump ethtool linuxptp hwloc numactl strace \
ltrace \
&& apt upgrade -y \
&& install -m 0755 -d /etc/apt/keyrings \
&& curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc \
&& chmod a+r /etc/apt/keyrings/docker.asc \
&& echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null \
&& apt update \
&& apt install -y docker-ce-cli \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false

Step 2: Grant Permission (The "GID Mismatch" Fix)

This is the most critical part, where we solve the GID Mismatch problem from our third "failure" example. We will do this by passing the host’s docker group GID into the build and permanently modifying the container’s group database.

Action 2a: Pass the Host GID During the Build

We must modify our "construction manager" script, build-dev.sh, to find the host’s docker GID and pass it to the docker build command as a --build-arg.

Action: Open build-dev.sh and ensure it looks like this.

#!/usr/bin/env bash
source ./dev.conf

USERNAME="$USER"
USER_ID=$(id -u)
USER_GID=$(id -g)

# 1. Find the GID of the 'docker' group on the HOST
DOCKER_GID=$(getent group docker | cut -d: -f3)

# 2. Check that it was found, otherwise exit
if [ -z "$DOCKER_GID" ]; then
echo "Error: 'docker' group not found on host."
echo "Please run 'sudo groupadd docker && sudo usermod -aG docker $USER'"
echo "Then, log out and log back in before re-running this script."
exit 1
fi

# ... (ssh key logic) ...
SSH_DIR_HOST=~/.ssh
cp -r $SSH_DIR_HOST .
SSH_DIR_CONTEXT=$(basename $SSH_DIR_HOST)

docker build --progress=plain \
--build-arg SSH_DIR="$SSH_DIR_CONTEXT" \
--build-arg INSTALL_CUDA_IN_CONTAINER="$INSTALL_CUDA_IN_CONTAINER" \
--build-arg USERNAME="$USERNAME" \
--build-arg USER_UID="$USER_ID" \
--build-arg USER_GID="$USER_GID" \
# 3. Pass the host's GID as a build argument
--build-arg HOST_DOCKER_GID="$DOCKER_GID" \
-f Dockerfile -t dev-container:latest .

# ... (cleanup logic) ...
rm -rf $SSH_DIR_CONTEXT

Action 2b: Use the GID in the Dockerfile

Now we will modify our Dockerfile to use that build argument.

Action: Add ARG HOST_DOCKER_GID near the top of your Dockerfile, and then replace the entire RUN command for user setup with this robust version.

Deconstruction: This command:

Declares the HOST_DOCKER_GID build argument.
Checks if the docker group name already exists in the container (which it usually won’t, since docker-ce-cli doesn’t create the group for us).
Checks if its GID matches the host’s GID.
If they don’t match, it modifies the container’s docker group GID using groupmod.
Finally, it creates our $USERNAME and adds them to this now-correct docker group.

Code:

# (Inside Dockerfile, near the top)
ARG USER_UID
ARG USER_GID
ARG SSH_DIR
ARG HOST_DOCKER_GID # <-- ADD THIS LINE
ARG INSTALL_CUDA_IN_CONTAINER="false"

# ... (skip apt install and CUDA blocks) ...

# (Inside Dockerfile, after CUDA block)
# This is the robust command to fix GID mismatch
RUN echo "--- Setting up user and Docker GID ---" \
&& if getent group docker >/dev/null 2>&1; then \
if [ $(getent group docker | cut -d: -f3) -ne $HOST_DOCKER_GID ]; then \
echo "--- Modifying container 'docker' group GID to match host ($HOST_DOCKER_GID) ---"; \
groupmod --gid $HOST_DOCKER_GID docker; \
else \
echo "--- Container 'docker' group GID already matches host ($HOST_DOCKER_GID) ---"; \
fi \
else \
echo "--- Creating 'docker' group (GID: $HOST_DOCKER_GID) ---"; \
groupadd --gid $HOST_DOCKER_GID docker; \
fi \
\
&& groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -G docker -m $USERNAME \
\
&& sed -i "s/#PubkeyAuthentication yes/PubkeyAuthentication yes/g" /etc/ssh/sshd_config \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME \
&& echo 'export GPG_TTY=$(tty)' >> /home/$USERNAME/.bashrc

Step 3: Mount the Socket (The "Control Port")

This is the final piece of the puzzle. Our build-dev.sh script is now passing the GID, and our Dockerfile is using it to grant permanent permission.

Now, we must modify our dev-container.sh script to only do what it’s supposed to: run the container and mount the socket.

Action: Open dev-container.sh.

Remove all logic related to DOCKER_GID and --group-add. It is no longer needed here.
Add the bind mount flag -v /var/run/docker.sock:/var/run/docker.sock to pass in the "remote control."

Code: Your dev-container.sh script should now look like this:

#!/usr/bin/env bash
source ./dev.conf

USERNAME="$USER"
GPU_FLAG=""

# Conditionally add the --gpus flag
if [ "$ENABLE_GPU_SUPPORT" = "true" ]; then
GPU_FLAG="--gpus all"
fi

# ... (mkdir logic) ...
mkdir -p repos data articles viewer

docker run -it \
--name "dev-container" \
--restart always \
--cap-add=SYS_NICE \
--cap-add=SYS_PTRACE \
$GPU_FLAG \
# 1. This is the new, critical line
-v /var/run/docker.sock:/var/run/docker.sock \
-v "$(pwd)/articles:/home/$USERNAME/articles" \
-v "$(pwd)/viewer:/home/$USERNAME/viewer" \
-v "$(pwd)/data:/home/$USERNAME/data" \
-v "$(pwd)/repos:/home/$USERNAME/repos" \
-p 127.0.0.1:10200:22 \
-p 127.0.0.1:10201:8888 \
-p 127.0.0.1:10202:8889 \
dev-container:latest

Step 4: Rebuild and Recreate the Environment

Now, you must run the full rebuild and restart process to apply all these changes.

Action: From your host terminal:

Build the new image (this will be slow because we just added a build arg - ensure your cpu governor is set to ‘performance’ for the build):

./build-dev.sh

Stop and remove your old container:

docker stop dev-container && docker rm dev-container

Start the new container using the modified dev-container.sh script:

./dev-container.sh

1.6 The "Verification": The "Success Second"

Our Dockerfile and build scripts have been modified. We have installed the docker-ce-cli package, passed the host’s docker GID into the build, and used it to create a docker group with the correct GID, adding our user to it. We have also mounted the Docker socket in our dev-container.sh script.

Now, let’s verify that our solution works.

Example 2: The ‘docker’ command succeeds

First, enter your rebuilt container using docker exec:

# On your host machine
docker exec -it dev-container bash

Now, from inside the container, run the docker ps command:

# (Inside dev-container)
docker ps

Result:

CONTAINER ID   IMAGE                COMMAND                  CREATED         STATUS         PORTS                               NAMES
<hash>         dev-container:latest   "/entrypoint.sh"         ...             Up ...         127.0.0.1:10200-10202->...          dev-container

This is our first success. The main docker exec shell now has the correct permissions. But the real test is whether a new SSH session—which gets a fresh login—also has these permissions.

Let’s test it. From your host machine’s terminal, SSH into the container:

# On your host machine
ssh -p 10200 $USER@127.0.0.1 "docker ps"

Result:

CONTAINER ID   IMAGE                COMMAND                  CREATED         STATUS         PORTS                               NAMES
<hash>         dev-container:latest   "/entrypoint.sh"         ...             Up ...         127.0.0.1:10200-10202->...          dev-container

Explanation: This is the critical success. We have proven that our "GID Mismatch" problem is permanently solved.

By modifying the Dockerfile to create the docker group with the correct HOST_DOCKER_GID and adding our user to it at build time, we have "baked" the correct permissions into the container’s user database.

This ensures that any new session, whether from docker exec or sshd, will correctly identify our user as a member of the docker group, granting it access to the mounted socket.

We are now inside our "Control Center," and we can use its "remote control" to manage the host’s Docker daemon. We are ready to build the rest of our CI/CD "city."

1.7 The "City Plan": Our 10-Article Stack

Now that we have successfully built our "Control Center", you have the power to manage Docker from within a stable, permission-safe environment.

But this is just the first step. The "Control Center" is where the "city planners" work, but we still need to build the city. The upcoming chapters of this article will lay the foundational "roads" (networking) and "land plots" (persistence).

After that, we will use our "Control Center" to build our complete CI/CD "city," one service at a time. Here is the 10-article blueprint of the stack we are building:

Article 1: Docker Foundations (This Article)

Role: The "Control Center" and "City Foundations" (which we will build in the next chapters).

Article 2: Local Certificate Authority (CA)

Role: The "Identity & Security Office." It will issue a unique, trusted ID (an HTTPS certificate) to every service we deploy, ensuring all communication is secure and encrypted.

Article 3: GitLab (Source Code Management)

Role: The "Central Library." This is the "single source of truth" where all our project’s "blueprints" (our source code) will be stored, versioned, and managed.

Article 4: Jenkins (CI/CD Orchestrator)

Role: The "Automated Factory Foreman." This is the "brain" of our operation. It will automatically pull blueprints from GitLab, run our build and test "assembly line," and tell other tools what to do.

Article 5: Artifactory (Artifact Manager)

Role: The "Secure Warehouse." After the factory (Jenkins) builds a finished product (a .jar, .whl, or .so file), it sends it to this warehouse for secure, versioned storage.

Article 6: SonarQube (Code Quality)

Role: The "Quality Assurance Inspector." This service automatically scans our blueprints (source code) to find bugs, security vulnerabilities, and "code smells," stopping the assembly line if quality standards are not met.

Article 7: Mattermost (ChatOps)

Role: The "Public Address System." This is our central chat hub where the "Factory Foreman" (Jenkins) can announce, in real-time, "Build 125 has passed!" or "Build 126 has failed!"

Article 8: ELK Stack (Logging)

Role: The "Central Investigation Office." With so many services, debugging is a nightmare. This stack collects all logs from all services into one searchable database.

Article 9: Prometheus & Grafana (Monitoring)

Role: The "Performance Dashboard." This stack provides the "health monitors" for our city, showing us in real-time which services are busy, which are slow, and which might be running out of memory.

Article 10: The Python API Package (Capstone)

Role: The "Master Control Package." Throughout the series, we will build a professional Python library from within our "Control Center". Each new article will add a module to this package for controlling that component’s API (e.g., adding users to GitLab, creating jobs in Jenkins). This capstone article will showcase our finished package, using it to automate the entire stack and perform complex, cross-service operations.

Chapter 2: Docker Networking (The "Roads")

2.1 The "Default Isolation" Problem

We have successfully built our "Control Center". We now possess a permission-safe, reproducible environment from which we can send docker commands to our host’s daemon. We are the "city planner" in our central office, ready to build our "city."

But we immediately face our next fundamental "pain point." If we simply run our services, they will be completely isolated from each other. A default Docker container is a "black box," and by design, it cannot see or speak to its neighbors. This is useless for our CI/CD stack. Our Jenkins container must be able to find and communicate with our GitLab container, which must be able to send notifications to our Mattermost container.

To build a functioning stack, we must first understand why this isolation exists and then build a private "phone system" to connect our services.

The Analogy: "The Private Hotel Room"

A new Docker container is like a soundproof, private hotel room. It has a main door to the "outside world" (the internet), which is why you can apt update or curl google.com from inside a new container.

But it has no phone and no adjoining doors to the other rooms in the hallway. You, in the "Jenkins" room (Room 101), have no way to find the "GitLab" room (Room 102). You can’t even tell if Room 102 exists, let alone call it by its name. We must lay the "wiring" for an internal phone system.

2.2 The Default `bridge` Network

When you install Docker on your Linux host, it creates a virtual Ethernet bridge called docker0. You can see this on your host machine by running the ip a command. This docker0 interface acts as a simple virtual switch. By default, every container you run is "plugged into" this switch with a virtual cable, allowing them to communicate if they know each other’s exact IP address.

This default network, however, is a legacy component. By design, it does not include an embedded DNS server. It was built for an older, deprecated linking system, not for modern, automatic service discovery. This is a deliberate design choice to maintain backward compatibility, and it’s the source of our "pain point." Containers on this network cannot find each other by name.

Let’s prove this from our "Control Center". We will run two simple debian:12 containers on the default network.

# (Inside dev-container)
# 1. Run two simple Debian containers on the default network
docker run -d --name helper-a debian:12 sleep 3600
docker run -d --name helper-b debian:12 sleep 3600

# 2. Install 'ping' in the 'helper-b' container
#    We suppress output with -qq for a cleaner log
docker exec -it helper-b apt update -qq
docker exec -it helper-b apt install -y -qq iputils-ping

Now that both containers are running and helper-b has the ping command, let’s try to have helper-b contact helper-a using its name.

# (Inside dev-container)
# 3. Try to ping 'helper-a' by its name
docker exec -it helper-b ping helper-a

Result:

ping: bad address 'helper-a'

This failure is the key takeaway. Because the default bridge has no DNS, helper-b has no way to resolve the name helper-a to an IP address. This makes the default network useless for our stack.

Let’s clean up our failed experiment.

# (Inside dev-container)
docker rm -f helper-a helper-b

2.3 The Solution: Custom `bridge` Networks

This is the best practice for all modern Docker applications. A user-defined bridge network is functionally similar to the default one, but it adds one critical, game-changing feature: automatic DNS resolution based on container names.

The "First Principles" of Embedded DNS

When you create a custom bridge network, the Docker daemon (dockerd) itself provides a built-in, lightweight DNS server for that network only.

Here’s how it works:

Docker automatically configures every container on that custom network to use this special DNS server. It does this by mounting a virtual /etc/resolv.conf file inside the container that points to nameserver 127.0.0.11.
This 127.0.0.11 address is a special loopback IP within the container’s namespace. The Docker daemon intercepts all DNS queries sent to this address.
The daemon maintains a "phone book" (a lookup table) for that specific network, instantly mapping container names (like gitlab) to their internal IP addresses.

The Analogy: "The Private Office VLAN"

Creating a custom bridge network is like putting all your servers on a private office network that comes with its own internal phone directory (the embedded DNS). The default bridge is a network without this directory.

Pedagogical Example: The Custom Bridge Success

Let’s repeat our experiment, but this time we’ll create our own "phone system."

# (Inside dev-container)
# 1. Create the network
docker network create my-test-net

# 2. Run containers attached to the new network
docker run -d --network my-test-net --name test-a debian:12 sleep 3600
docker run -d --network my-test-net --name test-b debian:12 sleep 3600

# 3. Install 'ping' in the 'test-b' container
docker exec -it test-b apt update -qq
docker exec -it test-b apt install -y -qq iputils-ping

Now, let’s try the same ping command that failed before.

# (Inside dev-container)
# 4. Try to ping by name again (this will succeed)
docker exec -it test-b ping test-a

Result:

PING test-a (172.19.0.2): 56(84) bytes of data.
64 bytes from test-a.my-test-net (172.19.0.2): icmp_seq=1 ttl=64 time=0.100 ms
...

This success is the foundation of our entire CI/CD stack. The embedded DNS server on my-test-net successfully resolved the name test-a to its internal IP address.

Let’s prove the "magic" by inspecting the DNS configuration inside the test-b container.

# (Inside dev-container)
# 5. Look at the DNS configuration file
docker exec -it test-b cat /etc/resolv.conf

Result:

nameserver 127.0.0.11
options ndots:0

This confirms our "first principles" explanation. The container is configured to use the internal 127.0.0.11 resolver, which is how it found test-a.

Cleanup:

# (Inside dev-container)
docker rm -f test-a test-b
docker network rm my-test-net

2.4 Driver 2: The `host` Network (No Isolation)

The host driver is the most extreme option. It provides the highest possible network performance by completely removing all network isolation between the container and the host. The container effectively "tears down its own walls" and attaches directly to your host machine’s network stack.

The Analogy: "The Open-Plan Office"

Using the host network is like putting your container not in a private room, but at a desk right next to your host OS in an open-plan office. It shares the same network connection, it can hear all the "conversations" (network traffic), and all the host’s ports are its ports.

This approach is fundamentally insecure and creates immediate, tangible risks. A process inside the container can:

Access localhost Services: It can connect directly to any service running on your host’s localhost or 127.0.0.1, such as a database or web server you thought was private.
Cause Port Conflicts: If your host is running a service on port 8080, and you try to start a host network container that also wants port 8080, the container will fail to start.
Sniff Host Traffic: A compromised container can potentially monitor all network traffic on your host machine.

Let’s prove the localhost access risk. This experiment requires two terminals.

Terminal 1 (Host Machine): First, on your host machine’s terminal (not inside the dev-container), start a simple Python web server.

# (Run on HOST)
# This requires Python 3 to be installed on your host
python3 -m http.server 8000

Result:

Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...

This server is now running on your host, bound to localhost:8000.

Terminal 2 (dev-container): Now, from your dev-container "Control Center," run a new temporary debian:12 container using the --network host flag.

# (Inside dev-container)
# 1. Run a temporary container on the 'host' network
docker run -it --rm --network host debian:12 bash

# (Inside debian container)
# 2. Install curl
root@...:/# apt update -qq && apt install -y -qq curl

# 3. Try to access the host's localhost
root@...:/# curl http://localhost:8000

Result: You will immediately see the HTML directory listing from the Python server that is running on your host.

Explanation: This proves the container has full access to the host’s network stack. It’s powerful for niche, high-performance applications, but for our CI/CD stack, this lack of isolation is an unacceptable security risk and a source of future port conflicts.

Cleanup:

In the debian container, type exit.
In your host terminal, press Ctrl+C to stop the Python server.

2.5 Driver 3: The `none` Network (Total Isolation)

This driver provides the most extreme form of isolation. When you attach a container to the none network, Docker creates the container with only a loopback interface (lo). It has no eth0 interface and no "virtual cable" plugging it into any switch. It cannot communicate with other containers or the outside world.

The Analogy: "Solitary Confinement"

A container on the none network is in a room with no doors and no windows. It can only talk to itself (via localhost).

This is not a mistake; it’s a powerful security feature. This is the perfect driver for secure, sandboxed batch jobs. Imagine a container that only needs to read a file from a mounted volume, perform a complex calculation on it, and write a result back to a volume. By attaching it to the none network, you can guarantee that this process has zero network access, eliminating an entire class of potential vulnerabilities.

Let’s verify this total isolation.

# (Inside dev-container)
# 1. Run a temporary container on the 'none' network
docker run -it --rm --network none debian:12 bash

Now, let’s try to do anything network-related, starting with updating the package manager.

# (Inside debian container)
# 2. Try to update apt
root@...:/# apt update -qq

Result:

W: Failed to fetch http://deb.debian.org/debian/dists/bookworm/InRelease  Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/bookworm-updates/InRelease  Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://deb.debian.org/debian-security/dists/bookworm-security/InRelease  Temporary failure resolving 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.

Explanation: This failure is the perfect proof. The container has no network stack, so it can’t even resolve the DNS for deb.debian.org to find its package repositories. This also means we can’t install tools like ping or iproute2 to investigate further.

# (Inside debian container)
# 3. Try to use common network tools (which aren't installed)
root@...:/# ip a
bash: ip: command not found

root@...:/# ping -c 1 8.8.8.8
bash: ping: command not found

Explanation: We are in "solitary confinement." We can’t reach the outside world to install new tools. This is clearly not useful for our interconnected CI/CD services, but it’s a critical tool for security-hardening.

Cleanup:

# (Inside debian container)
root@...:/# exit

2.6 Advanced Drivers: `macvlan` and `ipvlan`

Finally, there are advanced drivers for niche use cases where containers need to appear as if they are physically on your local network.

The Analogy: "A Physical Mailbox"

Instead of sharing the apartment building’s mailroom (the host’s IP), these drivers give a container its own physical street address (a unique IP on your LAN). Your home router will see the container as just another device, like your phone or laptop.

These drivers are powerful but complex. The fundamental difference between them is:

macvlan (Layer 2): This gives the container its own unique MAC address (a physical hardware address). It truly appears as a separate physical device on the network.
ipvlan (Layer 3): This is a more subtle approach. All containers share the host’s MAC address, but the kernel routes traffic to the correct container based on its unique IP address.

The `macvlan` "Wi-Fi" Pain Point

macvlan is notoriously fragile and fails on almost all Wi-Fi networks. This is a common "gotcha" for developers trying to use it on a laptop.

The reason is a "first principles" security feature of Wi-Fi. A Wi-Fi access point is designed to allow only one MAC address (your laptop’s) to communicate per connection. When macvlan tries to send packets from new virtual MAC addresses, the access point sees this as a spoofing attack and drops the packets.

Interestingly, ipvlan often works on Wi-Fi because it cleverly uses the host’s single, approved MAC address for all its packets.

These drivers are for legacy applications that must be on the physical network or for complex network segmentation. This is far more complexity than we need for our self-contained stack.

2.7 Chapter 2 Conclusion: Our Choice

We’ve explored the four main types of Docker networking. We proved that the default bridge network is useless for our stack because it lacks DNS. We saw that host is insecure, none is too isolated, and macvlan/ipvlan are unnecessarily complex.

Our choice is clear: the Custom bridge Network is the only one that provides the perfect balance of isolation from the host and service discovery (DNS) between our containers.

In our final "Action Plan," we will create one single, permanent, custom bridge network named cicd-net that all our services will share.

Chapter 3: Docker Persistence (The "Foundations")

3.1 The "Ephemeral Container" Problem

We have successfully built our "Control Center" and laid down the "roads" (cicd-net) for our services to communicate. Now, we must solve the second fundamental flaw of a default Docker setup: containers are ephemeral. They have no long-term memory.

A container’s filesystem is like an Etch A Sketch. You can do complex work inside it—install software, write files, run a database—but the moment you docker rm that container, the screen is "shaken clean." All of your data, your configuration, your repositories, and your build logs are permanently destroyed.

This is unacceptable for our stack. GitLab must persist Git repositories, Jenkins must persist job configurations, and Artifactory must persist binary artifacts.

The "First Principle": Copy-on-Write

This ephemerality is not a bug; it’s a core design feature that makes containers fast and lightweight. It’s achieved through a mechanism called Copy-on-Write (CoW).

Here is how it works:

Image Layers (Read-Only): A Docker image (like debian:12) is a stack of read-only layers. Think of these as a set of transparent blueprint sheets. They are immutable and are never changed.
Container Layer (Writable): When you docker run an image, Docker adds a single, thin, writable layer on top of the read-only stack. This is the "Etch A Sketch" screen.
The "Copy": When you read a file, you are just looking down through the transparent layers. But when you modify a file (or write a new one), the storage driver performs a "copy-on-write." It copies the file from the read-only layer "up" into your top writable layer and then modifies it.
The "Deletion": When you run docker rm, Docker doesn’t delete the massive, multi-gigabyte image layers. It only deletes your thin, top-level writable layer. This is why removing a container is instantaneous, and it’s also why all your data vanishes.

Let’s prove this.

Pedagogical Example: Proving Data is Lost

We will now prove this "Etch A Sketch" behavior. From your dev-container "Control Center," run the following commands.

First, let’s create a temporary debian:12 container, give it a name, and get a shell inside it.

# (Inside dev-container)
# 1. Run a container and create a file
docker run -it --name ephemeral-test debian:12 bash

Now, from inside this new ephemeral-test container, we will create a file in its filesystem.

# (Run inside 'ephemeral-test' container)
root@...:/# echo "My secret data" > /mydata.txt

# Verify the file was created
root@...:/# cat /mydata.txt

Result:

My secret data

The file exists. Now, exit the container and return to your dev-container shell.

# (Run inside 'ephemeral-test' container)
root@...:/# exit

Back in your dev-container, the ephemeral-test container is stopped. Let’s remove it, which "shakes the Etch A Sketch."

# (Inside dev-container)
# 2. Remove the container (this deletes the writable layer)
docker rm ephemeral-test

Now, let’s create a new container with the exact same name and image.

# (Inside dev-container)
# 3. Run a new container with the same name
docker run -it --name ephemeral-test debian:12 bash

Finally, from inside this new container, let’s look for the file we created.

# (Run inside 'ephemeral-test' container)
# 4. Look for the file
root@...:/# cat /mydata.txt

Result:

cat: /mydata.txt: No such file or directory

Explanation: This is the proof. The data was permanently destroyed along with the first container’s writable layer. This proves we need an external storage solution that exists outside this ephemeral lifecycle.

3.2 The Solution: Persistent Storage

To solve this problem, we must store our data outside the container’s ephemeral writable layer, in a location that persists independently. Docker provides three primary mechanisms for this:

Docker-Managed Volumes: The modern, preferred solution.
Bind Mounts: A powerful tool, but with significant side effects.
Tmpfs Mounts: A special-case, in-memory (non-persistent) option.

We will explore all three to build our professional, hybrid strategy.

3.3 Solution 1: Docker-Managed Volumes (The "Best Practice")

This is the modern, recommended way to persist data generated by a container. A volume is a "black box" of storage that is created and managed directly by the Docker daemon.

The Analogy: "The Smart Storage Locker"

Think of a volume as a smart storage locker that you rent from Docker.

You ask Docker to create one, giving it a name (e.g., docker volume create gitlab-data). This is like renting a new locker and getting a key.

You tell your container, "Mount the locker gitlab-data at the path /var/opt/gitlab."

The container now writes all its application data into this "locker."

The key insight is that you don’t know or care where in the warehouse Docker physically placed your locker (it’s tucked away in a deep system directory). You just use the "key" (the volume name), and Docker handles all the plumbing. When you docker rm the container, you are just throwing away the key, not the locker itself, which remains safe and sound in Docker’s warehouse, ready to be handed to the next container.

De-mystifying the "Black Box"

This "storage locker" isn’t magic. It’s just a directory on your host’s filesystem that Docker manages for you, and we can prove it. The Docker CLI gives us the tools to inspect this "black box."

From your dev-container, let’s create a volume:

# (Inside dev-container)
# 1. Create a new, named volume
docker volume create my-app-data

Now, let’s use the inspect command to find out where Docker physically put this "locker" on the host machine.

# (Inside dev-container)
# 2. Inspect the volume
docker volume inspect my-app-data

Result:

[
{
"CreatedAt": "2025-10-31T14:54:21Z",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/my-app-data/_data",
"Name": "my-app-data",
"Options": {},
"Scope": "local"
}
]

Explanation: The "magic" is gone. The Mountpoint field tells us exactly where this volume lives on our host: /var/lib/docker/volumes/my-app-data/_data. Docker just handles the management of this directory so we don’t have to.

Pedagogical Example: Proving Volume Persistence

Now let’s re-run our failed "Etch A Sketch" experiment, but this time, we’ll attach our "storage locker."

# (Inside dev-container)
# 1. Run a container, mounting the volume to a path
#    Note: --rm automatically deletes the container on exit
docker run -it --rm --name volume-test-1 \
-v my-app-data:/app/data \
debian:12 bash

Inside this new container, the directory /app/data is not part of the ephemeral filesystem; it’s a portal to our "storage locker."

# (Run inside 'volume-test-1' container)
# 2. Create a file inside the mounted volume
echo "I am persistent" > /app/data/persistent.txt
cat /app/data/persistent.txt

Result:

I am persistent

Now, exit the container. The --rm flag ensures it is immediately destroyed.

# (Run inside 'volume-test-1' container)
exit

The container is gone. Let’s run a brand-new container and attach the same "storage locker."

# (Inside dev-container)
# 3. Run a *new* container and mount the *same* volume
docker run -it --rm --name volume-test-2 \
-v my-app-data:/app/data \
debian:12 bash

Now, let’s look for the file created by the first, long-gone container.

# (Run inside 'volume-test-2'

Chapter 1: Introduction - Rejecting Fragility

1.1 The Goal: Building the "Factory Floor"

The Problem: Default Docker is Isolated and Ephemeral

The Analogy: "CI/CD City Planning"

The Solution: Our Three-Part Foundation

1.2 The "Why": Choosing Docker as Our Foundation

The Problem: "Dependency Hell" and Server "Drift"

The Analogy: "Houses vs. Apartments"

The Solution: Isolation Without the Overhead

1.3 The "Control Center": Docker-out-of-Docker

The Problem: Where do ‘docker’ commands come from?

The Analogy: "The Master Remote Control"

The Principle: Docker CLI vs. Docker Daemon

1.4 The "How": A Pedagogical Example (Failure First)

Example 1: The ‘docker’ command fails

Example 2: The ‘permission denied’ failure (The Real Pain Point)

Example 3: The Real GID Mismatch Failure

1.5 The "Action Plan": Implementing DooD

Step 1: Install the Docker CLI (The "Remote Control")

Step 2: Grant Permission (The "GID Mismatch" Fix)

Action 2a: Pass the Host GID During the Build

Action 2b: Use the GID in the Dockerfile

Step 3: Mount the Socket (The "Control Port")

Step 4: Rebuild and Recreate the Environment

1.6 The "Verification": The "Success Second"

Example 2: The ‘docker’ command succeeds

1.7 The "City Plan": Our 10-Article Stack

Chapter 2: Docker Networking (The "Roads")

2.1 The "Default Isolation" Problem

2.2 The Default bridge Network

2.3 The Solution: Custom bridge Networks

The "First Principles" of Embedded DNS

Pedagogical Example: The Custom Bridge Success

2.4 Driver 2: The host Network (No Isolation)

2.5 Driver 3: The none Network (Total Isolation)

2.6 Advanced Drivers: macvlan and ipvlan

The macvlan "Wi-Fi" Pain Point

2.7 Chapter 2 Conclusion: Our Choice

Chapter 3: Docker Persistence (The "Foundations")

3.1 The "Ephemeral Container" Problem

The "First Principle": Copy-on-Write

Pedagogical Example: Proving Data is Lost

3.2 The Solution: Persistent Storage

3.3 Solution 1: Docker-Managed Volumes (The "Best Practice")

De-mystifying the "Black Box"

Pedagogical Example: Proving Volume Persistence

Similar Posts

2.2 The Default `bridge` Network

2.3 The Solution: Custom `bridge` Networks

2.4 Driver 2: The `host` Network (No Isolation)

2.5 Driver 3: The `none` Network (Total Isolation)

2.6 Advanced Drivers: `macvlan` and `ipvlan`

The `macvlan` "Wi-Fi" Pain Point