Centralized Monitoring for Docker Containers with Prometheus & Grafana

Docker makes deploying applications easy. Monitoring them? Not so much.Monitoring containers across multiple servers is challenging/complex. Manually logging into each host to check whether containers are running, inspecting logs, tracking memory usage, or verifying disk space quickly becomes tedious and error prone.These are not tasks a DevOps engineer should be doing manually. What’s needed is centralized visibility, real-time metrics, and proactive alerts.So how to solve this issue?We solve this problem by setting up a central monitoring server that continuously collects metrics and logs from all Docker hosts. Instead of checking each server manually, the monitoring system automatically scrapes data and presents it in a unified view.This centralised approach provides:A single source of truth for metrics and logsReal-time visibility across all serversAlerts before issues impact usersTo achieve this, we’ll use the following open-source tools: — Collects and stores metrics using a pull-based model — Visualizes metrics and logs through dashboards — Exposes container-level resource metrics — Collects host-level metrics (CPU, memory, disk, network) — Ships container and system logs — Stores and queries logs efficiently — Handles alerts and notificationsTogether, these components form a complete observability stack for Docker-based environments.Here’s the link for full source code, clone the repo and follow the rest of the blogOpen ports on central server:Services will be accessible atWe’ll be having a centralised server where all the logs and metrics scrapped will be pulled, the metrics and logs will be pulled from docker host servers where containers are runningPromtail, cadvisor and node-exporter will be running along with other docker containers: Collects container-level metrics such as CPU, memory, filesystem, and network usage and ships them to prometheus: Exposes host-level metrics including CPU load, memory usage, disk I/O, and network statistics: Tails container and system logs and ships them to Loki.These components run alongside Docker containers and require minimal resources, making them ideal for deployment on every hostWe’ll be having prometheus, grafana, loki, alertmanager: Prometheus, running on the central monitoring server:Periodically from cAdvisor and Node Exporter endpoints.Stores metrics as time-series data.Evaluates alerting rules to detect abnormal behavior.This pull-based approach ensures reliability and consistency across all monitored hosts.: Promtail pushes logs to Loki, which:Indexes logs using labels (such as container name, host, and namespace).Stores logs efficiently without full-text indexing.Enables fast querying and correlation with metrics.: Grafana acts as the unified visualization layer:Queries for metrics.Displays dashboards showing real-time and historical insights.: When Prometheus detects threshold breaches or anomalies:Alerts are sent to .Alertmanager routes notifications to configured channels such as Slack, email, or PagerDuty.Alert grouping and deduplication prevent alert fatigue.You can add you custom alerts in alertmanager which we’ll do in later in this setupWe start by setting up a central monitoring server that will host Prometheus, Grafana, Loki, Alertmanager, and supporting components.First, create a dedicated directory for the monitoring stack:Create the required folders and configuration files:mkdir central-monitor cd central-monitormkdir alertmanager touch loki-config.yaml touch promtail-config.yamlcentral-monitor/ │ └── alertingrules.yaml You can get the full docker-compose file from github.The docker-compose.yml file is the . It defines all services, networking, volumes, and how components communicate.Runs Prometheus, Grafana, Loki, AlertmanagerRuns cAdvisor and Node Exporter (for local host monitoring)Ensures all services are on the same Docker networkversion: “3” monitor-net: prometheus_data: {} : creates a bridge network which will allow multiple containers to communicate with each other: It persists Prometheus & Grafana data if the containers are restartedprometheus: image: prom/prometheus:v2.38.0 container_name: prometheus volumes: - ./prometheus:/etc/prometheus - prometheus_data:/prometheus - ‘–config.file=/etc/prometheus/prometheus.yml’ - ‘–storage.tsdb.path=/prometheus’ - ‘–storage.tsdb.retention.time=200h’ - ‘–web.enable-lifecycle’ ports: networks:Scrapes metrics from exportersExposes prometheus on port 9090Prometheus_data is the volume we created/prometheus is direcroty being mounted as volumeuses monitor-net as networkalertmanager: image: prom/alertmanager:v0.27.0 container_name: alertmanager ports: volumes: - ./alertmanager/:/etc/alertmanager/ networks: Taking /alertmanager direcory as volumeHandles alert routing, grouping, and notifications.cadvisor: image: gcr.io/cadvisor/cadvisor:latest - 8080:8080 - /:/rootfs:ro - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro networks: Exposes like CPU, memory, filesystem, and network usage of the containers running in central serverNode Exporter (Host Metrics)node-exporter: image: prom/node-exporter:latest - 9100:9100 - /proc:/host/proc:ro - /:/rootfs:ro - monitor-netProvides such as CPU, RAM, disk, and network statsloki: image: grafana/loki:3.4.1 - 3100:3100 - ./loki-config.yaml:/etc/loki/local-config.yaml command: -config.file=/etc/loki/local-config.yamlCentralized, label-based log storage optimized for Grafana.promtail: image: grafana/promtail:3.4.1 - /var/log:/var/log:ro - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/run/docker.sock:/var/run/docker.sock - ./promtail-config.yaml:/etc/promtail/config.yml:ro depends_on: Collects container and system logs and pushes them to Loki.grafana: image: grafana/grafana:latest - 3000:3000 - grafana_data:/var/lib/grafana - GF_SECURITY_ADMIN_PASSWORD=admin - prometheus Visualization layer for metrics and logs.Now lets understand config filesprometheus/prometheus.ymlMetrics sources (local & remote)global: scrape_interval: 15s alerting: alertmanagers: - targets: [‘alertmanager:9093’]rule_files: - “alertingrules.yaml“Local services (Prometheus, cAdvisor, Node Exporter)Remote Docker hosts by IP- job_name: ‘local-cadvisor’ static_configs: - targets: [‘cadvisor:8080’]- job_name: ‘stage-remote-cadvisors’- targets:labels:- job_name: ‘prod-node-exporters’- targets:labels:environment: ’production’Remote servers are grouped by (dev, stage, production), making dashboards and alerts environment-aware.prometheus/alertingrules.yamlDefines container health and performance alerts.- alert: ContainerHighMemoryUsage expr: (memory_usage / memory_limit * 100) > 80 labels: These rules help detect both failures and inefficiencies.Alertmanager Configurationalertmanager/alertmanager.ymlHandles alert routing and notifications.route: receiver: ’email-notifications’receivers: - name: ‘email-notifications’ email_configs: - to: ’ ’ smarthost: ‘smtp.office365.com:587’inhibit_rules: - source_match: target_match: Prevents alert noise by suppressing lower-severity alerts.Storage backend (filesystem)limits_config: retention_period: 168hLogs are retained for .Handles log discovery and shipping.Docker container logs via Docker socketSystem logs from /var/logdocker_sd_configs: - host: unix:///var/run/docker.sockEach log is labeled with container metadata for efficient querying.Metrics are scraped centrally by PrometheusLogs are aggregated in LokiDashboards and alerts are handled by GrafanaNotifications are sent via AlertmanagerNow Lets setup docker host serversEach Docker host runs a lightweight monitoring stack responsible for exposing metrics and shipping logs to the central monitoring server.On every host, we deploy: → container-level metrics → host-level metrics → log collection and shippingOn each Docker host, create a simple directory:mkdir monitor cd monitor# Create the required files:touch docker-compose.yaml touch promtail-config.yamlmonitor/ ├── docker-compose.yaml docker-compose.yaml (Host-Level Services)This docker-compose.yaml runs required on Docker hosts. No Prometheus, Grafana, or Loki runs here — everything is sent to the central server.version: “3” monitor-net: A dedicated bridge network keeps monitoring traffic isolated.cAdvisor (Container Metrics)cadvisor: image: gcr.io/cadvisor/cadvisor:latest ports: volumes: - /var/run:/var/run:rw - /var/lib/docker/:/var/lib/docker:ro networks: command: [“-disable_metrics=percpu”, “-docker_only=true”, “-raw_cgroup_prefix_whitelist=/docker”]Automatically detects running Docker containersExposes container-level CPU, memory, filesystem, and network metricsMetrics are scraped remotely by PrometheusThe mounted volumes give cAdvisor to Docker and host internals.Node Exporter (Host Metrics)node-exporter: image: prom/node-exporter:latest container_name: node-exporter ports: volumes: - /sys:/host/sys:ro command: - ’ - path.procfs=/host/proc’ - ’ - path.rootfs=/rootfs’ - ’ - path.sysfs=/host/sys’ - ’ - collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)’ networks: Collects host-level metrics:Disk and filesystem usageHelps identify whether an issue is host-related or container-relatedPromtail (Log Collection)promtail: image: grafana/promtail:3.4.1 volumes: - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/run/docker.sock:/var/run/docker.sock - ./promtail-config.yaml:/etc/promtail/config.yml:ro networks: Collects Docker container logsCollects System logs from /var/logCollects Ships logs to Loki running on the central serverpromtail-config.yaml (Log Shipping Configuration)This file defines how Promtail discovers logs, labels them, and sends them to Loki.server: http_listen_port: 9080 Promtail exposes a small HTTP endpoint for health checks and metrics.clients: - url: :3100/loki/api/v1/pushSends logs to the Replace the IP with your monitoring server’s addressContainer Logs (Docker Service Discovery)scrape_configs: - job_name: container_logs - host: unix:///var/run/docker.sock Promtail dynamically discovers Docker containers via the Docker socket.relabel_configs: - source_labels: [‘__meta_docker_container_name’] target_label: ’container’Adds container name as a label for efficient filtering in Grafana.Server & Environment Labels- target_label: ‘server’ replacement: ‘test’ - target_label: ‘environment’ replacement: ’production’Environment-based dashboards (dev, stage, prod)- job_name: system static_configs: - localhost job: varlogs environment: ‘production’ Collects system-level logs such as:Data Flow Summary (Host Perspective) → exposes container metrics → exposes host metrics → scrapes metrics → pushes logs to Loki → visualizes everythingWith this setup on every Docker hostNo manual SSH or log inspectionCentralized metrics and logsEnvironment-aware dashboardsFaster debugging and alerting

Similar Posts