How to Monitor Docker Containers using Nagios

0
4

Can Nagios monitor docker container? Yes, in this tutorial, you will learn how to monitor Docker containers using Nagios. Monitoring your IT infrastructure is an important process as it helps you identify issues in your infrastructure that might have adverse effects on your productivity. IT infra monitoring resolves around checking health status, availability, performance, resource usage of your servers, containers, virtual machines, e.t.c. In this tutorial, we will focus on Docker container monitoring using Nagios core server.

Monitor Docker Containers using Nagios

So, how can you monitor Docker containers using Nagios?

Well, there are multiple scripts on the Internet that have been written for the very purposes of monitoring Docker containers using Nagios.

An example of such awesome Python scripts is check_docker created by none other than timdaman.

You would simply download his script as follows;

curl -o /usr/local/bin/check_docker \
https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker/check_docker.py

Or simply use wget;

wget -O /usr/local/bin/check_docker \
https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker/check_docker.py

This script allows you to check quite a number of Docker container metrics and other stuff such as;

  • memory consumption in absolute units (bytes, kb, mb, gb) and as a percentage (0-100%) of the container limit.
  • CPU usages as a percentage (0-100%) of container limit.
  • automatic restarts performed by the docker daemon
  • container status, i.e. is it running?
  • container health checks are passing?
  • uptime, i.e. is it able to stay running for a long enough time?
  • the presence of a container or containers matching specified names
  • image version, does the running image match that in the remote registry?
  • image age, when was the image built the last time?

You can download the script to your Docker host and make it executable;

chmod +x /usr/local/bin/check_docker

To see how to use the script to run various Docker container checks, simply execute it without any command line options;

/usr/local/bin/check_docker

Sample output;

usage: check_docker [-h] [--connection [//docker.socket|:] | --secure-connection [:]]
                    [--binary_units | --decimal_units] [--timeout TIMEOUT] [--containers CONTAINERS [CONTAINERS ...]] [--present] [--threads THREADS] [--cpu WARN:CRIT]
                    [--memory WARN:CRIT:UNITS] [--status STATUS] [--health] [--uptime WARN:CRIT] [--image-age WARN:CRIT] [--version]
                    [--insecure-registries INSECURE_REGISTRIES [INSECURE_REGISTRIES ...]] [--restarts WARN:CRIT] [--no-ok] [--no-performance] [-V]
Check docker containers.
options:
  -h, --help            show this help message and exit
  --connection [//docker.socket|:]
                        Where to find docker daemon socket. (default: /var/run/docker.sock)
  --secure-connection [:]
                        Where to find TLS protected docker daemon socket.
  --binary_units        Use a base of 1024 when doing calculations of KB, MB, GB, & TB (This is default)
  --decimal_units       Use a base of 1000 when doing calculations of KB, MB, GB, & TB
  --timeout TIMEOUT     Connection timeout in seconds. (default: 10.0)
  --containers CONTAINERS [CONTAINERS ...]
                        One or more RegEx that match the names of the container(s) to check. If omitted all containers are checked. (default: ['all'])
  --present             Modifies --containers so that each RegEx must match at least one container.
  --threads THREADS     This + 1 is the maximum number of concurent threads/network connections. (default: 10)
  --cpu WARN:CRIT       Check cpu usage percentage taking into account any limits.
  --memory WARN:CRIT:UNITS
                        Check memory usage taking into account any limits. Valid values for units are %,B,KB,MB,GB.
  --status STATUS       Desired container status (running, exited, etc).
  --health              Check container's health check status
  --uptime WARN:CRIT    Minimum container uptime in seconds. Use when infrequent crashes are tolerated.
  --image-age WARN:CRIT
                        Maximum image age in days.
  --version             Check if the running images are the same version as those in the registry. Useful for finding stale images. Does not support login.
  --insecure-registries INSECURE_REGISTRIES [INSECURE_REGISTRIES ...]
                        List of registries to connect to with http(no TLS). Useful when using "--version" with images from insecure registries.
  --restarts WARN:CRIT  Container restart thresholds.
  --no-ok               Make output terse suppressing OK messages. If all checks are OK return a single OK.
  --no-performance      Suppress performance data. Reduces output when performance data is not being used.
  -V                    show program's version number and exit
UNKNOWN: No checks specified.

You can use this script to check Docker container status, check memory and cpu usage e.t.c as explained in the command line options above.

For example, to check the CPU usage status of all Docker containers, simply execute;

/usr/local/bin/check_docker --cpu WARN:CRIT

e.g

/usr/local/bin/check_docker --cpu 70:80

Sample output;

OK: dozzle cpu is 0%; OK: nagios-core-4.4.9 cpu is 0%|dozzle_cpu=0;70;80;0;100 nagios-core-4.4.9_cpu=0;70;80;0;100

If you want to check for a single Docker container pass the --containers options with specific Docker container name;

/usr/local/bin/check_docker --cpu 70:80 --container dozzle

To check running Docker containers;

/usr/local/bin/check_docker --status running

Create Nagios Bash Script Plugin to Monitor Docker Containers

Well, you can create your own bash script to monitor Docker containers using Nagios. In this tutorial, we will create our bash script that can be used to monitor Docker container status, RAM and CPU usage.

We placed the script under /usr/local/nagios/libexec/ as check_docker on the Docker host;

cat /usr/local/nagios/libexec/check_docker
#!/bin/bash
# check for --help|-h or --memory or --cpu or --status flag

if [[ $1 == "-h" || $1 == "--help" ]]; then
	echo "Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>"
elif [[ $1 == "--memory" ]]; then
    if [[ -z $2 || -z $3 || -z $4 ]]; then
        echo "Missing argument for --memory option. Usage: check_docker --memory WARN:CRIT --container <name>"
        exit 3
    fi
    # split warning and critical thresholds
    IFS=':' read -ra THRESHOLDS <<< "$2"
    WARN=${THRESHOLDS[0]}
    CRIT=${THRESHOLDS[1]}
    # get container name
    CONTAINER=$4
    # get container memory usage
    #USAGE=$(docker stats --no-stream --format "{{.MemUsage}}" $CONTAINER | awk '{print $1}' | numfmt --to=iec-i)
    USAGE=$(docker stats --no-stream --format "{{.MemPerc}}" $CONTAINER | sed 's/.$//')
    # compare usage against thresholds
    if (( $(echo "$USAGE > $CRIT" | awk '{print ($1>$2)}') )); then
        echo "CRITICAL: Memory usage of container $CONTAINER is at $USAGE%"
        exit 2
    elif (( $(echo "$USAGE > $WARN" | awk '{print ($1>$2)}') )); then
        echo "WARNING: Memory usage of container $CONTAINER is at $USAGE%"
        exit 1
    else
        echo "OK: Memory usage of container $CONTAINER is at $USAGE%"
        exit 0
    fi
elif [[ $1 == "--cpu" ]]; then
    if [[ -z $2 || -z $3 || -z $4 ]]; then
        echo "Missing argument for --cpu option. Usage: check_docker --cpu WARN:CRIT --container <name>"
        exit 3
    fi
    # split warning and critical thresholds
    IFS=':' read -ra THRESHOLDS <<< "$2"
    WARN=${THRESHOLDS[0]}
    CRIT=${THRESHOLDS[1]}
    # get container name
    CONTAINER=$4
    # get container CPU usage
    USAGE=$(docker stats --no-stream --format "{{.CPUPerc}}" $CONTAINER | sed 's/.$//')
    # compare usage against thresholds
    if (( $(echo "$USAGE > $CRIT" | awk '{print ($1>$2)}') )); then
        echo "CRITICAL: CPU usage of container $CONTAINER is at $USAGE%"
        exit 2
    elif (( $(echo "$USAGE > $WARN" | awk '{print ($1>$2)}') )); then
        echo "WARNING: CPU usage of container $CONTAINER is at $USAGE%"
        exit 1
    else
        echo "OK: CPU usage of container $CONTAINER is at $USAGE%"
        exit 0
    fi
elif [[ $1 == "--status" ]]; then
    if [[ -z $2 || -z $3 ]]; then
        echo "Missing argument for --status option. Usage: check_docker --status --container <name>"
        exit 3
    fi
    CONTAINER=$3
    STATUS=$(docker inspect --format "{{.State.Status}}" $CONTAINER)
    if [ "$STATUS" == "running" ]; then
	    echo "OK: container $CONTAINER is running"
	    exit 0
    else
	    echo "CRITICAL: container $CONTAINER is $STATUS"
	    exit 2
    fi
else
    echo "Invalid flag. Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>"
    exit 3
fi

Make the script executable;

chmod +x /usr/local/nagios/libexec/check_docker

You can then run the script to monitor Docker containers status, RAM and CPU usage.

If you don’t supply any arguments;

/usr/local/nagios/libexec/check_docker

Sample output;

Invalid flag. Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>

To see usage;

check_docker [--help|-h]
Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>

To check status of the Docker container;

/usr/local/nagios/libexec/check_docker --status --container <name>

E.g;

/usr/local/nagios/libexec/check_docker --status --container dozzle

Sample output;

OK: container dozzle is running

To check Docker Container memory usage;

/usr/local/nagios/libexec/check_docker --memory WARN:CRIT --container <name>

To check Docker container CPU usage;

/usr/local/nagios/libexec/check_docker --cpu WARN:CRIT --container <name>

With this in place, you can then install NRPE agents on the host and configure it to check Docker container status, RAM or CPU usage using the script above.

See our sample nrpe.cfg configuration;

cat /usr/local/nagios/etc/nrpe.cfg
log_facility=daemon
debug=0
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,192.168.59.48
dont_blame_nrpe=1
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_container_status]=/usr/local/nagios/libexec/check_docker --status --container $ARG1$
command[check_container_memory]=/usr/local/nagios/libexec/check_docker --memory $ARG1$:$ARG2$ --container $ARG3$
command[check_container_cpu]=/usr/local/nagios/libexec/check_docker --cpu $ARG1$:$ARG2$ --container $ARG3$

See the three lines for checking the status, memory and cpu;

command[check_container_status]=/usr/local/nagios/libexec/check_docker --status --container $ARG1$
command[check_container_memory]=/usr/local/nagios/libexec/check_docker --memory $ARG1$:$ARG2$ --container $ARG3$
command[check_container_cpu]=/usr/local/nagios/libexec/check_docker --cpu $ARG1$:$ARG2$ --container $ARG3$

Restart NRPE agent!

systemctl restart nrpe

Or;

systemctl restart nagios-nrpe-server

Next, add host, relevant services and check commands to your Nagios servers;

See our sample configs on our Nagios server;

Command Definitions;

vim commands.cfg
...
define command {
    command_name    check_docker_status
    command_line    /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
}
define command {
    command_name    check_docker_metrics
    command_line    /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$
}

Host/Hostgroup definitions;

vim host-hostgroups.cfg
...
define host {
    use                     linux-server
    host_name               east02-docker-node
    alias                   East02-Docker-NODE
    address                 192.168.59.49
}

Service Definitions;

vim services.cfg
define service {
    use                     local-service
    host_name               east02-docker-node
    service_description     Check Dozzle Container Status
    check_command           check_docker_status!check_container_status!dozzle
}
define service {
    use                     local-service
    host_name               east02-docker-node
    service_description     Current Dozzle Container RAM Usage
    check_command           check_docker_metrics!check_container_memory!50!70!dozzle
}
define service {
    use                     local-service
    host_name               east02-docker-node
    service_description     Current Dozzle Container CPU Usage
    check_command           check_docker_metrics!check_container_cpu!70!80!dozzle
}

Check Nagios configs for any error and restart the services;

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Sample output;

Nagios Core 4.4.9
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2022-11-16
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
	Checked 14 services.
	Checked 3 hosts.
	Checked 1 host groups.
	Checked 0 service groups.
	Checked 1 contacts.
	Checked 1 contact groups.
	Checked 27 commands.
	Checked 5 time periods.
	Checked 0 host escalations.
	Checked 0 service escalations.
Checking for circular paths...
	Checked 3 hosts
	Checked 0 service dependencies
	Checked 0 host dependencies
	Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Confirm Docker container checks on Nagios;

How to Monitor Docker Containers using Nagios

And that is how you can easily use Nagios to monitor Docker containers.

That brings us to the end of our tutorial on how to monitor Docker Containers using Nagios.

Other Tutorials

Deploy NRPE Agent as a Docker Container

How to Check Docker Container RAM and CPU Usage

LEAVE A REPLY

Please enter your comment!
Please enter your name here