Can Nagios monitor docker container? Yes, in this tutorial, you will learn how to monitor Docker containers using Nagios. Monitoring your IT infrastructure is an important process as it helps you identify issues in your infrastructure that might have adverse effects on your productivity. IT infra monitoring resolves around checking health status, availability, performance, resource usage of your servers, containers, virtual machines, e.t.c. In this tutorial, we will focus on Docker container monitoring using Nagios core server.
Table of Contents
Monitoring Docker Containers using Nagios
Using Community Written Nagios Scripts to Monitor Docker container
Well, there are multiple scripts on the Internet that have been written for the very purposes of monitoring Docker containers using Nagios.
An example of such awesome Python scripts is check_docker created by none other than timdaman.
You would simply download his script as follows;
curl -o /usr/local/bin/check_docker \
https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker/check_docker.py
Or simply use wget;
wget -O /usr/local/bin/check_docker \
https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker/check_docker.py
This script allows you to check quite a number of Docker container metrics and other stuff such as;
- memory consumption in absolute units (bytes, kb, mb, gb) and as a percentage (0-100%) of the container limit.
- CPU usages as a percentage (0-100%) of container limit.
- automatic restarts performed by the docker daemon
- container status, i.e. is it running?
- container health checks are passing?
- uptime, i.e. is it able to stay running for a long enough time?
- the presence of a container or containers matching specified names
- image version, does the running image match that in the remote registry?
- image age, when was the image built the last time?
You can download the script to your Docker host and make it executable;
chmod +x /usr/local/bin/check_docker
To see how to use the script to run various Docker container checks, simply execute it without any command line options;
/usr/local/bin/check_docker
Sample output;
usage: check_docker [-h] [--connection [//docker.socket|:] | --secure-connection [:]]
[--binary_units | --decimal_units] [--timeout TIMEOUT] [--containers CONTAINERS [CONTAINERS ...]] [--present] [--threads THREADS] [--cpu WARN:CRIT]
[--memory WARN:CRIT:UNITS] [--status STATUS] [--health] [--uptime WARN:CRIT] [--image-age WARN:CRIT] [--version]
[--insecure-registries INSECURE_REGISTRIES [INSECURE_REGISTRIES ...]] [--restarts WARN:CRIT] [--no-ok] [--no-performance] [-V]
Check docker containers.
options:
-h, --help show this help message and exit
--connection [//docker.socket|:]
Where to find docker daemon socket. (default: /var/run/docker.sock)
--secure-connection [:]
Where to find TLS protected docker daemon socket.
--binary_units Use a base of 1024 when doing calculations of KB, MB, GB, & TB (This is default)
--decimal_units Use a base of 1000 when doing calculations of KB, MB, GB, & TB
--timeout TIMEOUT Connection timeout in seconds. (default: 10.0)
--containers CONTAINERS [CONTAINERS ...]
One or more RegEx that match the names of the container(s) to check. If omitted all containers are checked. (default: ['all'])
--present Modifies --containers so that each RegEx must match at least one container.
--threads THREADS This + 1 is the maximum number of concurent threads/network connections. (default: 10)
--cpu WARN:CRIT Check cpu usage percentage taking into account any limits.
--memory WARN:CRIT:UNITS
Check memory usage taking into account any limits. Valid values for units are %,B,KB,MB,GB.
--status STATUS Desired container status (running, exited, etc).
--health Check container's health check status
--uptime WARN:CRIT Minimum container uptime in seconds. Use when infrequent crashes are tolerated.
--image-age WARN:CRIT
Maximum image age in days.
--version Check if the running images are the same version as those in the registry. Useful for finding stale images. Does not support login.
--insecure-registries INSECURE_REGISTRIES [INSECURE_REGISTRIES ...]
List of registries to connect to with http(no TLS). Useful when using "--version" with images from insecure registries.
--restarts WARN:CRIT Container restart thresholds.
--no-ok Make output terse suppressing OK messages. If all checks are OK return a single OK.
--no-performance Suppress performance data. Reduces output when performance data is not being used.
-V show program's version number and exit
UNKNOWN: No checks specified.
You can use this script to check Docker container status, check memory and cpu usage e.t.c as explained in the command line options above.
For example, to check the CPU usage status of all Docker containers, simply execute;
/usr/local/bin/check_docker --cpu WARN:CRIT
e.g
/usr/local/bin/check_docker --cpu 70:80
Sample output;
OK: dozzle cpu is 0%; OK: nagios-core-4.4.9 cpu is 0%|dozzle_cpu=0;70;80;0;100 nagios-core-4.4.9_cpu=0;70;80;0;100
If you want to check for a single Docker container pass the --containers
options with specific Docker container name;
/usr/local/bin/check_docker --cpu 70:80 --container dozzle
To check running Docker containers;
/usr/local/bin/check_docker --status running
Create Nagios Bash Script Plugin to Monitor Docker Containers
Well, you can create your own bash script for monitoring Docker containers using Nagios. In this tutorial, we will create our bash script that can be used to monitor Docker container status, RAM and CPU usage.
We placed the script under /usr/local/nagios/libexec/
as check_docker
on the Docker host;
cat /usr/local/nagios/libexec/check_docker
#!/bin/bash
# check for --help|-h or --memory or --cpu or --status flag
if [[ $1 == "-h" || $1 == "--help" ]]; then
echo "Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>"
elif [[ $1 == "--memory" ]]; then
if [[ -z $2 || -z $3 || -z $4 ]]; then
echo "Missing argument for --memory option. Usage: check_docker --memory WARN:CRIT --container <name>"
exit 3
fi
# split warning and critical thresholds
IFS=':' read -ra THRESHOLDS <<< "$2"
WARN=${THRESHOLDS[0]}
CRIT=${THRESHOLDS[1]}
# get container name
CONTAINER=$4
# get container memory usage
#USAGE=$(docker stats --no-stream --format "{{.MemUsage}}" $CONTAINER | awk '{print $1}' | numfmt --to=iec-i)
USAGE=$(docker stats --no-stream --format "{{.MemPerc}}" $CONTAINER | sed 's/.$//')
# compare usage against thresholds
if (( $(echo "$USAGE > $CRIT" | awk '{print ($1>$2)}') )); then
echo "CRITICAL: Memory usage of container $CONTAINER is at $USAGE%"
exit 2
elif (( $(echo "$USAGE > $WARN" | awk '{print ($1>$2)}') )); then
echo "WARNING: Memory usage of container $CONTAINER is at $USAGE%"
exit 1
else
echo "OK: Memory usage of container $CONTAINER is at $USAGE%"
exit 0
fi
elif [[ $1 == "--cpu" ]]; then
if [[ -z $2 || -z $3 || -z $4 ]]; then
echo "Missing argument for --cpu option. Usage: check_docker --cpu WARN:CRIT --container <name>"
exit 3
fi
# split warning and critical thresholds
IFS=':' read -ra THRESHOLDS <<< "$2"
WARN=${THRESHOLDS[0]}
CRIT=${THRESHOLDS[1]}
# get container name
CONTAINER=$4
# get container CPU usage
USAGE=$(docker stats --no-stream --format "{{.CPUPerc}}" $CONTAINER | sed 's/.$//')
# compare usage against thresholds
if (( $(echo "$USAGE > $CRIT" | awk '{print ($1>$2)}') )); then
echo "CRITICAL: CPU usage of container $CONTAINER is at $USAGE%"
exit 2
elif (( $(echo "$USAGE > $WARN" | awk '{print ($1>$2)}') )); then
echo "WARNING: CPU usage of container $CONTAINER is at $USAGE%"
exit 1
else
echo "OK: CPU usage of container $CONTAINER is at $USAGE%"
exit 0
fi
elif [[ $1 == "--status" ]]; then
if [[ -z $2 || -z $3 ]]; then
echo "Missing argument for --status option. Usage: check_docker --status --container <name>"
exit 3
fi
CONTAINER=$3
STATUS=$(docker inspect --format "{{.State.Status}}" $CONTAINER)
if [ "$STATUS" == "running" ]; then
echo "OK: container $CONTAINER is running"
exit 0
else
echo "CRITICAL: container $CONTAINER is $STATUS"
exit 2
fi
else
echo "Invalid flag. Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>"
exit 3
fi
Make the script executable;
chmod +x /usr/local/nagios/libexec/check_docker
You can then run the script to monitor Docker containers status, RAM and CPU usage.
If you don’t supply any arguments;
/usr/local/nagios/libexec/check_docker
Sample output;
Invalid flag. Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>
To see usage;
check_docker [--help|-h]
Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>
To check status of the Docker container;
/usr/local/nagios/libexec/check_docker --status --container <name>
E.g;
/usr/local/nagios/libexec/check_docker --status --container dozzle
Sample output;
OK: container dozzle is running
To check Docker Container memory usage;
/usr/local/nagios/libexec/check_docker --memory WARN:CRIT --container <name>
To check Docker container CPU usage;
/usr/local/nagios/libexec/check_docker --cpu WARN:CRIT --container <name>
Configure NRPE Agents to Check Docker Containers
With this in place, you can then install NRPE agents on the host and configure it to check Docker container status, RAM or CPU usage using the script above.
See our sample nrpe.cfg configuration;
cat /usr/local/nagios/etc/nrpe.cfg
log_facility=daemon
debug=0
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,192.168.59.48
dont_blame_nrpe=1
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_container_status]=/usr/local/nagios/libexec/check_docker --status --container $ARG1$
command[check_container_memory]=/usr/local/nagios/libexec/check_docker --memory $ARG1$:$ARG2$ --container $ARG3$
command[check_container_cpu]=/usr/local/nagios/libexec/check_docker --cpu $ARG1$:$ARG2$ --container $ARG3$
See the three lines for checking the status, memory and cpu usage of the Docker containers;
command[check_container_status]=/usr/local/nagios/libexec/check_docker --status --container $ARG1$
command[check_container_memory]=/usr/local/nagios/libexec/check_docker --memory $ARG1$:$ARG2$ --container $ARG3$
command[check_container_cpu]=/usr/local/nagios/libexec/check_docker --cpu $ARG1$:$ARG2$ --container $ARG3$
Restart NRPE agent!
systemctl restart nrpe
Or;
systemctl restart nagios-nrpe-server
Add Docker Container Host to Nagios Server for Monitoring
Next, add host, relevant services and check commands to your Nagios servers;
See our sample configs on our Nagios server;
Command Definitions for Docker Container Monitoring;
vim commands.cfg
...
define command {
command_name check_docker_status
command_line /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
}
define command {
command_name check_docker_metrics
command_line /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$
}
Host/Hostgroup definitions;
vim host-hostgroups.cfg
...
define host {
use linux-server
host_name east02-docker-node
alias East02-Docker-NODE
address 192.168.59.49
}
Service Definitions;
vim services.cfg
define service {
use local-service
host_name east02-docker-node
service_description Check Dozzle Container Status
check_command check_docker_status!check_container_status!dozzle
}
define service {
use local-service
host_name east02-docker-node
service_description Current Dozzle Container RAM Usage
check_command check_docker_metrics!check_container_memory!50!70!dozzle
}
define service {
use local-service
host_name east02-docker-node
service_description Current Dozzle Container CPU Usage
check_command check_docker_metrics!check_container_cpu!70!80!dozzle
}
Verify Nagios Configuration
Check Nagios configs for any error and restart the services;
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Sample output;
Nagios Core 4.4.9
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2022-11-16
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 14 services.
Checked 3 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 27 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 3 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Confirm Nagios Docker Container Monitoring
Confirm Docker container checks on Nagios;
And that is how you can easily use Nagios to monitor Docker containers.
That brings us to the end of our tutorial on monitoring Docker Containers using Nagios.