Can Nagios monitor docker container? Yes, in this tutorial, you will learn how to monitor Docker containers using Nagios. Monitoring your IT infrastructure is an important process as it helps you identify issues in your infrastructure that might have adverse effects on your productivity. IT infra monitoring resolves around checking health status, availability, performance, resource usage of your servers, containers, virtual machines, e.t.c. In this tutorial, we will focus on Docker container monitoring using Nagios core server.
Monitor Docker Containers using Nagios
So, how can you monitor Docker containers using Nagios?
Well, there are multiple scripts on the Internet that have been written for the very purposes of monitoring Docker containers using Nagios.
An example of such awesome Python scripts is check_docker created by none other than timdaman.
You would simply download his script as follows;
curl -o /usr/local/bin/check_docker \
https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker/check_docker.py
Or simply use wget;
wget -O /usr/local/bin/check_docker \
https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker/check_docker.py
This script allows you to check quite a number of Docker container metrics and other stuff such as;
- memory consumption in absolute units (bytes, kb, mb, gb) and as a percentage (0-100%) of the container limit.
- CPU usages as a percentage (0-100%) of container limit.
- automatic restarts performed by the docker daemon
- container status, i.e. is it running?
- container health checks are passing?
- uptime, i.e. is it able to stay running for a long enough time?
- the presence of a container or containers matching specified names
- image version, does the running image match that in the remote registry?
- image age, when was the image built the last time?
You can download the script to your Docker host and make it executable;
chmod +x /usr/local/bin/check_docker
To see how to use the script to run various Docker container checks, simply execute it without any command line options;
/usr/local/bin/check_docker
Sample output;
usage: check_docker [-h] [--connection [//docker.socket| : ] | --secure-connection [ : ]] [--binary_units | --decimal_units] [--timeout TIMEOUT] [--containers CONTAINERS [CONTAINERS ...]] [--present] [--threads THREADS] [--cpu WARN:CRIT] [--memory WARN:CRIT:UNITS] [--status STATUS] [--health] [--uptime WARN:CRIT] [--image-age WARN:CRIT] [--version] [--insecure-registries INSECURE_REGISTRIES [INSECURE_REGISTRIES ...]] [--restarts WARN:CRIT] [--no-ok] [--no-performance] [-V] Check docker containers. options: -h, --help show this help message and exit --connection [/ /docker.socket| : ] Where to find docker daemon socket. (default: /var/run/docker.sock) --secure-connection [ : ] Where to find TLS protected docker daemon socket. --binary_units Use a base of 1024 when doing calculations of KB, MB, GB, & TB (This is default) --decimal_units Use a base of 1000 when doing calculations of KB, MB, GB, & TB --timeout TIMEOUT Connection timeout in seconds. (default: 10.0) --containers CONTAINERS [CONTAINERS ...] One or more RegEx that match the names of the container(s) to check. If omitted all containers are checked. (default: ['all']) --present Modifies --containers so that each RegEx must match at least one container. --threads THREADS This + 1 is the maximum number of concurent threads/network connections. (default: 10) --cpu WARN:CRIT Check cpu usage percentage taking into account any limits. --memory WARN:CRIT:UNITS Check memory usage taking into account any limits. Valid values for units are %,B,KB,MB,GB. --status STATUS Desired container status (running, exited, etc). --health Check container's health check status --uptime WARN:CRIT Minimum container uptime in seconds. Use when infrequent crashes are tolerated. --image-age WARN:CRIT Maximum image age in days. --version Check if the running images are the same version as those in the registry. Useful for finding stale images. Does not support login. --insecure-registries INSECURE_REGISTRIES [INSECURE_REGISTRIES ...] List of registries to connect to with http(no TLS). Useful when using "--version" with images from insecure registries. --restarts WARN:CRIT Container restart thresholds. --no-ok Make output terse suppressing OK messages. If all checks are OK return a single OK. --no-performance Suppress performance data. Reduces output when performance data is not being used. -V show program's version number and exit UNKNOWN: No checks specified.
You can use this script to check Docker container status, check memory and cpu usage e.t.c as explained in the command line options above.
For example, to check the CPU usage status of all Docker containers, simply execute;
/usr/local/bin/check_docker --cpu WARN:CRIT
e.g
/usr/local/bin/check_docker --cpu 70:80
Sample output;
OK: dozzle cpu is 0%; OK: nagios-core-4.4.9 cpu is 0%|dozzle_cpu=0;70;80;0;100 nagios-core-4.4.9_cpu=0;70;80;0;100
If you want to check for a single Docker container pass the --containers
options with specific Docker container name;
/usr/local/bin/check_docker --cpu 70:80 --container dozzle
To check running Docker containers;
/usr/local/bin/check_docker --status running
Create Nagios Bash Script Plugin to Monitor Docker Containers
Well, you can create your own bash script to monitor Docker containers using Nagios. In this tutorial, we will create our bash script that can be used to monitor Docker container status, RAM and CPU usage.
We placed the script under /usr/local/nagios/libexec/
as check_docker
on the Docker host;
cat /usr/local/nagios/libexec/check_docker
#!/bin/bash # check for --help|-h or --memory or --cpu or --status flag if [[ $1 == "-h" || $1 == "--help" ]]; then echo "Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>" elif [[ $1 == "--memory" ]]; then if [[ -z $2 || -z $3 || -z $4 ]]; then echo "Missing argument for --memory option. Usage: check_docker --memory WARN:CRIT --container <name>" exit 3 fi # split warning and critical thresholds IFS=':' read -ra THRESHOLDS <<< "$2" WARN=${THRESHOLDS[0]} CRIT=${THRESHOLDS[1]} # get container name CONTAINER=$4 # get container memory usage #USAGE=$(docker stats --no-stream --format "{{.MemUsage}}" $CONTAINER | awk '{print $1}' | numfmt --to=iec-i) USAGE=$(docker stats --no-stream --format "{{.MemPerc}}" $CONTAINER | sed 's/.$//') # compare usage against thresholds if (( $(echo "$USAGE > $CRIT" | awk '{print ($1>$2)}') )); then echo "CRITICAL: Memory usage of container $CONTAINER is at $USAGE%" exit 2 elif (( $(echo "$USAGE > $WARN" | awk '{print ($1>$2)}') )); then echo "WARNING: Memory usage of container $CONTAINER is at $USAGE%" exit 1 else echo "OK: Memory usage of container $CONTAINER is at $USAGE%" exit 0 fi elif [[ $1 == "--cpu" ]]; then if [[ -z $2 || -z $3 || -z $4 ]]; then echo "Missing argument for --cpu option. Usage: check_docker --cpu WARN:CRIT --container <name>" exit 3 fi # split warning and critical thresholds IFS=':' read -ra THRESHOLDS <<< "$2" WARN=${THRESHOLDS[0]} CRIT=${THRESHOLDS[1]} # get container name CONTAINER=$4 # get container CPU usage USAGE=$(docker stats --no-stream --format "{{.CPUPerc}}" $CONTAINER | sed 's/.$//') # compare usage against thresholds if (( $(echo "$USAGE > $CRIT" | awk '{print ($1>$2)}') )); then echo "CRITICAL: CPU usage of container $CONTAINER is at $USAGE%" exit 2 elif (( $(echo "$USAGE > $WARN" | awk '{print ($1>$2)}') )); then echo "WARNING: CPU usage of container $CONTAINER is at $USAGE%" exit 1 else echo "OK: CPU usage of container $CONTAINER is at $USAGE%" exit 0 fi elif [[ $1 == "--status" ]]; then if [[ -z $2 || -z $3 ]]; then echo "Missing argument for --status option. Usage: check_docker --status --container <name>" exit 3 fi CONTAINER=$3 STATUS=$(docker inspect --format "{{.State.Status}}" $CONTAINER) if [ "$STATUS" == "running" ]; then echo "OK: container $CONTAINER is running" exit 0 else echo "CRITICAL: container $CONTAINER is $STATUS" exit 2 fi else echo "Invalid flag. Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>" exit 3 fi
Make the script executable;
chmod +x /usr/local/nagios/libexec/check_docker
You can then run the script to monitor Docker containers status, RAM and CPU usage.
If you don’t supply any arguments;
/usr/local/nagios/libexec/check_docker
Sample output;
Invalid flag. Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>
To see usage;
check_docker [--help|-h]
Usage: check_docker [[--memory WARN:CRIT|--cpu WARN:CRIT|--status]] --container <name>
To check status of the Docker container;
/usr/local/nagios/libexec/check_docker --status --container <name>
E.g;
/usr/local/nagios/libexec/check_docker --status --container dozzle
Sample output;
OK: container dozzle is running
To check Docker Container memory usage;
/usr/local/nagios/libexec/check_docker --memory WARN:CRIT --container <name>
To check Docker container CPU usage;
/usr/local/nagios/libexec/check_docker --cpu WARN:CRIT --container <name>
With this in place, you can then install NRPE agents on the host and configure it to check Docker container status, RAM or CPU usage using the script above.
See our sample nrpe.cfg configuration;
cat /usr/local/nagios/etc/nrpe.cfg
log_facility=daemon debug=0 server_port=5666 nrpe_user=nagios nrpe_group=nagios allowed_hosts=127.0.0.1,192.168.59.48 dont_blame_nrpe=1 command_timeout=60 connection_timeout=300 command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -r -w .15,.10,.05 -c .30,.25,.20 command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p / command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 command[check_container_status]=/usr/local/nagios/libexec/check_docker --status --container $ARG1$ command[check_container_memory]=/usr/local/nagios/libexec/check_docker --memory $ARG1$:$ARG2$ --container $ARG3$ command[check_container_cpu]=/usr/local/nagios/libexec/check_docker --cpu $ARG1$:$ARG2$ --container $ARG3$
See the three lines for checking the status, memory and cpu;
command[check_container_status]=/usr/local/nagios/libexec/check_docker --status --container $ARG1$
command[check_container_memory]=/usr/local/nagios/libexec/check_docker --memory $ARG1$:$ARG2$ --container $ARG3$
command[check_container_cpu]=/usr/local/nagios/libexec/check_docker --cpu $ARG1$:$ARG2$ --container $ARG3$
Restart NRPE agent!
systemctl restart nrpe
Or;
systemctl restart nagios-nrpe-server
Next, add host, relevant services and check commands to your Nagios servers;
See our sample configs on our Nagios server;
Command Definitions;
vim commands.cfg
... define command { command_name check_docker_status command_line /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ } define command { command_name check_docker_metrics command_line /usr/local/nagios/libexec/check_nrpe -n -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$ }
Host/Hostgroup definitions;
vim host-hostgroups.cfg
... define host { use linux-server host_name east02-docker-node alias East02-Docker-NODE address 192.168.59.49 }
Service Definitions;
vim services.cfg
define service { use local-service host_name east02-docker-node service_description Check Dozzle Container Status check_command check_docker_status!check_container_status!dozzle } define service { use local-service host_name east02-docker-node service_description Current Dozzle Container RAM Usage check_command check_docker_metrics!check_container_memory!50!70!dozzle } define service { use local-service host_name east02-docker-node service_description Current Dozzle Container CPU Usage check_command check_docker_metrics!check_container_cpu!70!80!dozzle }
Check Nagios configs for any error and restart the services;
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Sample output;
Nagios Core 4.4.9 Copyright (c) 2009-present Nagios Core Development Team and Community Contributors Copyright (c) 1999-2009 Ethan Galstad Last Modified: 2022-11-16 License: GPL Website: https://www.nagios.org Reading configuration data... Read main config file okay... Read object config files okay... Running pre-flight check on configuration data... Checking objects... Checked 14 services. Checked 3 hosts. Checked 1 host groups. Checked 0 service groups. Checked 1 contacts. Checked 1 contact groups. Checked 27 commands. Checked 5 time periods. Checked 0 host escalations. Checked 0 service escalations. Checking for circular paths... Checked 3 hosts Checked 0 service dependencies Checked 0 host dependencies Checked 5 timeperiods Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
Confirm Docker container checks on Nagios;
And that is how you can easily use Nagios to monitor Docker containers.
That brings us to the end of our tutorial on how to monitor Docker Containers using Nagios.
Other Tutorials
Deploy NRPE Agent as a Docker Container
How to Check Docker Container RAM and CPU Usage