Step-by-Step Guide: Kubernetes Monitoring with Prometheus and Grafana

|
Last Updated:
|
|
kubernetes cluster dashboard metrics

In this tutorial, we will provide you with a step by step guide on Kubernetes monitoring with Prometheus and Grafana. Being able to monitor a Kubernetes cluster is fundamental in ensuring the health of the cluster, performance and scalability. You can use Prometheus and Grafana to provide real-time visibility into your cluster’s metrics usage. With real time monitoring, you can be able to identify bottlenecks, and optimize resource utilization in the cluster.

Step-by-Step Guide: Monitoring Kubernetes with Prometheus and Grafana

Setup Kubernetes Cluster

Of course, you cannot start to monitor what is not setup already. However, if you are looking at how to setup Kubernetes cluster, then check the guide below;

Setup Kubernetes Cluster on Ubuntu 22.04/20.04

Install Helm on Kubernetes Cluster

There are different methods in which you can use to install Kubernetes cluster monitoring tools;

  1. Creating seperate YAML files for Kubernetes application resources such as deployments, services, pods, etc.
  2. Using Helm Charts. Helm charts are packages of Kubernetes resources that have been created by Helm community to make installation of various K8s packages easy and convenient.
  3. Using Kubernetes operators to automate application deployment.
  4. e.t.c

In this tutorial, we will be using Helm charts to deploy Prometheus and Grafana. In that case, you need to have Helm client installed.

Follow the guide below to learn how to install Helm on Kubernetes cluster.

How to Install Helm on Kubernetes Cluster

Install Prometheus and Grafana Helm Charts Repositories

in order to be able to install Prometheus and Grafana charts on Kubernetes cluster, you first need to install their Helm charts community repositories.

Install Prometheus Helm charts repositories;

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Install Grafana Helm charts repositories;

helm repo add grafana https://grafana.github.io/helm-charts

Confirm that the repos are in place;

helm repo list
NAME                	URL                                               
prometheus-community	https://prometheus-community.github.io/helm-charts
grafana             	https://grafana.github.io/helm-charts             
bitnami             	https://charts.bitnami.com/bitnami 

You can search for any Kubernetes helm chart on the K8s charts hub.

Install Prometheus and Grafana on Kubernetes Cluster

Prometheus can collect and store metrics from a variety of sources while Grafana helps you visualize the metrics collected by Prometheus.

There are different charts related to Prometheus/Grafana on the Artifact Hub that offers different functionality.

In this guide, we will install kube-prometheus-stack chart which offers a complete monitoring solution for K8s cluster. kube-prometheus-stack installs the following components;

  1. Prometheus Operator: The Prometheus Operator is a Kubernetes-native operator that manages and automates the lifecycle of Prometheus and related monitoring components. It simplifies the deployment, configuration, and management of Prometheus instances in a Kubernetes environment.
  2. Highly available Prometheus which scrapes metrics from various endpoints in the cluster.
  3. Highly available Alertmanager: The Alertmanager is responsible for processing and managing alerts generated by Prometheus. It allows you to define alerting rules and configure how alerts are sent and handled. The highly available Alertmanager component ensures the reliability and availability of the alerting system.
  4. Prometheus node-exporter: The Prometheus node-exporter is an agent that runs on each Kubernetes node and exposes system-level metrics, such as CPU usage, memory usage, disk utilization, and network statistics. which can be scraped by Prometheus
  5. The Prometheus Adapter which allows you to use custom and external metrics collected by Prometheus in Kubernetes Horizontal Pod Autoscaling (HPA) and other scaling mechanisms. It enables the Kubernetes API server to retrieve metric values from Prometheus and make scaling decisions based on those metrics.
  6. kube-state-metrics a component which exposes metrics about the state of Kubernetes objects, such as pods, deployments, services, and nodes, providing insights into the current state and health of your Kubernetes resources.
  7. Grafana, the visualization tool itself.

To install kube-prometheus-stack chart, run the command below

helm install prometheus prometheus-community/kube-prometheus-stack

Once the installation is complete, you will see such an output;


NAME: prometheus
LAST DEPLOYED: Tue May 23 19:30:09 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace default get pods -l "release=prometheus"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

So, what resources for Prometheus stack are installed? You can get using the kubectl get command as follows;

kubectl get all --selector release=prometheus

The command will display the pods, services, daemonsets, deployments, replicatsets, statefulsets related to Prometheus.


NAME                                                       READY   STATUS    RESTARTS   AGE
pod/prometheus-kube-prometheus-operator-54cfc96db7-r6k6k   1/1     Running   0          3m34s
pod/prometheus-kube-state-metrics-5f5f8b8fdd-nzwsp         1/1     Running   0          3m34s
pod/prometheus-prometheus-node-exporter-92wtx              1/1     Running   0          3m34s
pod/prometheus-prometheus-node-exporter-c2zdq              1/1     Running   0          3m34s
pod/prometheus-prometheus-node-exporter-f6867              1/1     Running   0          3m34s
pod/prometheus-prometheus-node-exporter-wf8qd              1/1     Running   0          3m34s

NAME                                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/prometheus-kube-prometheus-alertmanager   ClusterIP   10.98.95.240             9093/TCP   3m35s
service/prometheus-kube-prometheus-operator       ClusterIP   10.101.213.133           443/TCP    3m35s
service/prometheus-kube-prometheus-prometheus     ClusterIP   10.110.12.91             9090/TCP   3m35s
service/prometheus-kube-state-metrics             ClusterIP   10.98.72.100             8080/TCP   3m35s
service/prometheus-prometheus-node-exporter       ClusterIP   10.107.130.247           9100/TCP   3m35s

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   4         4         4       4            4                     3m34s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           3m34s
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           3m34s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-kube-prometheus-operator-54cfc96db7   1         1         1       3m34s
replicaset.apps/prometheus-kube-state-metrics-5f5f8b8fdd         1         1         1       3m34s

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager   1/1     3m21s
statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus       1/1     3m21s

Some resources do not have the all resource type. In this case, you can list individual resource types e.g;

kubectl get pods -n <namespace>
kubectl get svc -n <namespace>

You can also check the config maps related to Prometheus using the command below. ConfigMaps are used to store and manage non-confidential configuration data for your applications

kubectl get configmaps --selector release=prometheus

NAME                                                           DATA   AGE
prometheus-kube-prometheus-alertmanager-overview               1      6m3s
prometheus-kube-prometheus-apiserver                           1      6m3s
prometheus-kube-prometheus-cluster-total                       1      6m3s
prometheus-kube-prometheus-controller-manager                  1      6m3s
prometheus-kube-prometheus-etcd                                1      6m3s
prometheus-kube-prometheus-grafana-datasource                  1      6m3s
prometheus-kube-prometheus-grafana-overview                    1      6m3s
prometheus-kube-prometheus-k8s-coredns                         1      6m3s
prometheus-kube-prometheus-k8s-resources-cluster               1      6m3s
prometheus-kube-prometheus-k8s-resources-multicluster          1      6m3s
prometheus-kube-prometheus-k8s-resources-namespace             1      6m3s
prometheus-kube-prometheus-k8s-resources-node                  1      6m3s
prometheus-kube-prometheus-k8s-resources-pod                   1      6m3s
prometheus-kube-prometheus-k8s-resources-workload              1      6m3s
prometheus-kube-prometheus-k8s-resources-workloads-namespace   1      6m3s
prometheus-kube-prometheus-kubelet                             1      6m3s
prometheus-kube-prometheus-namespace-by-pod                    1      6m3s
prometheus-kube-prometheus-namespace-by-workload               1      6m3s
prometheus-kube-prometheus-node-cluster-rsrc-use               1      6m3s
prometheus-kube-prometheus-node-rsrc-use                       1      6m3s
prometheus-kube-prometheus-nodes                               1      6m3s
prometheus-kube-prometheus-nodes-darwin                        1      6m3s
prometheus-kube-prometheus-persistentvolumesusage              1      6m3s
prometheus-kube-prometheus-pod-total                           1      6m3s
prometheus-kube-prometheus-prometheus                          1      6m3s
prometheus-kube-prometheus-proxy                               1      6m3s
prometheus-kube-prometheus-scheduler                           1      6m3s
prometheus-kube-prometheus-workload-total                      1      6m3s

There are also service monitors used by Prometheus operators to define the scraping configuration required to monitor services and endpoints within your Kubernetes cluster.

kubectl get servicemonitor --selector release=prometheus

NAME                                                 AGE
prometheus-kube-prometheus-alertmanager              13m
prometheus-kube-prometheus-apiserver                 13m
prometheus-kube-prometheus-coredns                   13m
prometheus-kube-prometheus-kube-controller-manager   13m
prometheus-kube-prometheus-kube-etcd                 13m
prometheus-kube-prometheus-kube-proxy                13m
prometheus-kube-prometheus-kube-scheduler            13m
prometheus-kube-prometheus-kubelet                   13m
prometheus-kube-prometheus-operator                  13m
prometheus-kube-prometheus-prometheus                13m
prometheus-kube-state-metrics                        13m
prometheus-prometheus-node-exporter                  13m

Accessing Prometheus Outside K8S cluster

You can check services related to Prometheus on the default namespace;

kubectl get svc --selector release=prometheus

NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
prometheus-kube-prometheus-alertmanager   ClusterIP   10.98.95.240             9093/TCP   10m
prometheus-kube-prometheus-operator       ClusterIP   10.101.213.133           443/TCP    10m
prometheus-kube-prometheus-prometheus     ClusterIP   10.110.12.91             9090/TCP   10m
prometheus-kube-state-metrics             ClusterIP   10.98.72.100             8080/TCP   10m
prometheus-prometheus-node-exporter       ClusterIP   10.107.130.247           9100/TCP   10m

As you can see, the Prometheus services are only meant for internal access within the cluster as depicted by the service type ClusterIP.

Check for example, Prometheus endpoint on port 9090/tcp, prometheus-kube-prometheus-prometheus, this service exposes Prometheus on internal cluster IP address.

To be able to access Prometheus from outside the cluster, we need to change service type to NodePort. This exposes the service on a static port on each selected node in the cluster and the service becomes accessible on each node’s IP address and the static port.

Thus, edit the service;

kubectl edit service prometheus-kube-prometheus-prometheus

By default, this is how the service manifest looks like;


# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2023-05-23T19:30:49Z"
  labels:
    app: kube-prometheus-stack-prometheus
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 46.1.0
    chart: kube-prometheus-stack-46.1.0
    heritage: Helm
    release: prometheus
    self-monitor: "true"
  name: prometheus-kube-prometheus-prometheus
  namespace: default
  resourceVersion: "124975"
  uid: 177fb969-6d22-46f5-8e39-0b3c451b4da2
spec:
  clusterIP: 10.108.24.96
  clusterIPs:
  - 10.108.24.96
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-web
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app.kubernetes.io/name: prometheus
    prometheus: prometheus-kube-prometheus-prometheus
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

We will edit this file and change type: ClusterIP to type: NodePort. Also, we will bind it to static NodePort that is currently not being used, 30002.


  ports:
  - name: http-web
    port: 9090
    protocol: TCP
    targetPort: 9090
    nodePort: 30002
  selector:
    app.kubernetes.io/name: prometheus
    prometheus: prometheus-kube-prometheus-prometheus
  sessionAffinity: None
  type: NodePort

Save and exit the file.

Confirm the changes;

kubectl get svc prometheus-kube-prometheus-prometheus
NAME                                    TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
prometheus-kube-prometheus-prometheus   NodePort   10.108.24.96   <none>        9090:30002/TCP   12m

You should now be able to access Prometheus interface on any node IP via the address http://<NodeIP>:30002.

kubernetes prometheus web interface

You may realize that some endpoints metrics are giving connection refused errors;

In order to resolve this, we need to change the bind addresses for kube-controller-manager, etcd, kube-scheduler and kube-proxy.

They all listen on loopback interfaces by default.

ss -altnp | grep -E  "10257|2381|10249|10259"

We will update the configurations and bind the address to 0.0.0.0. Please note this opens up access to these services from any interface. Be sure to setup proper firewall rules in place to prevent unauthorized access.

  • Update Kube Proxy bind address (cm is short form of configmaps)
kubectl edit cm kube-proxy -n kube-system

Under the section, config.conf: |-, change the metricsBindAddress: "" to metricsBindAddress: "0.0.0.0:10249"

Save and exit.

To apply the changes, delete all kube-proxy pods to recreate new ones with updated bind address;

kubectl delete pods -l k8s-app=kube-proxy -n kube-system

Check if Kube proxy pods have been recreated;

kubectl get pod -l k8s-app=kube-proxy -n kube-system

check ports;

ss -altnp | grep 10249
  • Change ETCD Metrics bind address

Edit the configuration file used by etcd and change the metrics bind address as follows;

sudo vim /etc/kubernetes/manifests/etcd.yaml

Change the line;

--listen-metrics-urls=http://127.0.0.1:2381

to;

--listen-metrics-urls=http://0.0.0.0:2381

Save and exit the file.

Relevant pods will automatically restart and set the bind address to 0.0.0.0.

  • Change Kube Scheduler bind address

Edit the configuration file used by scheduler and change the bind address as follows;

sudo vim /etc/kubernetes/manifests/kube-scheduler.yaml

Change the line;

--bind-address=127.0.0.1

to;

--bind-address=0.0.0.0

Save and exit the file.

Relevant pods will automatically restart and set the bind address to 0.0.0.0.

  • Change Kube controller manager bind address;

Edit the manifest configuration file used by controller manager and change the bind address as follows;

sudo vim /etc/kubernetes/manifests/kube-controller-manager.yaml

Change the line;

--bind-address=127.0.0.1

to;

--bind-address=127.0.0.1

Save and exit the file.

Relevant pods will automatically restart and set the bind address to 0.0.0.0.

Confirm the ports;

sudo lsof -i :2381,10249,10257,10259

COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kube-sche 1239 root    3u  IPv6  22234      0t0  TCP *:10259 (LISTEN)
etcd      1245 root   14u  IPv6  22268      0t0  TCP *:2381 (LISTEN)
kube-cont 1261 root    3u  IPv6  22936      0t0  TCP *:10257 (LISTEN)
kube-prox 1866 root   11u  IPv6  24294      0t0  TCP *:10249 (LISTEN)

This should resolve issue with kube-prometheus-stack connection refused.

Accessing Grafana Outside K8S Cluster

The kube-prometheus-stack helm chart installed, also install Grafana. The Grafana is can only be accessed within the cluster via port 80/TCP.

In order to access it externally, we will edit the service and change service type to NodePort as we did above for Prometheus.

kubectl edit svc prometheus-grafana

spec:
spec:
  clusterIP: 10.111.147.239
  clusterIPs:
  - 10.111.147.239
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: service
    port: 80
    protocol: TCP
    targetPort: 3000
    nodePort: 30003
  selector:
    app.kubernetes.io/instance: grafana
    app.kubernetes.io/name: grafana
  sessionAffinity: None
    #type: ClusterIP
  type: NodePort
..
kubectl get svc
NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
...
grafana                                   NodePort    10.111.147.239    <none>        80:30003/TCP                 21m
...

Login to Grafana Web Interface

You should now be able to Access Grafana outside the cluster on any cluster node’s IP on port 30003.

grafana web ui login page

You can generate Grafana admin user password by running the command below;

kubectl get secret prometheus-grafana \
-o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Sample output;

prom-operator

The default dashboard;

kubernetes grafana dashboard

Grafana Prometheus Datasource

The stack already preconfigured. Prometheus data source has already been added.

kubernetes prometheus grafana datasource

Grafana Kubernetes Dashboards

The stack also comes with some dashboard preconfigured.

Monitoring Kubernetes with Prometheus and Grafana

Let’s check some dashboard;

Kubernetes Monitoring with Prometheus and Grafana

Explore other dashboards.

Update everything to suite your needs!

That concludes our guide on monitoring Kubernetes with Prometheus and Grafana.

Other Tutorials

Monitor Docker Swarm and Container metrics using Metricbeat

Monitor Docker Swarm Node Metrics using Grafana

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
Linux Certified Engineer, with a passion for open-source technology and a strong understanding of Linux systems. With experience in system administration, troubleshooting, and automation, I am skilled in maintaining and optimizing Linux infrastructure.

Leave a Comment