How to Safely Reboot OpenStack Compute Node

|
Last Updated:
|
|

How can I safely reboot OpenStack compute node? In an OpenStack environment, the compute nodes are the workhorses responsible for running virtual machines and managing compute resources. However, there may be situations where you need to reboot a compute node for maintenance or troubleshooting. Performing this task safely is crucial to avoid disruption to your cloud services. This blog post will guide you through the steps to safely reboot an OpenStack compute node, minimizing downtime and ensuring a smooth transition while maintaining the integrity of your cloud infrastructure.

Reboot OpenStack Compute Node Safely

Disable Instance Scheduling on Compute Node

In an active environment where you cannot control who is creating OpenStack instances, you won’t one instances being launched on the node that you have marked for reboot.

As such, you need to ensure that you disable any instance scheduling on the same node marked for reboot.

You can disable instance scheduling on the respective compute node by disabling the nova compute service in that host either from the CLI or from OpenStack horizon;

You can list the services using the command below;

openstack compute service list --host compute02
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| ID                                   | Binary       | Host      | Zone | Status  | State | Updated At                 |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute | compute02 | nova | enabled | up    | 2023-11-07T21:15:50.000000 |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+

To disable instance scheduling on a compute node from the CLI, run the command below;

(Note that we deployed our OpenStack using Kolla-Ansible)

source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh
openstack compute service set [-h] [--enable | --disable] [--disable-reason <reason>] [--up | --down] <host> <service>

For example, to disable nova compute service on compute02 node;

openstack compute service set --disable --disable-reason for-safe-reboot compute02 nova-compute

You can do the same from Horizon, Admin > Compute > Hypervisors > Select the Host > Actions > Disable service.

Migrate Instances/VMs to Other Compute Nodes

Cold Instance Migration

Cold migration, or non-live migration, involves shutting down a running instance before migrating it from the source compute node to the destination compute node. Cold migration necessitates a brief interruption in the instance’s operation. The migrated instance retains access to its original volumes and IP addresses.

Live Instance Migration

Live migration seamlessly shifts the instance from the source Compute node to the destination Compute node without any need for a shutdown, all while preserving state consistency throughout the process.

If your OpenStack environment supports live migration (you can check Feature Support Matrix to determine which hypervisors support live-migration.), consider migrating VM instances from the compute node you plan to reboot to other available compute nodes. Live migration allows VMs to remain running during the process, minimizing downtime.

Live migrations can be categorized based on how they handle instance storage:

  1. Shared Storage-Based Live Migration: This type of migration involves instances with ephemeral disks stored on shared storage accessible to both the source and destination hosts. This method is faster and more efficient than block live migration because the instance’s disk images are already accessible to the destination host.
  2. Block Live Migration (Block Migration): Block migration is used when instances have ephemeral disks (e.g. instances booting from image) that are not shared between the source and destination hosts. It’s important to note that block migration is not compatible with read-only devices like CD-ROMs and Configuration Drive (config_drive). This method is slower and more resource-intensive than shared storage-based live migration.
  3. Volume-Backed Live Migration: In this scenario, instances use volumes for storage instead of ephemeral disks. This method is faster than block live migration because the disk images do not need to be copied. However, it is still slower than shared storage-based live migration because the block storage volumes need to be attached to the destination host. Block storage backends such as Ceph, Cinder, GlusterFS e.t.c support volume-backed live migration

These classifications help determine the method of live migration suitable for your specific instance and storage setup.

You can do the migration from the horizon dashboard or from the command line.

Get a List of Running Instances on Compute Node to Reboot

To begin with, get a list of all instances running on the compute node you need to reboot. For example, below is a list of instances running on our compute02 node;

openstack server list --host compute02 --all-projects

Sample output;

+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| ID                                   | Name            | Status | Networks                | Image  | Flavor  |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| ee54d242-4fdd-4a3b-8ee5-30b3171e1df6 | gracious_turing | ACTIVE | DEMO_NET=192.168.50.123 | cirros | custom1 |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+

Get a List of Compute Nodes

Similarly, you can also list compute nodes available (just in case you want to explicitly specify which node to migrate an instance to, otherwise the nova scheduler takes care of all the decisions on where to place the instance being migrated, just incase you have multiple compute nodes).

openstack hypervisor list
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| ID                                   | Hypervisor Hostname | Hypervisor Type | Host IP         | State |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| 6aa76044-d456-4c3b-8f28-fcfc7e79b658 | compute01           | QEMU            | 192.168.200.202 | up    |
| 7365f5eb-62e1-477e-bf45-8f77ea98802a | compute02           | QEMU            | 192.168.200.203 | up    |
+--------------------------------------+---------------------+-----------------+-----------------+-------+

Migrate OpenStack Instances to Other Compute Nodes

Once you have the information, you can now proceed to migrate an instance.

As already mentioned, depending on the criticality of the operations/services handled by an instance, you can choose to go the cold or the live way.

OpenStack instances can be migrated using the command, openstack server migrate.

openstack server migrate --help
usage: openstack server migrate [-h] [--live-migration] [--host ] [--shared-migration | --block-migration] [--disk-overcommit | --no-disk-overcommit]
                                [--wait]
                                

Migrate server to different host. A migrate operation is implemented as a resize operation using the same flavor as the old server. This means that, like resize, migrate
works by creating a new server using the same flavor and copying the contents of the original disk into a new one. As with resize, the migrate operation is a two-step
process for the user: the first step is to perform the migrate, and the second step is to either confirm (verify) success and release the old server, or to declare a
revert to release the new server and restart the old one.

positional arguments:
        Server (name or ID)

options:
  -h, --help            show this help message and exit
  --live-migration      Live migrate the server; use the ``--host`` option to specify a target host for the migration which will be validated by the scheduler
  --host 
                        Migrate the server to the specified host. (supported with --os-compute-api-version 2.30 or above when used with the --live-migration option)
                        (supported with --os-compute-api-version 2.56 or above when used without the --live-migration option)
  --shared-migration    Perform a shared live migration (default before --os-compute-api-version 2.25, auto after)
  --block-migration     Perform a block live migration (auto-configured from --os-compute-api-version 2.25)
  --disk-overcommit     Allow disk over-commit on the destination host(supported with --os-compute-api-version 2.24 or below)
  --no-disk-overcommit  Do not over-commit disk on the destination host (default)(supported with --os-compute-api-version 2.24 or below)
  --wait                Wait for migrate to complete

So, let’s live migrate my instance, gracious_turing, with the UUID, ee54d242-4fdd-4a3b-8ee5-30b3171e1df6.

Note that the instance is booting from an image and no shared storage, hence, we will do block-based live migration;

openstack server migrate --live-migration --block-migration gracious_turing --wait

If you check on horizon, under instances, you will see the instance status as migrating.

migrating live instance

After a short while, the instance migration should be completed. Since I have only two compute nodes, the instance should have been migrated to compute01;

instance migrated

If you want to do cold migration, then you can shut down an instance an migrate them.

Stop all OpenStack services running on the compute node

Once the instances on the compute node are migrated, you can now login to compute node and stop all openstack services.

If you are using ansible, then you can use it to check and stop the services on the compute node.

For example, let’s verify, from the controller/ansible node, all openstack services running on our compute02

ansible -i multinode -m raw -a "docker ps" compute02
compute02 | CHANGED | rc=0 >>
CONTAINER ID   IMAGE                                                                     COMMAND                  CREATED      STATUS                PORTS     NAMES
efe871ef9fbf   quay.io/openstack.kolla/zun-cni-daemon:2023.1-ubuntu-jammy                "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             zun_cni_daemon
f6155141547b   quay.io/openstack.kolla/zun-compute:2023.1-ubuntu-jammy                   "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             zun_compute
143e53a3b9de   quay.io/openstack.kolla/ceilometer-compute:2023.1-ubuntu-jammy            "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             ceilometer_compute
da3bb6f8f71b   quay.io/openstack.kolla/kuryr-libnetwork:2023.1-ubuntu-jammy              "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             kuryr
7fa1016b0acf   quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy     "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             neutron_openvswitch_agent
98016d47c4d6   quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy          "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             openvswitch_vswitchd
2676319cfbdc   quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy         "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             openvswitch_db
8b750f8dc593   quay.io/openstack.kolla/nova-compute:2023.1-ubuntu-jammy                  "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             nova_compute
84397013842c   quay.io/openstack.kolla/nova-libvirt:2023.1-ubuntu-jammy                  "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             nova_libvirt
3768d9da5ab7   quay.io/openstack.kolla/nova-ssh:2023.1-ubuntu-jammy                      "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             nova_ssh
ec5a5dd65cb4   quay.io/openstack.kolla/iscsid:2023.1-ubuntu-jammy                        "dumb-init --single-…"   3 days ago   Up 3 days                       iscsid
f4185c0884ae   quay.io/openstack.kolla/prometheus-libvirt-exporter:2023.1-ubuntu-jammy   "dumb-init --single-…"   3 days ago   Up 3 days                       prometheus_libvirt_exporter
d9942be630fa   quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy           "dumb-init --single-…"   3 days ago   Up 3 days                       prometheus_cadvisor
04fec61c5671   quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy      "dumb-init --single-…"   3 days ago   Up 3 days                       prometheus_node_exporter
221098bf97e7   quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy                          "dumb-init --single-…"   3 days ago   Up 3 days                       cron
36fc2702d398   quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy                 "dumb-init --single-…"   3 days ago   Up 3 days                       kolla_toolbox
80f42d83c6f7   quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy                       "dumb-init --single-…"   3 days ago   Up 3 days                       fluentd

The easiest way to stop these Docker services, remember we deployed our OpenStack using Kolla-Ansible, simply stop the docker service.

ansible -i multinode -m raw -a "sudo systemctl stop docker.service docker.socket" compute02

If you are not using configuration managemen tools such Ansible, be sure to stop nova-compute and neutron-linuxbridge-agent when you stop the services.

Reboot OpenStack Compute Node

Next, reboot the compute node. Again, we will use Ansible in our setup;

ansible -i multinode -m raw -a "sudo systemctl reboot -i" compute02

Start OpenStack Services

If you are using configuration management tools such as Ansible, chances are high that the OpenStack services will be started automatically after reboot.

ansible -i multinode -m raw -a "docker ps" compute02
compute02 | CHANGED | rc=0 >>
CONTAINER ID   IMAGE                                                                     COMMAND                  CREATED      STATUS                     PORTS     NAMES
efe871ef9fbf   quay.io/openstack.kolla/zun-cni-daemon:2023.1-ubuntu-jammy                "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               zun_cni_daemon
f6155141547b   quay.io/openstack.kolla/zun-compute:2023.1-ubuntu-jammy                   "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               zun_compute
143e53a3b9de   quay.io/openstack.kolla/ceilometer-compute:2023.1-ubuntu-jammy            "dumb-init --single-…"   3 days ago   Up 3 minutes (unhealthy)             ceilometer_compute
da3bb6f8f71b   quay.io/openstack.kolla/kuryr-libnetwork:2023.1-ubuntu-jammy              "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               kuryr
7fa1016b0acf   quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy     "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               neutron_openvswitch_agent
98016d47c4d6   quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy          "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               openvswitch_vswitchd
2676319cfbdc   quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy         "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               openvswitch_db
8b750f8dc593   quay.io/openstack.kolla/nova-compute:2023.1-ubuntu-jammy                  "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               nova_compute
84397013842c   quay.io/openstack.kolla/nova-libvirt:2023.1-ubuntu-jammy                  "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               nova_libvirt
3768d9da5ab7   quay.io/openstack.kolla/nova-ssh:2023.1-ubuntu-jammy                      "dumb-init --single-…"   3 days ago   Up 3 minutes (healthy)               nova_ssh
ec5a5dd65cb4   quay.io/openstack.kolla/iscsid:2023.1-ubuntu-jammy                        "dumb-init --single-…"   3 days ago   Up 3 minutes                         iscsid
f4185c0884ae   quay.io/openstack.kolla/prometheus-libvirt-exporter:2023.1-ubuntu-jammy   "dumb-init --single-…"   3 days ago   Up 3 minutes                         prometheus_libvirt_exporter
d9942be630fa   quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy           "dumb-init --single-…"   3 days ago   Up 3 minutes                         prometheus_cadvisor
04fec61c5671   quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy      "dumb-init --single-…"   3 days ago   Up 3 minutes                         prometheus_node_exporter
221098bf97e7   quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy                          "dumb-init --single-…"   3 days ago   Up 3 minutes                         cron
36fc2702d398   quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy                 "dumb-init --single-…"   3 days ago   Up 3 minutes                         kolla_toolbox
80f42d83c6f7   quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy                       "dumb-init --single-…"   3 days ago   Up 3 minutes                         fluentd

If not using any configuration management, be sure to start all Openstack services.

Re-Enable Instance Scheduling on the Compute Node

Once the node is up and all the services are up;

openstack hypervisor list
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| ID                                   | Hypervisor Hostname | Hypervisor Type | Host IP         | State |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| 6aa76044-d456-4c3b-8f28-fcfc7e79b658 | compute01           | QEMU            | 192.168.200.202 | up    |
| 7365f5eb-62e1-477e-bf45-8f77ea98802a | compute02           | QEMU            | 192.168.200.203 | up    |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
openstack compute service list --host compute02
+--------------------------------------+--------------+-----------+------+----------+-------+----------------------------+
| ID                                   | Binary       | Host      | Zone | Status   | State | Updated At                 |
+--------------------------------------+--------------+-----------+------+----------+-------+----------------------------+
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute | compute02 | nova | disabled | up    | 2023-11-08T20:32:50.000000 |
+--------------------------------------+--------------+-----------+------+----------+-------+----------------------------+

next, re-enable instance scheduling on the node;

openstack compute service set --enable compute02 nova-compute

Or from the dashboard;

re enable instance scheduling

[Optional] Migrate Instances Back to Original Nodes

Once the node is up and running, you can choose to migrate the instances back to it or just let new instances scheduled on it.

If you want to migrate, you can as done above, or do it from horizon;

e.g to live migrate our instance, gracious_turing, back to compute02;

How to Safely Reboot OpenStack Compute Node

Block live migration;

block live migration

Submit to begin the migration process.

And voila, the instance is now running on the original compute node;

instance migrated 1

And that completes our guide on how to reboot OpenStack compute node in a safe way.

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
Linux Certified Engineer, with a passion for open-source technology and a strong understanding of Linux systems. With experience in system administration, troubleshooting, and automation, I am skilled in maintaining and optimizing Linux infrastructure.

Leave a Comment