How can I safely reboot OpenStack compute node? In an OpenStack environment, the compute nodes are the workhorses responsible for running virtual machines and managing compute resources. However, there may be situations where you need to reboot a compute node for maintenance or troubleshooting. Performing this task safely is crucial to avoid disruption to your cloud services. This blog post will guide you through the steps to safely reboot an OpenStack compute node, minimizing downtime and ensuring a smooth transition while maintaining the integrity of your cloud infrastructure.
Table of Contents
Reboot OpenStack Compute Node Safely
Disable Instance Scheduling on Compute Node
In an active environment where you cannot control who is creating OpenStack instances, you won’t one instances being launched on the node that you have marked for reboot.
As such, you need to ensure that you disable any instance scheduling on the same node marked for reboot.
You can disable instance scheduling on the respective compute node by disabling the nova compute service in that host either from the CLI or from OpenStack horizon;
You can list the services using the command below;
openstack compute service list --host compute02
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute | compute02 | nova | enabled | up | 2023-11-07T21:15:50.000000 |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
To disable instance scheduling on a compute node from the CLI, run the command below;
(Note that we deployed our OpenStack using Kolla-Ansible)
source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh
openstack compute service set [-h] [--enable | --disable] [--disable-reason <reason>] [--up | --down] <host> <service>
For example, to disable nova compute service on compute02 node;
openstack compute service set --disable --disable-reason for-safe-reboot compute02 nova-compute
You can do the same from Horizon, Admin > Compute > Hypervisors > Select the Host > Actions > Disable service.
Migrate Instances/VMs to Other Compute Nodes
Cold Instance Migration
Cold migration, or non-live migration, involves shutting down a running instance before migrating it from the source compute node to the destination compute node. Cold migration necessitates a brief interruption in the instance’s operation. The migrated instance retains access to its original volumes and IP addresses.
Live Instance Migration
Live migration seamlessly shifts the instance from the source Compute node to the destination Compute node without any need for a shutdown, all while preserving state consistency throughout the process.
If your OpenStack environment supports live migration (you can check Feature Support Matrix to determine which hypervisors support live-migration.), consider migrating VM instances from the compute node you plan to reboot to other available compute nodes. Live migration allows VMs to remain running during the process, minimizing downtime.
Live migrations can be categorized based on how they handle instance storage:
- Shared Storage-Based Live Migration: This type of migration involves instances with ephemeral disks stored on shared storage accessible to both the source and destination hosts. This method is faster and more efficient than block live migration because the instance’s disk images are already accessible to the destination host.
- Block Live Migration (Block Migration): Block migration is used when instances have ephemeral disks (e.g. instances booting from image) that are not shared between the source and destination hosts. It’s important to note that block migration is not compatible with read-only devices like CD-ROMs and Configuration Drive (config_drive). This method is slower and more resource-intensive than shared storage-based live migration.
- Volume-Backed Live Migration: In this scenario, instances use volumes for storage instead of ephemeral disks. This method is faster than block live migration because the disk images do not need to be copied. However, it is still slower than shared storage-based live migration because the block storage volumes need to be attached to the destination host. Block storage backends such as Ceph, Cinder, GlusterFS e.t.c support volume-backed live migration
These classifications help determine the method of live migration suitable for your specific instance and storage setup.
You can do the migration from the horizon dashboard or from the command line.
Get a List of Running Instances on Compute Node to Reboot
To begin with, get a list of all instances running on the compute node you need to reboot. For example, below is a list of instances running on our compute02 node;
openstack server list --host compute02 --all-projects
Sample output;
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| ee54d242-4fdd-4a3b-8ee5-30b3171e1df6 | gracious_turing | ACTIVE | DEMO_NET=192.168.50.123 | cirros | custom1 |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
Get a List of Compute Nodes
Similarly, you can also list compute nodes available (just in case you want to explicitly specify which node to migrate an instance to, otherwise the nova scheduler takes care of all the decisions on where to place the instance being migrated, just incase you have multiple compute nodes).
openstack hypervisor list
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| 6aa76044-d456-4c3b-8f28-fcfc7e79b658 | compute01 | QEMU | 192.168.200.202 | up |
| 7365f5eb-62e1-477e-bf45-8f77ea98802a | compute02 | QEMU | 192.168.200.203 | up |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
Migrate OpenStack Instances to Other Compute Nodes
Once you have the information, you can now proceed to migrate an instance.
As already mentioned, depending on the criticality of the operations/services handled by an instance, you can choose to go the cold or the live way.
OpenStack instances can be migrated using the command, openstack server migrate
.
openstack server migrate --help
usage: openstack server migrate [-h] [--live-migration] [--host ] [--shared-migration | --block-migration] [--disk-overcommit | --no-disk-overcommit]
[--wait]
Migrate server to different host. A migrate operation is implemented as a resize operation using the same flavor as the old server. This means that, like resize, migrate
works by creating a new server using the same flavor and copying the contents of the original disk into a new one. As with resize, the migrate operation is a two-step
process for the user: the first step is to perform the migrate, and the second step is to either confirm (verify) success and release the old server, or to declare a
revert to release the new server and restart the old one.
positional arguments:
Server (name or ID)
options:
-h, --help show this help message and exit
--live-migration Live migrate the server; use the ``--host`` option to specify a target host for the migration which will be validated by the scheduler
--host
Migrate the server to the specified host. (supported with --os-compute-api-version 2.30 or above when used with the --live-migration option)
(supported with --os-compute-api-version 2.56 or above when used without the --live-migration option)
--shared-migration Perform a shared live migration (default before --os-compute-api-version 2.25, auto after)
--block-migration Perform a block live migration (auto-configured from --os-compute-api-version 2.25)
--disk-overcommit Allow disk over-commit on the destination host(supported with --os-compute-api-version 2.24 or below)
--no-disk-overcommit Do not over-commit disk on the destination host (default)(supported with --os-compute-api-version 2.24 or below)
--wait Wait for migrate to complete
So, let’s live migrate my instance, gracious_turing
, with the UUID, ee54d242-4fdd-4a3b-8ee5-30b3171e1df6
.
Note that the instance is booting from an image and no shared storage, hence, we will do block-based live migration;
openstack server migrate --live-migration --block-migration gracious_turing --wait
If you check on horizon, under instances, you will see the instance status as migrating.
After a short while, the instance migration should be completed. Since I have only two compute nodes, the instance should have been migrated to compute01;
If you want to do cold migration, then you can shut down an instance an migrate them.
Stop all OpenStack services running on the compute node
Once the instances on the compute node are migrated, you can now login to compute node and stop all openstack services.
If you are using ansible, then you can use it to check and stop the services on the compute node.
For example, let’s verify, from the controller/ansible node, all openstack services running on our compute02
ansible -i multinode -m raw -a "docker ps" compute02
compute02 | CHANGED | rc=0 >>
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
efe871ef9fbf quay.io/openstack.kolla/zun-cni-daemon:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) zun_cni_daemon
f6155141547b quay.io/openstack.kolla/zun-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) zun_compute
143e53a3b9de quay.io/openstack.kolla/ceilometer-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) ceilometer_compute
da3bb6f8f71b quay.io/openstack.kolla/kuryr-libnetwork:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) kuryr
7fa1016b0acf quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) neutron_openvswitch_agent
98016d47c4d6 quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) openvswitch_vswitchd
2676319cfbdc quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) openvswitch_db
8b750f8dc593 quay.io/openstack.kolla/nova-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) nova_compute
84397013842c quay.io/openstack.kolla/nova-libvirt:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) nova_libvirt
3768d9da5ab7 quay.io/openstack.kolla/nova-ssh:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) nova_ssh
ec5a5dd65cb4 quay.io/openstack.kolla/iscsid:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days iscsid
f4185c0884ae quay.io/openstack.kolla/prometheus-libvirt-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days prometheus_libvirt_exporter
d9942be630fa quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days prometheus_cadvisor
04fec61c5671 quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days prometheus_node_exporter
221098bf97e7 quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days cron
36fc2702d398 quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days kolla_toolbox
80f42d83c6f7 quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days fluentd
The easiest way to stop these Docker services, remember we deployed our OpenStack using Kolla-Ansible, simply stop the docker service.
ansible -i multinode -m raw -a "sudo systemctl stop docker.service docker.socket" compute02
If you are not using configuration managemen tools such Ansible, be sure to stop nova-compute and neutron-linuxbridge-agent when you stop the services.
Reboot OpenStack Compute Node
Next, reboot the compute node. Again, we will use Ansible in our setup;
ansible -i multinode -m raw -a "sudo systemctl reboot -i" compute02
Start OpenStack Services
If you are using configuration management tools such as Ansible, chances are high that the OpenStack services will be started automatically after reboot.
ansible -i multinode -m raw -a "docker ps" compute02
compute02 | CHANGED | rc=0 >>
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
efe871ef9fbf quay.io/openstack.kolla/zun-cni-daemon:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) zun_cni_daemon
f6155141547b quay.io/openstack.kolla/zun-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) zun_compute
143e53a3b9de quay.io/openstack.kolla/ceilometer-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (unhealthy) ceilometer_compute
da3bb6f8f71b quay.io/openstack.kolla/kuryr-libnetwork:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) kuryr
7fa1016b0acf quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) neutron_openvswitch_agent
98016d47c4d6 quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) openvswitch_vswitchd
2676319cfbdc quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) openvswitch_db
8b750f8dc593 quay.io/openstack.kolla/nova-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) nova_compute
84397013842c quay.io/openstack.kolla/nova-libvirt:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) nova_libvirt
3768d9da5ab7 quay.io/openstack.kolla/nova-ssh:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes (healthy) nova_ssh
ec5a5dd65cb4 quay.io/openstack.kolla/iscsid:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes iscsid
f4185c0884ae quay.io/openstack.kolla/prometheus-libvirt-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes prometheus_libvirt_exporter
d9942be630fa quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes prometheus_cadvisor
04fec61c5671 quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes prometheus_node_exporter
221098bf97e7 quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes cron
36fc2702d398 quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes kolla_toolbox
80f42d83c6f7 quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 minutes fluentd
If not using any configuration management, be sure to start all Openstack services.
Re-Enable Instance Scheduling on the Compute Node
Once the node is up and all the services are up;
openstack hypervisor list
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| 6aa76044-d456-4c3b-8f28-fcfc7e79b658 | compute01 | QEMU | 192.168.200.202 | up |
| 7365f5eb-62e1-477e-bf45-8f77ea98802a | compute02 | QEMU | 192.168.200.203 | up |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
openstack compute service list --host compute02
+--------------------------------------+--------------+-----------+------+----------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------------------------+--------------+-----------+------+----------+-------+----------------------------+
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute | compute02 | nova | disabled | up | 2023-11-08T20:32:50.000000 |
+--------------------------------------+--------------+-----------+------+----------+-------+----------------------------+
next, re-enable instance scheduling on the node;
openstack compute service set --enable compute02 nova-compute
Or from the dashboard;
[Optional] Migrate Instances Back to Original Nodes
Once the node is up and running, you can choose to migrate the instances back to it or just let new instances scheduled on it.
If you want to migrate, you can as done above, or do it from horizon;
e.g to live migrate our instance, gracious_turing, back to compute02;
Block live migration;
Submit to begin the migration process.
And voila, the instance is now running on the original compute node;
And that completes our guide on how to reboot OpenStack compute node in a safe way.