How can I safely remove compute node from OpenStack deployment? When it comes to managing an OpenStack deployment, ensuring the safe removal of a compute node is a crucial task. Whether it’s for scaling down resources or performing maintenance, the process requires careful consideration to prevent disruptions to ongoing operations. In this guide, we’ll go through the steps you can take to safely remove compute node, minimizing potential impacts and maintaining the stability of your cloud infrastructure.
Table of Contents
Remove Compute Node Safely from OpenStack Deployment
Disable Instance Scheduling on Compute Node
In an active environment where you cannot control who is creating OpenStack instances, you won’t one instances being launched on the node that you have marked for removal.
As a result, disable any instance scheduling on such a node.
You can disable instance scheduling on the respective compute node by disabling the nova compute service in that host either from the CLI or from OpenStack horizon;
Note: We are using an OpenStack deployed using Kolla-Ansible.
Hence, activate the virtual env and load the credentials.
source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh
You can list the services using the command below;
openstack compute service list --host compute02
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute | compute02 | nova | enabled | up | 2023-11-07T21:15:50.000000 |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
To disable instance scheduling on a compute node from the CLI, run the command below;
openstack compute service set [-h] [--enable | --disable] [--disable-reason <reason>] [--up | --down] <host> <service>
For example, to disable nova compute service on compute02 node;
openstack compute service set --disable --disable-reason for-safe-removal compute02 nova-compute
You can do the same from Horizon, Admin > Compute > Hypervisors > Select the Host > Actions > Disable service.
Migrate Instances/VMs to Other Compute Nodes
There are different ways in which you can migrate your OpenStack instances.
Cold Instance Migration
Cold migration, or non-live migration, involves shutting down a running instance before migrating it from the source compute node to the destination compute node. Cold migration necessitates a brief interruption in the instance’s operation. The migrated instance retains access to its original volumes and IP addresses.
Live Instance Migration
Live migration seamlessly shifts the instance from the source Compute node to the destination Compute node without any need for a shutdown, all while preserving state consistency throughout the process.
If your OpenStack environment supports live migration (you can check Feature Support Matrix to determine which hypervisors support live-migration), consider migrating VM instances from the compute node you plan to remove to other available compute nodes. Live migration allows VMs to remain running during the process, minimizing downtime.
Live migrations can be categorized based on how they handle instance storage:
- Shared Storage-Based Live Migration: This type of migration involves instances with ephemeral disks stored on shared storage accessible to both the source and destination hosts. This method is faster and more efficient than block live migration because the instance’s disk images are already accessible to the destination host.
- Block Live Migration (Block Migration): Block migration is used when instances have ephemeral disks (e.g. instances booting from image) that are not shared between the source and destination hosts. It’s important to note that block migration is not compatible with read-only devices like CD-ROMs and Configuration Drive (config_drive). This method is slower and more resource-intensive than shared storage-based live migration.
- Volume-Backed Live Migration: In this scenario, instances use volumes for storage instead of ephemeral disks. This method is faster than block live migration because the disk images do not need to be copied. However, it is still slower than shared storage-based live migration because the block storage volumes need to be attached to the destination host. Block storage backends such as Ceph, Cinder, GlusterFS e.t.c support volume-backed live migration
These classifications help determine the method of live migration suitable for your specific instance and storage setup.
You can do the migration from the horizon dashboard or from the command line.
Kindly note that Openstack instance migration is a proactive and planned operation. There are some situations where a compute node may experience emergencies such hardware failures or similar. In such situations, you might want to use the evacute process instead.
Get a List of Running Instances on Compute Node to Remove
To begin with, get a list of all instances running on the compute node you need to remove. For example, below is a list of instances running on our compute02 node;
openstack server list --host compute02 --all-projects
Sample output;
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| 9eaa3419-47cf-40bd-a981-92517c81e2c7 | gracious_turing | ACTIVE | DEMO_NET=192.168.50.128 | cirros | custom1 |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
Get a List of Compute Nodes
Similarly, you can also list compute nodes available (just in case you want to explicitly specify which node to migrate an instance to, otherwise the nova scheduler takes care of all the decisions on where to place the instance being migrated, incase you have multiple compute nodes).
openstack hypervisor list
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| 6aa76044-d456-4c3b-8f28-fcfc7e79b658 | compute01 | QEMU | 192.168.200.202 | up |
| 7365f5eb-62e1-477e-bf45-8f77ea98802a | compute02 | QEMU | 192.168.200.203 | up |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
Migrate OpenStack Instances to Other Compute Nodes
Once you have the information about compute nodes, you can now proceed to migrate your instances.
As already mentioned, depending on the criticality of the operations/services handled by an instance, you can choose to go the cold or the live migration way.
OpenStack instances can be migrated using the command, openstack server migrate
.
openstack server migrate --help
usage: openstack server migrate [-h] [--live-migration] [--host ] [--shared-migration | --block-migration] [--disk-overcommit | --no-disk-overcommit]
[--wait]
Migrate server to different host. A migrate operation is implemented as a resize operation using the same flavor as the old server. This means that, like resize, migrate
works by creating a new server using the same flavor and copying the contents of the original disk into a new one. As with resize, the migrate operation is a two-step
process for the user: the first step is to perform the migrate, and the second step is to either confirm (verify) success and release the old server, or to declare a
revert to release the new server and restart the old one.
positional arguments:
Server (name or ID)
options:
-h, --help show this help message and exit
--live-migration Live migrate the server; use the ``--host`` option to specify a target host for the migration which will be validated by the scheduler
--host
Migrate the server to the specified host. (supported with --os-compute-api-version 2.30 or above when used with the --live-migration option)
(supported with --os-compute-api-version 2.56 or above when used without the --live-migration option)
--shared-migration Perform a shared live migration (default before --os-compute-api-version 2.25, auto after)
--block-migration Perform a block live migration (auto-configured from --os-compute-api-version 2.25)
--disk-overcommit Allow disk over-commit on the destination host(supported with --os-compute-api-version 2.24 or below)
--no-disk-overcommit Do not over-commit disk on the destination host (default)(supported with --os-compute-api-version 2.24 or below)
--wait Wait for migrate to complete
So, let’s live migrate my instance, gracious_turing
, with the UUID, ee54d242-4fdd-4a3b-8ee5-30b3171e1df6
.
Note that the instance is booting from an image and no shared storage, hence, we will do block-based live migration;
openstack server migrate --live-migration --block-migration gracious_turing --wait
If you check on horizon, under instances, you will see the instance status as migrating.
If you want to do cold migration, then you can shut down an instance and migrate them.
Verify Instance Migration
After a short while, the instance migration should be completed. Since I have only two compute nodes, the instance should have been migrated to compute01;
You can also check instances from command line;
openstack server list --all-projects --long
+--------------------------------------+-------------------+--------+------------+-------------+-------------------------+------------+--------------------------------------+---------+-------------------+-----------+------------+-------------+
| ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor | Availability Zone | Host | Properties | Host Status |
+--------------------------------------+-------------------+--------+------------+-------------+-------------------------+------------+--------------------------------------+---------+-------------------+-----------+------------+-------------+
| 9eaa3419-47cf-40bd-a981-92517c81e2c7 | gracious_turing | ACTIVE | None | Running | DEMO_NET=192.168.50.128 | cirros | 25dead1a-874c-4f19-b0b5-8ea739a15796 | custom1 | nova | compute01 | | UP |
| 6ea369b3-27f1-44d2-93aa-6f6e94533e6d | peaceful_hamilton | ACTIVE | None | Running | DEMO_NET=192.168.50.113 | cirros | 25dead1a-874c-4f19-b0b5-8ea739a15796 | custom1 | nova | compute01 | | UP |
| c4f95fa1-d5ed-4765-8305-04b2c559dd83 | vibrant_torvalds | ACTIVE | None | Running | DEMO_NET=192.168.50.150 | cirros | 25dead1a-874c-4f19-b0b5-8ea739a15796 | custom1 | nova | compute01 | | UP |
+--------------------------------------+-------------------+--------+------------+-------------+-------------------------+------------+--------------------------------------+---------+-------------------+-----------+------------+-------------+
As you can see, all instances are running on compute01 node now.
Migrate Volumes (If Applicable)
If the compute node had instances volumes attached to it, then you need to migrate the volumes as well.
Use the openstack volume migrate
command to migrate the volumes associated with instance from one compute node to another.
openstack volume migrate --help
usage: openstack volume migrate [-h] --host [--force-host-copy] [--lock-volume]
Migrate volume to a new host
positional arguments:
Volume to migrate (name or ID)
options:
-h, --help show this help message and exit
--host
Destination host (takes the form: host@backend-name#pool)
--force-host-copy Enable generic host-based force-migration, which bypasses driver optimizations
--lock-volume If specified, the volume state will be locked and will not allow a migration to be aborted (possibly by another operation)
Stop all OpenStack services running on the compute node
Once the instances on the compute node are migrated, you can now login to compute node and stop all openstack services.
If you are using ansible, then you can use it to check and stop the services on the compute node.
For example, let’s verify, from the controller/ansible node, all openstack services running on our compute02
ansible -i multinode -m raw -a "docker ps" compute02
compute02 | CHANGED | rc=0 >>
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
efe871ef9fbf quay.io/openstack.kolla/zun-cni-daemon:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) zun_cni_daemon
f6155141547b quay.io/openstack.kolla/zun-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) zun_compute
143e53a3b9de quay.io/openstack.kolla/ceilometer-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) ceilometer_compute
da3bb6f8f71b quay.io/openstack.kolla/kuryr-libnetwork:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) kuryr
7fa1016b0acf quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) neutron_openvswitch_agent
98016d47c4d6 quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) openvswitch_vswitchd
2676319cfbdc quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) openvswitch_db
8b750f8dc593 quay.io/openstack.kolla/nova-compute:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) nova_compute
84397013842c quay.io/openstack.kolla/nova-libvirt:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) nova_libvirt
3768d9da5ab7 quay.io/openstack.kolla/nova-ssh:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days (healthy) nova_ssh
ec5a5dd65cb4 quay.io/openstack.kolla/iscsid:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days iscsid
f4185c0884ae quay.io/openstack.kolla/prometheus-libvirt-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days prometheus_libvirt_exporter
d9942be630fa quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days prometheus_cadvisor
04fec61c5671 quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days prometheus_node_exporter
221098bf97e7 quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days cron
36fc2702d398 quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days kolla_toolbox
80f42d83c6f7 quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy "dumb-init --single-…" 3 days ago Up 3 days fluentd
The easiest way to stop these Docker services, remember we deployed our OpenStack using Kolla-Ansible, simply stop the docker service.
kolla-ansible -i <inventory> stop --yes-i-really-really-mean-it [ --limit <limit> ]
So, to stop all the Openstack services on compute02;
source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh
kolla-ansible -i multinode stop --yes-i-really-really-mean-it --limit compute02
If you are not using configuration management tools such Ansible, be sure to stop nova-compute and neutron-linuxbridge-agent when you stop the services.
Remove OpenStack Compute Node Compute Service
Next, remove the compute node compute service from the database;
You can execute these commands from control node.
List the compute services;
openstack compute service list
+--------------------------------------+----------------+--------------+----------+----------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------------------------+----------------+--------------+----------+----------+-------+----------------------------+
| 67db62aa-58a2-4e66-9a8b-bb1c85bd23e2 | nova-scheduler | controller01 | internal | enabled | up | 2023-11-09T18:07:18.000000 |
| b9520af1-490d-43b7-98ba-a55b0349b38c | nova-conductor | controller01 | internal | enabled | up | 2023-11-09T18:07:18.000000 |
| 5fdae690-ddbf-4dc3-a41e-61866858054b | nova-compute | compute01 | nova | enabled | up | 2023-11-09T18:07:17.000000 |
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute | compute02 | nova | disabled | down | 2023-11-09T18:04:07.000000 |
+--------------------------------------+----------------+--------------+----------+----------+-------+----------------------------+
So, we want to remove compute service on compute02. Hence, obtain the ID of the compute service on the respective node to be removed and proceed to remove the compute service from the node;
openstack compute service delete 464698d3-0da5-44cb-ba91-7d6782b2cff9
Remove OpenStack Compute Node Neutron Agents
Next, remove the Neutron agents on the compute node.
You can list the agents as follows;
openstack network agent list --host <compute-node>
For example;
openstack network agent list --host compute02
+--------------------------------------+--------------------+-----------+-------------------+-------+-------+---------------------------+
| ID | Agent Type | Host | Availability Zone | Alive | State | Binary |
+--------------------------------------+--------------------+-----------+-------------------+-------+-------+---------------------------+
| 313cd889-08d0-423f-befa-0254bd3bdefc | Open vSwitch agent | compute02 | None | XXX | UP | neutron-openvswitch-agent |
+--------------------------------------+--------------------+-----------+-------------------+-------+-------+---------------------------+
Delete the Agent (openstack network agent delete <agent_id>
);
openstack network agent delete 313cd889-08d0-423f-befa-0254bd3bdefc
Remove the hosts from the Ansible inventory
If you are using Kolla-Ansible, it is now time to delete the compute node from the inventory.
And that completes our guide on how to safely remove compute node from OpenStack deployment.
Re-add compute node into OpenStack
If you want to add new compute node into OpenStack, check our guide below;