Safely Remove Compute Node from OpenStack Deployment

|
Last Updated:
|
|

How can I safely remove compute node from OpenStack deployment? When it comes to managing an OpenStack deployment, ensuring the safe removal of a compute node is a crucial task. Whether it’s for scaling down resources or performing maintenance, the process requires careful consideration to prevent disruptions to ongoing operations. In this guide, we’ll go through the steps you can take to safely remove compute node, minimizing potential impacts and maintaining the stability of your cloud infrastructure.

Remove Compute Node Safely from OpenStack Deployment

Disable Instance Scheduling on Compute Node

In an active environment where you cannot control who is creating OpenStack instances, you won’t one instances being launched on the node that you have marked for removal.

As a result, disable any instance scheduling on such a node.

You can disable instance scheduling on the respective compute node by disabling the nova compute service in that host either from the CLI or from OpenStack horizon;

Note: We are using an OpenStack deployed using Kolla-Ansible.

Hence, activate the virtual env and load the credentials.

source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh

You can list the services using the command below;

openstack compute service list --host compute02
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| ID                                   | Binary       | Host      | Zone | Status  | State | Updated At                 |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute | compute02 | nova | enabled | up    | 2023-11-07T21:15:50.000000 |
+--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+

To disable instance scheduling on a compute node from the CLI, run the command below;

openstack compute service set [-h] [--enable | --disable] [--disable-reason <reason>] [--up | --down] <host> <service>

For example, to disable nova compute service on compute02 node;

openstack compute service set --disable --disable-reason for-safe-removal compute02 nova-compute

You can do the same from Horizon, Admin > Compute > Hypervisors > Select the Host > Actions > Disable service.

Migrate Instances/VMs to Other Compute Nodes

There are different ways in which you can migrate your OpenStack instances.

Cold Instance Migration

Cold migration, or non-live migration, involves shutting down a running instance before migrating it from the source compute node to the destination compute node. Cold migration necessitates a brief interruption in the instance’s operation. The migrated instance retains access to its original volumes and IP addresses.

Live Instance Migration

Live migration seamlessly shifts the instance from the source Compute node to the destination Compute node without any need for a shutdown, all while preserving state consistency throughout the process.

If your OpenStack environment supports live migration (you can check Feature Support Matrix to determine which hypervisors support live-migration), consider migrating VM instances from the compute node you plan to remove to other available compute nodes. Live migration allows VMs to remain running during the process, minimizing downtime.

Live migrations can be categorized based on how they handle instance storage:

  1. Shared Storage-Based Live Migration: This type of migration involves instances with ephemeral disks stored on shared storage accessible to both the source and destination hosts. This method is faster and more efficient than block live migration because the instance’s disk images are already accessible to the destination host.
  2. Block Live Migration (Block Migration): Block migration is used when instances have ephemeral disks (e.g. instances booting from image) that are not shared between the source and destination hosts. It’s important to note that block migration is not compatible with read-only devices like CD-ROMs and Configuration Drive (config_drive). This method is slower and more resource-intensive than shared storage-based live migration.
  3. Volume-Backed Live Migration: In this scenario, instances use volumes for storage instead of ephemeral disks. This method is faster than block live migration because the disk images do not need to be copied. However, it is still slower than shared storage-based live migration because the block storage volumes need to be attached to the destination host. Block storage backends such as Ceph, Cinder, GlusterFS e.t.c support volume-backed live migration

These classifications help determine the method of live migration suitable for your specific instance and storage setup.

You can do the migration from the horizon dashboard or from the command line.

Kindly note that Openstack instance migration is a proactive and planned operation. There are some situations where a compute node may experience emergencies such hardware failures or similar. In such situations, you might want to use the evacute process instead.

Get a List of Running Instances on Compute Node to Remove

To begin with, get a list of all instances running on the compute node you need to remove. For example, below is a list of instances running on our compute02 node;

openstack server list --host compute02 --all-projects

Sample output;

+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| ID                                   | Name            | Status | Networks                | Image  | Flavor  |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+
| 9eaa3419-47cf-40bd-a981-92517c81e2c7 | gracious_turing | ACTIVE | DEMO_NET=192.168.50.128 | cirros | custom1 |
+--------------------------------------+-----------------+--------+-------------------------+--------+---------+

Get a List of Compute Nodes

Similarly, you can also list compute nodes available (just in case you want to explicitly specify which node to migrate an instance to, otherwise the nova scheduler takes care of all the decisions on where to place the instance being migrated, incase you have multiple compute nodes).

openstack hypervisor list
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| ID                                   | Hypervisor Hostname | Hypervisor Type | Host IP         | State |
+--------------------------------------+---------------------+-----------------+-----------------+-------+
| 6aa76044-d456-4c3b-8f28-fcfc7e79b658 | compute01           | QEMU            | 192.168.200.202 | up    |
| 7365f5eb-62e1-477e-bf45-8f77ea98802a | compute02           | QEMU            | 192.168.200.203 | up    |
+--------------------------------------+---------------------+-----------------+-----------------+-------+

Migrate OpenStack Instances to Other Compute Nodes

Once you have the information about compute nodes, you can now proceed to migrate your instances.

As already mentioned, depending on the criticality of the operations/services handled by an instance, you can choose to go the cold or the live migration way.

OpenStack instances can be migrated using the command, openstack server migrate.

openstack server migrate --help
usage: openstack server migrate [-h] [--live-migration] [--host ] [--shared-migration | --block-migration] [--disk-overcommit | --no-disk-overcommit]
                                [--wait]
                                

Migrate server to different host. A migrate operation is implemented as a resize operation using the same flavor as the old server. This means that, like resize, migrate
works by creating a new server using the same flavor and copying the contents of the original disk into a new one. As with resize, the migrate operation is a two-step
process for the user: the first step is to perform the migrate, and the second step is to either confirm (verify) success and release the old server, or to declare a
revert to release the new server and restart the old one.

positional arguments:
        Server (name or ID)

options:
  -h, --help            show this help message and exit
  --live-migration      Live migrate the server; use the ``--host`` option to specify a target host for the migration which will be validated by the scheduler
  --host 
                        Migrate the server to the specified host. (supported with --os-compute-api-version 2.30 or above when used with the --live-migration option)
                        (supported with --os-compute-api-version 2.56 or above when used without the --live-migration option)
  --shared-migration    Perform a shared live migration (default before --os-compute-api-version 2.25, auto after)
  --block-migration     Perform a block live migration (auto-configured from --os-compute-api-version 2.25)
  --disk-overcommit     Allow disk over-commit on the destination host(supported with --os-compute-api-version 2.24 or below)
  --no-disk-overcommit  Do not over-commit disk on the destination host (default)(supported with --os-compute-api-version 2.24 or below)
  --wait                Wait for migrate to complete

So, let’s live migrate my instance, gracious_turing, with the UUID, ee54d242-4fdd-4a3b-8ee5-30b3171e1df6.

Note that the instance is booting from an image and no shared storage, hence, we will do block-based live migration;

openstack server migrate --live-migration --block-migration gracious_turing --wait

If you check on horizon, under instances, you will see the instance status as migrating.

How to Safely Remove Compute Node from OpenStack Deployment

If you want to do cold migration, then you can shut down an instance and migrate them.

Verify Instance Migration

After a short while, the instance migration should be completed. Since I have only two compute nodes, the instance should have been migrated to compute01;

instance migrated

You can also check instances from command line;

openstack server list --all-projects --long
+--------------------------------------+-------------------+--------+------------+-------------+-------------------------+------------+--------------------------------------+---------+-------------------+-----------+------------+-------------+
| ID                                   | Name              | Status | Task State | Power State | Networks                | Image Name | Image ID                             | Flavor  | Availability Zone | Host      | Properties | Host Status |
+--------------------------------------+-------------------+--------+------------+-------------+-------------------------+------------+--------------------------------------+---------+-------------------+-----------+------------+-------------+
| 9eaa3419-47cf-40bd-a981-92517c81e2c7 | gracious_turing   | ACTIVE | None       | Running     | DEMO_NET=192.168.50.128 | cirros     | 25dead1a-874c-4f19-b0b5-8ea739a15796 | custom1 | nova              | compute01 |            | UP          |
| 6ea369b3-27f1-44d2-93aa-6f6e94533e6d | peaceful_hamilton | ACTIVE | None       | Running     | DEMO_NET=192.168.50.113 | cirros     | 25dead1a-874c-4f19-b0b5-8ea739a15796 | custom1 | nova              | compute01 |            | UP          |
| c4f95fa1-d5ed-4765-8305-04b2c559dd83 | vibrant_torvalds  | ACTIVE | None       | Running     | DEMO_NET=192.168.50.150 | cirros     | 25dead1a-874c-4f19-b0b5-8ea739a15796 | custom1 | nova              | compute01 |            | UP          |
+--------------------------------------+-------------------+--------+------------+-------------+-------------------------+------------+--------------------------------------+---------+-------------------+-----------+------------+-------------+

As you can see, all instances are running on compute01 node now.

Migrate Volumes (If Applicable)

If the compute node had instances volumes attached to it, then you need to migrate the volumes as well.

Use the openstack volume migrate command to migrate the volumes associated with instance from one compute node to another.

openstack volume migrate --help
usage: openstack volume migrate [-h] --host  [--force-host-copy] [--lock-volume] 

Migrate volume to a new host

positional arguments:
        Volume to migrate (name or ID)

options:
  -h, --help            show this help message and exit
  --host 
                        Destination host (takes the form: host@backend-name#pool)
  --force-host-copy     Enable generic host-based force-migration, which bypasses driver optimizations
  --lock-volume         If specified, the volume state will be locked and will not allow a migration to be aborted (possibly by another operation)

Stop all OpenStack services running on the compute node

Once the instances on the compute node are migrated, you can now login to compute node and stop all openstack services.

If you are using ansible, then you can use it to check and stop the services on the compute node.

For example, let’s verify, from the controller/ansible node, all openstack services running on our compute02

ansible -i multinode -m raw -a "docker ps" compute02
compute02 | CHANGED | rc=0 >>
CONTAINER ID   IMAGE                                                                     COMMAND                  CREATED      STATUS                PORTS     NAMES
efe871ef9fbf   quay.io/openstack.kolla/zun-cni-daemon:2023.1-ubuntu-jammy                "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             zun_cni_daemon
f6155141547b   quay.io/openstack.kolla/zun-compute:2023.1-ubuntu-jammy                   "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             zun_compute
143e53a3b9de   quay.io/openstack.kolla/ceilometer-compute:2023.1-ubuntu-jammy            "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             ceilometer_compute
da3bb6f8f71b   quay.io/openstack.kolla/kuryr-libnetwork:2023.1-ubuntu-jammy              "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             kuryr
7fa1016b0acf   quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy     "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             neutron_openvswitch_agent
98016d47c4d6   quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy          "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             openvswitch_vswitchd
2676319cfbdc   quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy         "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             openvswitch_db
8b750f8dc593   quay.io/openstack.kolla/nova-compute:2023.1-ubuntu-jammy                  "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             nova_compute
84397013842c   quay.io/openstack.kolla/nova-libvirt:2023.1-ubuntu-jammy                  "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             nova_libvirt
3768d9da5ab7   quay.io/openstack.kolla/nova-ssh:2023.1-ubuntu-jammy                      "dumb-init --single-…"   3 days ago   Up 3 days (healthy)             nova_ssh
ec5a5dd65cb4   quay.io/openstack.kolla/iscsid:2023.1-ubuntu-jammy                        "dumb-init --single-…"   3 days ago   Up 3 days                       iscsid
f4185c0884ae   quay.io/openstack.kolla/prometheus-libvirt-exporter:2023.1-ubuntu-jammy   "dumb-init --single-…"   3 days ago   Up 3 days                       prometheus_libvirt_exporter
d9942be630fa   quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy           "dumb-init --single-…"   3 days ago   Up 3 days                       prometheus_cadvisor
04fec61c5671   quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy      "dumb-init --single-…"   3 days ago   Up 3 days                       prometheus_node_exporter
221098bf97e7   quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy                          "dumb-init --single-…"   3 days ago   Up 3 days                       cron
36fc2702d398   quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy                 "dumb-init --single-…"   3 days ago   Up 3 days                       kolla_toolbox
80f42d83c6f7   quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy                       "dumb-init --single-…"   3 days ago   Up 3 days                       fluentd

The easiest way to stop these Docker services, remember we deployed our OpenStack using Kolla-Ansible, simply stop the docker service.

kolla-ansible -i <inventory> stop --yes-i-really-really-mean-it [ --limit <limit> ]

So, to stop all the Openstack services on compute02;

source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh
kolla-ansible -i multinode stop --yes-i-really-really-mean-it  --limit compute02

If you are not using configuration management tools such Ansible, be sure to stop nova-compute and neutron-linuxbridge-agent when you stop the services.

Remove OpenStack Compute Node Compute Service

Next, remove the compute node compute service from the database;

You can execute these commands from control node.

List the compute services;

openstack compute service list
+--------------------------------------+----------------+--------------+----------+----------+-------+----------------------------+
| ID                                   | Binary         | Host         | Zone     | Status   | State | Updated At                 |
+--------------------------------------+----------------+--------------+----------+----------+-------+----------------------------+
| 67db62aa-58a2-4e66-9a8b-bb1c85bd23e2 | nova-scheduler | controller01 | internal | enabled  | up    | 2023-11-09T18:07:18.000000 |
| b9520af1-490d-43b7-98ba-a55b0349b38c | nova-conductor | controller01 | internal | enabled  | up    | 2023-11-09T18:07:18.000000 |
| 5fdae690-ddbf-4dc3-a41e-61866858054b | nova-compute   | compute01    | nova     | enabled  | up    | 2023-11-09T18:07:17.000000 |
| 464698d3-0da5-44cb-ba91-7d6782b2cff9 | nova-compute   | compute02    | nova     | disabled | down  | 2023-11-09T18:04:07.000000 |
+--------------------------------------+----------------+--------------+----------+----------+-------+----------------------------+

So, we want to remove compute service on compute02. Hence, obtain the ID of the compute service on the respective node to be removed and proceed to remove the compute service from the node;

openstack compute service delete 464698d3-0da5-44cb-ba91-7d6782b2cff9

Remove OpenStack Compute Node Neutron Agents

Next, remove the Neutron agents on the compute node.

You can list the agents as follows;

openstack network agent list --host <compute-node>

For example;

openstack network agent list --host compute02
+--------------------------------------+--------------------+-----------+-------------------+-------+-------+---------------------------+
| ID                                   | Agent Type         | Host      | Availability Zone | Alive | State | Binary                    |
+--------------------------------------+--------------------+-----------+-------------------+-------+-------+---------------------------+
| 313cd889-08d0-423f-befa-0254bd3bdefc | Open vSwitch agent | compute02 | None              | XXX   | UP    | neutron-openvswitch-agent |
+--------------------------------------+--------------------+-----------+-------------------+-------+-------+---------------------------+

Delete the Agent (openstack network agent delete <agent_id>);

openstack network agent delete 313cd889-08d0-423f-befa-0254bd3bdefc

Remove the hosts from the Ansible inventory

If you are using Kolla-Ansible, it is now time to delete the compute node from the inventory.

And that completes our guide on how to safely remove compute node from OpenStack deployment.

Re-add compute node into OpenStack

If you want to add new compute node into OpenStack, check our guide below;

Add Compute Nodes into OpenStack using Kolla-Ansible

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
Linux Certified Engineer, with a passion for open-source technology and a strong understanding of Linux systems. With experience in system administration, troubleshooting, and automation, I am skilled in maintaining and optimizing Linux infrastructure.

Leave a Comment