Is it possible to add controller nodes into existing OpenStack cluster using Kolla-Ansible? Of course, yes! In this blog post, you will learn how to add controller nodes to existing OpenStack cluster to boost the performance and provide high availability. Kolla-Ansible is a powerful tool that streamlines the deployment and management of OpenStack services using Docker containers. If you have deployed your OpenStack using Kolla-Ansible, then this guide is for you!
Table of Contents
Use Kolla-Ansible to Add Controller Nodes into Existing OpenStack Cluster
What is a Controller Node and Why is it Important?
In an OpenStack environment, a controller node plays a crucial role in managing and coordinating various services that make up the cloud infrastructure. It acts as the central hub, overseeing functions like authentication, API requests, and overall orchestration of resources.
Here’s why a controller node is important:
- Centralized Management: The controller node serves as a centralized point for managing core OpenStack services like Nova (compute), Neutron (networking), Keystone (identity), Glance (image), and others. This centralization simplifies administration and control.
- Orchestration: It coordinates actions among different services to ensure they work together seamlessly. For example, when you launch a new virtual machine, the controller node manages the process, instructing the compute nodes to carry out the request.
- High Availability: Deploying multiple controller nodes allows for high availability configurations. If one controller node fails, others can take over, ensuring that essential services remain operational, minimizing downtime.
- Scalability: As your OpenStack cloud grows, adding more controller nodes becomes important for distributing the management load. This scalability ensures efficient handling of increased workloads and resources.
- Security and Identity Management: The controller node is responsible for user authentication and authorization through Keystone. This centralized identity management enhances security by ensuring consistent access controls across the OpenStack environment.
What is the recommended number of Controller Nodes in Openstack?
The recommended number of controller nodes for an OpenStack deployment depends on various factors, including the size of the cloud environment, the anticipated workload, and the desired level of high availability and redundancy. While there isn’t a one-size-fits-all answer, here are some few key notes to take:
- Minimum Configuration: In a small or development environment, a single controller node might be sufficient for running all the required OpenStack services. This, however, lacks redundancy and high availability.
- High Availability (HA) Configuration: For production environments where high availability is crucial, a minimum of three controller nodes is often recommended. This allows for redundancy, ensuring that if one controller node fails, the others can continue to operate. This is typically achieved through the use of HAProxy.
- Scalability: As your OpenStack environment grows, you can scale the number of controller nodes to handle increased load and provide better performance. Adding more controller nodes allows for a distributed and load-balanced architecture.
- Separation of Services: In larger deployments, it’s common to separate services onto different controller nodes based on their functions. For example, Keystone (identity service) and Horizon (dashboard) might be on one controller node, while Nova (compute) and Neutron (networking) are on another. This separation can help manage resources more efficiently.
- Resource Considerations: Ensure that each controller node has sufficient resources (CPU, RAM, and disk space) to handle the services running on it. The specific resource requirements depend on the size of your deployment and the services you’re running.
Ideal Timing for Adding Controller Nodes in a Production Environment:
So, what is the ideal time to expand your OpenStack infrastructure?
- Low Traffic Window:
- Plan the addition of new controller nodes during a low-traffic window or a maintenance window to minimize the impact on users.
- Off-Peak Hours:
- If possible, schedule the addition during off-peak hours when user activity is minimal. This helps reduce the impact on ongoing operations.
- Backup and Snapshot:
- Before making any changes, take a backup or snapshot of critical components, including databases and configurations. This ensures a quick recovery in case of unexpected issues.
- Communication:
- Notify users and stakeholders in advance about the planned maintenance window, highlighting potential disruptions and assuring them of the temporary nature of the changes.
- Monitoring and Testing:
- Implement robust monitoring tools to keep a close eye on the existing infrastructure during the addition of new controller nodes. Test the changes in a staging environment before applying them in production.
- Rolling Upgrade:
- If your OpenStack deployment method supports it, consider a rolling upgrade approach. This involves adding new nodes, ensuring they work seamlessly, and then gradually migrating services to the new nodes without disrupting the entire environment.
- Verify High Availability:
- If your OpenStack deployment is designed for high availability, make sure that the addition of new controller nodes aligns with the HA configuration. Verify that services can failover and operate as expected.
- Rollback Plan:
- Have a well-defined rollback plan in case issues arise during the addition of controller nodes. This includes reverting to the previous state and ensuring minimal impact on users.
- Post-Deployment Verification:
- After the new controller nodes are added, perform thorough testing to ensure that OpenStack services are functioning correctly. Monitor the environment to identify and address any issues promptly.
- Documentation:
- Update documentation to reflect the changes made, including details about the new controller nodes. This helps maintain a clear record for future reference.
So, how can you add a controller node(s) into an existing OpenStack using Kolla-Ansible;
Prepare the Nodes for Addition into OpenStack
When using Kolla-Ansible for deployment, most of the pre-requisites will be taken care by Kolla-Ansible.
You however need to do the fresh installation of the OS, initial IP assignent to the node, hostname, creation of first user accounts… (You can automate if you want).
Also, I would recommend that you use same OS version for uniformity across the cluster and easy management. We are running Ubuntu 22.04 LTS
Based on our basic deployment architecture;
------------------+---------------------------------------------+--------------------------------+----------------------------------+
| | | |
+-----------------+-------------------------+ +-------------+-------------+ +------------+--------------+ +-------------+-------------+
| [ Controller Node ] | | [ Compute01 Node ] | | [ Storage01 Node ] | | [ Compute02 Node ] |
| | | | | | | |
| br0: VIP and Mgt IP | | enp1s0: 192.168.200.202 | | enp1s0: 192.168.200.201 | | enp1s0: 192.168.200.203 |
| VIP: 192.168.200.254 | | enp2s0: 10.100.0.110/24 | | | | enp2s0: 10.100.0.111/24 |
| Mgt IP: 192.168.200.200 | +---------------------------+ +---------------------------+ +---------------------------+
| br-ex: Provider Network |
| 10.100.0.100/24 |
+-------------------------------------------+
In this guide, we have added two controller nodes, assigned the IP addresses, create required user account with required sudo rights. Our basic architecture will now look like;
------------------+---------------------------------------------+--------------------------------+----------------------------------+
| | | |
+-----------------+-------------------------+ +-------------+-------------+ +------------+--------------+ +-------------+-------------+
| [ Controller 01 ] | | [ Compute01 Node ] | | [ Storage01 Node ] | | [ Compute02 Node ] |
| | | | | | | |
| br0: VIP and Mgt IP | | enp1s0: 192.168.200.202 | | enp1s0: 192.168.200.201 | | enp1s0: 192.168.200.203 |
| VIP: 192.168.200.254 | | enp2s0: 10.100.0.110/24 | | | | enp2s0: 10.100.0.111/24 |
| Mgt IP: 192.168.200.200 | +---------------------------+ +---------------------------+ +---------------------------+
| br-ex: Provider Network |
| 10.100.0.100/24 |
+-----------------+-------------------------+
|
+-----------------+-------------------------+
| [ Controller 02 ] |
| |
| br0: Mgt IP: 192.168.200.204 |
| br-ex: Provider Network |
| 10.100.0.101/24 |
+-----------------+-------------------------+
|
+-----------------+-------------------------+
| [ Controller 03 ] |
| |
| br0: Mgt IP: 192.168.200.205 |
| br-ex: Provider Network |
| 10.100.0.102/24 |
+-----------------+-------------------------+
Check our controller nodes network configuration.
Are you using Shared Storage for Glance Images?
Well, chances are you are using a shared storage for storing glance images. If by any chance this is the case in your environment, then you need to ensure that all controller nodes have access to the shared storage.
For example, in our demo environment, we are using NFS share for storing Glance images;
(kolla-ansible) kifarunix@controller01:~$ docker inspect glance_api
{
"Type": "bind",
"Source": "/mnt/glance",
"Destination": "/var/lib/glance",
"Mode": "rw",
"RW": true,
"Propagation": "rprivate"
}
],
df -hT -P /mnt/glance
Filesystem Type Size Used Avail Use% Mounted on
192.168.200.201:/mnt/glance nfs4 100G 747M 100G 1% /mnt/glance
Hence, configure the new controller nodes to also access the share by updating their FSTAB files accordingly.
sudo vim /etc/fstab
Add the line to mount the appropriate NFS share;
192.168.200.201:/mnt/glance /mnt/glance nfs _netdev,defaults 0 0
Where:
192.168.200.201:/mnt/glance
: This part indicates the NFS server’s IP address (192.168.200.201) and the exported path (/mnt/glance) on the NFS server./mnt/glance
: This is the local mount point on the current machine where the NFS share will be mounted.nfs
: This specifies the file system type to be mounted, in this case, NFS (Network File System)._netdev
: This option indicates that the filesystem is a network device and should be mounted after the network has been enabled. This is typically used for network file systems like NFS.defaults
: This includes a set of default mount options. The specific default options depend on the operating system, but they typically include options for read/write access, user/group permissions, etc.0
: The dump parameter. It is used by thedump
utility to determine whether the file system needs to be backed up.0
: The pass parameter. It is used by thefsck
utility to determine the order in which file systems should be checked.
Save and exit the file.
Ensure that the NFS client packages are installed on your system.
On Debian/Ubuntu, you might need to install the nfs-common
package:
sudo apt update
sudo apt install nfs-common
On Red Hat-based systems, you might need to install the nfs-utils
package:
sudo yum install nfs-utils
Then mount the share;
sudo mount -a
Copy Deployment User SSH Keys from Control Node to New Controller Node
Your control node is the node where you are running Kolla-ansible. In our setup, we are running Kolla-ansible in our controller01 node.
In regards to the deployment user, if you are not using SSH keys, you need to define the username and password in the multinode configuration file for the new respective controller node to define how Kolla-Ansible will login to configure that respective node.
We are using SSH keys on our guide which we already generated while creating Kolla-Ansible Deployment User Account.
Hence, let just copy the SSH keys into the new controller nodes.
First of all, let’s ensure the new controller nodes are reachable via their hostnames from the control node.
sudo tee -a /etc/hosts << EOL
192.168.200.204 controller02
192.168.200.205 controller03
EOL
Let’s confirm reachability;
ping controller02 -c 4
PING controller02 (192.168.200.204) 56(84) bytes of data.
64 bytes from controller02 (192.168.200.204): icmp_seq=1 ttl=64 time=0.260 ms
64 bytes from controller02 (192.168.200.204): icmp_seq=2 ttl=64 time=0.286 ms
64 bytes from controller02 (192.168.200.204): icmp_seq=3 ttl=64 time=0.334 ms
64 bytes from controller02 (192.168.200.204): icmp_seq=4 ttl=64 time=0.312 ms
--- controller02 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3056ms
rtt min/avg/max/mdev = 0.260/0.298/0.334/0.027 ms
ping controller03 -c 4
PING controller03 (192.168.200.205) 56(84) bytes of data.
64 bytes from controller03 (192.168.200.205): icmp_seq=1 ttl=64 time=0.452 ms
64 bytes from controller03 (192.168.200.205): icmp_seq=2 ttl=64 time=0.397 ms
64 bytes from controller03 (192.168.200.205): icmp_seq=3 ttl=64 time=0.398 ms
64 bytes from controller03 (192.168.200.205): icmp_seq=4 ttl=64 time=0.487 ms
--- controller03 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3076ms
rtt min/avg/max/mdev = 0.397/0.433/0.487/0.038 ms
Next, copy the keys;
for i in 02 03; do ssh-copy-id kifarunix@controller$i; done
Update Kolla-Ansible Inventory
Next, you need to add the new controller nodes in the inventory file. Since we are running a multinode deployment, open the multinode inventory and add your new controller node.
This is a snippet of how our multinode inventory looks like before we add the new controller node;
cat multinode
# These initial groups are the only groups required to be modified. The
# additional groups are for more control of the environment.
[control]
controller01 ansible_connection=local neutron_external_interface=vethext
# The above can also be specified as follows:
#control[01:03] ansible_user=kolla
# The network nodes are where your l3-agent and loadbalancers will run
# This can be the same as a host in the control group
[network]
controller01 ansible_connection=local neutron_external_interface=vethext network_interface=br0
[compute]
compute01 neutron_external_interface=enp2s0 network_interface=enp1s0
compute02 neutron_external_interface=enp2s0 network_interface=enp1s0
[monitoring]
controller01 ansible_connection=local neutron_external_interface=vethext
# When compute nodes and control nodes use different interfaces,
# you need to comment out "api_interface" and other interfaces from the globals.yml
# and specify like below:
#compute01 neutron_external_interface=eth0 api_interface=em1 tunnel_interface=em1
[storage]
storage01 neutron_external_interface=enp10s0 network_interface=enp1s0
[deployment]
localhost ansible_connection=local
[baremetal:children]
control
network
compute
storage
monitoring
[tls-backend:children]
control
# You can explicitly specify which hosts run each project by updating the
# groups in the sections below. Common services are grouped together.
[common:children]
control
network
compute
storage
monitoring
[collectd:children]
compute
[grafana:children]
monitoring
[etcd:children]
control
[influxdb:children]
monitoring
[prometheus:children]
monitoring
[kafka:children]
control
[telegraf:children]
compute
control
monitoring
network
storage
[hacluster:children]
control
[hacluster-remote:children]
compute
[loadbalancer:children]
network
[mariadb:children]
control
[rabbitmq:children]
control
[outward-rabbitmq:children]
control
[monasca-agent:children]
compute
control
monitoring
network
storage
[monasca:children]
monitoring
[storm:children]
monitoring
[keystone:children]
control
[glance:children]
control
[nova:children]
control
[neutron:children]
network
[openvswitch:children]
network
compute
manila-share
[cinder:children]
control
[cloudkitty:children]
control
[freezer:children]
control
[memcached:children]
control
[horizon:children]
control
[swift:children]
control
[barbican:children]
control
[heat:children]
control
[murano:children]
control
[solum:children]
control
[ironic:children]
control
[magnum:children]
control
[sahara:children]
control
[mistral:children]
control
[manila:children]
control
[ceilometer:children]
control
[aodh:children]
control
[cyborg:children]
control
compute
[gnocchi:children]
control
[tacker:children]
control
[trove:children]
control
[senlin:children]
control
[vitrage:children]
control
[watcher:children]
control
[octavia:children]
control
[designate:children]
control
[placement:children]
control
[bifrost:children]
deployment
[zookeeper:children]
control
[zun:children]
control
[skyline:children]
control
[redis:children]
control
[blazar:children]
control
[venus:children]
monitoring
# Additional control implemented here. These groups allow you to control which
# services run on which hosts at a per-service level.
#
# Word of caution: Some services are required to run on the same host to
# function appropriately. For example, neutron-metadata-agent must run on the
# same host as the l3-agent and (depending on configuration) the dhcp-agent.
# Common
[cron:children]
common
[fluentd:children]
common
[kolla-logs:children]
common
[kolla-toolbox:children]
common
[opensearch:children]
control
# Opensearch dashboards
[opensearch-dashboards:children]
opensearch
# Glance
[glance-api:children]
glance
# Nova
[nova-api:children]
nova
[nova-conductor:children]
nova
[nova-super-conductor:children]
nova
[nova-novncproxy:children]
nova
[nova-scheduler:children]
nova
[nova-spicehtml5proxy:children]
nova
[nova-compute-ironic:children]
nova
[nova-serialproxy:children]
nova
# Neutron
[neutron-server:children]
control
[neutron-dhcp-agent:children]
neutron
[neutron-l3-agent:children]
neutron
[neutron-metadata-agent:children]
neutron
[neutron-ovn-metadata-agent:children]
compute
network
[neutron-bgp-dragent:children]
neutron
[neutron-infoblox-ipam-agent:children]
neutron
[neutron-metering-agent:children]
neutron
[ironic-neutron-agent:children]
neutron
[neutron-ovn-agent:children]
compute
network
# Cinder
[cinder-api:children]
cinder
[cinder-backup:children]
storage
[cinder-scheduler:children]
cinder
[cinder-volume:children]
storage
# Cloudkitty
[cloudkitty-api:children]
cloudkitty
[cloudkitty-processor:children]
cloudkitty
# Freezer
[freezer-api:children]
freezer
[freezer-scheduler:children]
freezer
# iSCSI
[iscsid:children]
compute
storage
ironic
[tgtd:children]
storage
# Manila
[manila-api:children]
manila
[manila-scheduler:children]
manila
[manila-share:children]
network
[manila-data:children]
manila
# Swift
[swift-proxy-server:children]
swift
[swift-account-server:children]
storage
[swift-container-server:children]
storage
[swift-object-server:children]
storage
# Barbican
[barbican-api:children]
barbican
[barbican-keystone-listener:children]
barbican
[barbican-worker:children]
barbican
# Heat
[heat-api:children]
heat
[heat-api-cfn:children]
heat
[heat-engine:children]
heat
# Murano
[murano-api:children]
murano
[murano-engine:children]
murano
# Monasca
[monasca-agent-collector:children]
monasca-agent
[monasca-agent-forwarder:children]
monasca-agent
[monasca-agent-statsd:children]
monasca-agent
[monasca-api:children]
monasca
[monasca-log-persister:children]
monasca
[monasca-log-metrics:children]
monasca
[monasca-thresh:children]
monasca
[monasca-notification:children]
monasca
[monasca-persister:children]
monasca
# Storm
[storm-worker:children]
storm
[storm-nimbus:children]
storm
# Ironic
[ironic-api:children]
ironic
[ironic-conductor:children]
ironic
[ironic-inspector:children]
ironic
[ironic-tftp:children]
ironic
[ironic-http:children]
ironic
# Magnum
[magnum-api:children]
magnum
[magnum-conductor:children]
magnum
# Sahara
[sahara-api:children]
sahara
[sahara-engine:children]
sahara
# Solum
[solum-api:children]
solum
[solum-worker:children]
solum
[solum-deployer:children]
solum
[solum-conductor:children]
solum
[solum-application-deployment:children]
solum
[solum-image-builder:children]
solum
# Mistral
[mistral-api:children]
mistral
[mistral-executor:children]
mistral
[mistral-engine:children]
mistral
[mistral-event-engine:children]
mistral
# Ceilometer
[ceilometer-central:children]
ceilometer
[ceilometer-notification:children]
ceilometer
[ceilometer-compute:children]
compute
[ceilometer-ipmi:children]
compute
# Aodh
[aodh-api:children]
aodh
[aodh-evaluator:children]
aodh
[aodh-listener:children]
aodh
[aodh-notifier:children]
aodh
# Cyborg
[cyborg-api:children]
cyborg
[cyborg-agent:children]
compute
[cyborg-conductor:children]
cyborg
# Gnocchi
[gnocchi-api:children]
gnocchi
[gnocchi-statsd:children]
gnocchi
[gnocchi-metricd:children]
gnocchi
# Trove
[trove-api:children]
trove
[trove-conductor:children]
trove
[trove-taskmanager:children]
trove
# Multipathd
[multipathd:children]
compute
storage
# Watcher
[watcher-api:children]
watcher
[watcher-engine:children]
watcher
[watcher-applier:children]
watcher
# Senlin
[senlin-api:children]
senlin
[senlin-conductor:children]
senlin
[senlin-engine:children]
senlin
[senlin-health-manager:children]
senlin
# Octavia
[octavia-api:children]
octavia
[octavia-driver-agent:children]
octavia
[octavia-health-manager:children]
octavia
[octavia-housekeeping:children]
octavia
[octavia-worker:children]
octavia
# Designate
[designate-api:children]
designate
[designate-central:children]
designate
[designate-producer:children]
designate
[designate-mdns:children]
network
[designate-worker:children]
designate
[designate-sink:children]
designate
[designate-backend-bind9:children]
designate
# Placement
[placement-api:children]
placement
# Zun
[zun-api:children]
zun
[zun-wsproxy:children]
zun
[zun-compute:children]
compute
[zun-cni-daemon:children]
compute
# Skyline
[skyline-apiserver:children]
skyline
[skyline-console:children]
skyline
# Tacker
[tacker-server:children]
tacker
[tacker-conductor:children]
tacker
# Vitrage
[vitrage-api:children]
vitrage
[vitrage-notifier:children]
vitrage
[vitrage-graph:children]
vitrage
[vitrage-ml:children]
vitrage
[vitrage-persistor:children]
vitrage
# Blazar
[blazar-api:children]
blazar
[blazar-manager:children]
blazar
# Prometheus
[prometheus-node-exporter:children]
monitoring
control
compute
network
storage
[prometheus-mysqld-exporter:children]
mariadb
[prometheus-haproxy-exporter:children]
loadbalancer
[prometheus-memcached-exporter:children]
memcached
[prometheus-cadvisor:children]
monitoring
control
compute
network
storage
[prometheus-alertmanager:children]
monitoring
[prometheus-openstack-exporter:children]
monitoring
[prometheus-elasticsearch-exporter:children]
opensearch
[prometheus-blackbox-exporter:children]
monitoring
[prometheus-libvirt-exporter:children]
compute
[prometheus-msteams:children]
prometheus-alertmanager
[masakari-api:children]
control
[masakari-engine:children]
control
[masakari-hostmonitor:children]
control
[masakari-instancemonitor:children]
compute
[ovn-controller:children]
ovn-controller-compute
ovn-controller-network
[ovn-controller-compute:children]
compute
[ovn-controller-network:children]
network
[ovn-database:children]
control
[ovn-northd:children]
ovn-database
[ovn-nb-db:children]
ovn-database
[ovn-sb-db:children]
ovn-database
[venus-api:children]
venus
[venus-manager:children]
venus
So, we will update the [control] group to add our new nodes such that the configuration looks like;
vim multinode
See controller[02:03]
.
# These initial groups are the only groups required to be modified. The
# additional groups are for more control of the environment.
[control]
controller01 ansible_connection=local neutron_external_interface=vethext
control[02:03] neutron_external_interface=vethext
# The network nodes are where your l3-agent and loadbalancers will run
# This can be the same as a host in the control group
[network]
controller01 ansible_connection=local neutron_external_interface=vethext network_interface=br0
controller[02:03] neutron_external_interface=vethext network_interface=br0
[compute]
compute01 neutron_external_interface=enp2s0 network_interface=enp1s0
compute02 neutron_external_interface=enp2s0 network_interface=enp1s0
[monitoring]
controller01 ansible_connection=local neutron_external_interface=vethext
controller[02:03] neutron_external_interface=vethext
# When controller nodes and control nodes use different interfaces,
# you need to comment out "api_interface" and other interfaces from the globals.yml
# and specify like below:
#compute01 neutron_external_interface=eth0 api_interface=em1 tunnel_interface=em1
[storage]
storage01 neutron_external_interface=enp10s0 network_interface=enp1s0
[deployment]
localhost ansible_connection=local
[baremetal:children]
control
network
compute
storage
monitoring
[tls-backend:children]
control
# You can explicitly specify which hosts run each project by updating the
# groups in the sections below. Common services are grouped together.
[common:children]
control
network
compute
storage
monitoring
...
Enable HAProxy for High Availability and Load Balancing
The globals.yaml
file in Kolla-Ansible is a configuration file where you can set various global parameters for your OpenStack deployment. The parameter enable_haproxy
in globals.yaml
is used to specify whether or not HAProxy should be enabled as part of the deployment.
Set the value of enable_haproxy to yes to configure HAProxy to provide load balancing and high availability for your OpenStack services across the controller nodes.
vim /etc/kolla/globals.yml
...
enable_haproxy: "yes"
enable_keepalived: "{{ enable_haproxy | bool }}"
...
Save and exit the file.
The above configuration basically sets the value of enable_keepalived
to a boolean indicating that if HAProxy is enabled (enable_haproxy
is “yes”), enable_keepalived
will be True
; otherwise, it will be False
.
While Keepalived manages virtual IP addresses associated with the active controller node using VRRP (Virtual Router Redundancy Protocol), HAProxy load-balances traffic to service backends.
HAProxy regularly checks the health of each controller node by sending health-check requests. If a node becomes unreachable or fails these health checks, HAProxy stops directing traffic to that node.
Keepalived continuously monitors the health of the active controller node. If it detects a failure (for example, if the controller node becomes unreachable), Keepalived triggers a failover process and automatically transfers the VIP to one of the standby controller nodes that are still healthy based on their priority setting.
You can check the configuration of HAProxy on /etc/kolla/haproxy/haproxy.cfg
and configuration for KeepAlived on /etc/kolla/keepalived/keepalived.conf
.
Activate Kolla-Ansible Virtual Environment
Activate your respective virtual environment;
source ~/kolla-ansible/bin/activate
Test Connectivity to the Node
Execute the Ansible command below to check the reachability of node in your inventory using the Ansible ping
module.
ansible -i multinode -m ping controller02,controller03
Sample output;
controller03 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
controller02 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
Bootstrap the new Controller Nodes
You need to bootstrap the controller nodes with kolla deploy dependencies by running the command below.
kolla-ansible -i <inventory> bootstrap-servers [ --limit <limit> ]
Replace the <inventory>
with your inventory file. When adding controller nodes, <limit>
needs to specify the controller nodes group, in this case control, since you need the boostrap command to generate the Fernet keys that are used for encrypting tokens in Keystone and provide security for authentication tokens and then distributed them across the Keystone hosts/controller nodes to ensure consistency.
Be cautious about re-bootstrapping a cloud that has already been boostrapped. See some considerations for reboostrapping.
Thus, our command will look like;
kolla-ansible -i multinode bootstrap-servers --limit control
This command may restart docker containers in other nodes. Hence, also check the ideal time to add controller nodes into production deployment as outlined above.
If at some point during the deployment, MariaDB stucks at starting after a restart;
docker ps | grep mariadb
fa28f34cbad9 quay.io/openstack.kolla/mariadb-server:2023.1-ubuntu-jammy "dumb-init -- kolla_…" 3 hours ago Up 24 seconds (health: starting) mariadb
And such errors appear on the logs;
tail -f /var/log/kolla/mariadb/mariadb.log
2023-11-12 8:19:18 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50534S), skipping check
2023-11-12 8:19:47 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2023-11-12 8:19:47 0 [Note] WSREP: view((empty))
2023-11-12 8:19:47 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at ./gcomm/src/pc.cpp:connect():160
2023-11-12 8:19:47 0 [ERROR] WSREP: ./gcs/src/gcs_core.cpp:gcs_core_open():221: Failed to open backend connection: -110 (Connection timed out)
2023-11-12 8:19:48 0 [ERROR] WSREP: ./gcs/src/gcs.cpp:gcs_open():1669: Failed to open channel 'openstack' at 'gcomm://192.168.200.200:4567,192.168.200.204:4567,192.168.200.205:4567': -110 (Connection timed out)
2023-11-12 8:19:48 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2023-11-12 8:19:48 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.200.200:4567,192.168.200.204:4567,192.168.200.205:4567) failed: 7
2023-11-12 8:19:48 0 [ERROR] Aborting
231112 08:19:48 mysqld_safe mysqld from pid file /var/lib/mysql/mariadb.pid ended
You can execure the database recovery;
kolla-ansible -i multinode mariadb_recovery
Once that is completed, you can re-run the deployment command.
Run Pre-Deployment Checks on the New Controller Nodes
Next, run pre-deployment checks for node;
kolla-ansible -i multinode prechecks --limit controller02,controller03
Deploy Required Services Docker Containers on the New Controller Nodes
Next, deploy Docker containers for the required services on the new controller nodes.
To begin with, download the container images into the host
kolla-ansible -i multinode pull --limit controller02,controller03
When the command completes, you can list the container images on the node.
You can login to the node and check list images;
docker images
or just list them using ansible from the control node;
ansible -i multinode -m raw -a "sudo docker images" controller02
Sample output;
controller02 | CHANGED | rc=0 >>
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/openstack.kolla/cinder-scheduler 2023.1-ubuntu-jammy 41d2c120dcaf 5 hours ago 1.33GB
quay.io/openstack.kolla/cinder-api 2023.1-ubuntu-jammy dc89b0b74a64 5 hours ago 1.33GB
quay.io/openstack.kolla/neutron-server 2023.1-ubuntu-jammy dd92163f698e 5 hours ago 1.05GB
quay.io/openstack.kolla/neutron-l3-agent 2023.1-ubuntu-jammy 47e8b6290070 5 hours ago 1.05GB
quay.io/openstack.kolla/aodh-listener 2023.1-ubuntu-jammy 4f103cd2aa96 5 hours ago 892MB
quay.io/openstack.kolla/aodh-notifier 2023.1-ubuntu-jammy 577c5461b901 5 hours ago 892MB
quay.io/openstack.kolla/aodh-api 2023.1-ubuntu-jammy 80311624cca1 5 hours ago 892MB
quay.io/openstack.kolla/aodh-evaluator 2023.1-ubuntu-jammy 82850f3f9b82 5 hours ago 892MB
quay.io/openstack.kolla/neutron-dhcp-agent 2023.1-ubuntu-jammy 3c9ab4533eb5 5 hours ago 1.04GB
quay.io/openstack.kolla/keystone 2023.1-ubuntu-jammy d4c445793a75 5 hours ago 942MB
quay.io/openstack.kolla/keystone-ssh 2023.1-ubuntu-jammy 946e8484c8a5 5 hours ago 948MB
quay.io/openstack.kolla/neutron-openvswitch-agent 2023.1-ubuntu-jammy 8c8468ded47c 5 hours ago 1.04GB
quay.io/openstack.kolla/neutron-metadata-agent 2023.1-ubuntu-jammy 87d457f01029 5 hours ago 1.04GB
quay.io/openstack.kolla/keystone-fernet 2023.1-ubuntu-jammy 0d92e47e97b5 5 hours ago 946MB
quay.io/openstack.kolla/nova-novncproxy 2023.1-ubuntu-jammy 14934aa7795f 5 hours ago 1.22GB
quay.io/openstack.kolla/placement-api 2023.1-ubuntu-jammy fb19639b8875 5 hours ago 894MB
quay.io/openstack.kolla/horizon 2023.1-ubuntu-jammy 2c893a67ae9f 5 hours ago 1.11GB
quay.io/openstack.kolla/heat-api-cfn 2023.1-ubuntu-jammy 7ec11e8fefb9 5 hours ago 960MB
quay.io/openstack.kolla/heat-api 2023.1-ubuntu-jammy e925a6c8b000 5 hours ago 960MB
quay.io/openstack.kolla/heat-engine 2023.1-ubuntu-jammy 048d6ebe056e 5 hours ago 960MB
quay.io/openstack.kolla/nova-scheduler 2023.1-ubuntu-jammy ab1b447c7021 5 hours ago 1.11GB
quay.io/openstack.kolla/nova-conductor 2023.1-ubuntu-jammy ca809956e88b 5 hours ago 1.11GB
quay.io/openstack.kolla/nova-api 2023.1-ubuntu-jammy 137618290a77 5 hours ago 1.11GB
quay.io/openstack.kolla/glance-api 2023.1-ubuntu-jammy 4e47347129e3 5 hours ago 1.04GB
quay.io/openstack.kolla/zun-wsproxy 2023.1-ubuntu-jammy 6004d566ab20 5 hours ago 1.01GB
quay.io/openstack.kolla/zun-api 2023.1-ubuntu-jammy f93535599eba 5 hours ago 1.01GB
quay.io/openstack.kolla/gnocchi-api 2023.1-ubuntu-jammy 4e8e97ea301a 5 hours ago 1.07GB
quay.io/openstack.kolla/gnocchi-metricd 2023.1-ubuntu-jammy c8cefb46f796 5 hours ago 1.07GB
quay.io/openstack.kolla/gnocchi-statsd 2023.1-ubuntu-jammy b6b3c5404e55 5 hours ago 1.07GB
quay.io/openstack.kolla/ceilometer-central 2023.1-ubuntu-jammy 52e724bda206 5 hours ago 895MB
quay.io/openstack.kolla/ceilometer-notification 2023.1-ubuntu-jammy 24affe5317f4 5 hours ago 895MB
quay.io/openstack.kolla/kolla-toolbox 2023.1-ubuntu-jammy 47fb2da898a6 5 hours ago 819MB
quay.io/openstack.kolla/mariadb-server 2023.1-ubuntu-jammy 08f31f386d7b 5 hours ago 605MB
quay.io/openstack.kolla/mariadb-clustercheck 2023.1-ubuntu-jammy df9693cf6795 5 hours ago 322MB
quay.io/openstack.kolla/prometheus-blackbox-exporter 2023.1-ubuntu-jammy 3786777ba92f 5 hours ago 278MB
quay.io/openstack.kolla/prometheus-alertmanager 2023.1-ubuntu-jammy 9dc45a653636 5 hours ago 314MB
quay.io/openstack.kolla/prometheus-node-exporter 2023.1-ubuntu-jammy 29854ef51590 5 hours ago 276MB
quay.io/openstack.kolla/prometheus-memcached-exporter 2023.1-ubuntu-jammy bccdd901d45b 5 hours ago 271MB
quay.io/openstack.kolla/prometheus-mysqld-exporter 2023.1-ubuntu-jammy 142d34487ae1 5 hours ago 272MB
quay.io/openstack.kolla/prometheus-openstack-exporter 2023.1-ubuntu-jammy 00708fff486b 5 hours ago 267MB
quay.io/openstack.kolla/prometheus-cadvisor 2023.1-ubuntu-jammy 57e64a35a36d 5 hours ago 295MB
quay.io/openstack.kolla/prometheus-haproxy-exporter 2023.1-ubuntu-jammy 8da898f5509f 5 hours ago 272MB
quay.io/openstack.kolla/prometheus-v2-server 2023.1-ubuntu-jammy 34d5a456575e 5 hours ago 469MB
quay.io/openstack.kolla/openvswitch-vswitchd 2023.1-ubuntu-jammy da575b11b8d8 5 hours ago 274MB
quay.io/openstack.kolla/grafana 2023.1-ubuntu-jammy 9485136dab77 5 hours ago 673MB
quay.io/openstack.kolla/openvswitch-db-server 2023.1-ubuntu-jammy 45b86c2748cb 5 hours ago 274MB
quay.io/openstack.kolla/fluentd 2023.1-ubuntu-jammy f0b67ea51417 5 hours ago 529MB
quay.io/openstack.kolla/keepalived 2023.1-ubuntu-jammy 34c6f0412dca 5 hours ago 269MB
quay.io/openstack.kolla/rabbitmq 2023.1-ubuntu-jammy a9e21182aae7 5 hours ago 314MB
quay.io/openstack.kolla/cron 2023.1-ubuntu-jammy f0bc61f82d64 5 hours ago 258MB
quay.io/openstack.kolla/memcached 2023.1-ubuntu-jammy 366a6292a66e 5 hours ago 259MB
quay.io/openstack.kolla/etcd 2023.1-ubuntu-jammy 37bc7fc289fe 5 hours ago 297MB
quay.io/openstack.kolla/haproxy 2023.1-ubuntu-jammy b64892cee984 5 hours ago 266MB
Deploy containers on the new controller nodes. Again, you have to specify all the controller nodes here. Thus, –limit control.
kolla-ansible -i multinode deploy --limit control
Ensure there is no error. If any, fix it and proceed.
Verify New Node Addition to OpenStack
You can now verify if the new controller nodes have been successfully added into OpenStack cluster.
To begin with, you can list Docker containers running on the node (the command below is executed from the control node);
ansible -i multinode -m raw -a "sudo docker ps" controller02
sample output;
controller02 | CHANGED | rc=0 >>
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ae81cdd54526 quay.io/openstack.kolla/zun-wsproxy:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) zun_wsproxy
559251e9067c quay.io/openstack.kolla/zun-api:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) zun_api
703c89f0d130 quay.io/openstack.kolla/grafana:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours grafana
d55aa3751b1d quay.io/openstack.kolla/aodh-notifier:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) aodh_notifier
b7f1ade8cf50 quay.io/openstack.kolla/aodh-listener:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) aodh_listener
5a2f03854727 quay.io/openstack.kolla/aodh-evaluator:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) aodh_evaluator
ca5054ffcb1c quay.io/openstack.kolla/aodh-api:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) aodh_api
b3fe5de9c0d7 quay.io/openstack.kolla/ceilometer-central:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (unhealthy) ceilometer_central
96e41a027685 quay.io/openstack.kolla/ceilometer-notification:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) ceilometer_notification
407a3b083aa9 quay.io/openstack.kolla/gnocchi-statsd:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) gnocchi_statsd
8847afd527c6 quay.io/openstack.kolla/gnocchi-metricd:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) gnocchi_metricd
6409e89ac7c9 quay.io/openstack.kolla/gnocchi-api:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) gnocchi_api
9f89e8ea58cd quay.io/openstack.kolla/horizon:2023.1-ubuntu-jammy "dumb-init --single-…" 2 hours ago Up 2 hours (healthy) horizon
853c7bbaea53 quay.io/openstack.kolla/mariadb-server:2023.1-ubuntu-jammy "dumb-init -- kolla_…" 2 hours ago Up 2 hours (healthy) mariadb
76651604bdf0 quay.io/openstack.kolla/heat-engine:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) heat_engine
c6ef2875a2b4 quay.io/openstack.kolla/heat-api-cfn:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) heat_api_cfn
12dd70306101 quay.io/openstack.kolla/heat-api:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) heat_api
2cddac565e6d quay.io/openstack.kolla/neutron-metadata-agent:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) neutron_metadata_agent
ae50f1adecbe quay.io/openstack.kolla/neutron-l3-agent:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) neutron_l3_agent
a14026c69184 quay.io/openstack.kolla/neutron-dhcp-agent:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) neutron_dhcp_agent
803595b10a5f quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) neutron_openvswitch_agent
bc1c0f6b268d quay.io/openstack.kolla/neutron-server:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) neutron_server
c0cd45da353d quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) openvswitch_vswitchd
4d452111b3ba quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) openvswitch_db
33cb79ec0a55 quay.io/openstack.kolla/nova-novncproxy:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) nova_novncproxy
6ff84d56ce4c quay.io/openstack.kolla/nova-conductor:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) nova_conductor
a6d168bca4c9 quay.io/openstack.kolla/nova-api:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) nova_api
e338faeeeee0 quay.io/openstack.kolla/nova-scheduler:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) nova_scheduler
9988ebb9618a quay.io/openstack.kolla/placement-api:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) placement_api
9a6e5585150f quay.io/openstack.kolla/cinder-scheduler:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) cinder_scheduler
77d6e4115252 quay.io/openstack.kolla/cinder-api:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) cinder_api
e0590534d6c8 quay.io/openstack.kolla/glance-api:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) glance_api
e08bb362be08 quay.io/openstack.kolla/keystone:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) keystone
21534f07df6a quay.io/openstack.kolla/keystone-fernet:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) keystone_fernet
06726634c6eb quay.io/openstack.kolla/keystone-ssh:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) keystone_ssh
8fba633812b7 quay.io/openstack.kolla/etcd:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours etcd
bce5a05dcbba quay.io/openstack.kolla/rabbitmq:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) rabbitmq
f1ff17d3b41a quay.io/openstack.kolla/prometheus-blackbox-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_blackbox_exporter
5d900639936d quay.io/openstack.kolla/prometheus-openstack-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_openstack_exporter
0f5bd3e873b0 quay.io/openstack.kolla/prometheus-alertmanager:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_alertmanager
aa9ef3408ee0 quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_cadvisor
85443d572966 quay.io/openstack.kolla/prometheus-memcached-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_memcached_exporter
77df867d2b80 quay.io/openstack.kolla/prometheus-haproxy-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_haproxy_exporter
f4e0e360bdad quay.io/openstack.kolla/prometheus-mysqld-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_mysqld_exporter
1394ba79ee7c quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_node_exporter
43136ca82078 quay.io/openstack.kolla/prometheus-v2-server:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours prometheus_server
87595629490c quay.io/openstack.kolla/memcached:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) memcached
184454c7d93b quay.io/openstack.kolla/mariadb-clustercheck:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours mariadb_clustercheck
27690e469dc2 quay.io/openstack.kolla/keepalived:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours keepalived
0d61e2f1edc6 quay.io/openstack.kolla/haproxy:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours (healthy) haproxy
b9477be14773 quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours cron
291f972ce16c quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours kolla_toolbox
9e3bffb135fa quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy "dumb-init --single-…" 7 hours ago Up 2 hours fluentd
Well, I am not sure there is an easy way to list the controller nodes, -:). But from what I can see, they are listed under internal availability zone.
Load the credentials;
source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh
Let’s see;
openstack availability zone list --compute --long
+-----------+-------------+---------------+--------------+----------------+----------------------------------------+
| Zone Name | Zone Status | Zone Resource | Host Name | Service Name | Service Status |
+-----------+-------------+---------------+--------------+----------------+----------------------------------------+
| internal | available | | controller02 | nova-scheduler | enabled :-) 2023-11-14T15:22:10.000000 |
| internal | available | | controller02 | nova-conductor | enabled :-) 2023-11-14T15:22:14.000000 |
| internal | available | | controller03 | nova-scheduler | enabled :-) 2023-11-14T15:22:11.000000 |
| internal | available | | controller03 | nova-conductor | enabled :-) 2023-11-14T15:22:11.000000 |
| internal | available | | controller01 | nova-scheduler | enabled :-) 2023-11-14T15:22:13.000000 |
| internal | available | | controller01 | nova-conductor | enabled :-) 2023-11-14T15:22:12.000000 |
| nova | available | | compute02 | nova-compute | enabled :-) 2023-11-14T15:22:09.000000 |
| nova | available | | compute01 | nova-compute | enabled :-) 2023-11-14T15:22:07.000000 |
+-----------+-------------+---------------+--------------+----------------+----------------------------------------+
If you can also list compute services, you will see some services distributed across controller nodes!
openstack compute service list
Output;
+--------------------------------------+----------------+--------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+--------------------------------------+----------------+--------------+----------+---------+-------+----------------------------+
| b4d36484-cd27-4f5b-bf18-c93d7184890d | nova-scheduler | controller01 | internal | enabled | up | 2023-11-12T15:42:45.000000 |
| 7be8d02c-8a76-48a0-bfa0-3edfc583bc9c | nova-scheduler | controller02 | internal | enabled | up | 2023-11-12T15:42:44.000000 |
| 24253619-07d6-479f-975c-b2876c81d12f | nova-scheduler | controller03 | internal | enabled | up | 2023-11-12T15:42:42.000000 |
| 5efddcac-fdf4-4ce3-8843-43e1784dc8d2 | nova-conductor | controller01 | internal | enabled | up | 2023-11-12T15:42:48.000000 |
| 77ea6f87-5144-476b-9685-ebb6f1765b09 | nova-compute | compute02 | nova | enabled | up | 2023-11-12T15:42:42.000000 |
| 546d891d-04f8-41f7-b2a6-714569e4bc52 | nova-compute | compute01 | nova | enabled | up | 2023-11-12T15:42:46.000000 |
| 3d1fa2a8-0637-4c39-9941-d80a011a6be1 | nova-conductor | controller03 | internal | enabled | up | 2023-11-12T15:42:42.000000 |
| 22df4a17-2d4b-4413-b81e-32856c5ba2c5 | nova-conductor | controller02 | internal | enabled | up | 2023-11-12T15:42:42.000000 |
+--------------------------------------+----------------+--------------+----------+---------+-------+----------------------------+
You can see some services are distributed across the controller nodes.
Testing Virtual IP (VIP) high availability on Controller Nodes
At this point, you might need to simulate a scenario whereby the active node (containing the VIP) fails, and verifying that the failover process works as expected, with minimal or no disruption to the services.
Each of our three controller nodes, have their priority numbers defined.
What is a priority number in Keepalived? In Keepalived, the priority
number is used to determine the priority of a node in a High Availability (HA) setup. The node with the highest priority is typically chosen as the master (active) node, and in the event of a failure, the node with the next highest priority becomes the master. The priority is an integer value within the range of 0 to 255, and the node with the highest priority is considered the most preferred for the master role.
Let’s check our nodes priorities;
Controller01;
sudo grep priority /etc/kolla/keepalived/keepalived.conf
priority 1
Controller02;
sudo grep priority /etc/kolla/keepalived/keepalived.conf
priority 2
Controller03;
sudo grep priority /etc/kolla/keepalived/keepalived.conf
priority 3
From the commands output above, controller03 is the most preferred node and for our case, it currently has the VIP assigned;
root@controller03:~# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
enp1s0 UP
enp7s0 UP
br-ex UP 10.100.0.102/24 fe80::c866:d5ff:feba:3cf8/64
br0 UP 192.168.200.205/24 192.168.200.254/32 fe80::f04e:aaff:fe5c:90f9/64
vethext@vethint UP fe80::44ea:9aff:fe7f:b1e0/64
vethint@vethext UP
ovs-system DOWN
br-int DOWN
br-tun DOWN
root@controller03:~#
So, there are multiple ways to simulate the failover here. For example, you can pause node from the virtualization host, you can take down the network interface, or anything to make sure it temporarily becomes unreachable. If you are doing this in production environment, be cautious!!
So, before I take down controller03, I will tail the logs on controller01 and controller02;
Controller02;
root@controller02:~# docker logs --tail 10 -f keepalived
Controller01;
kifarunix@controller01:~$ docker logs --tail 10 -f keepalived
Now, temporarily taking down controller03;
Logs on Controller02 (see the last line, Sun Nov 12 16:46:39 2023: (kolla_internal_vip_51) Entering MASTER STATE) with higher priority than controller01;
Sun Nov 12 12:58:27 2023: Reset ARP config counter 0
Sun Nov 12 12:58:27 2023: Original arp_ignore 0
Sun Nov 12 12:58:27 2023: Original arp_filter 0
Sun Nov 12 12:58:27 2023: Original promote_secondaries 1
Sun Nov 12 12:58:27 2023: Reset promote_secondaries counter 0
Sun Nov 12 12:58:27 2023: Script `check_alive` now returning 1
Sun Nov 12 12:58:27 2023: VRRP_Script(check_alive) failed (exited with status 1)
Sun Nov 12 12:58:31 2023: Script `check_alive` now returning 0
Sun Nov 12 12:58:49 2023: VRRP_Script(check_alive) succeeded
Sun Nov 12 12:58:49 2023: (kolla_internal_vip_51) Entering BACKUP STATE
Sun Nov 12 16:46:39 2023: (kolla_internal_vip_51) Entering MASTER STATE
Nothing much on controller01.
Hence, controller02 should now be having the VIP;
root@controller02:~# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
enp1s0 UP
enp7s0 UP
br-ex UP 10.100.0.101/24 fe80::5cfc:faff:fe5e:9a93/64
br0 UP 192.168.200.204/24 192.168.200.254/32 fe80::b04a:bbff:fece:1c6/64
vethext@vethint UP fe80::d458:43ff:fe9c:23a2/64
vethint@vethext UP
ovs-system DOWN
br-int DOWN
br-tun DOWN
vxlan_sys_4789 UNKNOWN fe80::98c7:18ff:fe88:4f14/64
And all services are working as expected for me! Hence, at this point, I believe it is good to conclude that the three node controller cluster is working as expected.
When controller03, with high priority is back up, it will assume the master state.
Controller02 Keepalived logs;
Sun Nov 12 16:56:52 2023: (kolla_internal_vip_51) Master received advert from 192.168.200.205 with higher priority 3, ours 2
Sun Nov 12 16:56:52 2023: (kolla_internal_vip_51) Entering BACKUP STATE
Heads up! Before you can conclude that your cluster is working as expected, perform thorough testing!
Define Preferred Master Controller Node
To define a preferred master controller node in a Keepalived setup, you typically need to adjust the priority configuration for the nodes. In Keepalived, the node with the highest priority is elected as the master.
Here is how to define preferred master controller node:
Identify the Keepalived configuration file on each controller node. In our Kolla-ansible deployment, the Keepalived configuration is found under /etc/kolla
.
/etc/kolla/keepalived/keepalived.conf
is the configuration.
Edit the Keepalived configuration file on the node you want to set as the preferred master. Look for the vrrp_instance
section, and specifically, the priority
parameter.
vim /etc/kolla/keepalived/keepalived.conf
Change the priority
value to a higher number for the preferred master node. Nodes with higher priority values will be preferred for the master role. We set the value of controller02 priority to 20.
vrrp_script check_alive {
script "/check_alive.sh"
interval 2
fall 2
rise 10
}
vrrp_instance kolla_internal_vip_51 {
state BACKUP
nopreempt
interface br0
virtual_router_id 51
priority 20
advert_int 1
virtual_ipaddress {
192.168.200.254 dev br0
}
authentication {
auth_type PASS
auth_pass 9MC1BOSy764sBcZFHxzniiLwMrBz6iz3HiRcOLWv
}
track_script {
check_alive
}
}
Save the changes to the configuration file.
Restart Keepalived on the node where you made the configuration changes:
sudo docker restart keepalived
Check the logs or status to confirm that Keepalived has successfully restarted.
On the node that currently has VIP, you can restart Keepalived to relieve it of master role.
Check that the controller node with high priority has assumed the master role:
Sample logs on my controller node;
docker logs --tail 5 -f keepalived
Wed Nov 15 05:19:20 2023: Original arp_filter 0
Wed Nov 15 05:19:20 2023: Original promote_secondaries 1
Wed Nov 15 05:19:20 2023: Reset promote_secondaries counter 0
Wed Nov 15 05:19:20 2023: VRRP_Script(check_alive) succeeded
Wed Nov 15 05:19:20 2023: (kolla_internal_vip_51) Entering BACKUP STATE
Wed Nov 15 05:19:39 2023: (kolla_internal_vip_51) Entering MASTER STATE
Check the Virtual IP (VIP) to ensure it is associated with the preferred master node:
root@controller02:~# ip a
...
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
link/ether 52:54:00:0c:c4:bf brd ff:ff:ff:ff:ff:ff
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-ex state UP group default qlen 1000
link/ether 52:54:00:5c:50:31 brd ff:ff:ff:ff:ff:ff
4: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 5e:fc:fa:5e:9a:93 brd ff:ff:ff:ff:ff:ff
inet 10.100.0.101/24 brd 10.100.0.255 scope global br-ex
valid_lft forever preferred_lft forever
inet6 fe80::5cfc:faff:fe5e:9a93/64 scope link
valid_lft forever preferred_lft forever
5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether b2:4a:bb:ce:01:c6 brd ff:ff:ff:ff:ff:ff
inet 192.168.200.204/24 brd 192.168.200.255 scope global br0
valid_lft forever preferred_lft forever
inet 192.168.200.254/32 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::b04a:bbff:fece:1c6/64 scope link
valid_lft forever preferred_lft forever
...
Remember to repeat these steps for each controller node where you want to set or adjust the priority. Adjust the priority
values based on your desired preference.
And that concludes our guide on how to add controller nodes into existing Openstack cluster using Kolla-Ansible.