Extend OpenShift CoreOS /sysroot Root Filesystem

|
Published:
|
|
Extend OpenShift CoreOS /sysroot Root Filesystem

In this tutorial, you will learn on how to extend OpenShift CoreOS /sysroot root filesystem. Extending the OpenShift CoreOS /sysroot filesystem is a critical task for administrators running OpenShift clusters. OpenShift CoreOS (RHCOS) uses an immutable root filesystem (/sysroot), which can pose challenges when additional storage is needed for applications, logs, or system updates. This guide provides a step-by-step process to safely extend the /sysroot filesystem in a KVM-based virtualized test OpenShift cluster environment, ensuring minimal downtime and data integrity.

⚠️ Caution!
We are running a virtualized OpenShift cluster on KVM. This procedure therefore, is intended for test environments running as virtual machines. For production environments, proceed with extreme care and thorough validation. If running OpenShift on bare-metal, consult Red Hat documentation or Red Hat support for guidance.

Extend OpenShift CoreOS /sysroot Root Filesystem

Why Extend the /sysroot Filesystem?

Running OpenShift in a test environment offers flexibility for experimenting with virtualized clusters, but the default disk size allocated to RHCOS VMs may become insufficient. Common scenarios requiring filesystem extension include:

  • Logs: Growing application or system logs.
  • Workloads: Additional containers or services.
  • Updates: CoreOS updates needing temporary storage.

In KVM test environments, extending the /sysroot filesystem involves resizing the virtual disk, updating the partition table, and expanding the logical volume. This process must account for RHCOS’s immutable nature and KVM’s virtual disk management. For bare-metal deployments, consult Red Hat support to ensure compatibility and safety.

Identifying Signs of Low Disk Space on Nodes

Before it gets to a point where there is a need to extend /sysroot, it’s critical to detect when disk space on nodes is running low. Common signs include:

  • Service Failures: Pods or services fail to start or take too long to schedule, stays in ContainerCreating or Pending state…Sample logs
oc describe pod <pod-name>
Status:           Pending
SeccompProfile:   RuntimeDefault
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/mariadb-7d794d9ccd
Containers:
  mariadb:
    Image:      image-registry.openshift-image-registry.svc:5000/openshift/mariadb@sha256:b11ca823cfb0ef506cd3ff5d0d76eea6b23d61ab254e00bf0cc9dea3e0954795
    Port:       3306/TCP
    Host Port:  0/TCP
    Environment:
      MYSQL_ROOT_PASSWORD:    pass
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m4675 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-m4675:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  9m59s                default-scheduler  0/6 nodes are available: 1 Too many pods, 2 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 No preemption victims found for incoming pod, 5 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  72s (x2 over 6m13s)  default-scheduler  0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 2 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
  • Node Disk Pressure in Logs: Check OpenShift logs for disk pressure events indicating low /sysroot space. Sample logs
oc describe node <node-name>

Sample logs;

...
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  k8s-wk-03.ocp.kifarunix-demo.com
  AcquireTime:     
  RenewTime:       Wed, 14 May 2025 18:24:44 +0000
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----             ------    -----------------                 ------------------                ------              -------
  MemoryPressure   Unknown   Wed, 14 May 2025 18:24:57 +0000   Wed, 14 May 2025 18:25:44 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure     Unknown   Wed, 14 May 2025 18:24:57 +0000   Wed, 14 May 2025 18:25:44 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure      Unknown   Wed, 14 May 2025 18:24:57 +0000   Wed, 14 May 2025 18:25:44 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready            Unknown   Wed, 14 May 2025 18:24:57 +0000   Wed, 14 May 2025 18:25:44 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:  192.168.122.215
  Hostname:    k8s-wk-03.ocp.kifarunix-demo.com
  ...
  Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                745m (21%)    210m (6%)
  memory             3649Mi (33%)  8Gi (75%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:
  Type     Reason                Age   From             Message
  ----     ------                ----  ----             -------
  Warning  EvictionThresholdMet  50m   kubelet          Attempting to reclaim ephemeral-storage
  Normal   NodeHasDiskPressure   50m   kubelet          Node k8s-wk-03.ocp.kifarunix-demo.com status is now: NodeHasDiskPressure
  Normal   NodeNotReady          48m   node-controller  Node k8s-wk-03.ocp.kifarunix-demo.com status is now: NodeNotReady
  • Debugging Issues: Inability to debug nodes due to a full /sysroot. Commands like oc debug node/<node-name> fail, reporting insufficient space.
  • Node Info Metrics: Use OpenShift’s monitoring tools (e.g., Prometheus) to track disk availability. You can query the same on OpenShift web under Observer > Metrics and use the query: node_filesystem_avail_bytes{mountpoint="/sysroot"}

Proactively monitor these signs to prevent cluster disruptions. Configure alerts in OpenShift’s monitoring stack for low disk space thresholds on /sysroot.

Prerequisites

Before proceeding in a test environment, ensure you have:

  • Administrative access to the KVM hypervisor and OpenShift cluster.
  • A backup of the VM’s disk image to prevent data loss. (A must)
  • Familiarity with Linux storage concepts (LVM, partitions, filesystems).
  • Tools like virt-resize, parted, and lvm installed on the KVM host.
  • The oc command-line tool for OpenShift management.

Warning: Test this in a non-production environment first. Production changes require extensive validation.

Important: Kindly note that it is required that the OpenShift Container Platform worker nodes MUST have the the same storage type and size attached to each node. As such, if you are extending for one node, it must be expanded for the rest of the worker nodes!

Step-by-Step Guide to Extend /sysroot Filesystem

1. Do We Need to Cordon/Drain the Affected Node?

In this guide, we will use online KVM disk expansion using virsh blockresize at the hypervisor level and growpart on the node itself to extend the drive. As such, cordon/drain is not strictly required when performing a disk resize using these tools on KVM-based node.

  • The virsh blockresize command resizes the virtual disk at the hypervisor level without interrupting disk I/O.
  • Inside the VM, growpart and filesystem resizing (resize2fs or xfs_growfs) are also safe to run on mounted filesystems — including the root filesystem.
  • These operations are non-disruptive and do not require unmounting or rebooting the node.
⚠️ Precaution:
While these tools are designed for live environments, it’s good practice to:
  • Perform the operation during a maintenance window or low activity period
  • Ensure recent backups or snapshots are available
  • Monitor disk I/O closely if the node runs critical workloads

2. Back Up the Virtual Disk

Create a clone, snapshot or backup of the VM’s disk image (e.g., QCOW2) if you have enough space on the hypervisor host.

3. Resize the Virtual Disk Live

We are doing an online disk resize which do not need the VM to be shutdown.

Note
If using ESXi, you can expand the virtual disk directly from the VMware vSphere interface or esxcli by increasing the VMDK size. Ensure sufficient space is available on the respective datastore, then proceed from step 5 to resize the partition and filesystem within the VM.

For KVM users, this has been extensively illustrated in our previous guide and therefore, these are the logical steps you need to take;

For example, this was the size of my one of the worker nodes before expanding on the hypervisor host.

sudo virsh domblkinfo ocp-node-wk-01 vda --human

Sample command output

Capacity:       53687091200
Allocation:     53689724928
Physical:       53689450496

Note that all the three nodes had the same storage.

After extending the vm disk size, this is our new size (for all the three worker nodes);

sudo virsh domblkinfo ocp-node-wk-03 vda --human
Capacity:       100.000 GiB
Allocation:     49.986 GiB
Physical:       49.985 GiB

Verify the new size inside the VM (Replace ssh-key and worker-node with your SSH key and respective worker node address):

ssh -i ssh-key core@worker-node lsblk

Sample command output;

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
vda    252:0    0  100G  0 disk 
├─vda1 252:1    0    1M  0 part 
├─vda2 252:2    0  127M  0 part 
├─vda3 252:3    0  384M  0 part /boot
└─vda4 252:4    0 49.5G  0 part /var/opt/pwx/oci
                                /var
                                /sysroot/ostree/deploy/rhcos/var
                                /usr
                                /etc
                                /
                                /sysroot

As you can, the drive is now at 100G yet the root partition still shows the old size before it was expanded.

See, the current state:

df -hT

Sample command output;

[core@k8s-wk-03 ~]$ df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  4.0M     0  4.0M   0% /dev
tmpfs          tmpfs     5.9G   92K  5.9G   1% /dev/shm
tmpfs          tmpfs     2.4G   74M  2.3G   4% /run
/dev/vda4      xfs        50G   31G   19G  62% /sysroot
tmpfs          tmpfs     5.9G  4.0K  5.9G   1% /tmp
/dev/vda3      ext4      350M  112M  216M  35% /boot
tmpfs          tmpfs      64M     0   64M   0% /var/lib/osd/lttng
tmpfs          tmpfs     1.2G     0  1.2G   0% /run/user/1000

So, proceed to step 5!

Step 5. Final Step: Expand the /sysroot Filesystem Inside the VM:

So, login to each VM and expand the /sysroot filesystem.

As you can see from the output of the df -hT command, /sysroot is on the forth partition of drive vda, /dev/vda4. Hence, use growpart command to extend partition table so that partition takes up all the space it can. Replace your drive and drive partition number accordingly.

sudo growpart vda 4

Sample command output;

CHANGED: partition=4 start=1050624 old: size=103806943 end=104857566 new: size=208664543 end=209715166

Do the same on all the respective nodes.

Check the partition to verify the resize;

lsblk

Sample output;

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
vda    252:0    0  100G  0 disk 
├─vda1 252:1    0    1M  0 part 
├─vda2 252:2    0  127M  0 part 
├─vda3 252:3    0  384M  0 part /boot
└─vda4 252:4    0 99.5G  0 part /var/lib/kubelet/pods/bc3243dd-6ece-467f-8517-bd3e7dbe8c86/volume-subpaths/nginx-conf/monitoring-plugin/1
                                /var/lib/kubelet/pods/e5eb4a59-6f8e-4b59-b216-422378e7d91f/volume-subpaths/nginx-conf/networking-console-plugin/1
                                /var/opt/pwx/oci
                                /var
                                /sysroot/ostree/deploy/rhcos/var
                                /usr
                                /etc
                                /
                                /sysroot

Now proceed to expand the filesystem. Note that the /sysroot partition is an XFS filesystem. Check the output of the df -hT.

To extend an XFS filesystem, use xfs_growfs command against the mount point of the respective partition.

But, here comes another issue! The RH CoreOS mounts /sysroot filesystem in read-only mode by default. Which means that, write operations like disk expansion will not work, unless we have some write access to the drive.

If you are logged into the node, you can confirm this by running the command below.

mount | grep "on /sysroot "

Sample command output;

/dev/vda4 on /sysroot type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)

As you can see, ro (Read Only) option is shown as one of the mount options.

So, how do we proceed?

Well, you have to remount the /sysroot in read-write mode, to be able to expand. As such, you need to create a separate mount namespace isolated from the rest of the system that lets you make temporary mount changes (like remounting /sysroot) without affecting the global system. This can be achieved using the unshare command.

For example, run the unshare command with –mount option to create a separate mount namespace.

sudo unshare --mount

Once you are in the new mount namespace, remount the /sysroot in RW mode.

mount -o remount,rw /sysroot

You should now be able to expand the /sysroot filesystem by running the command below;

xfs_growfs /sysroot

Sample command output;

meta-data=/dev/vda4              isize=512    agcount=68, agsize=191744 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=12975867, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 12975867 to 26083067

Once the resize is complete, exit the new mount point:

exit

Then verify the same with df -hT command;

df -hT

Sample output;

Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  4.0M     0  4.0M   0% /dev
tmpfs          tmpfs     5.9G   92K  5.9G   1% /dev/shm
tmpfs          tmpfs     2.4G   74M  2.3G   4% /run
/dev/vda4      xfs       100G   31G   70G  31% /sysroot
tmpfs          tmpfs     5.9G  4.0K  5.9G   1% /tmp
/dev/vda3      ext4      350M  112M  216M  35% /boot
tmpfs          tmpfs      64M     0   64M   0% /var/lib/osd/lttng
tmpfs          tmpfs     1.2G     0  1.2G   0% /run/user/1000

As you can see, the /sysroot is expanded successfully.

Confirm the node is ready:

oc get nodes

Sample nodes output;

NAME                               STATUS   ROLES                  AGE   VERSION
k8s-ms-01.ocp.kifarunix-demo.com   Ready    control-plane,master   87d   v1.30.7
k8s-ms-02.ocp.kifarunix-demo.com   Ready    control-plane,master   87d   v1.30.7
k8s-ms-03.ocp.kifarunix-demo.com   Ready    control-plane,master   87d   v1.30.7
k8s-wk-01.ocp.kifarunix-demo.com   Ready    worker                 87d   v1.30.7
k8s-wk-02.ocp.kifarunix-demo.com   Ready    worker                 87d   v1.30.7
k8s-wk-03.ocp.kifarunix-demo.com   Ready    worker                 87d   v1.30.7

Confirm that the nodes no longer have disk pressure.

oc get nodes <node name> -o yaml

Sample output;

...
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 14 May 2025 20:54:38 +0000   Thu, 08 May 2025 14:15:52 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 14 May 2025 20:54:38 +0000   Wed, 14 May 2025 20:52:46 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 14 May 2025 20:54:38 +0000   Thu, 08 May 2025 14:15:52 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 14 May 2025 20:54:38 +0000   Thu, 08 May 2025 14:17:27 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.122.214
  Hostname:    k8s-wk-02.ocp.kifarunix-demo.com
...

And you have successfully expanded the Red Hat OpenShift CoreOS /sysroot filesystem.

Alternatives to Disk Expansion

What if there were other methods to recover disk space on Red Hat OpenShift CoreOS nodes? Well, before extending /sysroot, consider housekeeping to reclaim space:

  • Evict Unused Pods: Identify and delete idle, terminated, completed pods that were used to do one off jobs:
oc delete pod `oc get pods -A | grep -E "Terminat|Completed|Error|Crash|StatusUnknown" | awk '{print $1}'`
  • Clean Up Orphaned or Old Images: Remove unused container images. You can manually prune the images from the node or utilize the Cluster Image Registry Operator for automatic pruning.

    For manual image pruning, debug into the node;
oc debug node/<node-name> -- chroot /host crictl rmi --prune

You can also use oc adm prune command to prune the images. You will need a user token to use the command. The user whose token you want to use must have system:image-pruner cluster role or greater.

OpenShift also ships with ImagePruner resource called cluster that helps with automatic image prunning.

oc get imagepruner

Read more on basic openshift pruning operations.

  • Clear Logs: Truncate or rotate large log files. For example, to truncate all log files that are larger than 100M of size;
oc debug node/<node-name> -- chroot /host find /var/log -size +100M -exec truncate -s 0 {} \;

And many other housekeeping tasks.

These steps can delay or avoid disk expansion, especially in test environments. Monitor their impact to ensure sufficient space is reclaimed.

Conclusion

Extending the OpenShift CoreOS /sysroot filesystem with minimal downtime is achievable using live resizing tools like virsh blockresize in KVM. Similarly, housekeeping alternatives like evicting unused pods and cleaning old images can further optimize space. For production, validate all the steps thoroughly. This ensures reliable containerized workloads in your cluster.

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
Linux Certified Engineer, with a passion for open-source technology and a strong understanding of Linux systems. With experience in system administration, troubleshooting, and automation, I am skilled in maintaining and optimizing Linux infrastructure.

Leave a Comment