Part 3: How to Restore and Validate Application Backups in OpenShift with OADP

|
Published:
|
|
How to Backup and Restore Applications in OpenShift 4 with OADP (Velero) and ODF

In this guide, you will learn how to restore and validate application backups in OpenShift with OADP. This is Part 3 of a three-part series on backing up and restoring applications in OpenShift 4 with OADP (Velero). If you are arriving here directly, we recommend starting with Part 1, where we installed and configured OADP, and Part 2, where we created backups and verified the data landed in object storage.

In this part, we put the backup to the test. We will delete the demo-app namespace entirely, simulating an accidental deletion or a disaster scenario, then restore it from the backup created in Part 2 and verify that the MySQL database comes back with all data intact.

Restore and Validate Application Backups in OpenShift with OADP

Step 10: Verify the Scheduled Backup

Before testing a restore, confirm the scheduled backup actually ran and completed successfully. Do not assume it worked because no alerts fired.

Check that the schedule produced a backup:

oc get backup.velero.io -n openshift-adp

Sample output;

NAME                                   AGE
demo-app-backup-20260221-1808          13h
demo-app-daily-backup-20260221183509   12h
demo-app-daily-backup-20260222020046   4h36m

Look for a backup named demo-app-daily-backup-<timestamp> with Phase: Completed. Hence, describe the latest one to check the status:

oc describe backup.velero.io demo-app-daily-backup-20260222020046 -n openshift-adp | grep -A20 ^Status:
Status:
  Backup Item Operations Attempted:  1
  Backup Item Operations Completed:  1
  Completion Timestamp:              2026-02-22T02:01:31Z
  Expiration:                        2026-03-01T02:00:46Z
  Format Version:                    1.1.0
  Hook Status:
  Phase:  Completed
  Progress:
    Items Backed Up:  26
    Total Items:      26
  Start Timestamp:    2026-02-22T02:00:46Z
  Version:            1
Events:               <none>

Then confirm the DataUpload completed:

oc get datauploads.velero.io -n openshift-adp

BYTES DONE must match TOTAL BYTES. If it does not, the PVC data never made it to object storage and that backup cannot be used for a restore.

NAME                                         STATUS      STARTED   BYTES DONE   TOTAL BYTES   STORAGE LOCATION   AGE     NODE
demo-app-backup-20260221-1808-cdm2t          Completed   13h       113057808    113057808     default            13h     wk-03.ocp.comfythings.com
demo-app-daily-backup-20260221183509-m4cqh   Completed   12h       113057808    113057808     default            12h     wk-03.ocp.comfythings.com
demo-app-daily-backup-20260222020046-kkhjf   Completed   4h44m     113057808    113057808     default            4h45m   wk-03.ocp.comfythings.com

Also spot-check MinIO directly. Navigate to velero/backups/ and confirm the timestamped folder exists with the expected files.

Restore and Validate Application Backups in OpenShift with OADP

Backup data

Restore and Validate Application Backups in OpenShift with OADP

Then check velero/kopia/demo-app/ and confirm pack files are present.

incremental backup files kopia

Note that after the first full backup, Kopia uploads only newly changed data chunks. If your PVC data hasn’t changed since the last backup, Kopia has no new chunks to upload, so you won’t see new pack (p*) files created in the repository. Instead, Kopia simply creates a new snapshot that references the existing stored chunks.

In short, Kopia uses content-defined chunking with block-level deduplication, meaning unchanged data is never re-uploaded, saving both time and storage.

Therefore, only once you have confirmed the backup is genuinely there should you proceed to testing the restore.

We will run two scenarios to simulate data restore here:

  • First, a full namespace deletion and restore to the original namespace, which simulates accidental deletion or complete application loss.
  • Second, a cross-namespace restore to a separate namespace, which is how you validate a backup without touching the running application at all.

Step 11: Simulate a Disaster and Restore to Original Namespace

A backup strategy that has never been tested is not a backup strategy. It is a document that might make you feel better right up until the moment you need it. This step is not optional. Running a restore in a controlled environment, before you ever need it in anger, is what separates a genuine recovery capability from a false sense of security.

Simulate a Disaster: Delete the namespace

To simulate a disaster, we will delete the entire demo-app namespace. This simulates complete loss of the application. Everything goes: the Deployment, the Service, the PVC definition, and the PersistentVolume that held our MySQL data.

oc delete namespace demo-app

Confirm it is gone:

oc get namespace demo-app
Error from server (NotFound): namespaces "demo-app" not found

Create the Restore CR targeting the Deleted Namespace

A Restore CR tells Velero which backup to recover from and what to include. It is the mirror of the Backup CR. You specify the backup by name, the namespaces to restore, and whether to recover persistent volumes. OADP will recreate the namespace, all its Kubernetes resources, and trigger Data Mover to pull the volume data back from MinIO your respective object storage.

Before initiating a restore, identify which backup you want to recover from. This decision is driven by your Recovery Point Objective (RPO), how far back in time you are willing to go.

From our verification step above, we will want to restore to our most recent backup before the disaster, demo-app-daily-backup-20260222020046.

Similarly, choose the backup that represents the recovery point you need and set it:

BACKUP_NAME="demo-app-daily-backup-20260222020046"

Then proceed to creating the Restore CR using $BACKUP_NAME

cat <<EOF | oc apply -f -
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: ${BACKUP_NAME}-r1
  namespace: openshift-adp
spec:
  backupName: $BACKUP_NAME
  includedNamespaces:
    - demo-app
  restorePVs: true
EOF

This CR will:

  • Create a Velero Restore resource in the openshift-adp namespace
  • Uses the backup named $BACKUP_NAME as the restore source
  • Restores only the demo-app namespace
  • Restores associated Persistent Volumes (restorePVs: true)

Confirm the restore was created:

oc get restore.velero.io $BACKUP_NAME-r1 -n openshift-adp
NAME                                      AGE
demo-app-daily-backup-20260222020046-r1   11s

Then check the restoration status:

watch "oc -n openshift-adp get restore $BACKUP_NAME-r1 -o json | jq .status"

Sample output for a successful restore:


{
  "completionTimestamp": "2026-02-22T10:18:35Z",
  "hookStatus": {},
  "phase": "Completed",
  "progress": {
    "itemsRestored": 24,
    "totalItems": 24
  },
  "restoreItemOperationsAttempted": 1,
  "restoreItemOperationsCompleted": 1,
  "startTimestamp": "2026-02-22T10:17:33Z",
  "warnings": 5
}

Describe for full details, events and any warnings:

oc describe restore.velero.io $BACKUP_NAME-r1 -n openshift-adp

Restore completed.

If the restore fails, and upon describing the restore object you see an error like:

Failure Reason: found a restore with status "InProgress" during the server starting, mark it as "Failed"

Then you are most likely running out of resources. The Velero pod is hitting its memory limit and getting OOMKilled mid-restore, causing it to restart. When it comes back up it finds the restore stuck in InProgress and immediately marks it Failed.

Check the limits on your Velero pod:

oc describe pod -n openshift-adp -l app.kubernetes.io/name=velero | grep -A 6 "Limits:"

If memory is at 256Mi or lower, that might be your problem. Update the resource allocations in your DPA accordingly and rerun the restore.

Also check the logs:

oc logs -n openshift-adp -l app.kubernetes.io/name=velero --previous 2>/dev/null | tail -50

Now, the restore is successful.

Verify that everything came back, not just the namespace, but the actual application data.

Namespace exists again:

oc get namespace demo-app
NAME       STATUS   AGE
demo-app   Active   7m6s

All resources are back:

oc get all -n demo-app
NAME                         READY   STATUS    RESTARTS   AGE
pod/mysql-6d9765ddd8-7dk4r   1/1     Running   0          8m59s

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql   1/1     1            1           8m58s

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-6d9765ddd8   1         1         1       8m59s

Confirm PVC is bound:

oc get pvc -n demo-app
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
mysql-data   Bound    pvc-450e733e-d296-4037-b3ca-c10706d1024b   5Gi        RWO            ocs-storagecluster-ceph-rbd   <unset>                 11m

Once the MySQL pod is Running, run the critical verification: data integrity.

MYSQL_POD=$(oc get pod -n demo-app -l app=mysql -o jsonpath='{.items[0].metadata.name}')
oc exec -n demo-app $MYSQL_POD -- bash -c \
  "mysql -u demouser -pdemopassword demodb -e 'SELECT * FROM customers;'"
mysql: [Warning] Using a password on the command line interface can be insecure.
id	name	email
1	Alice Nyundo	[email protected]
2	Brian Waters	[email protected]
3	Carol Ben	[email protected]

All three rows are back, exactly as they were before the namespace was deleted. The PV data survived the complete destruction and reconstruction of the application namespace.

You may notice that the restore above completed with warnings. This is acceptable. A Completed restore that shows a non-zero warning count is normal in real clusters. Common benign warnings include skipped ClusterRoleBindings, which are outside OADP’s namespace scope by default, and resources that already exist in the target.

PartiallyFailed is the status to investigate. You can use the command below to to see which resources failed and why:

oc describe restore.velero.io $BACKUP_NAME -n openshift-adp 

Step 12: Cross-Namespace Restore for DR Drills

The previous restore validated disaster recovery by deleting the source namespace first. But in a production environment, you often want to validate a backup without touching the running application at all. OADP gives you two powerful options here:

  • Same cluster, different namespace: restore the backup into a new namespace on the same cluster, leaving the running application completely untouched
  • Different cluster entirely: point a second cluster’s OADP instance at the same MinIO backup storage location, and restore from there. This is your true cross-cluster disaster recovery and migration path, something etcd backup cannot do at all.

The namespaceMapping field in the Restore CR handles the namespace remapping in both cases. It maps the source namespace from the backup to a different target namespace on whichever cluster you are restoring to.

In this demo, we will demonstrate restoring to the same cluster but a different namespace. Before applying the manifest, get your available backups and identify the one you want to restore from:

oc get backup.velero.io -n openshift-adp

Note the name of your target backup, replace the backupName value in the manifest below accordingly

BACKUP_NAME="demo-app-daily-backup-20260222020046"

then apply:

cat <<EOF | oc apply -f -
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: demo-app-dr-drill
  namespace: openshift-adp
spec:
  backupName: $BACKUP_NAME
  includedNamespaces:
    - demo-app
  namespaceMapping:
    demo-app: demo-app-dr-test
  restorePVs: true
EOF

This creates a full, independent replica of demo-app: all Kubernetes resources and all PV data, in a new namespace demo-app-dr-test. Your production demo-app is completely unaffected. You can query the restored database, check application endpoints, and confirm everything is consistent, all without touching production.

You can watch the restore:

watch "oc -n openshift-adp get restore demo-app-dr-drill -o json | jq .status"

Sample output;

{
  "completionTimestamp": "2026-02-22T10:50:15Z",
  "hookStatus": {},
  "phase": "Completed",
  "progress": {
    "itemsRestored": 24,
    "totalItems": 24
  },
  "restoreItemOperationsAttempted": 1,
  "restoreItemOperationsCompleted": 1,
  "startTimestamp": "2026-02-22T10:49:26Z",
  "warnings": 4
}

Note: Routes after namespace restore. OADP restores Routes using the base domain of the restore cluster. Routes without an explicit spec.host, that is auto-generated routes, receive a new hostname based on the destination cluster’s wildcard domain. This is expected and correct. You would not want a restored DR copy answering on the same hostname as production.

Verify the DR copy and confirm the data:

oc get all -n demo-app-dr-test
NAME                         READY   STATUS    RESTARTS   AGE
pod/mysql-6d9765ddd8-7dk4r   1/1     Running   0          4m11s

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql   1/1     1            1           4m10s

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-6d9765ddd8   1         1         1       4m11s

Check the DB:

MYSQL_POD_DR=$(oc get pod -n demo-app-dr-test -l app=mysql -o jsonpath='{.items[0].metadata.name}')
oc exec -n demo-app-dr-test $MYSQL_POD_DR -- bash -c \
  "mysql -u demouser -pdemopassword demodb -e 'SELECT * FROM customers;'"
mysql: [Warning] Using a password on the command line interface can be insecure.
id	name	email
1	Alice Nyundo	[email protected]
2	Brian Waters	[email protected]
3	Carol Ben	[email protected]

Clean up after the drill:

oc delete namespace demo-app-dr-test

Make this drill part of your operational calendar. Run it monthly, after any major cluster change, and before any large maintenance window. A restore you have never practiced is a restore you cannot rely on.

Step 13: Monitor Backup Health with Prometheus Alerts

A backup strategy without monitoring is incomplete. You will not know a backup failed until the moment you need to restore and by then, your protection window may have been silently broken for days. Alerting on backup health is what closes that gap.

Velero exposes Prometheus metrics automatically via a service called openshift-adp-velero-metrics-svc on port 8085, which gets created when the DPA is configured. However, metrics are not scraped automatically. Making these metrics visible to OpenShift’s monitoring stack requires two things:

  • user workload monitoring must be enabled, and
  • a ServiceMonitor must be created manually to point at the Velero metrics service.

Enable User Workload Monitoring

Velero exposes Prometheus metrics automatically. OpenShift’s built-in monitoring stack can collect them, but this requires user workload monitoring to be enabled on your cluster first, since openshift-adp is not an openshift-* core namespace. If you have not done this yet, follow our guide before proceeding:

Enable User Workload Monitoring in OpenShift 4.20

Verify the user workload monitoring pods are running:

oc get pods -n openshift-user-workload-monitoring

You should see prometheus-user-workload and thanos-ruler-user-workload pods in Running state.

Once user workload monitoring is enabled, you also need to label the openshift-adp namespace so the monitoring stack knows to scrape it:

oc label namespace openshift-adp openshift.io/cluster-monitoring="true"

Verify the label is applied:

oc get namespace openshift-adp --show-labels

For a deeper understanding of how OpenShift alerting works end to end, including Alertmanager configuration and routing rules, refer to our guides:

Confirm the Velero metrics service exists

Run the command below to check if Velero metrics service exists:

oc get svc -n openshift-adp -l app.kubernetes.io/name=velero
NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
openshift-adp-velero-metrics-svc   ClusterIP   172.30.5.161   <none>        8085/TCP   22h

Create the ServiceMonitor

A ServiceMonitor is a Custom Resource that tells the Prometheus Operator which services to scrape for metrics, how often to scrape them, and on which port and path. Without it, Prometheus has no way of knowing that the openshift-adp-velero-metrics-svc service exists or that it exposes metrics worth collecting. It must be created in the openshift-adp namespace and must select the openshift-adp-velero-metrics-svc service using the app.kubernetes.io/name=velero label:

Update the manifest below accordingly and deploy it on your cluster:

cat <<EOF | oc apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: oadp-service-monitor
  name: oadp-service-monitor
  namespace: openshift-adp
spec:
  endpoints:
    - interval: 30s
      path: /metrics
      targetPort: 8085
      scheme: http
  selector:
    matchLabels:
      app.kubernetes.io/name: "velero"
EOF

Verify the ServiceMonitor was created:

oc get servicemonitor -n openshift-adp

Confirm metrics are being scraped

In the OpenShift web console, navigate to Observe > Targets. Use the Label filter and select: service=openshift-adp-velero-metrics-svc

You should see the oadp-service-monitor target listed in the openshift-adp namespace. Within a few minutes, the Status column should show Up, confirming that Prometheus is successfully scraping the Velero metrics endpoint.

monitor OADP velero backup metrics

Confirm metrics are available by checking them on Observe > Metrics. You can simply type velero_ and you will see all possible expressions you can use.

velero metrics

Create the PrometheusRule for Alerting

With that in place, we create a PrometheusRule that alerts on a number of Velero backup conditions.

Here is our sample PrometheusRule for these alerts:

cat oadp-backup-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: oadp-backup-alerts-demo
  namespace: openshift-adp
  labels:
    openshift.io/prometheus-rule-evaluation-scope: leaf-prometheus
spec:
  groups:
    - name: oadp.backup.rules
      rules:

        # Backup Failures
        - alert: OADPBackupFailed
          expr: |
            sum by (schedule) (
              increase(velero_backup_failure_total{job="openshift-adp-velero-metrics-svc"}[2h])
            ) > 0
          for: 5m
          labels:
            severity: critical
            component: oadp
          annotations:
            summary: "OADP backup failed"
            description: "A backup failure was detected in the last 2 hours for schedule {{ $labels.schedule }}. Check Velero logs: oc logs deployment/velero -n openshift-adp"

        # Partial Backup Failures
        - alert: OADPBackupPartiallyFailed
          expr: |
            sum by (schedule) (
              increase(velero_backup_partial_failure_total{job="openshift-adp-velero-metrics-svc"}[2h])
            ) > 0
          for: 5m
          labels:
            severity: warning
            component: oadp
          annotations:
            summary: "OADP backup partially failed"
            description: "Partial backup failure for schedule {{ $labels.schedule }}. Inspect with: oc describe backup.velero.io -n openshift-adp"

        # Backup Validation Failures
        - alert: OADPBackupValidationFailed
          expr: |
            sum by (schedule) (
              increase(velero_backup_validation_failure_total{job="openshift-adp-velero-metrics-svc"}[2h])
            ) > 0
          for: 5m
          labels:
            severity: warning
            component: oadp
          annotations:
            summary: "OADP backup validation failed"
            description: "Backup validation failed for schedule {{ $labels.schedule }}. Check BackupStorageLocation availability: oc get backupstoragelocation -n openshift-adp"

        # Backup Overdue
        - alert: OADPBackupOverdue
          expr: |
            (time() - max by (schedule) (
              velero_backup_last_successful_timestamp{job="openshift-adp-velero-metrics-svc"}
            )) > 93600
          for: 5m
          labels:
            severity: critical
            component: oadp
          annotations:
            summary: "OADP scheduled backup overdue"
            description: "No successful backup in over 26 hours for schedule {{ $labels.schedule }}. Verify schedule and BackupStorageLocation availability."

        # No Backup Ever
        - alert: OADPNoBackupEver
          expr: |
            absent(velero_backup_last_successful_timestamp{job="openshift-adp-velero-metrics-svc"})
          for: 10m
          labels:
            severity: critical
            component: oadp
          annotations:
            summary: "No successful OADP backup recorded"
            description: "Velero has never successfully completed a backup. Ensure OADP is configured and a backup has been run."

        # Backup Storage Location Unavailable
        - alert: OADPBackupStorageLocationUnavailable
          expr: |
            velero_backup_storage_location_available{job="openshift-adp-velero-metrics-svc"} == 0
          for: 5m
          labels:
            severity: critical
            component: oadp
          annotations:
            summary: "BackupStorageLocation unavailable"
            description: "BackupStorageLocation {{ $labels.backup_storage_location }} is unavailable. Velero cannot write backups. Check MinIO/cloud credentials in openshift-adp."

        # Backup Taking Too Long
        - alert: OADPBackupTakingTooLong
          expr: |
            (
              sum by (schedule) (
                rate(velero_backup_duration_seconds_sum{job="openshift-adp-velero-metrics-svc"}[1h])
              )
              /
              sum by (schedule) (
                rate(velero_backup_duration_seconds_count{job="openshift-adp-velero-metrics-svc"}[1h])
              )
            ) > 3600
          for: 5m
          labels:
            severity: warning
            component: oadp
          annotations:
            summary: "OADP backup taking too long"
            description: "Rolling 1-hour average backup duration > 1 hour for schedule {{ $labels.schedule }}. Check PVC size, node-agent resources, or storage throughput."

        # Volume Snapshot Failures
        - alert: OADPVolumeSnapshotFailed
          expr: |
            sum by (schedule) (
              increase(velero_volume_snapshot_failure_total{job="openshift-adp-velero-metrics-svc"}[2h])
            ) > 0
          for: 5m
          labels:
            severity: critical
            component: oadp
          annotations:
            summary: "OADP volume snapshot failed"
            description: "A volume snapshot failure was detected for schedule {{ $labels.schedule }}. Verify VolumeSnapshotClass configuration and ODF health."

        # Node Agent Pods Not Ready
        - alert: OADPNodeAgentDown
          expr: |
            kube_daemonset_status_number_ready{daemonset="node-agent", namespace="openshift-adp"} 
            < 
            kube_daemonset_status_desired_number_scheduled{daemonset="node-agent", namespace="openshift-adp"}
          for: 5m
          labels:
            severity: critical
            component: oadp
          annotations:
            summary: "OADP Node Agent pods not ready"
            description: "One or more node-agent pods in openshift-adp are not ready. PV data on those nodes will be skipped. Check pods: oc get pods -n openshift-adp -l component=velero"

Where:

  • OADPBackupFailed: Fires when a full backup fails in the last 2 hours (velero_backup_failure_total); severity: critical.
  • OADPBackupPartiallyFailed: Fires when a backup partially fails (some resources/volumes not captured) (velero_backup_partial_failure_total); severity: warning.
  • OADPBackupValidationFailed: Fires when backup validation fails before the backup runs (velero_backup_validation_failure_total); severity: warning.
  • OADPBackupOverdue: Fires when no successful backup has occurred for over 26 hours for a schedule (velero_backup_last_successful_timestamp); severity: critical.
  • OADPNoBackupEver: Fires when no backup has ever completed successfully (metric absent) (velero_backup_last_successful_timestamp); severity: critical.
  • OADPBackupStorageLocationUnavailable: Fires when BackupStorageLocation is unreachable or unavailable (velero_backup_storage_location_available); severity: critical.
  • OADPBackupTakingTooLong: Fires when rolling 1-hour average backup duration exceeds 1 hour (velero_backup_duration_seconds_sum / velero_backup_duration_seconds_count); severity: warning.
  • OADPVolumeSnapshotFailed: Fires when a non-CSI volume snapshot fails (velero_volume_snapshot_failure_total); severity: critical.
  • OADPNodeAgentDown: Fires when one or more Node Agent pods are not ready (kube_daemonset_status_number_ready < kube_daemonset_status_desired_number_scheduled); severity: critical.

Let’s apply the rule;

oc apply -f oadp-backup-alerts.yaml

Verify the rule is created:

oc get prometheusrule -n openshift-adp
NAME                 AGE
oadp-backup-alerts   18s

The most accurate and professional subtitle would be:

Validating Prometheus Alerts for OADP

Creating a PrometheusRule is not enough. You need to confirm the alerts actually fire when things go wrong. This section walks through a controlled simulation that deliberately breaks two things:

  • the backup storage endpoint and
  • the backup schedule.

The goal is to trigger real alert conditions and confirm they surface correctly in OpenShift’s Alerting UI before you ever need them in a real incident.

Step 1: Break the BackupStorageLocation

Point the DPA to a non-existent object storage endpoint:

Get DPA:

oc get dpa -n openshift-adp
NAME         RECONCILED   AGE
oadp-minio   True         23h

Edit and set an invalid endpoint:

oc edit dpa oadp-minio -n openshift-adp

Change the value of s3Url:

Save and exit the file.

Confirm the BSL flips to Unavailable within 30 seconds:

oc get backupstoragelocation.velero.io -n openshift-adp

Expected:

NAME      PHASE         LAST VALIDATED   AGE   DEFAULT
default   Unavailable   32s              23h   true

Step 2: Trigger a manual backup against the broken BSL:

Create a manual backup to test the alerting:

cat <<EOF | oc apply -f -
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: demo-app-backup-alert-test
  namespace: openshift-adp
spec:
  includedNamespaces:
    - demo-app
  defaultVolumesToFsBackup: true
  storageLocation: default
  ttl: 1h0m0s
EOF

Watch the backup attempt fire and fail against the broken endpoint:

watch "oc get backup.velero.io demo-app-backup-alert-test \
  -n openshift-adp -o json | jq '.status'"
{
  "expiration": "2026-02-22T15:41:24Z",
  "formatVersion": "1.1.0",
  "phase": "FailedValidation",
  "validationErrors": [
    "backup can't be created because BackupStorageLocation default is in Unavailable status. please create a new backup after the BSL becomes available"
  ],
  "version": 1
}

Check backups:

oc get backup.velero.io -n openshift-adp

Step 3: Confirm alerts are firing

In the OpenShift web console navigate to Observe > Alerting and filter by OADP. Within 5 minutes you should see alerts:

velero backup failed alerts

And here are the Slack channels alerts:

velero backup alerts on slack channel

bash

oc get prometheusrule oadp-backup-alerts -n openshift-adp -o yaml | grep -A 3 "alert:"

Step 4: Restore everything after the test

After the demo, restore the correct object storage endpoint:

oc edit dpa oadp-minio -n openshift-adp

Confirm BSL is Available again

oc get bsl -n openshift-adp
NAME      PHASE       LAST VALIDATED   AGE   DEFAULT
default   Available   12s              27h   true

Once the BSL is back to Available and the schedule is restored, OADPBackupValidationFailed alerts will automatically resolve. No manual intervention on the alerts is needed.

Conclusion

This guide completed the full OADP backup and restore cycle on OpenShift with ODF and MinIO. Here is what you now have in place:

  • A verified backup pipeline: CSI snapshots, Data Mover, Kopia deduplication, and MinIO as the final backup target
  • A tested restore process: full namespace deletion and recovery with data integrity confirmed at the database level
  • A DR drill workflow: cross-namespace restore that validates backups without touching production
  • Prometheus alerting: real failure conditions surfaced in OpenShift’s Alerting UI before they become incidents

A backup that has never been restored is not a backup. The MySQL rows coming back intact after a full namespace deletion is the only proof that matters.

Make the DR drill a scheduled activity. Run it monthly, after OADP upgrades, and before major cluster changes. The difference between a practiced restore and an untested one shows up at the worst possible moment.

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
DevOps Engineer and Linux Specialist with deep expertise in RHEL, Debian, SUSE, Ubuntu, FreeBSD... Passionate about open-source technologies, I specialize in Kubernetes, Docker, OpenShift, Ansible automation, and Red Hat Satellite. With extensive experience in Linux system administration, infrastructure optimization, information security, and automation, I design and deploy secure, scalable solutions for complex environments. Leveraging tools like Terraform and CI/CD pipelines, I ensure seamless integration and delivery while enhancing operational efficiency across Linux-based infrastructures.

Leave a Comment

document.addEventListener("DOMContentLoaded", function() { document.querySelectorAll(".scroll-box").forEach(function(box) { box.style.position = "relative"; // Needed for absolute positioning of button var button = document.createElement("button"); button.className = "copy-icon-btn"; button.setAttribute("aria-label", "Copy code"); button.innerHTML = ''; box.appendChild(button); button.addEventListener("click", function() { var text = box.innerText; navigator.clipboard.writeText(text).then(function() { button.querySelector("svg").setAttribute("fill", "#4CAF50"); setTimeout(function() { button.querySelector("svg").setAttribute("fill", "white"); }, 1500); }).catch(function(err) { console.error("Copy failed: ", err); }); }); }); });