In this tutorial, you will learn how to start stop or restart Ceph Services. Ceph is a distributed storage system that provides object storage, block storage, and file storage capabilities. It comprises several services that work together to manage and store data across a cluster of nodes. The key Ceph services include:
- MON Service (Monitor Service):
- Monitors the health and status of the Ceph cluster.
- MGR Service (Manager Service):
- Overview: Provides a management interface for the Ceph cluster.
- OSD Service (Object Storage Daemon):
- Manages storage devices for storing and retrieving data as objects.
- RGW Service (RADOS Gateway Service – Object Gateway):
- Offers a RESTful API gateway interface for Ceph’s object storage.
- MDS Service (Metadata Server):
- Manages metadata for the Ceph File System (CephFS).
- NFS Service:
- Provides Network File System (NFS) access to Ceph storage.
- Utilizes the
- RBD Service (RADOS Block Device):
- Manages block storage devices within the Ceph cluster.
- Utilizes the
rbdcomponent and interacts with the
Read more on Ceph service management.
To start stop or restart Ceph services, proceed as follows.
Table of Contents
How to Start Stop or Restart Ceph Services
In a Ceph cluster, services are organized and managed at different levels:
List Ceph Services
You can get a list of Ceph services using
ceph orch command.
To get a general overview of the ceph services, run the command;
sudo ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 6m ago 2d count:1 ceph-exporter 4/4 10m ago 2d * crash 4/4 10m ago 2d * grafana ?:3000 1/1 6m ago 2d count:1 mgr 2/2 6m ago 2d count:2 mon 4/5 10m ago 2d count:5 node-exporter ?:9100 4/4 10m ago 2d * osd 3 10m ago -
prometheus ?:9095 1/1 6m ago 2d count:1
To get a detailed listing of the Ceph services, run the command;
sudo ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID alertmanager.ceph-admin ceph-admin *:9093,9094 running (2d) 9m ago 2d 15.1M - 0.25.0 c8568f914cd2 8c12c81552e5 ceph-exporter.ceph-admin ceph-admin running (2d) 9m ago 2d 22.5M - 18.2.0 10237bca3285 1ca71e41cd22 ceph-exporter.ceph-mon ceph-mon running (8h) 9m ago 2d 17.5M - 18.2.0 10237bca3285 939f1001e611 ceph-exporter.ceph-osd1 ceph-osd1 running (2d) 3m ago 2d 18.0M - 18.2.0 10237bca3285 a8bb422e2a79 ceph-exporter.ceph-osd2 ceph-osd2 running (2d) 2m ago 2d 17.7M - 18.2.0 10237bca3285 deaaa5c586d1 crash.ceph-admin ceph-admin running (2d) 9m ago 2d 7432k - 18.2.0 10237bca3285 fac0c03abfa2 crash.ceph-mon ceph-mon running (8h) 9m ago 2d 7084k - 18.2.0 10237bca3285 c6ad83687a9d crash.ceph-osd1 ceph-osd1 running (2d) 3m ago 2d 7119k - 18.2.0 10237bca3285 f2c57cbaaf3d crash.ceph-osd2 ceph-osd2 running (2d) 2m ago 2d 7107k - 18.2.0 10237bca3285 bf23fe62a3a6 grafana.ceph-admin ceph-admin *:3000 running (32h) 9m ago 2d 88.1M - 9.4.7 2c41d148cca3 d3f2f3edc8e8 mgr.ceph-admin.ykkdly ceph-admin *:9283,8765,8443 running (2d) 9m ago 2d 646M - 18.2.0 10237bca3285 9b395d873cf5 mgr.ceph-mon.grwzmv ceph-mon *:8443,9283,8765 running (8h) 9m ago 2d 431M - 18.2.0 10237bca3285 a40257127c4f mon.ceph-admin ceph-admin running (2d) 9m ago 2d 497M 2048M 18.2.0 10237bca3285 39a5c79ebe49 mon.ceph-mon ceph-mon running (8h) 9m ago 2d 226M 2048M 18.2.0 10237bca3285 69af76467894 mon.ceph-osd1 ceph-osd1 running (2d) 3m ago 2d 442M 2048M 18.2.0 10237bca3285 48e379303841 mon.ceph-osd2 ceph-osd2 running (2d) 2m ago 2d 446M 2048M 18.2.0 10237bca3285 1a5ac19d09c2 node-exporter.ceph-admin ceph-admin *:9100 running (2d) 9m ago 2d 9940k - 1.5.0 0da6a335fe13 f8a22cdbc222 node-exporter.ceph-mon ceph-mon *:9100 running (8h) 9m ago 2d 8991k - 1.5.0 0da6a335fe13 bc7bd68616a8 node-exporter.ceph-osd1 ceph-osd1 *:9100 running (2d) 3m ago 2d 9564k - 1.5.0 0da6a335fe13 0e26f9a5cd1e node-exporter.ceph-osd2 ceph-osd2 *:9100 running (2d) 2m ago 2d 9075k - 1.5.0 0da6a335fe13 b557f82a9e1d osd.0 ceph-mon running (8h) 9m ago 2d 77.4M 4096M 18.2.0 10237bca3285 fbbb2be86316 osd.1 ceph-osd1 running (2d) 3m ago 2d 99.0M 2356M 18.2.0 10237bca3285 4c930eb2c71e osd.2 ceph-osd2 running (2d) 2m ago 2d 96.3M 2356M 18.2.0 10237bca3285 94551a6b5b94 prometheus.ceph-admin ceph-admin *:9095 running (2d) 9m ago 2d 256M - 2.43.0 a07b618ecd1d 3b63ed00c55e
You can list services running on specific node only by specifying the node name.
ceph orch ps HOST
ceph orch ps ceph-osd1
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID ceph-exporter.ceph-osd1 ceph-osd1 running (2d) 6m ago 2d 18.0M - 18.2.0 10237bca3285 a8bb422e2a79 crash.ceph-osd1 ceph-osd1 running (2d) 6m ago 2d 7119k - 18.2.0 10237bca3285 f2c57cbaaf3d mon.ceph-osd1 ceph-osd1 running (2d) 6m ago 2d 442M 2048M 18.2.0 10237bca3285 48e379303841 node-exporter.ceph-osd1 ceph-osd1 *:9100 running (2d) 6m ago 2d 9564k - 1.5.0 0da6a335fe13 0e26f9a5cd1e osd.1 ceph-osd1 running (2d) 6m ago 2d 99.0M 2356M 18.2.0 10237bca3285 4c930eb2c71e
To check specific service on a specific node;
ceph orch ps ceph-osd1 --service_name mon
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mon.ceph-osd1 ceph-osd1 running (3m) 3m ago 2d 19.6M 2048M 18.2.0 10237bca3285 b9bf54ad48d9
Manage Ceph Services at a Cluster Level
The cluster level represents the overall orchestration and coordination of all nodes and services to create a unified storage infrastructure. At the cluster level, Ceph services collaborate to ensure data redundancy, fault tolerance, and efficient storage management. Key activities at the cluster level include maintaining cluster maps, distributing data across OSDs, handling failover scenarios, and managing the overall health of the Ceph storage system.
To start stop or restart ceph services at a cluster level, you use
ceph orch command.
The command syntax to start, stop, or restart cluster service is;
ceph orch <start|stop|restart> <service_name>
For example, to stop, start, restart all OSDs in the cluster;
ceph orch stop <service_name>
ceph orch start <service_name>
ceph orch restart <service_name>
Note that you cannot stop the mgr or mon services for the entire cluster. Stopping these services, cluster-wide, would make the cluster inaccessible You can issue the restart command to schedule a node by node restart.
Manage Ceph Services at a Node Level
At the node level, Ceph services are associated with individual servers or nodes in the cluster. Each node typically runs multiple daemons, which collaborate to provide the necessary storage services. Nodes can host OSDs, MONs, MGRs, RGWs, or MDSs, depending on the specific role assigned to them in the Ceph cluster.
You can use
systemctl command to start stop or restart ceph services at a node level.
List Ceph SystemD services running on a specific node;
sudo systemctl list-units "*ceph*"
Sample output on my ceph-osd1 node;
UNIT LOAD ACTIVE SUB DESCRIPTION firstname.lastname@example.org loaded active running Ceph ceph-exporter.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415 email@example.com loaded active running Ceph crash.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415 [email protected] loaded active running Ceph mon.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415 firstname.lastname@example.org loaded active running Ceph node-exporter.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415 [email protected] loaded active running Ceph osd.1 for 70d227de-83e3-11ee-9dda-ff8b7941e415 system-ceph\x2d70d227de\x2d83e3\x2d11ee\x2d9dda\x2dff8b7941e415.slice loaded active active Slice /system/ceph-70d227de-83e3-11ee-9dda-ff8b7941e415 ceph-70d227de-83e3-11ee-9dda-ff8b7941e415.target loaded active active Ceph cluster 70d227de-83e3-11ee-9dda-ff8b7941e415 ceph.target loaded active active All Ceph clusters and services LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 8 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.
From the output above, the UNIT field shows the service names in the format;
FSIDis the Ceph File System Identifier, a unique identifier for the cluster.
- SERVICE_TYPE.ID is the ceph systemd service name which corresponds to the NAME field of the
ceph orch pscommand.
So, you can control and manage each single ceph service on each node using systemctl command;
systemctl stop ceph-FSID@SERVICE_TYPE.ID
systemctl start ceph-FSID@SERVICE_TYPE.ID
systemctl restart ceph-FSID@SERVICE_TYPE.ID
systemctl status ceph-FSID@SERVICE_TYPE.ID
If you want to manage all the Ceph clusters services in a node, then use the
ceph.target service unit.
systemctl stop ceph.target
systemctl start ceph.target
systemctl restart ceph.target
systemctl status ceph.target
If you are running multiple clusters, then services associated with the cluster will have their respective cluster IDs. So, if you want to manage all services for a specific cluster, then;
systemctl <start|stop|restart|status> ceph-FSID.target
Ceph employs a decentralized architecture where various components, called daemons, work together to provide different storage services. These daemons are responsible for specific tasks within the Ceph cluster. Here are some key Ceph daemons:
- OSD (Object Storage Daemon): Manages the storage devices and is responsible for storing and retrieving data as objects.
- MON (Monitor Daemon): Maintains maps of the cluster state, including OSD maps and monitor maps. Monitors communicate with each other to reach a consensus on the state of the cluster.
- MGR (Manager Daemon): Provides a management interface for the Ceph cluster, offering RESTful APIs and a web-based dashboard for monitoring and managing the cluster.
- RGW (RADOS Gateway Daemon): Facilitates access to Ceph object storage through S3 and Swift-compatible APIs.
- MDS (Metadata Server Daemon): Manages metadata for Ceph File System (CephFS), facilitating file access and directory operations.
ceph orch daemon command in Ceph orchestrator is used to start stop or restart ceph services at a daemon level. It allows you to interact with and perform various operations on Ceph daemon services deployed in the cluster. The
ceph orch daemon command provides subcommands for tasks such as starting, stopping, restarting, reconfig daemons, e.t.c.
Thus, the command syntax is;
ceph orch daemon <start|stop|restart> SERVICE_NAME
You can get the SERVICE_NAME from the
ceph orch ps command.
ceph orch daemon restart grafana.ceph-admin
Check more on;
ceph orch daemon -h
How to Gracefully Stop and Start Whole Ceph Cluster for Maintenance
HEADS UP! POTENTIAL DATA LOSS/CORRUPTION! PROCEED AT YOUR OWN RISK!
Stopping the entire Ceph cluster involves stopping all Ceph daemon services across the MONs (Monitors), OSDs (Object Storage Daemons), MGRs (Managers), and other components.
Be cautious when stopping a Ceph cluster, especially in production environments, to avoid potential data loss or corruption. Ensure that you have proper backups, and the cluster is not serving critical workloads.
The specific steps might depend on how Ceph was deployed in your environment (e.g. using cephadm, manual deployment, or other methods). You can check our Ceph cluster deployment guides.
If you are sure you want proceed, then proceed as follows.
Verify healthy cluster state
Before initiating the shutdown process, ensure that the Ceph cluster is in a healthy state. Check for any ongoing maintenance tasks, data replication issues, or OSD failures.
cluster: id: 70d227de-83e3-11ee-9dda-ff8b7941e415 health: HEALTH_OK services: mon: 4 daemons, quorum ceph-admin,ceph-mon,ceph-osd1,ceph-osd2 (age 102m) mgr: ceph-admin.ykkdly(active, since 2d), standbys: ceph-mon.grwzmv osd: 3 osds: 3 up (since 38m), 3 in (since 2d) data: pools: 2 pools, 33 pgs objects: 45 objects, 14 MiB usage: 191 MiB used, 300 GiB / 300 GiB avail pgs: 33 active+clean
Backup your Data
Ensures that you have a backup of your data in case of any unexpected issues during the shutdown process.
Stop data writes on Ceph Cluster
Stop any applications or processes that are writing data to the Ceph cluster. This prevents new data from being written while the cluster is shutting down, reducing the risk of data loss or corruption.
If you have any clients using the cluster, stop or power them off before you can proceed.
Prepare the Object Storage Devices (OSDs) for Shutdown
Modify configuration parameters of OSDs in the Ceph cluster in preparation for cluster shutdown.
Prevent OSDs from being treated as
out of the cluster (useful during maintenance);
ceph osd set noout
This means that OSDs will not be marked as “out” even if they are not responding, during shutdown.
Disables backfill operations in the cluster;
ceph osd set nobackfill
This command sets the
nobackfill flag for an OSD, which prevents the OSD from replicating data from other OSDs.
Disable Cluster OSD recovery operations;
ceph osd set norecover
This command sets the
norecover flag for an OSD, which prevents the OSD from recovering from failures.
Disable Ceph Cluster rebalance operations;
ceph osd set norebalance
This command sets the
norebalance flag for an OSD, which prevents the OSD from participating in rebalancing operations.
Prevents OSDs from being marked as “down” to avoid unnecessary cluster adjustments.
ceph osd set nodown
Step Ceph Cluster read and write operations.
ceph osd set pause
You can verify that all these have been effected on Ceph OSDs by checking Ceph cluster status;
cluster: id: 70d227de-83e3-11ee-9dda-ff8b7941e415 health: HEALTH_WARN pauserd,pausewr,nodown,noout,nobackfill,norebalance flag(s) set services: mon: 4 daemons, quorum ceph-admin,ceph-mon,ceph-osd1,ceph-osd2 (age 2h) mgr: ceph-admin.ykkdly(active, since 2d), standbys: ceph-mon.grwzmv osd: 3 osds: 3 up (since 109m), 3 in (since 2d) flags pauserd,pausewr,nodown,noout,nobackfill,norebalance data: pools: 2 pools, 33 pgs objects: 45 objects, 14 MiB usage: 191 MiB used, 300 GiB / 300 GiB avail pgs: 33 active+clean
Shut down the Ceph cluster Nodes.
Login to each Ceph cluster nodes and shut them down in the following order;
(Ensure the IP addresses are assigned permanently to the nodes)
- Ceph Service nodes: If you are running seperate nodes for services such as RGW nodes or other special services, shut them down first:
- Ceph OSD nodes: Login to each OSD node and gracefully shut them down: systemctl poweroff
- Ceph MON nodes: Login to each MON node and gracefully shut them down: systemctl poweroff
- Ceph MGR Nodes: Login to each MGR node and gracefully shut them down: systemctl poweroff
Bring Backup Ceph Cluster
After the maintenance, it is now time to bring up the cluster.
To begin with, power up the Ceph cluster nodes reverse order with which you shut them down above.
- Power on Ceph MGR Nodes.
- Power on Ceph MON nodes.
- Power on OSD nodes.
- Power on Service nodes
Once the nodes are up, ensure that timestamp is the same across all the nodes (NTP can be used).
Unset all the flags set above on the OSD nodes, in the reverse order;
ceph osd unset pause ceph osd unset nodown ceph osd unset norebalance ceph osd unset norecover ceph osd unset nobackfill ceph osd unset noout
Once all is done, confirm the health of your cluster.
cluster: id: 70d227de-83e3-11ee-9dda-ff8b7941e415 health: HEALTH_OK services: mon: 4 daemons, quorum ceph-admin,ceph-mon,ceph-osd1,ceph-osd2 (age 60s) mgr: ceph-admin.ykkdly(active, since 2d), standbys: ceph-mon.grwzmv osd: 3 osds: 3 up (since 56s), 3 in (since 2d) data: pools: 2 pools, 33 pgs objects: 45 objects, 14 MiB usage: 125 MiB used, 300 GiB / 300 GiB avail pgs: 33 active+clean
Verify and validate everything to ensure your cluster is up and running as expected.
That concludes our guide on how to start stop and restart Ceph cluster services.