How to Start Stop or Restart Ceph Services

|
Last Updated:
|
|
How to Start Stop or Restart Ceph Services

In this tutorial, you will learn how to start stop or restart Ceph Services. Ceph is a distributed storage system that provides object storage, block storage, and file storage capabilities. It comprises several services that work together to manage and store data across a cluster of nodes. The key Ceph services include:

  1. MON Service (Monitor Service):
    • Monitors the health and status of the Ceph cluster.
  2. MGR Service (Manager Service):
    • Overview: Provides a management interface for the Ceph cluster.
  3. OSD Service (Object Storage Daemon):
    • Manages storage devices for storing and retrieving data as objects.
  4. RGW Service (RADOS Gateway Service – Object Gateway):
    • Offers a RESTful API gateway interface for Ceph’s object storage.
  5. MDS Service (Metadata Server):
    • Manages metadata for the Ceph File System (CephFS).
  6. NFS Service:
    • Provides Network File System (NFS) access to Ceph storage.
    • Utilizes the nfs-ganesha daemon.
  7. RBD Service (RADOS Block Device):
    • Manages block storage devices within the Ceph cluster.
    • Utilizes the rbd component and interacts with the rados and ceph-osd daemons.

Read more on Ceph service management.

To start, stop or restart Ceph services, proceed as follows.

How to Start, Stop or Restart Ceph Services

In a Ceph cluster, services are organized and managed at different levels:

  • cluster
  • node
  • daemon

List Ceph Services

You can get a list of Ceph services using ceph orch command.

To get a general overview of the ceph services, run the command;

sudo ceph orch ls
NAME           PORTS        RUNNING  REFRESHED  AGE  PLACEMENT    
alertmanager   ?:9093,9094      1/1  6m ago     2d   count:1      
ceph-exporter                   4/4  10m ago    2d   *            
crash                           4/4  10m ago    2d   *            
grafana        ?:3000           1/1  6m ago     2d   count:1      
mgr                             2/2  6m ago     2d   count:2      
mon                             4/5  10m ago    2d   count:5      
node-exporter  ?:9100           4/4  10m ago    2d   *            
osd                               3  10m ago    -      
prometheus     ?:9095           1/1  6m ago     2d   count:1 

To get a detailed listing of the Ceph services, run the command;

sudo ceph orch ps
NAME                      HOST        PORTS             STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
alertmanager.ceph-admin   ceph-admin  *:9093,9094       running (2d)      9m ago   2d    15.1M        -  0.25.0   c8568f914cd2  8c12c81552e5  
ceph-exporter.ceph-admin  ceph-admin                    running (2d)      9m ago   2d    22.5M        -  18.2.0   10237bca3285  1ca71e41cd22  
ceph-exporter.ceph-mon    ceph-mon                      running (8h)      9m ago   2d    17.5M        -  18.2.0   10237bca3285  939f1001e611  
ceph-exporter.ceph-osd1   ceph-osd1                     running (2d)      3m ago   2d    18.0M        -  18.2.0   10237bca3285  a8bb422e2a79  
ceph-exporter.ceph-osd2   ceph-osd2                     running (2d)      2m ago   2d    17.7M        -  18.2.0   10237bca3285  deaaa5c586d1  
crash.ceph-admin          ceph-admin                    running (2d)      9m ago   2d    7432k        -  18.2.0   10237bca3285  fac0c03abfa2  
crash.ceph-mon            ceph-mon                      running (8h)      9m ago   2d    7084k        -  18.2.0   10237bca3285  c6ad83687a9d  
crash.ceph-osd1           ceph-osd1                     running (2d)      3m ago   2d    7119k        -  18.2.0   10237bca3285  f2c57cbaaf3d  
crash.ceph-osd2           ceph-osd2                     running (2d)      2m ago   2d    7107k        -  18.2.0   10237bca3285  bf23fe62a3a6  
grafana.ceph-admin        ceph-admin  *:3000            running (32h)     9m ago   2d    88.1M        -  9.4.7    2c41d148cca3  d3f2f3edc8e8  
mgr.ceph-admin.ykkdly     ceph-admin  *:9283,8765,8443  running (2d)      9m ago   2d     646M        -  18.2.0   10237bca3285  9b395d873cf5  
mgr.ceph-mon.grwzmv       ceph-mon    *:8443,9283,8765  running (8h)      9m ago   2d     431M        -  18.2.0   10237bca3285  a40257127c4f  
mon.ceph-admin            ceph-admin                    running (2d)      9m ago   2d     497M    2048M  18.2.0   10237bca3285  39a5c79ebe49  
mon.ceph-mon              ceph-mon                      running (8h)      9m ago   2d     226M    2048M  18.2.0   10237bca3285  69af76467894  
mon.ceph-osd1             ceph-osd1                     running (2d)      3m ago   2d     442M    2048M  18.2.0   10237bca3285  48e379303841  
mon.ceph-osd2             ceph-osd2                     running (2d)      2m ago   2d     446M    2048M  18.2.0   10237bca3285  1a5ac19d09c2  
node-exporter.ceph-admin  ceph-admin  *:9100            running (2d)      9m ago   2d    9940k        -  1.5.0    0da6a335fe13  f8a22cdbc222  
node-exporter.ceph-mon    ceph-mon    *:9100            running (8h)      9m ago   2d    8991k        -  1.5.0    0da6a335fe13  bc7bd68616a8  
node-exporter.ceph-osd1   ceph-osd1   *:9100            running (2d)      3m ago   2d    9564k        -  1.5.0    0da6a335fe13  0e26f9a5cd1e  
node-exporter.ceph-osd2   ceph-osd2   *:9100            running (2d)      2m ago   2d    9075k        -  1.5.0    0da6a335fe13  b557f82a9e1d  
osd.0                     ceph-mon                      running (8h)      9m ago   2d    77.4M    4096M  18.2.0   10237bca3285  fbbb2be86316  
osd.1                     ceph-osd1                     running (2d)      3m ago   2d    99.0M    2356M  18.2.0   10237bca3285  4c930eb2c71e  
osd.2                     ceph-osd2                     running (2d)      2m ago   2d    96.3M    2356M  18.2.0   10237bca3285  94551a6b5b94  
prometheus.ceph-admin     ceph-admin  *:9095            running (2d)      9m ago   2d     256M        -  2.43.0   a07b618ecd1d  3b63ed00c55e

You can list services running on specific node only by specifying the node name.

ceph orch ps HOST

For example;

ceph orch ps ceph-osd1
NAME                     HOST       PORTS   STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
ceph-exporter.ceph-osd1  ceph-osd1          running (2d)     6m ago   2d    18.0M        -  18.2.0   10237bca3285  a8bb422e2a79  
crash.ceph-osd1          ceph-osd1          running (2d)     6m ago   2d    7119k        -  18.2.0   10237bca3285  f2c57cbaaf3d  
mon.ceph-osd1            ceph-osd1          running (2d)     6m ago   2d     442M    2048M  18.2.0   10237bca3285  48e379303841  
node-exporter.ceph-osd1  ceph-osd1  *:9100  running (2d)     6m ago   2d    9564k        -  1.5.0    0da6a335fe13  0e26f9a5cd1e  
osd.1                    ceph-osd1          running (2d)     6m ago   2d    99.0M    2356M  18.2.0   10237bca3285  4c930eb2c71e

To check specific service on a specific node;

ceph orch ps ceph-osd1 --service_name mon
NAME           HOST       PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
mon.ceph-osd1  ceph-osd1         running (3m)     3m ago   2d    19.6M    2048M  18.2.0   10237bca3285  b9bf54ad48d9

Manage Ceph Services at a Cluster Level

The cluster level represents the overall orchestration and coordination of all nodes and services to create a unified storage infrastructure. At the cluster level, Ceph services collaborate to ensure data redundancy, fault tolerance, and efficient storage management. Key activities at the cluster level include maintaining cluster maps, distributing data across OSDs, handling failover scenarios, and managing the overall health of the Ceph storage system.

To start, stop or restart ceph services at a cluster level, you use ceph orch command.

The command syntax to start, stop, or restart cluster service is;

ceph orch <start|stop|restart> <service_name>

For example, to stop, start, restart all OSDs in the cluster;

ceph orch stop <service_name>
ceph orch start <service_name>
ceph orch restart <service_name>

Note that you cannot stop the mgr or mon services for the entire cluster. Stopping these services, cluster-wide, would make the cluster inaccessible You can issue the restart command to schedule a node by node restart.

Manage Ceph Services at a Node Level

At the node level, Ceph services are associated with individual servers or nodes in the cluster. Each node typically runs multiple daemons, which collaborate to provide the necessary storage services. Nodes can host OSDs, MONs, MGRs, RGWs, or MDSs, depending on the specific role assigned to them in the Ceph cluster.

You can use systemctl command to start, stop or restart ceph services at a node level.

List Ceph SystemD services running on a specific node;

sudo systemctl list-units "*ceph*"

Sample output on my ceph-osd1 node;

  UNIT                                                                      LOAD   ACTIVE SUB     DESCRIPTION                                                          
  ceph-70d227de-83e3-11ee-9dda-ff8b7941e415@ceph-exporter.ceph-osd1.service loaded active running Ceph ceph-exporter.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415
  ceph-70d227de-83e3-11ee-9dda-ff8b7941e415@crash.ceph-osd1.service         loaded active running Ceph crash.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415
  [email protected]           loaded active running Ceph mon.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415
  ceph-70d227de-83e3-11ee-9dda-ff8b7941e415@node-exporter.ceph-osd1.service loaded active running Ceph node-exporter.ceph-osd1 for 70d227de-83e3-11ee-9dda-ff8b7941e415
  [email protected]                   loaded active running Ceph osd.1 for 70d227de-83e3-11ee-9dda-ff8b7941e415                  
  system-ceph\x2d70d227de\x2d83e3\x2d11ee\x2d9dda\x2dff8b7941e415.slice     loaded active active  Slice /system/ceph-70d227de-83e3-11ee-9dda-ff8b7941e415              
  ceph-70d227de-83e3-11ee-9dda-ff8b7941e415.target                          loaded active active  Ceph cluster 70d227de-83e3-11ee-9dda-ff8b7941e415
  ceph.target                                                               loaded active active  All Ceph clusters and services

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
8 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

From the output above, the UNIT field shows the service names in the format;

ceph-FSID@SERVICE_TYPE.ID.service

Where;

  • FSID is the Ceph File System Identifier, a unique identifier for the cluster.
  • SERVICE_TYPE.ID is the ceph systemd service name which corresponds to the NAME field of the ceph orch ps command.

So, you can control and manage each single ceph service on each node using systemctl command;

systemctl stop ceph-FSID@SERVICE_TYPE.ID
systemctl start ceph-FSID@SERVICE_TYPE.ID
systemctl restart ceph-FSID@SERVICE_TYPE.ID
systemctl status ceph-FSID@SERVICE_TYPE.ID

If you want to manage all the Ceph clusters services in a node, then use the ceph.target service unit.

systemctl stop ceph.target
systemctl start ceph.target
systemctl restart ceph.target
systemctl status ceph.target

If you are running multiple clusters, then services associated with the cluster will have their respective cluster IDs. So, if you want to manage all services for a specific cluster, then;

systemctl <start|stop|restart|status> ceph-FSID.target

Daemon Level

Ceph employs a decentralized architecture where various components, called daemons, work together to provide different storage services. These daemons are responsible for specific tasks within the Ceph cluster. Here are some key Ceph daemons:

  1. OSD (Object Storage Daemon): Manages the storage devices and is responsible for storing and retrieving data as objects.
  2. MON (Monitor Daemon): Maintains maps of the cluster state, including OSD maps and monitor maps. Monitors communicate with each other to reach a consensus on the state of the cluster.
  3. MGR (Manager Daemon): Provides a management interface for the Ceph cluster, offering RESTful APIs and a web-based dashboard for monitoring and managing the cluster.
  4. RGW (RADOS Gateway Daemon): Facilitates access to Ceph object storage through S3 and Swift-compatible APIs.
  5. MDS (Metadata Server Daemon): Manages metadata for Ceph File System (CephFS), facilitating file access and directory operations.

The ceph orch daemon command in Ceph orchestrator is used to start, stop or restart ceph services at a daemon level. It allows you to interact with and perform various operations on Ceph daemon services deployed in the cluster. The ceph orch daemon command provides subcommands for tasks such as starting, stopping, restarting, reconfig daemons, e.t.c.

Thus, the command syntax is;

ceph orch daemon <start|stop|restart> SERVICE_NAME

You can get the SERVICE_NAME from the ceph orch ps command.

ceph orch daemon restart grafana.ceph-admin

Check more on;

ceph orch daemon -h

How to Gracefully Stop and Start Whole Ceph Cluster for Maintenance

HEADS UP! POTENTIAL DATA LOSS/CORRUPTION! PROCEED AT YOUR OWN RISK!

Stopping the entire Ceph cluster involves stopping all Ceph daemon services across the MONs (Monitors), OSDs (Object Storage Daemons), MGRs (Managers), and other components.

Be cautious when stopping a Ceph cluster, especially in production environments, to avoid potential data loss or corruption. Ensure that you have proper backups, and the cluster is not serving critical workloads.

The specific steps might depend on how Ceph was deployed in your environment (e.g. using cephadm, manual deployment, or other methods). You can check our Ceph cluster deployment guides.

If you are sure you want proceed, then proceed as follows.

Verify healthy cluster state

Before initiating the shutdown process, ensure that the Ceph cluster is in a healthy state. Check for any ongoing maintenance tasks, data replication issues, or OSD failures.

ceph -s
  cluster:
    id:     70d227de-83e3-11ee-9dda-ff8b7941e415
    health: HEALTH_OK
 
  services:
    mon: 4 daemons, quorum ceph-admin,ceph-mon,ceph-osd1,ceph-osd2 (age 102m)
    mgr: ceph-admin.ykkdly(active, since 2d), standbys: ceph-mon.grwzmv
    osd: 3 osds: 3 up (since 38m), 3 in (since 2d)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 45 objects, 14 MiB
    usage:   191 MiB used, 300 GiB / 300 GiB avail
    pgs:     33 active+clean

Or simply;

ceph health
HEALTH_OK

Backup your Data

Ensures that you have a backup of your data in case of any unexpected issues during the shutdown process.

Stop data writes on Ceph Cluster

Stop any applications or processes that are writing data to the Ceph cluster. This prevents new data from being written while the cluster is shutting down, reducing the risk of data loss or corruption.

If you have any clients using the cluster, stop or power them off before you can proceed.

Prepare the Object Storage Devices (OSDs) for Shutdown

Modify configuration parameters of OSDs in the Ceph cluster in preparation for cluster shutdown.

Prevent OSDs from being treated as out of the cluster (useful during maintenance);

ceph osd set noout

This means that OSDs will not be marked as “out” even if they are not responding, during shutdown.

Disables backfill operations in the cluster;

ceph osd set nobackfill

This command sets the nobackfill flag for an OSD, which prevents the OSD from replicating data from other OSDs.

Disable Cluster OSD recovery operations;

ceph osd set norecover

This command sets the norecover flag for an OSD, which prevents the OSD from recovering from failures.

Disable Ceph Cluster rebalance operations;

ceph osd set norebalance

This command sets the norebalance flag for an OSD, which prevents the OSD from participating in rebalancing operations.

Prevents OSDs from being marked as “down” to avoid unnecessary cluster adjustments.

ceph osd set nodown

Step Ceph Cluster read and write operations.

ceph osd set pause

You can verify that all these have been effected on Ceph OSDs by checking Ceph cluster status;

ceph -s
  cluster:
    id:     70d227de-83e3-11ee-9dda-ff8b7941e415
    health: HEALTH_WARN
            pauserd,pausewr,nodown,noout,nobackfill,norebalance flag(s) set
 
  services:
    mon: 4 daemons, quorum ceph-admin,ceph-mon,ceph-osd1,ceph-osd2 (age 2h)
    mgr: ceph-admin.ykkdly(active, since 2d), standbys: ceph-mon.grwzmv
    osd: 3 osds: 3 up (since 109m), 3 in (since 2d)
         flags pauserd,pausewr,nodown,noout,nobackfill,norebalance
 
  data:
    pools:   2 pools, 33 pgs
    objects: 45 objects, 14 MiB
    usage:   191 MiB used, 300 GiB / 300 GiB avail
    pgs:     33 active+clean

Shut down the Ceph cluster Nodes.

Login to each Ceph cluster nodes and shut them down in the following order;

(Ensure the IP addresses are assigned permanently to the nodes)

  1. Ceph Service nodes: If you are running seperate nodes for services such as RGW nodes or other special services, shut them down first: systemctl poweroff
  2. Ceph OSD nodes: Login to each OSD node and gracefully shut them down: systemctl poweroff
  3. Ceph MON nodes: Login to each MON node and gracefully shut them down: systemctl poweroff
  4. Ceph MGR Nodes: Login to each MGR node and gracefully shut them down: systemctl poweroff

Bring Backup Ceph Cluster

After the maintenance, it is now time to bring up the cluster.

To begin with, power up the Ceph cluster nodes reverse order with which you shut them down above.

  1. Power on Ceph MGR Nodes.
  2. Power on Ceph MON nodes.
  3. Power on OSD nodes.
  4. Power on Service nodes

Once the nodes are up, ensure that timestamp is the same across all the nodes (NTP can be used).

date

Unset all the flags set above on the OSD nodes, in the reverse order;

ceph osd unset pause
ceph osd unset nodown
ceph osd unset norebalance
ceph osd unset norecover
ceph osd unset nobackfill
ceph osd unset noout

Once all is done, confirm the health of your cluster.

ceph -s
  cluster:
    id:     70d227de-83e3-11ee-9dda-ff8b7941e415
    health: HEALTH_OK
 
  services:
    mon: 4 daemons, quorum ceph-admin,ceph-mon,ceph-osd1,ceph-osd2 (age 60s)
    mgr: ceph-admin.ykkdly(active, since 2d), standbys: ceph-mon.grwzmv
    osd: 3 osds: 3 up (since 56s), 3 in (since 2d)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 45 objects, 14 MiB
    usage:   125 MiB used, 300 GiB / 300 GiB avail
    pgs:     33 active+clean

Verify and validate everything to ensure your cluster is up and running as expected.

That concludes our guide on how to start stop and restart Ceph cluster services.

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
Linux Certified Engineer, with a passion for open-source technology and a strong understanding of Linux systems. With experience in system administration, troubleshooting, and automation, I am skilled in maintaining and optimizing Linux infrastructure.

Leave a Comment