Setting up a Ceph cluster with Rook on a Raspberry Pi k3s cluster

Introduction

Due to the low IOPS of the micro SD cards, I decided to switch to external NVMe SSDs. In addition to the boot partition, each SSD has a large unused partition that I am utilizing to provision Kubernetes Persistent Volumes through a Rook Ceph Cluster. I chose this approach as a learning opportunity to explore Kubernetes-native storage. Since my cluster isn’t running any critical applications, I don’t need data replication, so Local Persistent Volumes would have been sufficient. However, in production environments, the following requirements often arise:

High availability and fault tolerance
Dynamic scaling
Unified storage (object, block, and file)
Self-healing

Local Persistent Volumes cannot meet these needs, making Kubernetes-native storage solutions the preferred choice for production workloads. In this post, I’ll share the challenges I encountered while setting up the latest version of Rook Ceph on Raspberry Pis and how I overcame them. I hope you find it both informative and helpful.

Setup

Before installing Rook and deploying a Ceph cluster, these prerequisites are needed. The most important ones are:

unformatted partition/logical volume. In my setup I used unformatted /dev/sda3 partition from each of my SSDs.
rbd kernel module. This is used by Ceph to manipulate RADOS block device images. At the time of this writing, the kernel version (v6.6.31) of the Raspberry Pi OS Lite 64 bit did not have the rbd module installed. I needed to use the rpi-update utility to update to the latest kernel build which had the rbd module enabled:

$ sudo rpi-update rpi-6.6.y
$ modprobe rbd
$ lsmod | grep rbd
  rbd                    81920  4
  libceph               352256  1 rbd

When I ran the commands above, I already had a Ceph cluster running, therefore you could see that the libceph module is using rbd for communication between the Linux kernel and the distributed Ceph cluster.

Another issue that you are likely to encounter if you are using external SSDs attached via USB is Ceph ignoring them. A workaround for that is to create a udev rule which changes the ID_BUS from usb to scsi. You need to create a new udev rule file under /etc/udev/ with the below contents:

ACTION=="add", ENV{ID_TYPE}=="disk", ENV{ID_BUS}=="usb", ENV{ID_BUS}="scsi"
ACTION=="change", ENV{ID_TYPE}=="disk", ENV{ID_BUS}=="usb", ENV{ID_BUS}="scsi"
ACTION=="online", ENV{ID_TYPE}=="disk", ENV{ID_BUS}=="usb", ENV{ID_BUS}="scsi"

Running the command below will show the new value for ID_BUS:

$ udevadm info --query=property /dev/sda3 | grep -i id_bus
  ID_BUS=scsi

Installation

I preferred to deploy Rook with the manifests from this location instead of using Helm. I opted for this method because Rook is already complex, and adding another layer of abstraction with Helm would make it harder to understand the individual components and how they work together. I used the below Flux Kustomization to deploy the manifests:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - crds.yaml
  - common.yaml
  - operator.yaml
  - cluster.yaml
  - toolbox.yaml

I faced one issue with the OSD pods crashing for the latest Ceph image. After further troubleshooting I came across this Github issue which highlighted the same problem on ARM64 CPUs. Changing the image version to image: quay.io/ceph/ceph:v18.2.2 in the cluster.yaml manifest solved the issue.

The main changes I did to the default cluster.yaml manifest are highlighted below:

spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.2
  dashboard:
    enabled: true
  storage:
      nodes:
      - name: "master"
        devices: 
        - name: "sda3"
      - name: "worker1"
        devices: 
        - name: "sda3"
      - name: "worker2"
        devices: 
        - name: "sda3"

Once the Flux Kustomization finished reconciling, the Rook operator and the cluster were successfully deployed. The entire deployment process takes about 5 minutes to complete. I also included the Ceph toolbox in the Kustomization, which is useful for troubleshooting. As shown below, an OSD has been created on each of my SSDs:

$ k exec -it rook-ceph-tools-767b99dbdd-t6nt9 -- bin/sh 
sh-5.1$ ceph osd status
ID  HOST      USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  worker1   592M   364G      0        0       0        0   exists,up  
 1  worker2   762M   364G      1     52.0k      0        0   exists,up  
 2  master   1018M   364G      0     5120       0        0   exists,up

Now that the Cluster is deployed, I needed to create a StorageClass. Below you can find the manifest for deploying a CephBlockPool and a StorageClass which uses the rook-ceph.rbd.csi.ceph.com provisioner:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph 
spec:
  failureDomain: osd
  replicated:
    size: 1
    requireSafeReplicaSize: false
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph 
  pool: replicapool
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph 
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true
reclaimPolicy: Delete

With the StorageClass now in place, we can provision volumes dynamically. Below you can find a manifest I used for my Jellyfin config block storage:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jellyfin-config
  namespace: streaming-server
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: "rook-ceph-block"

When I applied it, I could see that the PV and PVC were created succesfully:

$ k get pvc jellyfin-config 
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
jellyfin-config   Bound    pvc-63921c06-bbd1-426d-9c41-ee98abd32e63   5Gi        RWO            rook-ceph-block   <unset>                 70s

Analyzing the PV shows the filesystem type together with the CSI Driver and the pool used to provision it:

$ k describe pv pvc-63921c06-bbd1-426d-9c41-ee98abd32e63 
  Source:
      Type:              CSI (a Container Storage Interface (CSI) volume source)
      Driver:            rook-ceph.rbd.csi.ceph.com
      FSType:            ext4
      VolumeHandle:      0001-0009-rook-ceph-0000000000000002-73bf4b79-3cc5-48b8-90a9-e12d705ffdce
      ReadOnly:          false
      VolumeAttributes:      clusterID=rook-ceph
                             imageFeatures=layering
                             imageFormat=2
                             imageName=csi-vol-73bf4b79-3cc5-48b8-90a9-e12d705ffdce
                             journalPool=replicapool
                             pool=replicapool
                             storage.kubernetes.io/csiProvisionerIdentity=1727085615996-2398-rook-ceph.rbd.csi.ceph.com

I exposed the dashboard via a NodePort service. Below you can see the OSDs and the hosts in the cluster:

Conclusion

Rook has excelled in simplifying the management of a Ceph cluster. What would typically demand a deep dive into Ceph’s complexities and a steep learning curve is now achievable with minimal effort, even for someone like me with less expertise in storage systems. Thanks to Rook’s automation, I was able to get a fully functional distributed storage solution across all my SSDs, which is exactly what I set out to do. That said, while this experience was successful, I realize that mastering storage management would require a much deeper investment of time and learning.

Alex Stan

Setting up a Ceph cluster with Rook on a Raspberry Pi k3s cluster

Introduction

Setup

Installation

Conclusion

Table of Contents