Setting up a Ceph cluster with Rook on a Raspberry Pi k3s cluster
Introduction
Due to the low IOPS of the micro SD cards, I decided to switch to external NVMe SSDs. In addition to the boot partition, each SSD has a large unused partition that I am utilizing to provision Kubernetes Persistent Volumes through a Rook Ceph Cluster. I chose this approach as a learning opportunity to explore Kubernetes-native storage. Since my cluster isn’t running any critical applications, I don’t need data replication, so Local Persistent Volumes would have been sufficient. However, in production environments, the following requirements often arise:
- High availability and fault tolerance
- Dynamic scaling
- Unified storage (object, block, and file)
- Self-healing
Local Persistent Volumes cannot meet these needs, making Kubernetes-native storage solutions the preferred choice for production workloads. In this post, I’ll share the challenges I encountered while setting up the latest version of Rook Ceph on Raspberry Pis and how I overcame them. I hope you find it both informative and helpful.
Setup
Before installing Rook and deploying a Ceph cluster, these prerequisites are needed. The most important ones are:
- unformatted partition/logical volume. In my setup I used unformatted /dev/sda3 partition from each of my SSDs.
- rbd kernel module. This is used by Ceph to manipulate RADOS block device images. At the time of this writing, the kernel version (v6.6.31) of the Raspberry Pi OS Lite 64 bit did not have the rbd module installed. I needed to use the rpi-update utility to update to the latest kernel build which had the rbd module enabled:
$ sudo rpi-update rpi-6.6.y
$ modprobe rbd
$ lsmod | grep rbd
rbd 81920 4
libceph 352256 1 rbd
When I ran the commands above, I already had a Ceph cluster running, therefore you could see that the libceph module is using rbd for communication between the Linux kernel and the distributed Ceph cluster.
Another issue that you are likely to encounter if you are using external SSDs attached via USB is Ceph ignoring them. A workaround for that is to create a udev rule which changes the ID_BUS from usb to scsi. You need to create a new udev rule file under /etc/udev/ with the below contents:
ACTION=="add", ENV{ID_TYPE}=="disk", ENV{ID_BUS}=="usb", ENV{ID_BUS}="scsi"
ACTION=="change", ENV{ID_TYPE}=="disk", ENV{ID_BUS}=="usb", ENV{ID_BUS}="scsi"
ACTION=="online", ENV{ID_TYPE}=="disk", ENV{ID_BUS}=="usb", ENV{ID_BUS}="scsi"
Running the command below will show the new value for ID_BUS:
$ udevadm info --query=property /dev/sda3 | grep -i id_bus
ID_BUS=scsi
Installation
I preferred to deploy Rook with the manifests from this location instead of using Helm. I opted for this method because Rook is already complex, and adding another layer of abstraction with Helm would make it harder to understand the individual components and how they work together. I used the below Flux Kustomization to deploy the manifests:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- crds.yaml
- common.yaml
- operator.yaml
- cluster.yaml
- toolbox.yaml
I faced one issue with the OSD pods crashing for the latest Ceph image. After further troubleshooting I came across this Github issue which highlighted the same problem on ARM64 CPUs. Changing the image version to image: quay.io/ceph/ceph:v18.2.2 in the cluster.yaml manifest solved the issue.
The main changes I did to the default cluster.yaml manifest are highlighted below:
spec:
cephVersion:
image: quay.io/ceph/ceph:v18.2.2
dashboard:
enabled: true
storage:
nodes:
- name: "master"
devices:
- name: "sda3"
- name: "worker1"
devices:
- name: "sda3"
- name: "worker2"
devices:
- name: "sda3"
Once the Flux Kustomization finished reconciling, the Rook operator and the cluster were successfully deployed. The entire deployment process takes about 5 minutes to complete. I also included the Ceph toolbox in the Kustomization, which is useful for troubleshooting. As shown below, an OSD has been created on each of my SSDs:
$ k exec -it rook-ceph-tools-767b99dbdd-t6nt9 -- bin/sh
sh-5.1$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 worker1 592M 364G 0 0 0 0 exists,up
1 worker2 762M 364G 1 52.0k 0 0 exists,up
2 master 1018M 364G 0 5120 0 0 exists,up
Now that the Cluster is deployed, I needed to create a StorageClass. Below you can find the manifest for deploying a CephBlockPool and a StorageClass which uses the rook-ceph.rbd.csi.ceph.com provisioner:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: osd
replicated:
size: 1
requireSafeReplicaSize: false
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: replicapool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true
reclaimPolicy: Delete
With the StorageClass now in place, we can provision volumes dynamically. Below you can find a manifest I used for my Jellyfin config block storage:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jellyfin-config
namespace: streaming-server
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: "rook-ceph-block"
When I applied it, I could see that the PV and PVC were created succesfully:
$ k get pvc jellyfin-config
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
jellyfin-config Bound pvc-63921c06-bbd1-426d-9c41-ee98abd32e63 5Gi RWO rook-ceph-block <unset> 70s
Analyzing the PV shows the filesystem type together with the CSI Driver and the pool used to provision it:
$ k describe pv pvc-63921c06-bbd1-426d-9c41-ee98abd32e63
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: rook-ceph.rbd.csi.ceph.com
FSType: ext4
VolumeHandle: 0001-0009-rook-ceph-0000000000000002-73bf4b79-3cc5-48b8-90a9-e12d705ffdce
ReadOnly: false
VolumeAttributes: clusterID=rook-ceph
imageFeatures=layering
imageFormat=2
imageName=csi-vol-73bf4b79-3cc5-48b8-90a9-e12d705ffdce
journalPool=replicapool
pool=replicapool
storage.kubernetes.io/csiProvisionerIdentity=1727085615996-2398-rook-ceph.rbd.csi.ceph.com
I exposed the dashboard via a NodePort service. Below you can see the OSDs and the hosts in the cluster:


Conclusion
Rook has excelled in simplifying the management of a Ceph cluster. What would typically demand a deep dive into Ceph’s complexities and a steep learning curve is now achievable with minimal effort, even for someone like me with less expertise in storage systems. Thanks to Rook’s automation, I was able to get a fully functional distributed storage solution across all my SSDs, which is exactly what I set out to do. That said, while this experience was successful, I realize that mastering storage management would require a much deeper investment of time and learning.
comments powered by Disqus