# 📚 Complete Ceph CSI Deployment & Troubleshooting Guide This guide details the preparation and configuration necessary for successful dynamic provisioning of Ceph RBD (RWO) and CephFS (RWX) volumes in a Kubernetes cluster running on **MicroK8s**, backed by a Proxmox VE (PVE) Ceph cluster. --- ## 1. ⚙️ Ceph Cluster Preparation (Proxmox VE) These steps ensure the Ceph backend has the necessary pools and structure. * **Create Dedicated Pools:** Create OSD Pools for data, e.g., **`k8s_rbd`** (for RWO), and **`k8s_data`** and **`k8s_metadata`** (for CephFS). * **Create CephFS Metadata Servers (MDS):** Deploy **at least two** Metadata Server (MDS) instances. * **Create CephFS Filesystem:** Create the Ceph Filesystem (e.g., named **`k8s`**), linking the metadata and data pools. * **Create Subvolume Group (Mandatory Fix):** Create the dedicated Subvolume Group **`csi`** inside your CephFS. This is required by the CSI driver's default configuration and fixes the "No such file or directory" error during provisioning. * **CLI Command:** `ceph fs subvolumegroup create k8s csi` --- ## 2. 🔑 Ceph User and Authorization (The Permission Fix) This addresses the persistent "Permission denied" errors during provisioning. * **Create and Configure Ceph User:** Create the user (`client.kubernetes`) and set permissions for all services. The **wildcard MGR cap** (`mgr "allow *"`) is critical for volume creation. * **Final Correct Caps Command:** ```bash sudo ceph auth caps client.kubernetes \ mon 'allow r' \ mgr "allow *" \ mds 'allow rw' \ osd 'allow class-read object_prefix rbd_children, allow pool k8s_rbd rwx, allow pool k8s_metadata rwx, allow pool k8s_data rwx' ``` * **Export Key to Kubernetes Secrets:** Create and place two Secrets with the user key in the correct CSI provisioner namespaces: * **RBD Secret:** `csi-rbd-secret` (in the RBD Provisioner namespace). * **CephFS Secret:** `csi-cephfs-secret` (in the CephFS Provisioner namespace). **The secrets should contain the keys: userID & userKey. userID should omit the 'client.' from the ceph output.** --- ## 3. 🌐 Network Configuration and Bi-Directional Routing These steps ensure stable, bidirectional communication for volume staging and mounting. ### A. PVE Host Firewall Configuration The PVE firewall must explicitly **allow inbound traffic** from the entire Kubernetes Pod Network to the Ceph service ports. | Protocol | Port(s) | Source | Purpose | | :--- | :--- | :--- | :--- | | **TCP** | **6789** | K8s Pod Network CIDR (e.g., `10.1.0.0/16`) | Monitor connection. | | **TCP** | **6800-7300** | K8s Pod Network CIDR | OSD/MDS/MGR data transfer. | Alternatively, you may find a 'ceph' macro you can use on the PVE Firewall, if so use the Macro instead of additional rules. ### B. PVE Host Static Routing (Ceph > K8s) Add **persistent static routes** on **all PVE Ceph hosts** to allow Ceph to send responses back to the Pod Network. * **Action:** Edit `/etc/network/interfaces` on each PVE host: ```ini # Example: post-up ip route add via dev # e.g., post-up ip route add 10.1.0.0/16 via 172.35.100.40 dev vmbr0 ``` ### C. K8s Node IP Forwarding (Gateway Function) Enable IP forwarding on **all Kubernetes nodes** so they can route incoming Ceph traffic to the correct Pods. * **Action:** Run on all K8s nodes: ```bash sudo sysctl net.ipv4.ip_forward=1 sudo sh -c 'echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/99-sysctl.conf' ``` ### D. K8s Static Routing (K8s > Ceph) - Conditional/Advanced ⚠️ This routing is **only required** if the **Ceph Public Network** (the network Ceph Monitors/OSDs listen on) is **not reachable** by your Kubernetes Node's **default gateway**. * **Action:** This is implemented via a **Netplan configuration** on the Kubernetes nodes, using multiple routes with different metrics to provide load balancing and automatic failover. * **Example Netplan Configuration (`/etc/netplan/99-ceph-routes.yaml`):** ```yaml network: version: 2 renderer: networkd ethernets: eth0: # Replace with your primary K8s network interface routes: # Route 1: Directs traffic destined for the first Ceph Monitor IP (10.11.12.1) # through three different PVE hosts (172.35.100.x) as gateways. # The lowest metric (10) is preferred. - to: 10.11.12.1/32 via: 172.35.100.10 metric: 10 - to: 10.11.12.1/32 via: 172.35.100.20 metric: 100 - to: 10.11.12.1/32 via: 172.35.100.30 metric: 100 # Route 2: Directs traffic destined for the second Ceph Monitor IP (10.11.12.2) # with a similar failover strategy. - to: 10.11.12.2/32 via: 172.35.100.20 metric: 10 - to: 10.11.12.2/32 via: 172.35.100.10 metric: 100 - to: 10.11.12.2/32 via: 172.35.100.30 metric: 100 ``` Use route priorities (Lower is Higher) to prefer the most direct path, while still offering alternative gateways into the Ceph network where needed. --- ## 4. 🧩 MicroK8s CSI Driver Configuration (The Path Fix) - Conditional/Advanced ⚠️ This adjustment is **only required** if **MicroK8s** is running your Kuberneted deployment. **Alternative changes** may be needed for other Kubetnetes distributions. This resolves the **`staging path does not exist on node`** error for the Node Plugin. * **Update `kubeletDir`:** When deploying the CSI driver (via Helm or YAML), the `kubeletDir` parameter must be set to the MicroK8s-specific path. ```yaml # Correct path for MicroK8s Kubelet root directory kubeletDir: /var/snap/microk8s/common/var/lib/kubelet ```