Search
left arrowBack
Pavel Rykov

Pavel Rykov

July 31, 2023 ・ Kubernetes

Installing Ceph as a Distributed Storage Solution and Connecting it to Kubernetes

In this guide, we'll walk you through the process of setting up a Ceph cluster and integrating it with a Kubernetes cluster. Ceph is a scalable and resilient distributed storage system that can handle large amounts of data. It's a perfect fit for Kubernetes, which can leverage Ceph's capabilities for storage-intensive applications.

Creating Virtual Machines with Vagrant and VirtualBox

To familiarize yourself with Ceph, you may opt to create a local cluster of virtual machines (VMs) using Vagrant and VirtualBox. This approach provides a controlled environment for learning and experimentation, without the need for dedicated hardware.

First, generate the SSH key pair on your host machine:

mkdir .vagrant
ssh-keygen -t rsa -N '' -f .vagrant/vagrant_rsa

This will generate the private key ./keys/vagrant_rsa and the public key ./keys/vagrant_rsa.pub.

Next create and Vagrantfile****, here is a sample Vagrantfile to get you started:

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/bionic64"

  nodes = {
    "ceph1" => "192.168.60.11",
    "ceph2" => "192.168.60.12"
  }

  config.vm.provider "virtualbox" do |v|
    v.memory = 2048
    v.cpus = 2
  end

  nodes.each do |nodeName, ip|
    config.vm.define nodeName do |node|
      node.vm.hostname = nodeName
      node.vm.network "private_network", ip: ip

      # Copy public and private keys to the VM
      node.vm.provision "file", source: ".vagrant/vagrant_rsa.pub", destination: "~/.ssh/id_rsa.pub.extra"
      node.vm.provision "file", source: ".vagrant/vagrant_rsa", destination: "~/.ssh/id_rsa"

      node.vm.provision "shell", inline: <<-SHELL
        chmod 700 /home/vagrant/.ssh
        chown vagrant:vagrant /home/vagrant/.ssh
        cat /home/vagrant/.ssh/id_rsa.pub.extra >> /home/vagrant/.ssh/authorized_keys
        chmod 600 /home/vagrant/.ssh/authorized_keys
        chmod 600 /home/vagrant/.ssh/id_rsa
        chown vagrant:vagrant /home/vagrant/.ssh/authorized_keys
        chown vagrant:vagrant /home/vagrant/.ssh/id_rsa
      SHELL

      # Update /etc/hosts file for each node
      nodes.each do |otherNodeName, otherNodeIp|
        node.vm.provision "shell", inline: <<-SHELL
          echo "#{otherNodeIp} #{otherNodeName}" | sudo tee -a /etc/hosts
        SHELL
      end

    end
  end
end

This Vagrantfile will create two VMs, which will be used to run Ceph. You can start the VMs by running vagrant up in the terminal.

Screenshot_20230516_130043.jpg

Remember, this setup is intended for testing and learning purposes. It creates a controlled environment that is ideal for understanding how to install and use Ceph. However, if you're planning to deploy Ceph in a production environment, you'll likely be working with physical servers instead of virtual machines. The steps to install and configure Ceph would remain largely the same, but you can skip the virtual machine setup part.

In a real-world scenario, you would also likely have more nodes in your Ceph cluster for redundancy and capacity. Moreover, you'd need to consider factors such as network configuration, security, and data recovery, which are beyond the scope of this introductory guide.

Installing Ceph

After you've created your VMs, the next step is to install Ceph on them. Here is a step-by-step process on how to do this:

SSH into each VM

You can SSH into each VM using the vagrant ssh command:

vagrant ssh ceph1
vagrant ssh ceph2

Screenshot_20230516_101917.jpg

Update the System

Before you install Ceph, it's a good practice to update your system's package repository:

sudo apt-get update
sudo apt-get upgrade -y

Install Ceph

Install the Ceph deployment tool, ceph-deploy, and other necessary packages on both nodes:

sudo apt-get install ceph-deploy ceph ceph-common

Create a Ceph Cluster

You'll create a new Ceph cluster on the first node. Before you do this, create a new directory for the cluster configuration:

mkdir ~/ceph-cluster
cd ~/ceph-cluster

Then, use ceph-deploy to create a new cluster:

ceph-deploy new ceph1

Screenshot_20230516_134857.jpg

This command will create a configuration file called ceph.conf in the current directory.

Modify the Ceph Configuration File

Open the ceph.conf file with a text editor like nano and add the following lines under the [global] section:

osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 128
osd pool default pgp num = 128

Screenshot_20230516_135025.jpg

These lines configure the storage settings for the Ceph cluster.

  • Install Ceph on All Nodes:

Now you'll install Ceph on all nodes. Run the following command:

ceph-deploy install ceph1 ceph2

Screenshot_20230516_135147.jpg

  • Create the Initial Monitor(s) and Gather Keys:

Run the following command to create the initial monitors and gather the keys:

ceph-deploy mon create-initial
  • Add the OSDs:

Finally, you'll add the OSDs (Object Storage Daemons). The OSDs handle the data store, data replication, and recovery. You'll need to specify the disk that Ceph can use to store data. If you're using a VM, you can use a directory as a disk. Here's how you can do it:

sudo mkdir /var/local/osd0
ceph-deploy osd create ceph1:/var/local/osd0

Repeat the process for the second node (need to execute on first node):

sudo mkdir /var/local/osd1
ceph-deploy osd create ceph2:/var/local/osd1

This completes the installation of the Ceph cluster on your VMs. Now, you can move on to integrating Ceph with Kubernetes.

Integrating Ceph with Kubernetes

Once your Ceph cluster is set up, you can integrate it with your Kubernetes cluster.

First, you'll need to install the ceph-csi-cephfs driver in your Kubernetes cluster. This driver allows Kubernetes to interact with Ceph. You can install it with the following command:

kubectl apply -f https://github.com/ceph/ceph-csi/raw/devel/deploy/cephfs/kubernetes/csi-cephfsplugin-provisioner.yaml

Next, you'll need to create a Secret in your Kubernetes cluster that contains your Ceph admin user's key:

kubectl create secret ceph-secret --type="kubernetes.io/rbd" --from-literal=key='AQD9o0Fd6hQRChAAt7fMaSZXduT3NWEqylNpmg==' --namespace=default

Replace 'AQD9o0Fd6hQRChAAt7fMaSZXduT3NWEqylNpmg==' with your Ceph admin user's key.

Finally, you'll need to create a StorageClass in Kubernetes that references your Ceph cluster:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cephfs
provisioner: ceph.com/ceph
parameters:
  monitors: 192.168.60.10:6789,192.168.60.11:6789  # replace with your Ceph monitors
  adminId: admin
  adminSecretName: ceph-secret
  adminSecretNamespace: "default"

Replace 192.168.60.10:6789,192.168.60.11:6789 with the addresses of your Ceph monitors.

Then, apply this StorageClass with the command:

kubectl apply -f ceph-storageclass.yaml

This will create a new StorageClass named 'cephfs' that your Kubernetes cluster can use to dynamically provision Persistent Volumes (PVs) that use your Ceph storage.

With this setup, you're now ready to create Persistent Volume Claims (PVCs) against the Ceph StorageClass, allowing you to utilize the power and resilience of Ceph storage for your Kubernetes applications. Here is a sample PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: cephfs

This PVC requests a PV of 1Gi from the 'cephfs' StorageClass, which would be dynamically provisioned by Ceph. Any Pods that use this PVC would have access to the provisioned storage.

This guide provided a brief overview of setting up a Ceph cluster and integrating it with Kubernetes. Keep in mind that in a real-world scenario, you would likely have multiple Ceph nodes for redundancy, and you might also want to consider additional security measures. Always refer to the official Ceph and Kubernetes documentation for the most detailed and up-to-date information.

Conclusion

Distributed storage systems like Ceph play a crucial role in managing and scaling storage needs in a distributed environment. It can be particularly useful when working with Kubernetes, which is designed to automate deploying, scaling, and operating application containers.

In this guide, we walked through the process of setting up a Ceph cluster on multiple virtual machines using Vagrant and VirtualBox. We then looked at how to integrate this setup with a Kubernetes cluster, enabling Kubernetes to leverage Ceph's robust and resilient storage capabilities. With Ceph now installed and connected to Kubernetes, you're well on your way to deploying applications that require distributed storage. But remember, this guide is a basic setup meant for learning and experimenting. A production setup might require more nodes, additional security measures, and more complex configurations.

Continually learning and improving your understanding of these technologies is key to effectively managing and maintaining modern distributed systems. Always refer to the official Ceph and Kubernetes documentation for the most detailed and up-to-date information.

Happy experimenting with your new distributed storage setup!

  • Kubernetes
  • Infrastructure