How to Assign storage for each user in JupyterHub on Kubernetes?

If you are running JupyterHub on kubernetes you might have came to the situation when you need to store data related with all user pods it has spawned. It is critical to store and persist files for data scientists.


JupyterHub can run with single-user/multi-user mode, when we run server it spawns a new pod for the user and processes all the needed task.

Pre-requisites

  • Install JupyterHub on Kubernetes
  • Check for configmap which stores configuration for Jupyterhub
  • You know basics of Kubespawner

Let's see how we can persist data for Jupyterhub in single/multi user mode.

Single User mode

For single-user mode Jupyterhub can store data in single persistent volume provided by Kubernetes, for that you need to configure Kubespawner accordingly.

Step 1: Set working directory for spawning pod, so whenever pod starts it will start with this directory

c.KubeSpawner.working_dir = "/home/jovyan"

Step 2: Assign pvc_name_template for kubespawner in configmap

c.KubeSpawner.pvc_name_template = 'claim-jupyterhub-single'

Step 3: Assign storage_class and storage_capacity for persistent volume to be created when pod spawned. PVC will be created according to storage_capacity allocated

c.KubeSpawner.storage_class = 'default'
c.KubeSpawner.storage_capacity = '2Gi'

Step 4: Add pvc ensure parameter as it is needed to create PVC before pod gets created

# This will ensure pvc is created before spawning the pod
c.KubeSpawner.storage_pvc_ensure = True

Step 5: Create volume from PVC created using volumes directive

c.KubeSpawner.volumes = [{
            "name": "claim-jupyterhub-single",
            "persistentVolumeClaim": {
              "claimName": "claim-jupyterhub-single"
             }
}]

Step 6: Mount created volume to spawning pod using volume_mounts directive, mount this storage on working directory specified in Step 1

c.KubeSpawner.volume_mounts = [{
		"name": "claim-jupyterhub-single",
		"subPath": "value",
		"mountPath": "/home/jovyan"
}]

Step 7: Restart Jupyterhub pod so that above parameters will take effect on next pod spawn

Now try spawning user pod and you will see claim-jupyterhub-single pvc created and mounted on the user specified.

Multi user mode

In multi-user you need to create persistent volume for each pod, assign it to the pod dynamically. Kubespawner provides different directives to achieve dynamic variable inclusion in the configuration.

Step 1: Configure storage_class and storage_capacity for PVC to be created. Here we need to keep in mind this will be separate PVC created for each user so plan storage_capacity accordingly

c.KubeSpawner.storage_class = 'default'
c.KubeSpawner.storage_capacity = '200Mi'

Step 2: Set working directory for the pod/executor

c.KubeSpawner.working_dir = "/home/jovyan"

Step 3: Set pvc_name_template using {username} and {servername} variables provided. This will ensure to create PVC with name 'claim-user1-jupyterhub'

c.KubeSpawner.pvc_name_template = 'claim-{username}--{servername}'

Step 4: Create dynamic volume and assign it to the pod

c.KubeSpawner.volumes = [{
            "name": "claim-{username}--{servername}",
            "persistentVolumeClaim": {
              "claimName": "claim-{username}--{servername}"
             }
}]

Step 5: Mount created dynamic volume on working directory specified in step 1

c.KubeSpawner.volume_mounts = [{
		"name": "claim-{username}--{servername}",
		"subPath": "value",
		"mountPath": "/home/jovyan"
}]

Step 6: Restart Jupyterhub pod so that above parameters will take effect on next pod spawn.

To make sure these PVCs are not deleted after pod is terminated you can set delete_pvc to false

c.KubeSpawner.delete_pvc = False

To check more on kubespawner parameters: https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html

Jupyterhub Pod Spawning flow:

  1. Server start is triggered for the user
  2. storage_pvc_ensure parameter will make sure PVC is created before if not will create pvc with storage_capacity and with defined name
  3. Now specified pod will start and all volumes specified in volume_mounts will get mounted to the pod
  4. If pod is terminated then delete_pvc will make sure to NOT delete persistent storage

Conclusion

Using kubespawner we can run Jupyterhub in single or multi user mode, create separate storage for each user using dynamic pvc naming and store data for users independently.