Storage
By default, the operator configures Pods to store data on emptyDir volumes which aren’t persisted when the Pods are redeployed. To maintain data across deployments and version upgrades, you can configure persistent storage for Prometheus, Alertmanager and ThanosRuler resources.
Kubernetes supports several kinds of storage volumes. The Prometheus Operator works with PersistentVolumeClaims, which support the underlying PersistentVolume to be provisioned when requested.
This document assumes a basic understanding of PersistentVolumes, PersistentVolumeClaims, and their provisioning.
Storage Provisioning on AWS
Automatic provisioning of storage requires a StorageClass.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ssd provisioner: kubernetes.io/aws-ebs parameters: type: gp2 Note: Make sure that AWS as a cloud provider is properly configured with your cluster, or storage provisioning will not work.
For best results, use volumes that have high I/O throughput. These examples use SSD EBS volumes. Read the Kubernetes Persistent Volumes documentation to adapt this StorageClass to your needs.
The StorageClass that was created can be specified in the storage section in the Prometheus resource (note that if you’re using kube-prometheus, then instead of making the following change to your Prometheus resource, see the prometheus-pvc.jsonnet example).
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: persisted spec: storage: volumeClaimTemplate: spec: storageClassName: ssd resources: requests: storage: 40Gi The full documentation of the
storagefield can be found in the API reference.
When creating the Prometheus object, a PersistentVolumeClaim is used for each Pod in the StatefulSet, and the storage should automatically be provisioned, mounted and used.
The same approach should work with other cloud providers (GCP, Azure, …) and any Kubernetes storage provider supporting dynamic provisioning.
Manual storage provisioning
The Prometheus CRD specification allows you to support arbitrary storage through a PersistentVolumeClaim.
The easiest way to use a volume that cannot be automatically provisioned (for whatever reason) is to use a label selector alongside a manually created PersistentVolume.
For example, using an NFS volume might be accomplished with the following manifests:
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: my-example-prometheus-name labels: prometheus: example spec: replicas: 1 storage: volumeClaimTemplate: spec: selector: matchLabels: app.kubernetes.io/name: my-example-prometheus resources: requests: storage: 50Gi --- apiVersion: v1 kind: PersistentVolume metadata: name: my-pv-name labels: app.kubernetes.io/name: my-example-prometheus spec: capacity: storage: 50Gi accessModes: - ReadWriteOnce # required nfs: server: myServer path: "/path/to/prom/db" Using hostPath volumes
Using a hostPath volume requires ensuring that the container has the appropriate permissions to access and modify files at the specified path on the host machine, example:
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: example spec: replicas: 1 storage: volumeClaimTemplate: spec: selector: matchLabels: app.kubernetes.io/name: example resources: requests: storage: 50Gi securityContext: fsGroup: 65534 runAsNonRoot: true runAsUser: 65534 --- apiVersion: v1 kind: PersistentVolume metadata: name: my-pv-name labels: app.kubernetes.io/name: example spec: capacity: storage: 50Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain hostPath: path: /mnt/data Disabling Default StorageClasses
To manually provision volumes (as of Kubernetes 1.6.0), you may need to disable the default StorageClass that is automatically created for certain Cloud Providers. Default StorageClasses are pre-installed on Azure, AWS, GCE, OpenStack, and vSphere.
The default StorageClass behavior will override manual storage provisioning, preventing PersistentVolumeClaims from automatically binding to manually created PersistentVolumes.
To override this behavior, you must explicitly create the same resource, but set it to not be default (see the changelog for more information.)
For example, to disable default StorageClasses on a Google Container Engine cluster, create the following StorageClass:
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: standard annotations: # disable this default storage class by setting this annotation to false. storageclass.kubernetes.io/is-default-class: "false" provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd zone: us-east1-d Resizing volumes
Even if the StorageClass supports resizing, Kubernetes doesn’t support (yet) volume expansion through StatefulSets. This means that when you update the storage requests in the spec.storage field of a custom resource such as Prometheus, the operator has to delete/recreate the underlying StatefulSet and the associated PVCs aren’t expanded (more details in the KEP issue).
It is still possible to fix the situation manually.
First check that the storage class allows volume expansion:
$ kubectl get storageclass -o custom-columns=NAME:.metadata.name,ALLOWVOLUMEEXPANSION:.allowVolumeExpansion NAME ALLOWVOLUMEEXPANSION gp2-csi true gp3-csi true Next, update the spec.paused field to true (to prevent the operator from recreating the StatefulSet) and update the storage request in the spec.storage field of the custom resource. Assuming a Prometheus resource named example for which you want to increase the storage size to 10Gi:
kubectl patch prometheus/example --patch '{"spec": {"paused": true, "storage": {"volumeClaimTemplate": {"spec": {"resources": {"requests": {"storage":"10Gi"}}}}}}}' --type merge Next, patch every PVC with the updated storage request (10Gi in this example):
for p in $(kubectl get pvc -l operator.prometheus.io/name=example -o jsonpath='{range .items[*]}{.metadata.name} {end}'); do \ kubectl patch pvc/${p} --patch '{"spec": {"resources": {"requests": {"storage":"10Gi"}}}}'; \ done Next, delete the underlying StatefulSet using the orphan deletion strategy:
kubectl delete statefulset -l operator.prometheus.io/name=example --cascade=orphan Last, change spec.paused field of the custom resource back to false.
kubectl patch prometheus/example --patch '{"spec": {"paused": false}}' --type merge The operator should recreate the StatefulSet immediately, there will be no service disruption thanks to the orphan strategy and the volumes mounted in the Pods should have the updated size.