Storage Classes
This document describes the concept of a StorageClass in Kubernetes. Familiarity with volumes and persistent volumes is suggested.
Introduction
A StorageClass provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators. Kubernetes itself is unopinionated about what classes represent. This concept is sometimes called "profiles" in other storage systems.
The StorageClass Resource
Each StorageClass contains the fields provisioner
, parameters
, and
reclaimPolicy
, which are used when a PersistentVolume belonging to the
class needs to be dynamically provisioned.
The name of a StorageClass object is significant, and is how users can request a particular class. Administrators set the name and other parameters of a class when first creating StorageClass objects, and the objects cannot be updated once they are created.
Administrators can specify a default StorageClass only for PVCs that don't request any particular class to bind to: see the PersistentVolumeClaim section for details.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
- debug
volumeBindingMode: Immediate
Provisioner
Each StorageClass has a provisioner that determines what volume plugin is used for provisioning PVs. This field must be specified.
Volume Plugin | Internal Provisioner | Config Example |
---|---|---|
AWSElasticBlockStore | ✓ | AWS EBS |
AzureFile | ✓ | Azure File |
AzureDisk | ✓ | Azure Disk |
CephFS | - | - |
Cinder | ✓ | OpenStack Cinder |
FC | - | - |
FlexVolume | - | - |
GCEPersistentDisk | ✓ | GCE PD |
Glusterfs | ✓ | Glusterfs |
iSCSI | - | - |
NFS | - | NFS |
RBD | ✓ | Ceph RBD |
VsphereVolume | ✓ | vSphere |
PortworxVolume | ✓ | Portworx Volume |
Local | - | Local |
You are not restricted to specifying the "internal" provisioners listed here (whose names are prefixed with "kubernetes.io" and shipped alongside Kubernetes). You can also run and specify external provisioners, which are independent programs that follow a specification defined by Kubernetes. Authors of external provisioners have full discretion over where their code lives, how the provisioner is shipped, how it needs to be run, what volume plugin it uses (including Flex), etc. The repository kubernetes-sigs/sig-storage-lib-external-provisioner houses a library for writing external provisioners that implements the bulk of the specification. Some external provisioners are listed under the repository kubernetes-sigs/sig-storage-lib-external-provisioner.
For example, NFS doesn't provide an internal provisioner, but an external provisioner can be used. There are also cases when 3rd party storage vendors provide their own external provisioner.
Reclaim Policy
PersistentVolumes that are dynamically created by a StorageClass will have the
reclaim policy specified in the reclaimPolicy
field of the class, which can be
either Delete
or Retain
. If no reclaimPolicy
is specified when a
StorageClass object is created, it will default to Delete
.
PersistentVolumes that are created manually and managed via a StorageClass will have whatever reclaim policy they were assigned at creation.
Allow Volume Expansion
Kubernetes v1.11 [beta]
PersistentVolumes can be configured to be expandable. This feature when set to true
,
allows the users to resize the volume by editing the corresponding PVC object.
The following types of volumes support volume expansion, when the underlying
StorageClass has the field allowVolumeExpansion
set to true.
Volume type | Required Kubernetes version |
---|---|
gcePersistentDisk | 1.11 |
awsElasticBlockStore | 1.11 |
Cinder | 1.11 |
glusterfs | 1.11 |
rbd | 1.11 |
Azure File | 1.11 |
Azure Disk | 1.11 |
Portworx | 1.11 |
FlexVolume | 1.13 |
CSI | 1.14 (alpha), 1.16 (beta) |
Mount Options
PersistentVolumes that are dynamically created by a StorageClass will have the
mount options specified in the mountOptions
field of the class.
If the volume plugin does not support mount options but mount options are specified, provisioning will fail. Mount options are not validated on either the class or PV. If a mount option is invalid, the PV mount fails.
Volume Binding Mode
The volumeBindingMode
field controls when volume binding and dynamic
provisioning should occur. When unset, "Immediate" mode is used by default.
The Immediate
mode indicates that volume binding and dynamic
provisioning occurs once the PersistentVolumeClaim is created. For storage
backends that are topology-constrained and not globally accessible from all Nodes
in the cluster, PersistentVolumes will be bound or provisioned without knowledge of the Pod's scheduling
requirements. This may result in unschedulable Pods.
A cluster administrator can address this issue by specifying the WaitForFirstConsumer
mode which
will delay the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created.
PersistentVolumes will be selected or provisioned conforming to the topology that is
specified by the Pod's scheduling constraints. These include, but are not limited to, resource
requirements,
node selectors,
pod affinity and
anti-affinity,
and taints and tolerations.
The following plugins support WaitForFirstConsumer
with dynamic provisioning:
The following plugins support WaitForFirstConsumer
with pre-created PersistentVolume binding:
- All of the above
- Local
Kubernetes v1.17 [stable]
If you choose to use WaitForFirstConsumer
, do not use nodeName
in the Pod spec
to specify node affinity. If nodeName
is used in this case, the scheduler will be bypassed and PVC will remain in pending
state.
Instead, you can use node selector for hostname in this case as shown below.
apiVersion: v1
kind: Pod
metadata:
name: task-pv-pod
spec:
nodeSelector:
kubernetes.io/hostname: kube-01
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
Allowed Topologies
When a cluster operator specifies the WaitForFirstConsumer
volume binding mode, it is no longer necessary
to restrict provisioning to specific topologies in most situations. However,
if still required, allowedTopologies
can be specified.
This example demonstrates how to restrict the topology of provisioned volumes to specific
zones and should be used as a replacement for the zone
and zones
parameters for the
supported plugins.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central-1a
- us-central-1b
Parameters
Storage Classes have parameters that describe volumes belonging to the storage
class. Different parameters may be accepted depending on the provisioner
. For
example, the value io1
, for the parameter type
, and the parameter
iopsPerGB
are specific to EBS. When a parameter is omitted, some default is
used.
There can be at most 512 parameters defined for a StorageClass. The total length of the parameters object including its keys and values cannot exceed 256 KiB.
AWS EBS
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
iopsPerGB: "10"
fsType: ext4
type
:io1
,gp2
,sc1
,st1
. See AWS docs for details. Default:gp2
.zone
(Deprecated): AWS zone. If neitherzone
norzones
is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node.zone
andzones
parameters must not be used at the same time.zones
(Deprecated): A comma separated list of AWS zone(s). If neitherzone
norzones
is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node.zone
andzones
parameters must not be used at the same time.iopsPerGB
: only forio1
volumes. I/O operations per second per GiB. AWS volume plugin multiplies this with size of requested volume to compute IOPS of the volume and caps it at 20 000 IOPS (maximum supported by AWS, see AWS docs. A string is expected here, i.e."10"
, not10
.fsType
: fsType that is supported by kubernetes. Default:"ext4"
.encrypted
: denotes whether the EBS volume should be encrypted or not. Valid values are"true"
or"false"
. A string is expected here, i.e."true"
, nottrue
.kmsKeyId
: optional. The full Amazon Resource Name of the key to use when encrypting the volume. If none is supplied butencrypted
is true, a key is generated by AWS. See AWS docs for valid ARN value.
GCE PD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
fstype: ext4
replication-type: none
-
type
:pd-standard
orpd-ssd
. Default:pd-standard
-
zone
(Deprecated): GCE zone. If neitherzone
norzones
is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node.zone
andzones
parameters must not be used at the same time. -
zones
(Deprecated): A comma separated list of GCE zone(s). If neitherzone
norzones
is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node.zone
andzones
parameters must not be used at the same time. -
fstype
:ext4
orxfs
. Default:ext4
. The defined filesystem type must be supported by the host operating system. -
replication-type
:none
orregional-pd
. Default:none
.
If replication-type
is set to none
, a regular (zonal) PD will be provisioned.
If replication-type
is set to regional-pd
, a
Regional Persistent Disk
will be provisioned. It's highly recommended to have
volumeBindingMode: WaitForFirstConsumer
set, in which case when you create
a Pod that consumes a PersistentVolumeClaim which uses this StorageClass, a
Regional Persistent Disk is provisioned with two zones. One zone is the same
as the zone that the Pod is scheduled in. The other zone is randomly picked
from the zones available to the cluster. Disk zones can be further constrained
using allowedTopologies
.
Glusterfs
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/glusterfs
parameters:
resturl: "http://127.0.0.1:8081"
clusterid: "630372ccdc720a92c681fb928f27b53f"
restauthenabled: "true"
restuser: "admin"
secretNamespace: "default"
secretName: "heketi-secret"
gidMin: "40000"
gidMax: "50000"
volumetype: "replicate:3"
-
resturl
: Gluster REST service/Heketi service url which provision gluster volumes on demand. The general format should beIPaddress:Port
and this is a mandatory parameter for GlusterFS dynamic provisioner. If Heketi service is exposed as a routable service in openshift/kubernetes setup, this can have a format similar tohttp://heketi-storage-project.cloudapps.mystorage.com
where the fqdn is a resolvable Heketi service url. -
restauthenabled
: Gluster REST service authentication boolean that enables authentication to the REST server. If this value is"true"
,restuser
andrestuserkey
orsecretNamespace
+secretName
have to be filled. This option is deprecated, authentication is enabled when any ofrestuser
,restuserkey
,secretName
orsecretNamespace
is specified. -
restuser
: Gluster REST service/Heketi user who has access to create volumes in the Gluster Trusted Pool. -
restuserkey
: Gluster REST service/Heketi user's password which will be used for authentication to the REST server. This parameter is deprecated in favor ofsecretNamespace
+secretName
. -
secretNamespace
,secretName
: Identification of Secret instance that contains user password to use when talking to Gluster REST service. These parameters are optional, empty password will be used when bothsecretNamespace
andsecretName
are omitted. The provided secret must have type"kubernetes.io/glusterfs"
, for example created in this way:kubectl create secret generic heketi-secret \ --type="kubernetes.io/glusterfs" --from-literal=key='opensesame' \ --namespace=default
Example of a secret can be found in glusterfs-provisioning-secret.yaml.
-
clusterid
:630372ccdc720a92c681fb928f27b53f
is the ID of the cluster which will be used by Heketi when provisioning the volume. It can also be a list of clusterids, for example:"8452344e2becec931ece4e33c4674e4e,42982310de6c63381718ccfa6d8cf397"
. This is an optional parameter. -
gidMin
,gidMax
: The minimum and maximum value of GID range for the StorageClass. A unique value (GID) in this range ( gidMin-gidMax ) will be used for dynamically provisioned volumes. These are optional values. If not specified, the volume will be provisioned with a value between 2000-2147483647 which are defaults for gidMin and gidMax respectively. -
volumetype
: The volume type and its parameters can be configured with this optional value. If the volume type is not mentioned, it's up to the provisioner to decide the volume type.For example:
- Replica volume:
volumetype: replicate:3
where '3' is replica count. - Disperse/EC volume:
volumetype: disperse:4:2
where '4' is data and '2' is the redundancy count. - Distribute volume:
volumetype: none
For available volume types and administration options, refer to the Administration Guide.
For further reference information, see How to configure Heketi.
When persistent volumes are dynamically provisioned, the Gluster plugin automatically creates an endpoint and a headless service in the name
gluster-dynamic-<claimname>
. The dynamic endpoint and service are automatically deleted when the persistent volume claim is deleted. - Replica volume:
NFS
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example-nfs
provisioner: example.com/external-nfs
parameters:
server: nfs-server.example.com
path: /share
readOnly: "false"
server
: Server is the hostname or IP address of the NFS server.path
: Path that is exported by the NFS server.readOnly
: A flag indicating whether the storage will be mounted as read only (default false).
Kubernetes doesn't include an internal NFS provisioner. You need to use an external provisioner to create a StorageClass for NFS. Here are some examples:
OpenStack Cinder
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gold
provisioner: kubernetes.io/cinder
parameters:
availability: nova
availability
: Availability Zone. If not specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node.
Kubernetes v1.11 [deprecated]
This internal provisioner of OpenStack is deprecated. Please use the external cloud provider for OpenStack.
vSphere
There are two types of provisioners for vSphere storage classes:
- CSI provisioner:
csi.vsphere.vmware.com
- vCP provisioner:
kubernetes.io/vsphere-volume
In-tree provisioners are deprecated. For more information on the CSI provisioner, see Kubernetes vSphere CSI Driver and vSphereVolume CSI migration.
CSI Provisioner
The vSphere CSI StorageClass provisioner works with Tanzu Kubernetes clusters. For an example, refer to the vSphere CSI repository.
vCP Provisioner
The following examples use the VMware Cloud Provider (vCP) StorageClass provisioner.
-
Create a StorageClass with a user specified disk format.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast provisioner: kubernetes.io/vsphere-volume parameters: diskformat: zeroedthick
diskformat
:thin
,zeroedthick
andeagerzeroedthick
. Default:"thin"
. -
Create a StorageClass with a disk format on a user specified datastore.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast provisioner: kubernetes.io/vsphere-volume parameters: diskformat: zeroedthick datastore: VSANDatastore
datastore
: The user can also specify the datastore in the StorageClass. The volume will be created on the datastore specified in the StorageClass, which in this case isVSANDatastore
. This field is optional. If the datastore is not specified, then the volume will be created on the datastore specified in the vSphere config file used to initialize the vSphere Cloud Provider. -
Storage Policy Management inside kubernetes
-
Using existing vCenter SPBM policy
One of the most important features of vSphere for Storage Management is policy based Management. Storage Policy Based Management (SPBM) is a storage policy framework that provides a single unified control plane across a broad range of data services and storage solutions. SPBM enables vSphere administrators to overcome upfront storage provisioning challenges, such as capacity planning, differentiated service levels and managing capacity headroom.
The SPBM policies can be specified in the StorageClass using the
storagePolicyName
parameter. -
Virtual SAN policy support inside Kubernetes
Vsphere Infrastructure (VI) Admins will have the ability to specify custom Virtual SAN Storage Capabilities during dynamic volume provisioning. You can now define storage requirements, such as performance and availability, in the form of storage capabilities during dynamic volume provisioning. The storage capability requirements are converted into a Virtual SAN policy which are then pushed down to the Virtual SAN layer when a persistent volume (virtual disk) is being created. The virtual disk is distributed across the Virtual SAN datastore to meet the requirements.
You can see Storage Policy Based Management for dynamic provisioning of volumes for more details on how to use storage policies for persistent volumes management.
-
There are few vSphere examples which you try out for persistent volume management inside Kubernetes for vSphere.
Ceph RBD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/rbd
parameters:
monitors: 10.16.153.105:6789
adminId: kube
adminSecretName: ceph-secret
adminSecretNamespace: kube-system
pool: kube
userId: kube
userSecretName: ceph-secret-user
userSecretNamespace: default
fsType: ext4
imageFormat: "2"
imageFeatures: "layering"
-
monitors
: Ceph monitors, comma delimited. This parameter is required. -
adminId
: Ceph client ID that is capable of creating images in the pool. Default is "admin". -
adminSecretName
: Secret Name foradminId
. This parameter is required. The provided secret must have type "kubernetes.io/rbd". -
adminSecretNamespace
: The namespace foradminSecretName
. Default is "default". -
pool
: Ceph RBD pool. Default is "rbd". -
userId
: Ceph client ID that is used to map the RBD image. Default is the same asadminId
. -
userSecretName
: The name of Ceph Secret foruserId
to map RBD image. It must exist in the same namespace as PVCs. This parameter is required. The provided secret must have type "kubernetes.io/rbd", for example created in this way:kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" \ --from-literal=key='QVFEQ1pMdFhPUnQrSmhBQUFYaERWNHJsZ3BsMmNjcDR6RFZST0E9PQ==' \ --namespace=kube-system
-
userSecretNamespace
: The namespace foruserSecretName
. -
fsType
: fsType that is supported by kubernetes. Default:"ext4"
. -
imageFormat
: Ceph RBD image format, "1" or "2". Default is "2". -
imageFeatures
: This parameter is optional and should only be used if you setimageFormat
to "2". Currently supported features arelayering
only. Default is "", and no features are turned on.
Azure Disk
Azure Unmanaged Disk storage class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/azure-disk
parameters:
skuName: Standard_LRS
location: eastus
storageAccount: azure_storage_account_name
skuName
: Azure storage account Sku tier. Default is empty.location
: Azure storage account location. Default is empty.storageAccount
: Azure storage account name. If a storage account is provided, it must reside in the same resource group as the cluster, andlocation
is ignored. If a storage account is not provided, a new storage account will be created in the same resource group as the cluster.
Azure Disk storage class (starting from v1.7.2)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/azure-disk
parameters:
storageaccounttype: Standard_LRS
kind: managed
storageaccounttype
: Azure storage account Sku tier. Default is empty.kind
: Possible values areshared
,dedicated
, andmanaged
(default). Whenkind
isshared
, all unmanaged disks are created in a few shared storage accounts in the same resource group as the cluster. Whenkind
isdedicated
, a new dedicated storage account will be created for the new unmanaged disk in the same resource group as the cluster. Whenkind
ismanaged
, all managed disks are created in the same resource group as the cluster.resourceGroup
: Specify the resource group in which the Azure disk will be created. It must be an existing resource group name. If it is unspecified, the disk will be placed in the same resource group as the current Kubernetes cluster.
- Premium VM can attach both Standard_LRS and Premium_LRS disks, while Standard VM can only attach Standard_LRS disks.
- Managed VM can only attach managed disks and unmanaged VM can only attach unmanaged disks.
Azure File
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azurefile
provisioner: kubernetes.io/azure-file
parameters:
skuName: Standard_LRS
location: eastus
storageAccount: azure_storage_account_name
skuName
: Azure storage account Sku tier. Default is empty.location
: Azure storage account location. Default is empty.storageAccount
: Azure storage account name. Default is empty. If a storage account is not provided, all storage accounts associated with the resource group are searched to find one that matchesskuName
andlocation
. If a storage account is provided, it must reside in the same resource group as the cluster, andskuName
andlocation
are ignored.secretNamespace
: the namespace of the secret that contains the Azure Storage Account Name and Key. Default is the same as the Pod.secretName
: the name of the secret that contains the Azure Storage Account Name and Key. Default isazure-storage-account-<accountName>-secret
readOnly
: a flag indicating whether the storage will be mounted as read only. Defaults to false which means a read/write mount. This setting will impact theReadOnly
setting in VolumeMounts as well.
During storage provisioning, a secret named by secretName
is created for the
mounting credentials. If the cluster has enabled both
RBAC and
Controller Roles,
add the create
permission of resource secret
for clusterrole
system:controller:persistent-volume-binder
.
In a multi-tenancy context, it is strongly recommended to set the value for
secretNamespace
explicitly, otherwise the storage account credentials may
be read by other users.
Portworx Volume
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: portworx-io-priority-high
provisioner: kubernetes.io/portworx-volume
parameters:
repl: "1"
snap_interval: "70"
priority_io: "high"
fs
: filesystem to be laid out:none/xfs/ext4
(default:ext4
).block_size
: block size in Kbytes (default:32
).repl
: number of synchronous replicas to be provided in the form of replication factor1..3
(default:1
) A string is expected here i.e."1"
and not1
.priority_io
: determines whether the volume will be created from higher performance or a lower priority storagehigh/medium/low
(default:low
).snap_interval
: clock/time interval in minutes for when to trigger snapshots. Snapshots are incremental based on difference with the prior snapshot, 0 disables snaps (default:0
). A string is expected here i.e."70"
and not70
.aggregation_level
: specifies the number of chunks the volume would be distributed into, 0 indicates a non-aggregated volume (default:0
). A string is expected here i.e."0"
and not0
ephemeral
: specifies whether the volume should be cleaned-up after unmount or should be persistent.emptyDir
use case can set this value to true andpersistent volumes
use case such as for databases like Cassandra should set to false,true/false
(defaultfalse
). A string is expected here i.e."true"
and nottrue
.
Local
Kubernetes v1.14 [stable]
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
Local volumes do not currently support dynamic provisioning, however a StorageClass
should still be created to delay volume binding until Pod scheduling. This is
specified by the WaitForFirstConsumer
volume binding mode.
Delaying volume binding allows the scheduler to consider all of a Pod's scheduling constraints when choosing an appropriate PersistentVolume for a PersistentVolumeClaim.