Skip to content

Start adding a KEP for csi volume resizing #780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 2, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions keps/sig-storage/20190129-csi-volume-resizing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
title: Support for CSI volume resizing
authors:
- "@gnufied
owning-sig: sig-storage
participating-sigs:
- sig-storage
reviewers:
- @saad-ali
- @jsafrane
approvers:
- @saad-ali
- @childsb
creation-date: 2019-01-29
last-updated: 2019-01-29
status: implementable
see-also:
- [Kubernetes Volume expansion](https://github.com/kubernetes/enhancements/issues/284)
- [Online resizing design](https://github.com/kubernetes/enhancements/pull/737)
replaces:
superseded-by:
---

# Support for CSI volume resizing

## Table of Contents

Table of Contents
=================

* [Support for CSI volume resizing](#support-for-csi-volume-resizing)
* [Table of Contents](#table-of-contents)
* [Table of Contents](#table-of-contents-1)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Proposal](#proposal)
* [External resize controller](#external-resize-controller)
* [Expansion on Kubelet](#expansion-on-kubelet)
* [Offline volume resizing on kubelet:](#offline-volume-resizing-on-kubelet)
* [Online volume resizing on kubelet:](#online-volume-resizing-on-kubelet)
* [Risks and Mitigations](#risks-and-mitigations)
* [Test Plan](#test-plan)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)


## Summary

To bring CSI volumes in feature parity with in-tree volumes we need to implement support for resizing of CSI volumes.

## Motivation

We recently implemented volume resizing support in CSI specs. This proposal implements this feature for Kubernetes.
Any CSI volume plugin that implements necessary part of CSI specs will become resizable.

### Goals

To enable expansion of CSI volumes used by `PersistentVolumeClaim`s that support volume expansion as a plugin capability.

### Non-Goals

The expansion capability of a CSI plugin will not be validated by using CSI RPC call when user edits the PVC(i.e existing resize admission controller will not make CSI RPC call).
The responsibility of
actually enabling expansion for certains storageclasses still falls on Kubernetes admin.

## Proposal

The design of CSI volume resizing is made of two parts.


### External resize controller

To support resizing of CSI volumes an external resize controller will monitor all PVCs. If a PVC meets following criteria for resizing, it will be added to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: monitor all changes to PVCs

controller's workqueue:

- The driver name disovered from PVC should match name of driver currently known(by querying driver info via CSI RPC call) to external resize controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it get the driver name? From StorageClass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From PV.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify that.

- Once it notices a PVC has been updated and by comparing old and new PVC object, it determines more space has been requested by the user.

Once PVC gets picked from workqueue, the controller will also compare requested PVC size with actual size of volume in `PersistentVolume`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this check do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to check PV size before actually doing controller resize because it is possible that, volume already has been resized and controller is just resyncing PVCs.

object. Once PVC passes all these checks, a CSI `ControllerExpandVolume` call will be made by the controller if CSI plugin implements `ControllerExpandVolume`
RPC call.

If `ControllerExpandVolume` call is successful and plugin implements `NodeExpandVolume`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e. the CSI driver has a EXPAND_VOLUME node capability)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I did not use use capabilities in this document on purpose because I thought "plugins implements NodeExpandVolume "is synonymous with "plugin has EXPAND_VOLUME node capability".

- if `ControllerExpandVolumeResponse` returns `true` in `node_expansion_required` then `FileSystemResizePending` condition will be added to PVC and `NodeExpandVolume` operation will be queued on kubelet. Also volume size reported by PV will be updated to new value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So kubelet will wait until the FileSystemResizePending condition is applied to PVC? If so, this smells fishy. My understanding was that conditions were informative for end user and not to be used to trigger behavior in other components?

Copy link
Member Author

@gnufied gnufied Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No kubelet does not check for FileSystemResiznePending condition saad. From proposal:

When a pod that is using the PVC is started, kubelet will compare pvc.spec.resources.requests.storage and pvc.Status.Capacity. It also compares PVC's size with pv.Spec.Capacity and if it detects PV is reporting same size as pvc's spec but PVC's status is still reporting smaller value then it determines - a volume expansion is pending on the node. At this point if plugin implements NodeExpandVolume RPC call then, kubelet will call it

- if `ControllerExpandVolumeResponse` returns `false` in `node_expansion_required` then volume resize operation will be marked finished and both `pvc.Status.Capacity` and `pv.Spec.Capacity` will report updated value.

If plugin does not implement `NodeExpandVolume` then volume resize operation will be marked as finished and both `pvc.Status.Capacity` and `pv.Spec.Capacity` will report updated value after successful completion of `ControllerExpandVolume` RPC call.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ControllerExpandVolume call is successful and the plugin does not implement...


If `ControllerExpandVolume` call fails:
- Then PVC will retain `Resizing` condition and will have appropriate events added to the PVC.
- Controller will retry resizing operation with exponential backoff, assuming it corrects itself.

A general mechanism for recovering from resize failure will be implemented via: https://github.com/kubernetes/kubernetes/issues/73036

### Expansion on Kubelet

A CSI volume may require expansion on the node to finish volume resizing. In some cases - the entire resizing operation can happen on the node and
plugin may choose to not implement `ControllerExpandVolume` CSI RPC call at all.

Currently Kubernetes supports two modes of performing volume resize on kubelet. We will describe each mode here. For more information , please refer to original volume resize proposal - https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md.


#### Offline volume resizing on kubelet:

This is the default mode and in this mode `NodeExpandVolume` will only be called when volume is being mounted on the node. In other words, pod that was using the volume must be re-created for expansion on node to happen.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes kubelet to start this process of calling NodeExpandVolume? Based on above it seems like some component (mount operation?) watches the PVC and triggers this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - offline resizing is done as part of MountVolume operation (in operation_executor.go). The actual checks we perform are outlined below. I am going to update this document to be more explicit.


When a pod that is using the PVC is started, kubelet will compare `pvc.spec.resources.requests.storage` and `pvc.Status.Capacity`. It also compares PVC's size with `pv.Spec.Capacity` and if it detects PV is reporting same size as pvc's spec but PVC's status is still reporting smaller value then it determines -
a volume expansion is pending on the node. At this point if plugin implements `NodeExpandVolume` RPC call then, kubelet will call it and:

If `NodeExpandVolume` is successful:
- It will update `pvc.Status.Capacity` with latest value and remove all resizing related conditions from PVC.

If `NodeExpandVolume` failed:
- It will add a event to both PVC and Pod about failed resizing and resize operation will be retried. This will prevent pod from starting up.


#### Online volume resizing on kubelet:

More details about online resizing can be found in [Online resizing design](https://github.com/kubernetes/enhancements/pull/737) but essentially if
`ExpandInUsePersistentVolumes` feature is enabled then kubelet will periodically poll all PVCs that are being used on the node and compare `pvc.spec.resources.requests.storage` and `pvc.Status.Capacity`(also `pv.Spec.Capacity`) and make similar determination about whether node expansion is required for the volume.

In this mode `NodeExpandVolume` can be called while pod is running and volume is in-use. Using aformentioned check if kubelet determines that
volume expansion is needed on the node and plugin implements `NodeExpandVolume` RPC call then, kubelet will call it(provided volume has already been node staged and published on the node) and:

If `NodeExpandVolume` is successful:
- It will update `pvc.Status.Capacity` with latest value and remove all resizing related conditions from PVC.

If `NodeExpandVolume` failed:
- It will add a event to both PVC and Pod about failed resizing and resize operation will be retried.

### Risks and Mitigations

Before this feature goes GA - we need to handle recovering https://github.com/kubernetes/kubernetes/issues/73036.

## Test Plan

* Unit tests for external resize controller.
* Add e2e tests in Kubernetes that use csi-mock driver for volume resizing.
- (postive) Give a plugin that supports both control plane and node size resize, CSI volume should be resizable and able to complete successfully.
- (positive) Given a plugin that only requires control plane resize, CSI volume should be resizable and able to complete successfully.
- (positive) Given a plugin that only requires node side resize, CSI volume should be resizable and able to complete successfully.
- (positive) Given a plugin that support online resizing, CSI volume should be resizable and online resize operation be able to complete successfully.
- (negative) If control resize fails, PVC should have appropriate events.
- (neative) if node side resize fails, both pod and PVC should have appropriate events.

## Graduation Criteria

Once implemented CSI volumes should be resizable and in-line with current in-tree implementation of volume resizing.

- *Alpha* : Initial support for CSI volume resizing. Released code will include an external CSI volume resize controller and changes to Kubelet. Implementation will have unit tests and csi-mock driver e2e tests.
- *Beta* : More robust support for CSI volume resizing, handle recovering from resize failures. Add e2e tests that use real drivers(`gce-pd`, `ebs` at minimum). Add metrics for volume resize operations.
- *GA* : CSI resizing in general will only leave GA after existing [Volume expansion](https://github.com/kubernetes/enhancements/issues/284) feature leaves GA. Online resizing of CSI volumes depends on [Online resizing](https://github.com/kubernetes/enhancements/pull/737) feature and online resizing of CSI volumes will be available as a GA feature only when [Online resizing feature](https://github.com/kubernetes/enhancements/pull/737) goes GA.

Hopefully the content previously contained in [umbrella issues][] will be tracked in the `Graduation Criteria` section.

[umbrella issues]: https://github.com/kubernetes/kubernetes/issues/62096

## Implementation History
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the Implementation History to clarify what has been implemented.


Major milestones in the life cycle of a KEP should be tracked in `Implementation History`.
Major milestones might include

- the `Summary` and `Motivation` sections being merged signaling SIG acceptance
- the `Proposal` section being merged signaling agreement on a proposed design
- the date implementation started
- the first Kubernetes release where an initial version of the KEP was available
- the version of Kubernetes where the KEP graduated to general availability
- when the KEP was retired or superseded