Skip to content

scalability issues with zvol_create_minors(pool)  #2217

Closed
@bprotopopov

Description

@bprotopopov

zvol_create_minors() is a generic function that recursively scans pool namespace and invokes a callback for each object set found. The goal is to create device nodes for all the zvols and their snapshots if the snapdev property is set on the corresponding zvols.

zvol_create_minors() is invoked at spa first open/import, and at zvol, clone, and zvol snapshot create time. Similar function (zvol_remove_minors()) is invoked at last close/export time to remove the device nodes created earlier. At snapshot create time, it should only be invoked on the snapshots under processing, whereas currently, it is called on the whole pool at the end of zfs_ioc_snapshot(). This invocation should be removed.

However, other issues still remain. Specifically, if there are many zvols and snapshots, all of them will have to be visited, and device nodes created/removed (which includes interaction with udev and dynamic creation/removal of the /dev nodes). One issue here is that regardless of the value of the snapdev property ('visible' or 'hidden'), all the snapshots are visited. As the total number of snapshots can be much larger than the total number of zvols/clones, this can present a scalability problem.

The reason this happens is that zvol_create_minors() uses generic pool namespace traversal routine (dmu_fobjset_find()) that does not have a way of filtering or pruning its traversal of the objset namespace. It does not interact with the callback it invokes on the objsets under traversal in any way other than aborting the entire traversal if a callback returns non-zero value. As a result, if DS_FIND_SNAPSHOTS flag is passed, the callback will be invoked on all the snapshots of a zvol, whether of not this zvol has znapdev='visible' set (and each callback will get the zvol name and will check for the snapdev property value).

This can be addressed either by amending the objset space traversal to filter (by object type) and/or prune (if zvol does not expose the snaps, don't iterate over the snaps). Alternatively, one can filter the zvols with snapdev='hidden' in the callback, if the name of the zvol is passed in the (currently unused) argument to the callback. The latter approach cannot eliminate the unneeded snapshot traversal but it can make the callbacks very fast.

To summarize, the zvol_create_minors(poolname) at the end of zfs_ioc_snapshot() should be removed, and the overhead during spa first open/import should be controlled by filtering/pruning objset namespace scan to avoid scanning zvol snapshots if the snapdev property of the parent zvol is det to 'hidden'.

Unfortunately, unless one keeps the history of changes of snapdev property for the volumes, one is still likely to have to scan all the snapshots on export/last close to make sure the /dev is cleaned up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions