Skip to content

vdev_id regression: slot remapping doesn't work anymore #11951

Closed
@ZNikke

Description

@ZNikke

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04.2 LTS
Linux Kernel 5.4.0-72-generic
Architecture x86_64
ZFS Version 2.1.0-rc4
SPL Version 2.1.0-rc4

Describe the problem you're observing

It looks to me that slot remapping has stopped working sometime between zfs-0.8.3 (actually 0.8.3-1ubuntu12.8) and zfs-2.1.0-rc4.

Our disks ends up at different id:s depending on enclosure model, none of them matching the physical 1-12 numbering on the enclosure, so we want to map them all to match the 1-12 numbering.

Our previously working vdev_id.conf seems to still map the channels, but the disks ends up with whatever slot the enclosures use. As the documentation and examples for vdev_id.conf suggests this should still work, I think it's a regression.

Describe how to reproduce the problem

Given this vdev_id.conf:

# Settings, specify them even if they are the same as the defaults as the
# vdev_id mapping script has some interesting corner cases if they're missing.
multipath       no
topology        sas_direct
phys_per_port   4
slot            bay

# PCI-Express Slot 1
channel 1b:00.0 0 enc9d
# PCI-Express Slot 2
channel 20:00.0 0 enc10d

## ZFS log
# storcli64 /c0/v1 show all|grep Id
alias c0v1 wwn-0x600507604097171827cf79423d452f1f

## ZFS cache
# PCI-Express Slot 1
channel 1b:00.0 1 c3bay
# PCI-Express Slot 2
channel 20:00.0 1 c4bay

# Slot remapping
#    Linux      Mapped
#    Slot       Slot    Channel

# Drives in internal bays
slot 4           8      c3bay
slot 5           9      c3bay
slot 6          10      c3bay
slot 7          11      c3bay
slot 4          12      c4bay
slot 5          13      c4bay
slot 6          14      c4bay
slot 7          15      c4bay

# Map slot numbering in HP D2600 enclosure to match the 1-12 legend on box
slot 13          1      enc9d
slot 14          2      enc9d
slot 15          3      enc9d
slot 16          4      enc9d
slot 17          5      enc9d
slot 18          6      enc9d
slot 19          7      enc9d
slot 20          8      enc9d
slot 21          9      enc9d
slot 22         10      enc9d
slot 23         11      enc9d
slot 24         12      enc9d

# Default mapping for our DL180G6 "plåtsax" enclosures, they are off-by-one
# compared to the 1-12 numbering as printed on the label.
slot  0          1
slot  1          2
slot  2          3
slot  3          4
slot  4          5
slot  5          6
slot  6          7
slot  7          8
slot  8          9
slot  9         10
slot 10         11
slot 11         12

We get this mapping:

# ls --hide="*-part*" /dev/disk/by-vdev/
c0v1    c3bay7  c4bay7    enc10d11  enc10d5  enc10d9  enc9d16  enc9d20  enc9d24
c3bay4  c4bay4  enc10d0   enc10d2   enc10d6  enc9d13  enc9d17  enc9d21
c3bay5  c4bay5  enc10d1   enc10d3   enc10d7  enc9d14  enc9d18  enc9d22
c3bay6  c4bay6  enc10d10  enc10d4   enc10d8  enc9d15  enc9d19  enc9d23

Ie no slot remapping was done! The expected result of the slot mapping would be for c3bay c4bay to have the disks in the 8 .. 15 sequence, and that the enc9/enc10 enclosures to have the disks in the 1 .. 12 sequence.

Things get extra interesting if I remove the slot bay assignment, then I only get c0v1 mapped in by-vdev/. The vdev_id.confman page says that bay is the default, so it really shouldn't be needed to specify it. On this subject it's interesting to note that vdev_id.conf.sas_direct.example doesn't mention slot bay, but but the example in the vdev_id.confman page do. I personally think that the double-use of the slot keyword is a mistake, it's very confusing...

Include any warning/errors/backtraces from the system logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions