Skip to content

mantle/platform/azure: Add support for Azure Shared Image Gallery (SIG) and other enhancements #4109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 29, 2025

Conversation

marmijo
Copy link
Member

@marmijo marmijo commented May 17, 2025

This PR introduces support for building and running Azure Shared Image Gallery (SIG) images, also known as "gallery images", alongside traditional managed images. Both image types can now be used with either Hyper-V Generation 1 or Generation 2. Both images types now default to using Hyper-V Generation 2.

Additional enhancements include:

  • Support for overriding the instance type per kola test via external test configuration (useful for tests requiring specific VM sizes).

  • Ability to attach additional data disks to Azure instances, as specified in kola tests.

This enables testing the Azure disk udev rules that were added in coreos/fedora-coreos-config#3378, as NVMe support is only available on Gen2 Gallery Images.

See: https://issues.redhat.com/browse/COS-3125

EDIT: a commit was added to remove the option to set the Hyper-V generation. All images will now default to Gen2.


commit 89c9f8b3c8de67d723f8a90bedf40813c085e70c
Author: Michael Armijo <[email protected]>
Date:   Wed May 28 15:08:36 2025 -0600

    azure: only build and run Hyper-V Gen2 images
    
    Remove the option to specify the Hyper-V Generation during image
    creation and when running instances through kola. Instead, only
    build traditional and gallery images using Hyper-V Gen2.


commit 270e9e4c5cf175efe62381dc0bf57226c29a8586
Author: Michael Armijo <[email protected]>
Date:   Fri May 16 17:11:02 2025 -0600

    mantle/kola: add InstanceType to PlatformOptions for external tests
    
    Add an `InstanceType` field  to `PlatformOptions` to allow external
    tests to override the instance type used in `kola run`. This is useful
    for cases where a specific test needs to run on a different (potentially
    more expensive) instance type. Support for this is currently limited
    to the Azure Platform.
    
    Also, fix the `MultiPathDisk` for the qemu-iso platform. The check is
    now correctly performed in the `NewMachineWithOptions` function, since
    `MultiPathDisk` is part of `platform.MachineOptions`.

commit 386e133a741e41a8969951b4f5a8ce5789ec72e6
Author: Michael Armijo <[email protected]>
Date:   Wed May 14 17:58:40 2025 -0600

    ore/azure: add options to create and delete Shared Image Gallery Images
    
    Add a new `create-gallery-image` ore command to Support creating images
    within Azure Shared Image Galleries (Gallery Images). The command
    creates image definitions and versions in Azure Shared Image Galleries.
    Gallery images can be created from either a blob URL or an existing
    managed image. A `--azure-publisher` flag is added to assign a publisher
    to the gallery image.
    
    `delete-gallery-image` is also added to delete individual gallery images
    or an entire Shared Image Gallery.

commit 0be49d77209a89a8af65414d4ddec5cd099cb9e2
Author: Michael Armijo <[email protected]>
Date:   Wed May 14 17:26:58 2025 -0600

    azure: add --azure-hyper-v-generation flag
    
    Add a new flag to allow specifying the Hyper-V generation (V1 or V2)
    when creating Azure images. This enables support for both Gen1 and Gen2
    image creation.


commit 54658f96aa716dc17f8da26a48d8cf08b3ef4f9c
Author: Michael Armijo <[email protected]>
Date:   Wed May 14 16:49:34 2025 -0600

    platform/azure: support additional data disks
    
    Add support for attaching additional data disks to instances created in
    Azure. Disks are defined through the machines options as the Size in GB
    and the 'sku', or storage type, e.g. '["100G:sku=UltraSSD_LRS"]' for
    NVMe disks.

@marmijo marmijo marked this pull request as draft May 17, 2025 01:10
@marmijo marmijo force-pushed the azure-gen2-images branch from b76e3db to 8c1c513 Compare May 21, 2025 01:21
@marmijo marmijo marked this pull request as ready for review May 21, 2025 23:25
@marmijo marmijo force-pushed the azure-gen2-images branch from 8c1c513 to 18626cb Compare May 21, 2025 23:33
marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request May 22, 2025
Introduce a new test verifies that udev rules[1] for Azure Managed NVMe
disks correctly create symlinks in `/dev/disk/azure`. It only runs on
Azure and uses new kola functionality[2] to override the instance type,
enabling the use of a Hyper-V Gen2 VM (Standard_M16bds_v3) with a
UltraSSD_LRS (NVMe) data disk attached.

The test checks for the presence of expected symlinks:
  - /dev/disk/azure/os
  - /dev/disk/azure/data/by-lun/0

[1]: coreos#3378
[2]: coreos/coreos-assembler#4109
marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request May 22, 2025
Introduce a new test which verifies that udev rules[1] for Azure
Managed NVMe disks correctly create symlinks in `/dev/disk/azure`.
It only runs on Azure and uses new kola functionality[2] to override
the instance type, enabling the use of a Hyper-V Gen2 VM
(Standard_M16bds_v3) with a UltraSSD_LRS (NVMe) data disk attached.

The test checks for the presence of expected symlinks:
  - /dev/disk/azure/os
  - /dev/disk/azure/data/by-lun/0

[1]: coreos#3378
[2]: coreos/coreos-assembler#4109
@marmijo marmijo force-pushed the azure-gen2-images branch from 18626cb to 9131ca7 Compare May 22, 2025 17:05
marmijo added a commit to marmijo/fedora-coreos-pipeline that referenced this pull request May 22, 2025
Expand the job to build Hyper-V Gen2 gallery images and run kola tests
on both traditional managed images and new Gen2 gallery images[1]. The new
NVMe test[2] is explicitly denylisted from managed image testing since
since it requires Gen2 support. This will essentially allow all Azure
kola tests to be validated on both kinds of images.

A pipeline config variable, `test_gallery`, is introduced to the
Azure section to specify the gallery used to create Gen2 images.

[1]: coreos/coreos-assembler#4109
[2]: coreos/fedora-coreos-config#3519
marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request May 27, 2025
Introduce a new test which verifies that udev rules[1] for Azure
Managed NVMe disks correctly create symlinks in `/dev/disk/azure`.
It only runs on Azure and uses new kola functionality[2] to override
the instance type, enabling the use of a Hyper-V Gen2 VM
(Standard_M16bds_v3) with a UltraSSD_LRS (NVMe) data disk attached.

The test checks for the presence of expected symlinks:
  - /dev/disk/azure/os
  - /dev/disk/azure/data/by-lun/0

[1]: coreos#3378
[2]: coreos/coreos-assembler#4109
@marmijo
Copy link
Member Author

marmijo commented May 27, 2025

/retest rhcos

Copy link

openshift-ci bot commented May 27, 2025

@marmijo: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test images
/test rhcos

Use /test all to run all jobs.

In response to this:

/retest rhcos

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@marmijo
Copy link
Member Author

marmijo commented May 27, 2025

/test rhcos

marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request May 27, 2025
Introduce a new test which verifies that udev rules[1] for Azure
Managed NVMe disks correctly create symlinks in `/dev/disk/azure`.
It only runs on Azure and uses new kola functionality[2] to override
the instance type, enabling the use of a Hyper-V Gen2 VM
(Standard_M16bds_v3) with a UltraSSD_LRS (NVMe) data disk attached.

The test checks for the presence of OS and data disk symlinks, and
that they point to a valid target.

[1]: coreos#3378
[2]: coreos/coreos-assembler#4109
@marmijo marmijo force-pushed the azure-gen2-images branch from 9131ca7 to a5edc64 Compare May 27, 2025 20:50
}

// Create a Gallery Image Version
versionName := "1.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely will want to update this when we start publishing images for other people to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky. Azure only allows version names to match MajorVersion.MinorVersion.Patch, so our naming convention for FCOS wont be valid because we have four numeric segments instead of three. We'll need to discuss how we want to structure our image galleries to preserve the whole version name elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss this over in coreos/fedora-coreos-tracker#148 (comment)

We can mutate our version slightly to fit this mold if we must, but let's see how that conversation goes too.

Comment on lines +120 to +116
versionPager := a.galImgVerClient.NewListByGalleryImagePager(resourceGroup, galleryName, imageName, nil)
for versionPager.More() {
versionPage, err := versionPager.NextPage(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we'll definitely want to be able to delete a specific version of an image from an image gallery.

i.e. we'd call ore from cosa coreos-prune code to delete old versions of CoreOS we don't want to keep around any longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment about the naming of image versions. We can add this to that discussion as well.

marmijo added 2 commits May 28, 2025 14:19
Add support for attaching additional data disks to instances created in
Azure. Disks are defined through the machines options as the Size in GB
and the 'sku', or storage type, e.g. '["100G:sku=UltraSSD_LRS"]' for
NVMe disks.
Add a new flag to allow specifying the Hyper-V generation (V1 or V2)
when creating Azure images. This enables support for both Gen1 and Gen2
image creation.
@marmijo marmijo force-pushed the azure-gen2-images branch 3 times, most recently from 89c9f8b to 702cf8a Compare May 28, 2025 23:04
dustymabe
dustymabe previously approved these changes May 29, 2025
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

marmijo added 3 commits May 29, 2025 10:44
Add a new `create-gallery-image` ore command to Support creating images
within Azure Shared Image Galleries (Gallery Images). The command
creates image definitions and versions in Azure Shared Image Galleries.
Gallery images can be created from either a blob URL or an existing
managed image. A `--azure-publisher` flag is added to assign a publisher
to the gallery image.

`delete-gallery-image` is also added to delete individual gallery images
or an entire Shared Image Gallery.
Add an `InstanceType` field  to `PlatformOptions` to allow external
tests to override the instance type used in `kola run`. This is useful
for cases where a specific test needs to run on a different (potentially
more expensive) instance type. Support for this is currently limited
to the Azure Platform.

Also, fix the `MultiPathDisk` check for the qemu-iso platform. The check
is now correctly performed in the `NewMachineWithOptions` function, since
`MultiPathDisk` is part of `platform.MachineOptions`.
Remove the option to specify the Hyper-V Generation during image
creation and when running instances through kola. Instead, only
build traditional and gallery images using Hyper-V Gen2.
@marmijo
Copy link
Member Author

marmijo commented May 29, 2025

I made one last force push (hopefully) to also delete the azure managed image when calling ore azure delete-gallery-image.

marmijo added a commit to marmijo/fedora-coreos-pipeline that referenced this pull request May 29, 2025
Modify the job to build Hyper-V Gen2 gallery images and run kola tests
exclusively on those new images[1]. The new Azure NVMe test[2] requires
NVMe support, which is only available with Gen2 gallery images. Since
there are no constraints requiring Gen1 images, we simplify things
by testing only Gen2.

A pipeline config variable, `test_gallery`, is introduced to the
Azure section to specify the gallery used to create Gen2 images.

[1]: coreos/coreos-assembler#4109
[2]: coreos/fedora-coreos-config#3519
@dustymabe
Copy link
Member

Feel free to merge whenever (or enable auto-merge).

marmijo added a commit to marmijo/fedora-coreos-pipeline that referenced this pull request May 29, 2025
Modify the job to build Hyper-V Gen2 gallery images and run kola tests
exclusively on those new images[1]. The new Azure NVMe test[2] requires
NVMe support, which is only available with Gen2 gallery images. Since
there are no constraints requiring Gen1 images, we simplify things
by testing only Gen2.

A pipeline config variable, `test_gallery`, is introduced to the
Azure section to specify the gallery used to create Gen2 images.

[1]: coreos/coreos-assembler#4109
[2]: coreos/fedora-coreos-config#3519
@marmijo marmijo enabled auto-merge (rebase) May 29, 2025 17:36
@marmijo marmijo merged commit 592d922 into coreos:main May 29, 2025
5 checks passed
marmijo added a commit to coreos/fedora-coreos-pipeline that referenced this pull request May 29, 2025
Modify the job to build Hyper-V Gen2 gallery images and run kola tests
exclusively on those new images[1]. The new Azure NVMe test[2] requires
NVMe support, which is only available with Gen2 gallery images. Since
there are no constraints requiring Gen1 images, we simplify things
by testing only Gen2.

A pipeline config variable, `test_gallery`, is introduced to the
Azure section to specify the gallery used to create Gen2 images.

[1]: coreos/coreos-assembler#4109
[2]: coreos/fedora-coreos-config#3519
marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request May 30, 2025
Introduce a new test which verifies that udev rules[1] for Azure
Managed NVMe disks correctly create symlinks in `/dev/disk/azure`.
It only runs on Azure and uses new kola functionality[2] to override
the instance type, enabling the use of a Hyper-V Gen2 VM
(Standard_M16bds_v3) with a UltraSSD_LRS (NVMe) data disk attached.

The test checks for the presence of OS and data disk symlinks, and
that they point to a valid target.

[1]: coreos#3378
[2]: coreos/coreos-assembler#4109
marmijo added a commit to marmijo/coreos-assembler that referenced this pull request May 30, 2025
The default size of Standard_D2_v2 does not support Hyper-V Gen2 VMs.
Now that we are only producing Gen2[1], update the default to
Standard_D2_v3, which does support Gen2 VMs.

[1]: coreos#4109
dustymabe pushed a commit that referenced this pull request May 30, 2025
The default size of Standard_D2_v2 does not support Hyper-V Gen2 VMs.
Now that we are only producing Gen2[1], update the default to
Standard_D2_v3, which does support Gen2 VMs.

[1]: #4109
marmijo added a commit to coreos/fedora-coreos-config that referenced this pull request May 30, 2025
Introduce a new test which verifies that udev rules[1] for Azure
Managed NVMe disks correctly create symlinks in `/dev/disk/azure`.
It only runs on Azure and uses new kola functionality[2] to override
the instance type, enabling the use of a Hyper-V Gen2 VM
(Standard_M16bds_v3) with a UltraSSD_LRS (NVMe) data disk attached.

The test checks for the presence of OS and data disk symlinks, and
that they point to a valid target.

[1]: #3378
[2]: coreos/coreos-assembler#4109
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants