Skip to content

VMNetwork adapter 'vEthernet (nat)*' not found #4445

Open
@RomaricKanyamibwa

Description

@RomaricKanyamibwa

Summary

Much like the issue 2416, there seems to be an issue with the Windows_Server-2022-English-Full-ECS_Optimized AMIs, where the ECS-Agent is sometimes having issues connecting to the ECS Cluster due to some virtual hardware issues (the VMNetwork cannot be found). Like the other issue, this, too, seems random but will happen sporadically on our windows image.

Description

Using packer we create our own AMIs based on the Windows_Server-2022-English-Full-ECS_Optimized AMIs. On the AMI we install ssh, then pull our windows docker images, and finally terminate it by installing EC2Launchv2. Once the AMI is ready we use it on our ECS cluster with the user data :

# configure ecs cluster
[Environment]::SetEnvironmentVariable("ECS_CLUSTER", "cluster-x86_64-windows","Machine")
[Environment]::SetEnvironmentVariable("ECS_IMAGE_PULL_BEHAVIOR","prefer-cached","Machine")
[Environment]::SetEnvironmentVariable("ECS_AWSVPC_BLOCK_IMDS","true ","Machine")
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE","true","Machine")
# init ecs agent
Import-Module ECSTools
Initialize-ECSAgent -EnableTaskIAMRole -EnableTaskENI -LoggingDrivers "['json-file','awslogs']"

Periodically one of the instances in the ASG fails to get attached to the ECS Cluster with the following errors:

2024-11-25T10:07:52Z - [INFO]:ScheduledTask Initialize-ECSHostReboot created.
2024-11-25T10:07:52Z - [INFO]:Configuring ECS Host for Task IAM Roles...
2024-11-25T10:07:52Z - [INFO]:Server Edition: Microsoft Windows Server 2022 Datacenter
2024-11-25T10:07:55Z - [INFO]:Attempt#: 10, Adapters:

2024-11-25T10:07:55Z - [INFO]:VMNetwork adapter 'vEthernet (nat)*' not found
2024-11-25T10:07:55Z - [INFO]:Retrying after sleeping 1sec

This error makes the instance unusable to the cluster, so the ASG launches a new one while the old one is left dangling unused.

Expected Behavior

The ECS-Agent reliably connects to the ECS cluster without errors.

Observed Behavior

The ECS-Agent will sometimes fail, and the instance will not be attached to the ECS cluster and will just continue running. Rebooting the instance fixes the issues and the agent no longer produces the error.

Before the reboot we get:

PS C:\Windows\system32> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Ethernet 4                Amazon Elastic Network Adapter #2             8 Up           06-D5-5D-A5-67-E1       5.0 Gbps

After reboot when it starts to work :

PS C:\Windows\system32> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Ethernet 4                Amazon Elastic Network Adapter #2             6 Up           06-4C-57-A5-89-89       5.0 Gbps
vEthernet (nat)           Hyper-V Virtual Ethernet Adapter             12 Up           00-15-5D-03-19-C0        10 Gbps

Environment Details:

PS C:\Windows\system32> docker info
Client:
 Version:    25.0.6.m
 Context:    default
 Debug Mode: false

Server:
ERROR: error during connect: in the default daemon configuration on Windows, the docker client must be run with elevated privileges to connect: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.44/info": open //./pi
pe/docker_engine: The system cannot find the file specified.
errors pretty printing info

PS C:\Windows\system32>  Invoke-WebRequest -Uri http://localhost:51678/v1/metadata -UseBasicParsing


StatusCode        : 200
StatusDescription : OK
Content           : {"Cluster":"x86_64-windows-2022","ContainerInstanceArn":"arn:aws:ecs:eu-west-1:123456789011:container-instance/cluster-x86_64-windows-2022/a4c4329a0392450
                    ba9e659b6b...
RawContent        : HTTP/1.1 200 OK
                    Content-Length: 259
                    Content-Type: application/json
                    Date: Mon, 02 Dec 2024 10:02:18 GMT

                    {"Cluster":"x86_64-windows-2022","ContainerInstanceArn":"arn:aws:ecs:...
Forms             :
Headers           : {[Content-Length, 259], [Content-Type, application/json], [Date, Mon, 02 Dec 2024 10:02:18 GMT]}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        :
RawContentLength  : 259

Supporting Log Snippets

UserScript.ps1.log
output.log
err.log

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions