Description
The documentation says that ${ECS_CONTAINER_METADATA_URI_V4}/task/stats
returns docker ContainerStats, which includes CPU throttling metrics at cpu_stats.throttling_data
. While this data is present, I have experimentally verified that it is never incremented from 0 even when throttling is occurring.
Quoting that comment here:
I ran tasks with both ecs_exporter and an alpine sidecar running ["/bin/sh", "-c", "yes > /dev/null"] (i.e. chewing up a lot of CPU) on Fargate and EC2. They both had less than 1 vCPU allocated, Fargate at the task level and EC2 at the container level. The CPU-seconds metrics for both were definitely increasing slower than real time passed, indicating that throttling was occurring. The builtin CloudWatch graphs available in the AWS console also indicated that these services were using all available CPU.
But the throttling container stats remained at 0. I'm not sure why, but regardless I think this is a dead end without action from AWS.
This data would be nice to have to help better make service operators aware of e.g. when their Fargate task CPU size is too low. Any kind of sampling of current CPU utilization (like you can do with CloudWatch metrics, or with the functional parts of ContainerStats) can miss throttling events (because you sampled only when throttling was not occurring), whereas these metrics would provide a definitive record of throttling having occurred.