-
Notifications
You must be signed in to change notification settings - Fork 204
feat: add RAPL zone filtering support through configuration #2031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add RAPL zone filtering support through configuration #2031
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## reboot #2031 +/- ##
==========================================
+ Coverage 93.26% 93.35% +0.08%
==========================================
Files 21 21
Lines 1099 1143 +44
==========================================
+ Hits 1025 1067 +42
- Misses 58 60 +2
Partials 16 16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
config/config_rapl_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we keep all config related tests inside config.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, there is no hard requirement like that. It needs to be under the same package. Thats all
cmd/kepler/main.go
Outdated
@@ -256,5 +256,18 @@ func createCPUMeter(logger *slog.Logger, cfg *config.Config) (device.CPUPowerMet | |||
if fake := cfg.Dev.FakeCpuMeter; fake.Enabled { | |||
return device.NewFakeCPUMeter(fake.Zones, device.WithFakeLogger(logger)) | |||
} | |||
return device.NewCPUPowerMeter(cfg.Host.SysFS) | |||
|
|||
if !cfg.Rapl.Enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be the behaviour when rapl is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kepler should continue to monitor power of other meters. E.g. say there is platform power and gpu ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I disable rapl I still can see Kepler producing node metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kepler should continue to monitor power of other meters. E.g. say there is platform power and GPU ...
For now it should only produce metrics like CPU info and build info right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vprashar2929, on second thought, I am removing enable.rapl
flag for now since it requires other changes to monitor (more than the scope of this work). Lets revisit this when we have platform power as well.
77fad70
to
c337cdd
Compare
hack/config.yaml
Outdated
@@ -6,6 +6,10 @@ host: | |||
sysfs: /sys # Path to sysfs filesystem (default: /sys) | |||
procfs: /proc # Path to procfs filesystem (default: /proc) | |||
|
|||
rapl: | |||
enabled: false # Enable RAPL (default: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should set it true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
677bb3a
to
9295f52
Compare
hack/config.yaml
Outdated
# WARN DO NOT ENABLE THIS IN PRODUCTION - for development / testing only | ||
dev: | ||
fake-cpu-meter: | ||
enabled: false | ||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stay false
hack/config.yaml
Outdated
@@ -6,8 +6,12 @@ host: | |||
sysfs: /sys # Path to sysfs filesystem (default: /sys) | |||
procfs: /proc # Path to procfs filesystem (default: /proc) | |||
|
|||
rapl: | |||
enabled: true # Enable RAPL (default: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed enable from rapl so no need to expose here until we add platform etc right?
9295f52
to
072c902
Compare
cmd/kepler/main.go
Outdated
return device.NewCPUPowerMeter(cfg.Host.SysFS) | ||
|
||
if len(cfg.Rapl.Zones) > 0 { | ||
logger.Info("Filtering RAPL zones", "zones", cfg.Rapl.Zones) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg.Rapl.Zones
are the zones we want to keep, not the ones we want to filter out.
better say it as enabled zones, like mentioned in comment below
# zones to be enabled, empty enables all default zones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment was not just about t he log but about replacing the word filter
with enable
or select
.
instead of device.WithZoneFilter
we have device.WithSelectedZones
or device.WithEnabledZones
instead of filterZones
we say selectZones
or enabledZones
but this is a minor comment. we can change it in a follow-up PR if needed.
return nil, fmt.Errorf("no RAPL zones found after filtering") | ||
} | ||
|
||
// filter out non-standard zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if a non-standard zone is configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-standard zones only differ in path and not names. At least for the purpose of current implementation, non-standard only differ in its path like kepler_node_rapl_zone{index="0",name="package",path="/sys/class/powercap/intel-rapl-mmio:0"} 1
By that, it will be preserved / filtered in
Changes include: * Add RAPL configuration for filtering only required zones * Add zone filtering in raplPowerMeter; log excluded and included zones * Update createCPUMeter to use RAPL config and filter zones. * Test for RAPL config and zone filtering. Signed-off-by: Sunil Thaha <[email protected]>
072c902
to
6d73db7
Compare
34894b2
into
sustainable-computing-io:reboot
Changes include: