Skip to content

feat: add RAPL zone filtering support through configuration #2031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented Apr 28, 2025

Changes include:

  • Add RAPL zone filtering option in config (rapl/zones).
  • Add zone filtering in raplPowerMeter and log excluded and included zones
  • Update createCPUMeter to use RAPL config and filter zones.
  • Test for RAPL config and zone filtering.

Copy link

codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 91.11111% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.35%. Comparing base (d6e67ac) to head (6d73db7).
Report is 4 commits behind head on reboot.

Files with missing lines Patch % Lines
internal/device/rapl_sysfs_power_meter.go 88.88% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           reboot    #2031      +/-   ##
==========================================
+ Coverage   93.26%   93.35%   +0.08%     
==========================================
  Files          21       21              
  Lines        1099     1143      +44     
==========================================
+ Hits         1025     1067      +42     
- Misses         58       60       +2     
  Partials       16       16              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sthaha sthaha requested a review from vimalk78 April 28, 2025 04:35
@vprashar2929 vprashar2929 added the feat A new feature or enhancement label Apr 28, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep all config related tests inside config.go ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, there is no hard requirement like that. It needs to be under the same package. Thats all

@@ -256,5 +256,18 @@ func createCPUMeter(logger *slog.Logger, cfg *config.Config) (device.CPUPowerMet
if fake := cfg.Dev.FakeCpuMeter; fake.Enabled {
return device.NewFakeCPUMeter(fake.Zones, device.WithFakeLogger(logger))
}
return device.NewCPUPowerMeter(cfg.Host.SysFS)

if !cfg.Rapl.Enabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be the behaviour when rapl is disabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kepler should continue to monitor power of other meters. E.g. say there is platform power and gpu ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I disable rapl I still can see Kepler producing node metrics

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kepler should continue to monitor power of other meters. E.g. say there is platform power and GPU ...

For now it should only produce metrics like CPU info and build info right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vprashar2929, on second thought, I am removing enable.rapl flag for now since it requires other changes to monitor (more than the scope of this work). Lets revisit this when we have platform power as well.

@sthaha sthaha force-pushed the feat-filter-rapl-zones branch from 77fad70 to c337cdd Compare April 28, 2025 08:34
hack/config.yaml Outdated
@@ -6,6 +6,10 @@ host:
sysfs: /sys # Path to sysfs filesystem (default: /sys)
procfs: /proc # Path to procfs filesystem (default: /proc)

rapl:
enabled: false # Enable RAPL (default: true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should set it true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sthaha sthaha force-pushed the feat-filter-rapl-zones branch 2 times, most recently from 677bb3a to 9295f52 Compare April 28, 2025 11:26
hack/config.yaml Outdated
# WARN DO NOT ENABLE THIS IN PRODUCTION - for development / testing only
dev:
fake-cpu-meter:
enabled: false
enabled: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay false

hack/config.yaml Outdated
@@ -6,8 +6,12 @@ host:
sysfs: /sys # Path to sysfs filesystem (default: /sys)
procfs: /proc # Path to procfs filesystem (default: /proc)

rapl:
enabled: true # Enable RAPL (default: true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed enable from rapl so no need to expose here until we add platform etc right?

@sthaha sthaha force-pushed the feat-filter-rapl-zones branch from 9295f52 to 072c902 Compare April 28, 2025 13:03
return device.NewCPUPowerMeter(cfg.Host.SysFS)

if len(cfg.Rapl.Zones) > 0 {
logger.Info("Filtering RAPL zones", "zones", cfg.Rapl.Zones)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg.Rapl.Zones are the zones we want to keep, not the ones we want to filter out.
better say it as enabled zones, like mentioned in comment below

 # zones to be enabled, empty enables all default zones

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment was not just about t he log but about replacing the word filter with enable or select.
instead of device.WithZoneFilter we have device.WithSelectedZones or device.WithEnabledZones
instead of filterZones we say selectZones or enabledZones

but this is a minor comment. we can change it in a follow-up PR if needed.

return nil, fmt.Errorf("no RAPL zones found after filtering")
}

// filter out non-standard zones
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if a non-standard zone is configured.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-standard zones only differ in path and not names. At least for the purpose of current implementation, non-standard only differ in its path like kepler_node_rapl_zone{index="0",name="package",path="/sys/class/powercap/intel-rapl-mmio:0"} 1

By that, it will be preserved / filtered in

Changes include:
* Add RAPL configuration for filtering only required zones
* Add zone filtering in raplPowerMeter; log excluded and included zones
* Update createCPUMeter to use RAPL config and filter zones.
* Test for RAPL config and zone filtering.

Signed-off-by: Sunil Thaha <[email protected]>
@sthaha sthaha force-pushed the feat-filter-rapl-zones branch from 072c902 to 6d73db7 Compare April 28, 2025 14:19
@vimalk78 vimalk78 merged commit 34894b2 into sustainable-computing-io:reboot Apr 28, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat A new feature or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants