Add basic relay support to Pbench Agent #3460

dbutenhof · 2023-06-08T19:31:40Z

PBENCH-1142

This changes the Pbench Agent results mechanism to support a new mode, refactoring the CopyResults class into a hierarchy with CopyResultToServer and CopyResultToRelay and an overloaded push to do the work. The --token option is now required only when --relay is not specified.

In addition to --relay <relay> to push a manifest and tarball to a Relay, I added a --server <server> to override the default config file value, which should allow us to deploy a containerized Pbench Agent without needing to map in a customized config file just to set the Pbench Server URI.

The agent "man pages" have been updated with the new options, and some general cleanup left over from #3442.

NOTE: with this change we have a full end-to-end relay mechanism, but it's simplistic. You need to start a relay server manually from the file relay repo at distributed-system-analysis/file-relay, and supply that URI to the pbench-results-move --relay <relay> command. In the future we'd like to package the relay and allow management and hosting through Pbench Agent commands within a standard container.

webbnh

Since this is a draft, I'm obviously not reviewing it. 😉

However, thanks for doing the awesome update of the man pages! I've posted a number of comments on that.

agent/util-scripts/gold/pbench-results-move/test-23.txt

docs/Agent/user-guide/man_page.md

dbutenhof · 2023-06-12T15:32:46Z

Well so much for the CVE: the Python 3.6 tests (which I'd neglected to run manually) apparently can't find above 2.27. Although at least that's high enough to satisfy the requests.exceptions.JSONDecodeError requirement...

agent/requirements.txt

webbnh

Thanks for the doc updates!

webbnh

I still have the tests to review, but here's the first installment.

lib/pbench/agent/results.py

lib/pbench/cli/agent/commands/results/move.py

lib/pbench/cli/agent/commands/results/results_options.py

webbnh · 2023-06-13T22:10:35Z

docs/Agent/user-guide/man_page.md

+Once the upload is complete, the result directories
+are, by default, removed from the local system.


Messing with option defaults based on other options

Agreed (although, I think Click has ways of doing it which aren't atrocious).

It's a callback, which moves the defaulting logic out of the decorator

I concur that using the callback is one approach. But there are a few others. For instance, after the parsing is done, I think the code can use click.Context.get_parameter_source() to determine whether the value of the --delete/--no-delete option was specified explicitly on the command line; if not, then the code can decide to use a value of "no delete" if --relay was specified. (And, that behavior is not that hard to explain in the documentation.)

Since the file relay doesn't have any persistent context other than the designated file directory, I assume that if it were to fail you could just restart with the same settings and all would be well.

My assumption is that the file relay, and its local storage, is entirely transient. I imagine one option being to run it in a cloud allocation, which evaporates entirely once it's shut down (e.g., in an on-demand, "serverless" execution, like AWS Lambda). So, I'm working from the assumption that the results need to persist on the host with the Agent until they have been safely conveyed all the way to the Pbench Server.

Also, I think it would be beneficial in most cases to hide (or abstract-away) the existence of the relay (to the maximum extent feasible). For instance, we should limit the coupling between the Pbench Server and the relay to just the URIs which the Agent supplies directly to the user and to the Pbench Server via the manifest file; likewise, we should see if we can arrange it so that the user doesn't actually (have to) interact with the relay on the Agent side, either (i.e., unless specifically requested by the user, we can have the Agent create and shut down the relay automatically and invisibly to the user). (And, I would like to avoid precluding the possibility of using a fixed service, like Amazon S3 as the relay.)

We're building Relay integration into the server; so unless you're arguing that we shouldn't do that, I don't really see how you can argue that it shouldn't be able to delete the resource it pulled.

What I think/hope we're building into the Pbench Server is the ability to "pull" results as well as having them "pushed". That ability is built on top of some sort of relay support, but I would stop short of saying that we're "building relay integration into the server" -- the integration that I'm expecting us to provide will be with the Agent.

I'm expecting that, in the typical case, the Agent will stand up the relay when the user wants to push a result, and, therefore, it makes a certain amount of sense that the onus should be on the Agent to tear it down when the transfer is complete. Yes, there is an alternative model where the Agent exits when the results have been uploaded to the relay, and the relay exits when the results have been uploaded to the Server, but I'm not sure that that model is better than keeping the Agent running until the full transfer is complete. So, if the relay is run under the auspices of the Agent, then there is no need for the Pbench Server to offer the capability of removing results from it -- the Agent can do that if we want it done (although, having the Server remove them as a signal to the Agent that the transfer is complete remains an interesting idea...but we don't need a Server API to drive that).

webbnh

Looks generally good. I just have questions and suggestions.

lib/pbench/test/unit/agent/task/test_copy_result_tb.py

lib/pbench/test/unit/agent/task/test_results_move.py

lib/pbench/test/unit/agent/task/test_results_push.py

lib/pbench/agent/results.py

webbnh

I'm approving, although I really think that pbench-results-move --relay should not delete the local result tree by default until we have some mechanism which allows the command to determine that the result has made it to the Pbench Server.

Also, I found a nit and a couple of doc-thingies for your consideration.

lib/pbench/agent/results.py

lib/pbench/test/unit/agent/task/test_results_move.py

lib/pbench/test/unit/agent/task/test_results_push.py

PBENCH-1142 This changes the Pbench Agent `results` mechanism to support a new mode, refactoring the `CopyResults` class into a hierarchy with `CopyResultToServer` and `CopyResultToRelay` and an overloaded `push` to do the work. The `--token` option is now required only when `--relay` is not specified. In addition to `--relay <relay>` to push a manifest and tarball to a Relay, I added a `--server <server>` to override the default config file value, which should allow us to deploy a containerized Pbench Agent without needing to map in a customized config file just to set the Pbench Server URI. The agent "man pages" have been updated with the new options, and some general cleanup left over from distributed-system-analysis#3442. _NOTE_: with this change we have a full end-to-end relay mechanism, but it's simplistic. You need to start a relay server manually from the file relay repo at `distributed-system-analysis/file-relay`, and supply that URI to the `pbench-results-move --relay <relay>` command. In the future we'd like to package the relay and allow management and hosting through Pbench Agent commands within a standard container.

(Yeah, this is Agent side...)

webbnh

Looks good.

(Except for the --delete default when --relay is used. 😇)

dbutenhof · 2023-06-16T20:00:27Z

(Except for the --delete default when --relay is used. innocent)

I really don't like the inconsistency, and I'm not convinced it makes sense. When we figure out the rest of the Agent workflow we want to support maybe this will become either obvious or obviously pointless. Either way I didn't want to complicate things by throwing it in now.

siddardh-ra

Looks good!

dbutenhof added enhancement Agent labels Jun 8, 2023

dbutenhof self-assigned this Jun 8, 2023

dbutenhof force-pushed the torelay branch from a68eedc to f5d22af Compare June 9, 2023 14:14

webbnh reviewed Jun 9, 2023

View reviewed changes

dbutenhof force-pushed the torelay branch from f5d22af to d403b29 Compare June 12, 2023 15:23

webbnh reviewed Jun 12, 2023

View reviewed changes

agent/requirements.txt Show resolved Hide resolved

webbnh reviewed Jun 12, 2023

View reviewed changes

dbutenhof marked this pull request as ready for review June 12, 2023 18:22

dbutenhof requested review from ndokos, webbnh, riya-17 and siddardh-ra June 12, 2023 18:23

webbnh reviewed Jun 13, 2023

View reviewed changes

webbnh previously approved these changes Jun 14, 2023

View reviewed changes

dbutenhof dismissed webbnh’s stale review via e6520d9 June 15, 2023 19:51

dbutenhof force-pushed the torelay branch from e208af4 to e6520d9 Compare June 15, 2023 19:51

dbutenhof requested a review from webbnh June 15, 2023 19:51

webbnh previously approved these changes Jun 16, 2023

View reviewed changes

lib/pbench/agent/results.py Outdated Show resolved Hide resolved

lib/pbench/test/unit/agent/task/test_results_move.py Outdated Show resolved Hide resolved

lib/pbench/test/unit/agent/task/test_results_push.py Outdated Show resolved Hide resolved

dbutenhof added 8 commits June 16, 2023 15:43

Some cleanup

f3edf0b

Python 3.6 cleanup

90f6d9d

(Yeah, this is Agent side...)

Legacy tests and cleanup

ca7f90d

Review feedback, and CVE

598adcc

Can't fix the requests CVE on Python 3.6...

71bed81

Review comments

5583277

Once more into the bottomless pit ...

ed1c9d6

dbutenhof dismissed webbnh’s stale review via ed1c9d6 June 16, 2023 19:43

dbutenhof force-pushed the torelay branch from e6520d9 to ed1c9d6 Compare June 16, 2023 19:43

dbutenhof requested a review from webbnh June 16, 2023 19:44

webbnh approved these changes Jun 16, 2023

View reviewed changes

siddardh-ra approved these changes Jun 21, 2023

View reviewed changes

dbutenhof merged commit c2b7472 into distributed-system-analysis:main Jun 21, 2023

dbutenhof deleted the torelay branch June 21, 2023 12:01

webbnh added a commit to webbnh/pbench that referenced this pull request Jun 21, 2023

Fixes for the merge of distributed-system-analysis#3460

4bf7774

webbnh added a commit to webbnh/pbench that referenced this pull request Jun 22, 2023

Fixes for the merge of distributed-system-analysis#3460

e542f0b

		Once the upload is complete, the result directories
		are, by default, removed from the local system.

Add basic relay support to Pbench Agent #3460

Add basic relay support to Pbench Agent #3460

Uh oh!

Conversation

dbutenhof commented Jun 8, 2023

Uh oh!

webbnh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbutenhof commented Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

webbnh left a comment

Choose a reason for hiding this comment

Uh oh!

webbnh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

webbnh Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

webbnh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

webbnh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

webbnh left a comment

Choose a reason for hiding this comment

Uh oh!

dbutenhof commented Jun 16, 2023

Uh oh!

siddardh-ra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dbutenhof commented Jun 12, 2023 •

edited

Loading

webbnh Jun 13, 2023 •

edited

Loading