Skip to content

Add basic relay support to Pbench Agent #3460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 21, 2023

Conversation

dbutenhof
Copy link
Member

PBENCH-1142

This changes the Pbench Agent results mechanism to support a new mode, refactoring the CopyResults class into a hierarchy with CopyResultToServer and CopyResultToRelay and an overloaded push to do the work. The --token option is now required only when --relay is not specified.

In addition to --relay <relay> to push a manifest and tarball to a Relay, I added a --server <server> to override the default config file value, which should allow us to deploy a containerized Pbench Agent without needing to map in a customized config file just to set the Pbench Server URI.

The agent "man pages" have been updated with the new options, and some general cleanup left over from #3442.

NOTE: with this change we have a full end-to-end relay mechanism, but it's simplistic. You need to start a relay server manually from the file relay repo at distributed-system-analysis/file-relay, and supply that URI to the pbench-results-move --relay <relay> command. In the future we'd like to package the relay and allow management and hosting through Pbench Agent commands within a standard container.

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a draft, I'm obviously not reviewing it. 😉

However, thanks for doing the awesome update of the man pages! I've posted a number of comments on that.

@dbutenhof
Copy link
Member Author

dbutenhof commented Jun 12, 2023

Well so much for the CVE: the Python 3.6 tests (which I'd neglected to run manually) apparently can't find above 2.27. Although at least that's high enough to satisfy the requests.exceptions.JSONDecodeError requirement...

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc updates!

@dbutenhof dbutenhof marked this pull request as ready for review June 12, 2023 18:22
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have the tests to review, but here's the first installment.

Comment on lines 486 to 487
Once the upload is complete, the result directories
are, by default, removed from the local system.
Copy link
Member

@webbnh webbnh Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messing with option defaults based on other options

Agreed (although, I think Click has ways of doing it which aren't atrocious).

It's a callback, which moves the defaulting logic out of the decorator

I concur that using the callback is one approach. But there are a few others. For instance, after the parsing is done, I think the code can use click.Context.get_parameter_source() to determine whether the value of the --delete/--no-delete option was specified explicitly on the command line; if not, then the code can decide to use a value of "no delete" if --relay was specified. (And, that behavior is not that hard to explain in the documentation.)

Since the file relay doesn't have any persistent context other than the designated file directory, I assume that if it were to fail you could just restart with the same settings and all would be well.

My assumption is that the file relay, and its local storage, is entirely transient. I imagine one option being to run it in a cloud allocation, which evaporates entirely once it's shut down (e.g., in an on-demand, "serverless" execution, like AWS Lambda). So, I'm working from the assumption that the results need to persist on the host with the Agent until they have been safely conveyed all the way to the Pbench Server.

Also, I think it would be beneficial in most cases to hide (or abstract-away) the existence of the relay (to the maximum extent feasible). For instance, we should limit the coupling between the Pbench Server and the relay to just the URIs which the Agent supplies directly to the user and to the Pbench Server via the manifest file; likewise, we should see if we can arrange it so that the user doesn't actually (have to) interact with the relay on the Agent side, either (i.e., unless specifically requested by the user, we can have the Agent create and shut down the relay automatically and invisibly to the user). (And, I would like to avoid precluding the possibility of using a fixed service, like Amazon S3 as the relay.)

We're building Relay integration into the server; so unless you're arguing that we shouldn't do that, I don't really see how you can argue that it shouldn't be able to delete the resource it pulled.

What I think/hope we're building into the Pbench Server is the ability to "pull" results as well as having them "pushed". That ability is built on top of some sort of relay support, but I would stop short of saying that we're "building relay integration into the server" -- the integration that I'm expecting us to provide will be with the Agent.

I'm expecting that, in the typical case, the Agent will stand up the relay when the user wants to push a result, and, therefore, it makes a certain amount of sense that the onus should be on the Agent to tear it down when the transfer is complete. Yes, there is an alternative model where the Agent exits when the results have been uploaded to the relay, and the relay exits when the results have been uploaded to the Server, but I'm not sure that that model is better than keeping the Agent running until the full transfer is complete. So, if the relay is run under the auspices of the Agent, then there is no need for the Pbench Server to offer the capability of removing results from it -- the Agent can do that if we want it done (although, having the Server remove them as a signal to the Agent that the transfer is complete remains an interesting idea...but we don't need a Server API to drive that).

webbnh
webbnh previously approved these changes Jun 14, 2023
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good. I just have questions and suggestions.

webbnh
webbnh previously approved these changes Jun 16, 2023
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving, although I really think that pbench-results-move --relay should not delete the local result tree by default until we have some mechanism which allows the command to determine that the result has made it to the Pbench Server.

Also, I found a nit and a couple of doc-thingies for your consideration.

PBENCH-1142

This changes the Pbench Agent `results` mechanism to support a new mode,
refactoring the `CopyResults` class into a hierarchy with `CopyResultToServer`
and `CopyResultToRelay` and an overloaded `push` to do the work. The `--token`
option is now required only when `--relay` is not specified.

In addition to `--relay <relay>` to push a manifest and tarball to a Relay, I
added a `--server <server>` to override the default config file value, which
should allow us to deploy a containerized Pbench Agent without needing to map in
a customized config file just to set the Pbench Server URI.

The agent "man pages" have been updated with the new options, and some general
cleanup left over from distributed-system-analysis#3442.

_NOTE_: with this change we have a full end-to-end relay mechanism, but it's
simplistic. You need to start a relay server manually from the file relay repo
at `distributed-system-analysis/file-relay`, and supply that URI to the
`pbench-results-move --relay <relay>` command. In the future we'd like to
package the relay and allow management and hosting through Pbench Agent
commands within a standard container.
(Yeah, this is Agent side...)
@dbutenhof dbutenhof requested a review from webbnh June 16, 2023 19:44
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

(Except for the --delete default when --relay is used. 😇)

@dbutenhof
Copy link
Member Author

(Except for the --delete default when --relay is used. innocent)

I really don't like the inconsistency, and I'm not convinced it makes sense. When we figure out the rest of the Agent workflow we want to support maybe this will become either obvious or obviously pointless. Either way I didn't want to complicate things by throwing it in now.

Copy link
Member

@siddardh-ra siddardh-ra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@dbutenhof dbutenhof merged commit c2b7472 into distributed-system-analysis:main Jun 21, 2023
@dbutenhof dbutenhof deleted the torelay branch June 21, 2023 12:01
webbnh added a commit to webbnh/pbench that referenced this pull request Jun 21, 2023
webbnh added a commit to webbnh/pbench that referenced this pull request Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants