-
Notifications
You must be signed in to change notification settings - Fork 108
Add basic relay support to Pbench Agent #3460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic relay support to Pbench Agent #3460
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a draft, I'm obviously not reviewing it. 😉
However, thanks for doing the awesome update of the man pages! I've posted a number of comments on that.
Well so much for the CVE: the Python 3.6 tests (which I'd neglected to run manually) apparently can't find above 2.27. Although at least that's high enough to satisfy the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc updates!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have the tests to review, but here's the first installment.
docs/Agent/user-guide/man_page.md
Outdated
Once the upload is complete, the result directories | ||
are, by default, removed from the local system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Messing with option defaults based on other options
Agreed (although, I think Click has ways of doing it which aren't atrocious).
It's a callback, which moves the defaulting logic out of the decorator
I concur that using the callback is one approach. But there are a few others. For instance, after the parsing is done, I think the code can use click.Context.get_parameter_source()
to determine whether the value of the --delete/--no-delete
option was specified explicitly on the command line; if not, then the code can decide to use a value of "no delete" if --relay
was specified. (And, that behavior is not that hard to explain in the documentation.)
Since the file relay doesn't have any persistent context other than the designated file directory, I assume that if it were to fail you could just restart with the same settings and all would be well.
My assumption is that the file relay, and its local storage, is entirely transient. I imagine one option being to run it in a cloud allocation, which evaporates entirely once it's shut down (e.g., in an on-demand, "serverless" execution, like AWS Lambda). So, I'm working from the assumption that the results need to persist on the host with the Agent until they have been safely conveyed all the way to the Pbench Server.
Also, I think it would be beneficial in most cases to hide (or abstract-away) the existence of the relay (to the maximum extent feasible). For instance, we should limit the coupling between the Pbench Server and the relay to just the URIs which the Agent supplies directly to the user and to the Pbench Server via the manifest file; likewise, we should see if we can arrange it so that the user doesn't actually (have to) interact with the relay on the Agent side, either (i.e., unless specifically requested by the user, we can have the Agent create and shut down the relay automatically and invisibly to the user). (And, I would like to avoid precluding the possibility of using a fixed service, like Amazon S3 as the relay.)
We're building Relay integration into the server; so unless you're arguing that we shouldn't do that, I don't really see how you can argue that it shouldn't be able to delete the resource it pulled.
What I think/hope we're building into the Pbench Server is the ability to "pull" results as well as having them "pushed". That ability is built on top of some sort of relay support, but I would stop short of saying that we're "building relay integration into the server" -- the integration that I'm expecting us to provide will be with the Agent.
I'm expecting that, in the typical case, the Agent will stand up the relay when the user wants to push a result, and, therefore, it makes a certain amount of sense that the onus should be on the Agent to tear it down when the transfer is complete. Yes, there is an alternative model where the Agent exits when the results have been uploaded to the relay, and the relay exits when the results have been uploaded to the Server, but I'm not sure that that model is better than keeping the Agent running until the full transfer is complete. So, if the relay is run under the auspices of the Agent, then there is no need for the Pbench Server to offer the capability of removing results from it -- the Agent can do that if we want it done (although, having the Server remove them as a signal to the Agent that the transfer is complete remains an interesting idea...but we don't need a Server API to drive that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good. I just have questions and suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving, although I really think that pbench-results-move --relay
should not delete the local result tree by default until we have some mechanism which allows the command to determine that the result has made it to the Pbench Server.
Also, I found a nit and a couple of doc-thingies for your consideration.
PBENCH-1142 This changes the Pbench Agent `results` mechanism to support a new mode, refactoring the `CopyResults` class into a hierarchy with `CopyResultToServer` and `CopyResultToRelay` and an overloaded `push` to do the work. The `--token` option is now required only when `--relay` is not specified. In addition to `--relay <relay>` to push a manifest and tarball to a Relay, I added a `--server <server>` to override the default config file value, which should allow us to deploy a containerized Pbench Agent without needing to map in a customized config file just to set the Pbench Server URI. The agent "man pages" have been updated with the new options, and some general cleanup left over from distributed-system-analysis#3442. _NOTE_: with this change we have a full end-to-end relay mechanism, but it's simplistic. You need to start a relay server manually from the file relay repo at `distributed-system-analysis/file-relay`, and supply that URI to the `pbench-results-move --relay <relay>` command. In the future we'd like to package the relay and allow management and hosting through Pbench Agent commands within a standard container.
(Yeah, this is Agent side...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
(Except for the --delete
default when --relay
is used. 😇)
I really don't like the inconsistency, and I'm not convinced it makes sense. When we figure out the rest of the Agent workflow we want to support maybe this will become either obvious or obviously pointless. Either way I didn't want to complicate things by throwing it in now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
PBENCH-1142
This changes the Pbench Agent
results
mechanism to support a new mode, refactoring theCopyResults
class into a hierarchy withCopyResultToServer
andCopyResultToRelay
and an overloadedpush
to do the work. The--token
option is now required only when--relay
is not specified.In addition to
--relay <relay>
to push a manifest and tarball to a Relay, I added a--server <server>
to override the default config file value, which should allow us to deploy a containerized Pbench Agent without needing to map in a customized config file just to set the Pbench Server URI.The agent "man pages" have been updated with the new options, and some general cleanup left over from #3442.
NOTE: with this change we have a full end-to-end relay mechanism, but it's simplistic. You need to start a relay server manually from the file relay repo at
distributed-system-analysis/file-relay
, and supply that URI to thepbench-results-move --relay <relay>
command. In the future we'd like to package the relay and allow management and hosting through Pbench Agent commands within a standard container.