Skip to content

Fix the custom function documentation #962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
cdavernas opened this issue Aug 7, 2024 · 14 comments · Fixed by #969
Closed
3 tasks

Fix the custom function documentation #962

cdavernas opened this issue Aug 7, 2024 · 14 comments · Fixed by #969
Assignees
Labels
area: spec Changes in the Specification change: documentation Improvements or additions to documentation. It won't impact a version change. change: fix Something isn't working. Impacts in a minor version change.
Milestone

Comments

@cdavernas
Copy link
Member

What would you like to be added:

Fix the documentation for custom functions.

Namely, we need to:

  • Remove top level name in function.yaml, as the name is defined by the function's directory
  • Indicate that custom functions can be referenced using the URI pointing at the repository directory that defines the function.yaml OR using the concatenation {name}:{version}. This is however only be supported for cataloged functions.
  • Indicate that runtime may define mechanisms to register additional function catalogs, in which case the above concatenation can also be used for functions defined in those.

Why is this needed:

The documentation drifted away from latest changes made to the SW Catalog.

@cdavernas cdavernas added change: fix Something isn't working. Impacts in a minor version change. change: documentation Improvements or additions to documentation. It won't impact a version change. labels Aug 7, 2024
@cdavernas cdavernas added this to the v1.0.0-alpha3 milestone Aug 7, 2024
@cdavernas
Copy link
Member Author

@ricardozanini @matthias-pichler-warrify @JBBianchi @fjtirado Should we maybe also add a new catalogs property in the top-level use?

This would allow users to easily reference functions from imported catalogs, using the convenience concatenation feature.

@JBBianchi
Copy link
Member

JBBianchi commented Aug 7, 2024

@ricardozanini @matthias-pichler-warrify @JBBianchi @fjtirado Should we maybe also add a new catalogs property in the top-level use?

This would allow users to easily reference functions from imported catalogs, using the convenience concatenation feature.

I think it might be useful indeed. I would still recommend the runtimes to also have a global configuration for catalogs (optional).

For instance, if at the global level the runtime is configured with:

https://catalogA.com/
https://catalogB.com/
https://catalogC.com/

All workflows, by default, will be able to lookup those catalogs.

Now, if the authors specifies the following in a definition:

use
  catalogs:
  - https://catalogB.com/
  - https://catalogC.com/
  - https://catalogZ.com/

It takes over the global configuration in favor of those.

As I see it, the lookup order should be:

  • Custom functions defined in the definition itself
  • (Local catalog, if any -- or a way to qualify it in use.catalogs)
  • Catalogs defined in the definition itsefl
  • Catalogs defined globally, if any

@ricardozanini
Copy link
Member

So for every function, I'll have to fetch all the catalogs to find the one? Custom functions must have a direct link to their catalog. I'd say an URI in this case fits better, like we do on GHA.

function: mycatalog.com/repo/function-name

@cdavernas
Copy link
Member Author

cdavernas commented Aug 7, 2024

If you added a catalog, yes.

But you'd only need to fetch all cataloged functions only once at workflow start, so IMHO it's not really an issue.

@cdavernas
Copy link
Member Author

I'd say an URI in this case fits better

That's the default behavior indeed, as inspired by GHA 😉

@JBBianchi
Copy link
Member

So for every function, I'll have to fetch all the catalogs to find the one? Custom functions must have a direct link to their catalog. I'd say an URI in this case fits better, like we do on GHA.

function: mycatalog.com/repo/function-name

I'm not sure what you mean by "fetch all the catalogs" but you'll have to consecutively lookup the function. For each custom function, it can lead to 0 request, if the function is declared within the workflow definition (or hosted locally), up to N requests, N being the number of catalogs configured (so in my examples above, up to 3 requests per custom function). In the end, nothing prevents the runtime from having a local cache for already resolved functions, but I don't think it needs to be covered by the spec.

(On a side note, I don't find the doc in GHA about the use of fully qualified URI, it seems to either be {owner}/{repo}(/path)@{ref} or ./path/to/dir (or a Docker variant) (cf. jobs.<job_id>.steps[*].uses. Maybe it's located somewhere else with Enterprise docs or self-hosted runners...)

@cdavernas
Copy link
Member Author

@JBBianchi You are right, and we should probably do the same than GH to ensure that the referenced function always reside in a (GH) repository.

So, good point, I believe we should only allow URIs relative to GH.

@ricardozanini
Copy link
Member

So, good point, I believe we should only allow URIs relative to GH.

I'm not so sure. Why can't a user have an internal catalog to share privately in the company? Or use gitlab (👿 )?

Regarding the uses, I know an implementation can use cache. That holds for direct calls as well.

An import in the DSL complicates things; the cost/benefit here doesn't seem right. I don't see why not just using the URI reference straight to the function would be a problem. We should focus on keeping things simple. Adding more ways of doing the same is the opposite. Once we have it well established and from usage feedback, we understand that it's important to have such a feature, and we can easily add it to the spec later.

Let's reference a free URI, not even defaulting to GH, to avoid bad practices like in container tags where people won't specify the docker.io as the default registry.

@cdavernas
Copy link
Member Author

cdavernas commented Aug 7, 2024

Why can't a user have an internal catalog to share privately in the company?

Let's reference a free URI

We cannot fo that, because a custom function, just like in GHA, is a definition file, namely function.yaml, possibly accompanied by relative resources. Therefore, the URI MUST reference a folder structure. Whereas GitLab is a possible alternative that would address the privately hosted repo use case, it is the only one that comes to mind, aside maybe from a file-system based one.

In other words, and while I think leaving free form URI reference is a dangerous and error prone, I think you are right in saying we need a broader use case than just GH.

An import in the DSL complicates things

I disagree. That's a feature we introduced in v0.8 (external resources) and it really simplifies the life of authors. Instead of having a potentially huge url to write, they can choose to import a catalog and drastically shorten references. IMHO, we should focus on that exactly: convenience for authors. Plus, if one doesn't like the overhead, she can choose to just not use that additional property in use.

To conclude that feature is imho really cool and painless to implement, and could drastically shorten some definitions, so I think we should include it.

@ricardozanini
Copy link
Member

Can we include use and make a reference in the function? So that we index the function name with the uses, avoiding iterating over the imports.

Something like:

uses github/serverlessworkflow/catalog as catalog

function:
    call: catalog/log

@cdavernas
Copy link
Member Author

@ricardozanini yes, that is IMHO a great idea!

@JBBianchi
Copy link
Member

JBBianchi commented Aug 8, 2024

Can we include use and make a reference in the function? So that we index the function name with the uses, avoiding iterating over the imports.

Something like:

uses github/serverlessworkflow/catalog as catalog

function:
    call: catalog/log

I understand it might have an impact on execution time but I think iteration is a fundamental feature, not a limitation. As far as I know, most package managers operate this way: you have a list of registries, and when you install a package, the system iterates through them.

Let me try to sum-up the different possibilities we mentioned so far, therefore we can maybe list the pros and cons of each approach.

Embedded function declaration

Use a custom function declared in the definition:

document:
  use:
    functions:
      myLogFn: //...
do:
  - callCustomFn:
      call: myLogFn

Load a remote function directly from an URI

Load the function directly from an URI:

do:
  - callCustomFn:
      call: https://server.com/my-functions/my-log-fn/function.yaml

Load a function using the default catalog

We assume runtime implement https://github.com/serverlessworkflow/catalog as the default catalog.

do:
  - callCustomFn:
      call: myLogFn:1.0.0

The runtime will fetch https://github.com/serverlessworkflow/catalog/myLogFn/1.0.0/function.yaml

* Raises the question of the possibility to override the default and/or add other global catalogs.

Load a function using a named catalog

Load the function from a specific catalog:

document:
  use:
    catalogs:
      myPrivateCatalog: https://private-catalog.com
do:
  - callCustomFn:
      call: myPrivateCatalog/myLogFn:1.0.0

The runtime will fetch (and only?*) https://private-catalog.com/myLogFn/1.0.0/function.yaml

* What if there is another global catalog with the same name ? In all cases, the one in the definition have the precedence. But should we ignore the global one or fallback onto it ? (I'd rather go for the fallback to be consistent with the next approach)

Load a function using fallbacks

We assume runtime implement https://github.com/serverlessworkflow/catalog as the default catalog.

document:
  use:
    catalogs:
      myPrivateCatalog: https://private-catalog.com
do:
  - callCustomFn:
      call: myLogFn:1.0.0

The runtime will fetch first https://private-catalog.com/myLogFn/1.0.0/function.yaml.
If it fails, it will then fetch https://github.com/serverlessworkflow/catalog/myLogFn/1.0.0/function.yaml.

Load a function using an alias

I assume we could "abuse" the system with aliases, by redefining a local function that calls a remote one:

document:
  use:
    functions:
      myLogFn:
        call: https://private-catalog.com/myLogFn/1.0.0/function.yaml
        //call: myLogFn/1.0.0
do:
  - callCustomFn:
      call: myLogFn

Let's not forget the priority of constituencies, it "doesn't matter" if it's harder to implement, the usability comes first.

From a user point of view, I assume all the examples above are valid.

@cdavernas cdavernas self-assigned this Aug 8, 2024
@cdavernas cdavernas added the area: spec Changes in the Specification label Aug 8, 2024
@ricardozanini
Copy link
Member

The issue in the catalog related to the topic:

@fjtirado
Copy link
Collaborator

fjtirado commented Aug 8, 2024

I think I prefer Load a function using a named catalog one, with a note.
We should not override default global catalog for a a function that is not prefixed.
I mean, lets assume the default global catalog defines myLogFn
using
call: myPrivateCatalog/myLogFn:1.0.0 invokes the one in myprivatecatalog (probably myPrivateCatelog:myLogFn:1.0.0 syntax is more clear, but Im not sure)
call: myLogFn:1.0.0 invokes the one defined in global catalog (not function namespace prefix means global default namespace)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: spec Changes in the Specification change: documentation Improvements or additions to documentation. It won't impact a version change. change: fix Something isn't working. Impacts in a minor version change.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants