Skip to content

How/where should bootstrap command get the list of auto-instrumentation packages from? #668

Closed
@owais

Description

@owais

PR #650 adds a new bootstrap command. The command can be executed by users to automatically detect the libraries their project uses and install instrumentation libraries for the detected packages.

To make this work, the bootstrap commands need posses some knowledge about supported packages and the respective instrumentations. The current implementation hard-codes this information in the bootstrap command source code. Some people have raised concerns about hard-coding this knowledge.

We could as an alternative have a development script that iterates over the list ext/ directory, find all the instrumentations and generate the list. This would automate the step and would ensure that the bootstrap command would always be in sync with the published packages.

That said, I think doing this raises more problems than it solves. Here is what such a script would need to do.

  1. Find all instrumentation packages from ext/ directory.
  2. Figure out the libraries the packages are instrumenting.
  3. Figure out which version of the package to install. For example, source code might have an unpublished version. Or we might discover an issue in the latest version of some instrumentation and want the bootstrap command to install an older version instead. It is easy to imagine that bootstrap command might not always want to install the version specified in the repo.
  4. Detect and ignore any unpublished/in-development packages.
  5. In future, if the project moves away from a mono-repo structure, automating this might not be worth the effort anymore.
  6. We might want the command to install some blessed community/contrib packages in addition to the ones shipped by the core project.

1-4 can be solved by adding additional metadata to each package. A build script can then iterate over the packages, extract this metadata and generate a mapping ready to be used by the command. IMO this is not dramatically better. It just gets rid of a central "hard-coded" index for a distributed one. If we envision this information to be useful elsewhere, may be it could be justified but if it's only consumed was to be the bootstrap command then it doesn't make much sense IMO. We'd just be hard-coding the same information but scattering it all over the repository.

For 5-6, this will not work as we'll need some sort of a discovery mechanism to make it all happen.

Downsides of hard-coding this information.

Only big downside to hard-coding this information I see is that some contributors might forget to update it when publishing instrumentation packages. It is somewhat less likely that people would forget to update this info if it it was stored in setup.cfg but it doesn't solve the problem completely. Also, what is the worst that could happen if someone forgot to update the boostrap command "index"? We'd ship opentelemetry-auto-instrumentation package and bootstrap would install a slightly older version of an instrumentation or not know about a new instrumentation. I think it is relatively very harmless compared to a bootstrap command that automatically updates the index to the latest version of instrumentation package it finds in each package's setup.py.

On the other hand, hard-coding this information gives us a lot more flexibility than any automated solution we could come up with.

Generally speaking, from a software engineering perspective, I think some form of hard-coding this should be the first step and we should move towards automation only after living with the pain (if any) this causes. It might turn out to be a case of premature abstraction otherwise.

I might have missed other obvious reasons to not hard-code though. Happy to hear other thoughts.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions