Skip to content

Rethink bundling of requirements #5074

Open
@OriolAbril

Description

@OriolAbril

I want to take this opportunity and rethink all these conda environments we have. I understand they are helpful and convenient to create our dev environments and they are also used for ci. However, at least in my case, the extra work and burden on updating 4 (8 now) nearly equivalent files with a pre-commit check that seems to only partially work (i.e. pandas has a lower bound in 3.7 and 3.8 but not in 3.9 nor in requirements-dev.txt) is clearly larger than the benefit I get. Moreover, issues with these files, installation and pre-commit blocked #5060. #5062 was only needed because of #5060 being blocked.

I think there are several issues at hand regarding all this.

  • No single source of truth. From what I understand, the pre-commit script somehow takes all conda env files and from them generates the pip file. However, it seems like conda files don't need to be equivalent between them (see for example from add 404 page to docs #5057 how nbsphinx was on some conda files but not all of them, as well as several versions not matching between files).
  • Overly bloated and non-specific dependency files. We have 2/3 (still not completely understand the new test file but it seems like a step in the right direction) types of requirements files, library requirements (user facing), dev and test ones. But we use those files to do much more than 3 types of things. We generate local envs for devs, we set up CI envs in github acions for testing, we set up the env on rtd to build documentation... So here are some things that we are doing and I think we should avoid as they also make our ci more brittle and prone to conda finding version conflicts:
    • We are installing sphinx and sphinx extensions everywhere, even on github actions
    • We are installing pytest on rtd
    • We are installing libraries like watermark or sphinx-autobuild also everywhere, but these are only used locally! watermark is only needed when writing/running example notebooks but not to build the docs that contain such notebooks. sphinx-autobuild is only needed to get an auto-updating preview of the docs locally.
  • Most of us (or maybe even all of us) have no idea about what some of these dependencies are for. I for example have no idea what cachetools is for. I think that this is both issue and a consequence of the previous two, but I think it's important to put that on the table, because in my opinion, unless we make an effort in explaining all that in the developer guide and keep it up to date it will only get worse.

Ideas for possible solutions:

  • Being more strict about CI. I have the feeling that some PRs are blocked when only pre-commit fails when others are green-lighted with failing tests and/or rtd job. I haven't written much tests myself here and I imagine testing distributions is inherently hard, but I think we need some way of getting rid of flaky tests or grouping them in a "potentially flaky" job, any idea here welcome. I think that having flaky tests makes ourselves open to the possibility of "maybe this is a flaky test too, I don't recall changing anything that should affect this test" or similar reasonings. On that end we might also benefit from skipping jobs either manually or automatically. i.e. if we only update the readme we don't need to run any ci job but if we modify requirements we should run all jobs even if the library modified isn't directly used everywhere as the change could generate a corrupted environment.
  • Being more strict about reviews. In a similar manner, I have a feeling that some PRs are merged without approving reviews or with approving reviews even though there were still open questions in the PR comments.
  • Split requirements into task related chunks. This by design eliminates the 1 line to dev environment/single file to serve all, so we'll need to install 2-4 files, I'm also not sure how that would fit with all the pre-commit scripts but I'm sure we'll figure it out, maybe even write a helper script to keep creating dev envs a single line operation. Here are some ideas:
    • requirements.txt -> requirements like we have now for pymc
    • requirements-optional.txt -> optional requirements for pymc, we could also update setup.py so that running pip install pymc[all] installs both requiremetns and requirements-optional
    • requirements-test.txt -> requirements for testing
    • requirements-docs.txt -> requirements for building the docs
    • requirements-write.txt -> requirements for writing example notebooks, see https://github.com/pymc-devs/pymc-examples/blob/main/requirements-write.txt
    • others?
  • Documenting everything. Explain on the developer guide/contributing guide when should each file be used plus maybe also extra recommendations for specific local env only tools (doc related examples of that would be sphinx-autobuild or sphobjinv libraries)

cc @Sayam753

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions