Skip to content

Replace pkg_resources with importlib_metadata #7413

Closed
@chrahunt

Description

@chrahunt

What's the problem this feature will solve?

Currently, pip uses a vendored version of pkg_resources in several places.

pkg_resources has a few downsides for our use cases, including:

  1. unconditional scan of entire sys.path on import (Avoid full path enumeration on import of setuptools or pkg_resources? setuptools#510), when we may only need it for a subset of those directories
  2. various issues filed regarding corrupt METADATA files in site-packages preventing pip from starting
  3. complicated to vendor vs a standalone library, since we need to extract it from setuptools
  4. higher barrier of entry for other package management tools that want pip-compatible behavior

Describe the solution you'd like

Replace pkg_resources with importlib.metadata (via the importlib-metadata backport for Python < 3.8).

Our current usages for pkg_resources are:

  1. build_env.BuildEnvironment: identify the set of packages available/conflicting in the isolated environment directories
  2. req.req_install.InstallRequirement: locate an already-installed package
  3. commands.search, commands.show: iterate over the set of installed packages and access metadata
  4. wheel:
    1. parse entrypoints
    2. check wheel version of unpacked Wheel
  5. ad-hoc usage of parsing/canonicalization functions

A combination of importlib_metadata and packaging can satisfy these requirements:

  1. importlib.metadata.distributions(path=["dir1", "dir2"]) then comparison of the version using packaging.specifiers.SpecifierSet
  2. importlib.metadata.distribution("package_name") or importlib.metadata.distributions(path=["..."], name="package_name")
  3. [d.metadata["..."] for d in importlib.metadata.distributions()]
    1. importlib.metadata.Distribution.at(info_dir).entry_points
    2. next(importlib.metadata.distributions(path=["..."])).read_text("WHEEL")
  4. packaging.utils.canonicalize_name, etc

As a vendored library, importlib-metadata would have the following characteristics:

  1. Apache-2.0 license
  2. Several backports for Python 2:
    1. contextlib2, which we already vendor
    2. pathlib2 - we have looked at for tests. There is one issue we need to investigate the impact of: Unusable on Windows/Python2.7 with unicode paths jazzband/pathlib2#56 - MIT licensed, pure Python with no dependencies
    3. configparser - configparser backport - MIT licensed, pure Python with no dependencies
  3. One dependency otherwise:
    1. zipp for making the zipfile module Path-aware - MIT licensed, pure Python with no dependencies

Alternative Solutions

  • Continue using pkg_resources - this approach has the downsides mentioned above
  • Use another metadata-providing library - none researched, I don't imagine we'd choose something else over either of these two

See also

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions