Skip to content

Investigate and align Huggingface blob_url and manifest tag usage to prevent mismatches #1417

Open
@sourcery-ai

Description

@sourcery-ai

There is a potential inconsistency in how we construct blob_url and reference tags for Huggingface models. Currently, the blob_url is hardcoded to use the main branch, while the manifest may use the latest tag. This could lead to mismatches or 404 errors if the file exists only in a specific revision or tag.

From the discussion, it appears that:

  • main in the blob URL is used as the actual model tag.
  • latest is used for the manifest.

We have been using main for the blob URL so far, but this approach may not be robust if the manifest and blob URL tags diverge.

Action Items:

  • Investigate how Huggingface handles tags for both manifests and blob URLs.
  • Determine if we should dynamically use the manifest's tag in the blob URL to ensure consistency and avoid potential 404 errors.
  • Update the code to use the correct tag if necessary, and add tests to cover this scenario.

This investigation will help prevent future issues with mismatched tags and ensure reliable access to model files.


I created this issue for @engelmi from #1416 (comment).

Tips and commands

Getting Help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions