Open
Description
When running from promptsource.seqio_tasks import tasks
it takes a huge amount of time. One of the main reasons is this queries all dataset infos:
- One has to load ALL dataset infos as soon as one uses one task.
- Even when cached, it still queries urls to check that it didn't change. One can bypass this point by passing
HF_DATASETS_OFFLINE=1
as described in Transferpromptsource.seqio_tasks
to https://github.com/bigscience-workshop/t-zero #703 (comment)
IMO both are unnecessary and should be fixed. Is there a reasons why one cannot load seqio tasks dynamically, in the sense of fetching only what is necessary? Something along the lines of:
def add_seqio_task(task_name):
seqio.TaskRegistry.add(...)
Metadata
Metadata
Assignees
Labels
No labels