Skip to content

Nightly build 2025-06-28

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 09 Dec 11:19

HyperQueue dev

Breaking change

  • In --crash-limit value 0 is no longer allowed, use --crash-limit=unlimited.
  • The --workers-per-alloc flag of the hq alloc add command has been replaced with --max-workers-per-alloc,
    which determines the maximum number of workers to spawn in each allocation. Previously the flag caused the
    allocator to (almost) always spawn the determined number of workers per allocation, regardless of actual
    computational load.

Changes

The automatic allocator has been finally reimplemented, and is now much better:

  • It now uses information from the scheduler to determine how many allocations to spawn, and thus it can react to the
    current computational load much more accurately. It should also be less "eager".
  • It properly supports multi-node tasks.
  • It considers computational load across all allocation queues (before, each queue was treated separately, which led to
    creating too many submissions).
  • It now exposes a min-utilization parameter, which can be used to avoid spawning an allocation that couldn't be utilized
    enough.

As this is a large behavioral change, we would be happy to hear your feedback!

New features

  • New command hq task explain <job_id> <task_id> explains why a task cannot be run on a given worker.
  • The server scheduler now slightly prioritizes tasks from older jobs and finishing partially-computed task graphs
  • New values for --crash-limit:
    • never-restart - task is never restarted, even if it "crashes" on a worker that was explicitly terminated.
    • unlimited - unlimited crash limit
  • hq worker info contains more information
  • hq job forget tries to free more memory
  • You can now configure Job name in the Python API.
  • hq job progress now displays all jobs and tasks that you wait for, rather than those that were unfinished at the
    time when the command was executed.

Fixes

  • Fixed a problem with journal loading when task dependencies are used
  • Fixed restoring crash counters and instance ids from journal
  • Fixed some corner cases of load balancing in server scheduler

Docs

  • CLI documentation (when --help is used) was cleaned up and improved
  • Our documentation now contains an automatically generated reference of all available HQ CLI commands and options.

Experimental

  • Added direct data transfers between tasks. User API not stabilized

Artifact summary:

  • hq-vdev-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-dev-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.