Closed
Description
I was testing out the library for the model on a single GPU for training.
Used the following command to run the training,
CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=1 run_train.py --config-file examples/config_tiny_llama.yaml
Made some changes in the config_tiny_llama.yaml file which include,
parallelism:
dp: 1 # 2
expert_parallel_size: 1
pp: 1 # 2
pp_engine: 1f1b
tp: 1 # 2
tp_linear_async_communication: true
tp_mode: REDUCE_SCATTER
The training ran smoothly and the checkpoints were generated, however when I try to run the model using,
torchrun --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --tp 1 --pp 1
I get the following error,
[rank0]: File "/mnt/d/nanotron-pretrain/nanotron/src/nanotron/models/llama.py", line 529, in forward
[rank0]: (query_unpad, indices_q, cu_seqlens_q, max_seqlen_q) = bert_padding.unpad_input(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 4)
Any help to resolve this issue will be greatly appreciated. Thanks.
Metadata
Metadata
Assignees
Labels
No labels