Open
Description
Although the KV cache implementation for inference looks good, the block_kv_cache is getting prefilled in the training phase leading to increase in compute and memory consumption.
nanoVLM/models/vision_language_model.py
Line 51 in 098db57
Metadata
Metadata
Assignees
Labels
No labels