Description
When the following script: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/simple_pytorch_demo.py I am getting cuda out of memory issues, regardless of max_batch_size or number of gpus used. I have access to 10 gpus with around 11gb vram each, so definitely should be fine.
I am running the code as it is on the repo, so won't paste here. But here is the error:
Traceback (most recent call last):
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/wsgi_app.py", line 191, in __call__
return self._ServeCustomHandler(request, clean_path, environ)(
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/wsgi_app.py", line 176, in _ServeCustomHandler
return self._handlers[clean_path](self, request, environ)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/app.py", line 385, in _handler
outputs = fn(data, **kw)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/app.py", line 305, in _get_interpretations
model_outputs = self._predict(data['inputs'], model, dataset_name)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/app.py", line 146, in _predict
return list(self._models[model_name].predict_with_metadata(
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/caching.py", line 182, in predict_with_metadata
results = self._predict_with_metadata(*args, **kw)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/caching.py", line 211, in _predict_with_metadata
model_preds = list(self.wrapped.predict_with_metadata(model_inputs))
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/api/model.py", line 197, in <genexpr>
results = (scrub_numpy_refs(res) for res in results)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/api/model.py", line 209, in _batched_predict
yield from self.predict_minibatch(minibatch, **kw)
File "/home/niallt/lit_nlp/lit_nlp/examples/simple_pytorch_demo.py", line 118, in predict_minibatch
self.model.cuda()
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 688, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 601, in _apply
param_applied = fn(param)
File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 688, in <lambda>
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I have got this working fine with the standard lit-nlp demo which I presume is using tensorflow backend by default, but my own models / codebases will require pytorch.
Any thoughts on what may be causing this? I am not an expert on how lit-nlp is processing the data behind the scenes, but its occuring during the predict_minibatch() and I can confirm it doesn't get past passing the model and then the batch to cuda.
e.g. I added some debugging prints to check what was going on with:
def predict_minibatch(self, inputs):
# Preprocess to ids and masks, and make the input batch.
encoded_input = self.tokenizer.batch_encode_plus(
[ex["sentence"] for ex in inputs],
return_tensors="pt",
add_special_tokens=True,
max_length=128,
padding="longest",
truncation="longest_first")
print(f"encoded input is: {encoded_input}")
# Check and send to cuda (GPU) if available
if torch.cuda.is_available():
print(f"cuda avaialble!")
self.model.cuda()
for tensor in encoded_input:
print(f"tensor is: {tensor}")
encoded_input[tensor] = encoded_input[tensor].cuda()
print(f"encoded input after passing to cuda is: {encoded_input}")
# Run a forward pass.
with torch.no_grad(): # remove this if you need gradients.
out: transformers.modeling_outputs.SequenceClassifierOutput = \
self.model(**encoded_input)
# Post-process outputs.
batched_outputs = {
"probas": torch.nn.functional.softmax(out.logits, dim=-1),
"input_ids": encoded_input["input_ids"],
"ntok": torch.sum(encoded_input["attention_mask"], dim=1),
"cls_emb": out.hidden_states[-1][:, 0], # last layer, first token
}
# Return as NumPy for further processing.
detached_outputs = {k: v.cpu().numpy() for k, v in batched_outputs.items()}
# Unbatch outputs so we get one record per input example.
for output in utils.unbatch_preds(detached_outputs):
ntok = output.pop("ntok")
output["tokens"] = self.tokenizer.convert_ids_to_tokens(
output.pop("input_ids")[1:ntok - 1])
yield output
I0812 14:55:26.451673 140234135095104 caching.py:210] Prepared 872 inputs for model
encoded input is: {'input_ids': tensor([[ 101, 2009, 1005, 1055, 1037, 11951, 1998, 2411, 12473, 4990,
1012, 102],
[ 101, 4895, 10258, 2378, 8450, 2135, 21657, 1998, 7143, 102,
0, 0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])}
cuda avaialble!
E0812 14:55:26.461915 140234135095104 wsgi_app.py:208] Uncaught error: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Any thoughts would be much appreciated. The GPU environment I have can handle these models very easily ordinarily.
Thanks in advance!