Cuda memory errors when running pytorch example

When the following script: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/simple_pytorch_demo.py I am getting cuda out of memory issues, regardless of max_batch_size or number of gpus used. I have access to 10 gpus with around 11gb vram each, so definitely should be fine. 

I am running the code as it is on the repo, so won't paste here. But here is the error:

```
Traceback (most recent call last):
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/wsgi_app.py", line 191, in __call__
    return self._ServeCustomHandler(request, clean_path, environ)(
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/wsgi_app.py", line 176, in _ServeCustomHandler
    return self._handlers[clean_path](self, request, environ)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/app.py", line 385, in _handler
    outputs = fn(data, **kw)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/app.py", line 305, in _get_interpretations
    model_outputs = self._predict(data['inputs'], model, dataset_name)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/app.py", line 146, in _predict
    return list(self._models[model_name].predict_with_metadata(
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/caching.py", line 182, in predict_with_metadata
    results = self._predict_with_metadata(*args, **kw)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/lib/caching.py", line 211, in _predict_with_metadata
    model_preds = list(self.wrapped.predict_with_metadata(model_inputs))
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/api/model.py", line 197, in <genexpr>
    results = (scrub_numpy_refs(res) for res in results)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/lit_nlp/api/model.py", line 209, in _batched_predict
    yield from self.predict_minibatch(minibatch, **kw)
  File "/home/niallt/lit_nlp/lit_nlp/examples/simple_pytorch_demo.py", line 118, in predict_minibatch
    self.model.cuda()
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 688, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 601, in _apply
    param_applied = fn(param)
  File "/home/niallt/venvs/39nlp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 688, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

```

I have got this working fine with the standard lit-nlp demo which I presume is using tensorflow backend by default, but my own models / codebases will require pytorch. 

Any thoughts on what may be causing this? I am not an expert on how lit-nlp is processing the data behind the scenes, but its occuring during the predict_minibatch() and I can confirm it doesn't get past passing the model and then the batch to cuda. 

e.g. I added some debugging prints to check what was going on with:

```
def predict_minibatch(self, inputs):
    # Preprocess to ids and masks, and make the input batch.
    encoded_input = self.tokenizer.batch_encode_plus(
        [ex["sentence"] for ex in inputs],
        return_tensors="pt",
        add_special_tokens=True,
        max_length=128,
        padding="longest",
        truncation="longest_first")
    print(f"encoded input is: {encoded_input}")
    # Check and send to cuda (GPU) if available
    if torch.cuda.is_available():
      print(f"cuda avaialble!")
      self.model.cuda()
      for tensor in encoded_input:
        print(f"tensor is: {tensor}")
        encoded_input[tensor] = encoded_input[tensor].cuda()

    print(f"encoded input after passing to cuda is: {encoded_input}")
    # Run a forward pass.
    with torch.no_grad():  # remove this if you need gradients.
      out: transformers.modeling_outputs.SequenceClassifierOutput = \
          self.model(**encoded_input)

    # Post-process outputs.
    batched_outputs = {
        "probas": torch.nn.functional.softmax(out.logits, dim=-1),
        "input_ids": encoded_input["input_ids"],
        "ntok": torch.sum(encoded_input["attention_mask"], dim=1),
        "cls_emb": out.hidden_states[-1][:, 0],  # last layer, first token
    }
    # Return as NumPy for further processing.
    detached_outputs = {k: v.cpu().numpy() for k, v in batched_outputs.items()}
    # Unbatch outputs so we get one record per input example.
    for output in utils.unbatch_preds(detached_outputs):
      ntok = output.pop("ntok")
      output["tokens"] = self.tokenizer.convert_ids_to_tokens(
          output.pop("input_ids")[1:ntok - 1])
      yield output
```


I0812 14:55:26.451673 140234135095104 caching.py:210] Prepared 872 inputs for model
encoded input is: {'input_ids': tensor([[  101,  2009,  1005,  1055,  1037, 11951,  1998,  2411, 12473,  4990,
          1012,   102],
        [  101,  4895, 10258,  2378,  8450,  2135, 21657,  1998,  7143,   102,
             0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])}
cuda avaialble!
E0812 14:55:26.461915 140234135095104 wsgi_app.py:208] Uncaught error: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 


Any thoughts would be much appreciated. The GPU environment I have can handle these models very easily ordinarily.

Thanks in advance!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda memory errors when running pytorch example #828

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cuda memory errors when running pytorch example #828

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions