Skip to content

torch.cuda.OutOfMemoryError: CUDA out of memory.real_basicvsr/basicvsr++运行报错 #2117

Open
@txy00001

Description

@txy00001

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmagic

Environment

image
cuda 11.7 cudnn8.9 gpu:4090

Reproduces the problem - code sample

import os
import time
from mmagic.apis import MMagicInferencer
from mmengine import mkdir_or_exist

# Create a MMagicInferencer instance and infer

video = '/home/txy/code/blur/video/6.mp4'
result_out_dir = '/home/txy/code/blur/output/6.mp4'
mkdir_or_exist(os.path.dirname(result_out_dir))
beg=time.time()
editor = MMagicInferencer('real_basicvsr', device='cuda:1')
results = editor.infer(video=video, result_out_dir=result_out_dir)
time.time-beg
print(time.time-beg)

Reproduces the problem - command or script

import os
import time
from mmagic.apis import MMagicInferencer
from mmengine import mkdir_or_exist

# Create a MMagicInferencer instance and infer

video = '/home/txy/code/blur/video/6.mp4'
result_out_dir = '/home/txy/code/blur/output/6.mp4'
mkdir_or_exist(os.path.dirname(result_out_dir))
beg=time.time()
editor = MMagicInferencer('real_basicvsr', device='cuda:1')
results = editor.infer(video=video, result_out_dir=result_out_dir)
time.time-beg
print(time.time-beg)

Reproduces the problem - error message

/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loads checkpoint by http backend from path: https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: step_counter

01/25 17:15:56 - mmengine - WARNING - Failed to search registry with scope "mmagic" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmagic" is a correct scope, or whether the registry is initialized.
/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the save_dir argument.
warnings.warn(f'Failed to add {vis_backend.class}, '
Traceback (most recent call last):
File "/home/txy/code/blur/demo/bas_real/real_infer_video.py", line 12, in
results = editor.infer(video=video, result_out_dir=result_out_dir)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/mmagic_inferencer.py", line 231, in infer
return self.inferencer(
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/init.py", line 110, in call
return self.inferencer(**kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 139, in call
results = self.base_call(**kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 165, in base_call
preds = self.forward(data, **forward_kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/video_restoration_inferencer.py", line 127, in forward
result = self.model(
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/base_models/base_edit_model.py", line 109, in forward
return self.forward_tensor(inputs, data_samples, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/real_esrgan/real_esrgan.py", line 112, in forward_tensor
feats = self.generator_ema(inputs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/real_basicvsr/real_basicvsr_net.py", line 88, in forward
residues = self.image_cleaning(lqs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/basicvsr/basicvsr_net.py", line 214, in forward
return self.main(feat)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 59.33 GiB (GPU 1; 23.65 GiB total capacity; 2.92 GiB already allocated; 19.91 GiB free; 2.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional information

我想对视频进行超分,视频是1s,和5s的,分辨率是1280×720,3160×2160,尝试real_basicvsr和basicvsr++都是
image
这个错误,换了服务器和视频依旧有问题,我该如何解决?

Metadata

Metadata

Assignees

Labels

kind/bugsomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions