Skip to content
This repository was archived by the owner on Feb 7, 2025. It is now read-only.
This repository was archived by the owner on Feb 7, 2025. It is now read-only.

AutoEncoderKL output tensor dimension mismatch with Input #498

Open
@shankartmv

Description

@shankartmv

I am trying to train a AutoEncoderKL model on RGB images with the following dimensions (3,1225,966). Here is the code that I use ( similar to what's there in tutorials/generative/2d_ldm/2d_ldm_tutorial.ipynb ).
autoencoderkl = AutoencoderKL(
spatial_dims=2,
in_channels=3,
out_channels=3,
num_channels=(128, 256, 384),
latent_channels=8,
num_res_blocks=1,
attention_levels=(False, False, False),
with_encoder_nonlocal_attn=False,
with_decoder_nonlocal_attn=False,
)
autoencoderkl = autoencoderkl.to(device)

Error is reported at line 27 (Train Model - as in the tutorials notebook)
recons_loss = F.l1_loss(reconstruction.float(), images.float()) RuntimeError: The size of tensor a (964) must match the size of tensor b (966) at non-singleton dimension 3

Using pytorchinfo package , I was able to print the model summary and can find the discrepancy in the upsampling layer.

===================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param #
===================================================================================================================
AutoencoderKL [1, 3, 1225, 966] [1, 3, 1224, 964] --
├─Encoder: 1-1 [1, 3, 1225, 966] [1, 8, 306, 241] --
│ └─ModuleList: 2-1 -- -- --
│ │ └─Convolution: 3-1 [1, 3, 1225, 966] [1, 128, 1225, 966] 3,584
│ │ └─ResBlock: 3-2 [1, 128, 1225, 966] [1, 128, 1225, 966] 295,680
│ │ └─Downsample: 3-3 [1, 128, 1225, 966] [1, 128, 612, 483] 147,584
│ │ └─ResBlock: 3-4 [1, 128, 612, 483] [1, 256, 612, 483] 919,040
│ │ └─Downsample: 3-5 [1, 256, 612, 483] [1, 256, 306, 241] 590,080
│ │ └─ResBlock: 3-6 [1, 256, 306, 241] [1, 384, 306, 241] 2,312,576
│ │ └─GroupNorm: 3-7 [1, 384, 306, 241] [1, 384, 306, 241] 768
│ │ └─Convolution: 3-8 [1, 384, 306, 241] [1, 8, 306, 241] 27,656
├─Convolution: 1-2 [1, 8, 306, 241] [1, 8, 306, 241] --
│ └─Conv2d: 2-2 [1, 8, 306, 241] [1, 8, 306, 241] 72
├─Convolution: 1-3 [1, 8, 306, 241] [1, 8, 306, 241] --
│ └─Conv2d: 2-3 [1, 8, 306, 241] [1, 8, 306, 241] 72
├─Convolution: 1-4 [1, 8, 306, 241] [1, 8, 306, 241] --
│ └─Conv2d: 2-4 [1, 8, 306, 241] [1, 8, 306, 241] 72
├─Decoder: 1-5 [1, 8, 306, 241] [1, 3, 1224, 964] --
│ └─ModuleList: 2-5 -- -- --
│ │ └─Convolution: 3-9 [1, 8, 306, 241] [1, 384, 306, 241] 28,032
│ │ └─ResBlock: 3-10 [1, 384, 306, 241] [1, 384, 306, 241] 2,656,512
│ │ └─Upsample: 3-11 [1, 384, 306, 241] [1, 384, 612, 482] 1,327,488
│ │ └─ResBlock: 3-12 [1, 384, 612, 482] [1, 256, 612, 482] 1,574,912
│ │ └─Upsample: 3-13 [1, 256, 612, 482] [1, 256, 1224, 964] 590,080
│ │ └─ResBlock: 3-14 [1, 256, 1224, 964] [1, 128, 1224, 964] 476,288
│ │ └─GroupNorm: 3-15 [1, 128, 1224, 964] [1, 128, 1224, 964] 256
│ │ └─Convolution: 3-16 [1, 128, 1224, 964] [1, 3, 1224, 964] 3,459
===================================================================================================================
Total params: 10,954,211
Trainable params: 10,954,211
Non-trainable params: 0
Total mult-adds (Units.TERABYTES): 3.20
===================================================================================================================
Input size (MB): 14.20
Forward/backward pass size (MB): 26803.57
Params size (MB): 43.82
Estimated Total Size (MB): 26861.59
===================================================================================================================

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions