Open
Description
Thanks for your great work !
I meet some problem when I train an unconditional transformer on IMagenet.
here is my config file , and I run it on 1 32G V100 GPU
model:
base_learning_rate: 0.00625
target: taming.models.cond_transformer.Net2NetTransformer
params:
transformer_config:
target: taming.modules.transformer.mingpt.GPT
params:
vocab_size: 16384
block_size: 256
n_layer: 48
n_head: 24
n_embd: 1536
first_stage_config:
target: taming.models.vqgan.VQModel
params:
ckpt_path: /mnt/lustre/huangjunqin/taming_transformer/logs/imagenet_vqgan_f16_16384/checkpoints/last.ckpt
embed_dim: 256
n_embed: 16384
ddconfig:
double_z: false
z_channels: 256
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 1
- 2
- 2
- 4
num_res_blocks: 2
attn_resolutions:
- 16
dropout: 0.0
lossconfig:
target: taming.modules.losses.vqperceptual.DummyLoss
cond_stage_config: __is_unconditional__
data:
target: main.DataModuleFromConfig
params:
batch_size: 1
wrap: false
train:
target: taming.data.imagenet.ImageNetTrain
params:
config:
size: 256
I use one of your share ckpt trained on Imagenet as my VQ model's pretrain.
when it starts training, the loss decreases, but only a few hundred iterations it converge to around 6.8. And when I check my image log. I found that the reconstruction image is fun, but the samples_nopix image's quality is getting worse, and the sample_det image is always a picture of a certain color:
iter 128 sample det
Is there something wrong ? Thanks for any help !
Metadata
Metadata
Assignees
Labels
No labels