In this Interspeech-24 paper,
we proposed QGAN: a Quaternion GAN-based model capable of generating high fidelity speech efficiently. We provide our open-source implementation and pretrained models in this repository.
Demo: Visit our demo website for audio samples.
- Python >= 3.8
- Clone this repository.
- Install python requirements. Please refer requirements.txt
- Download and extract the LJ Speech dataset.
- Downlaod and extract the Hindi dataset
And move all wav files to
LJSpeech-1.1/wavs
python train.py --config config_v1.json
To train V2 or V3 Generator, replace config_v1.json
with config_v2.json
or config_v3.json
.
Checkpoints and copy of the configuration file are saved in cp_hifigan
directory by default.
You can change the path by adding --checkpoint_path
option.
You can also use pretrained models we provide.
Download pretrained models
- Make
test_files
directory and copy wav files into the directory. - Run the following command.
python inference.py --checkpoint_file [generator checkpoint file path]
Generated wav files are saved in generated_files
by default.
You can change the path by adding --output_dir
option.
- Set checkpoint path in the lossladns.py file and load the models and their wirgths accordingly.
- Losslands code will dump the loss_list - a list of vlaues used for generating visualization
python losslands.py
Aryan Chaudhary: [email protected]
We referred to WaveGlow, MelGAN and Tacotron2 to implement this.