QGAN 🔈 🎶

QGAN: Low Footprint Quaternion Neural Vocoder for Speech Synthesis

In this Interspeech-24 paper, we proposed QGAN: a Quaternion GAN-based model capable of generating high fidelity speech efficiently. We provide our open-source implementation and pretrained models in this repository.

Demo: Visit our demo website for audio samples.

Pre-requisites

Python >= 3.8
Clone this repository.
Install python requirements. Please refer requirements.txt
Download and extract the LJ Speech dataset.
Downlaod and extract the Hindi dataset And move all wav files to LJSpeech-1.1/wavs

Training

python train.py --config config_v1.json

To train V2 or V3 Generator, replace config_v1.json with config_v2.json or config_v3.json.
Checkpoints and copy of the configuration file are saved in cp_hifigan directory by default.
You can change the path by adding --checkpoint_path option.

Pretrained Model

You can also use pretrained models we provide.
Download pretrained models

Inference from wav file

Make test_files directory and copy wav files into the directory.

Run the following command.

python inference.py --checkpoint_file [generator checkpoint file path]

Generated wav files are saved in generated_files by default.
You can change the path by adding --output_dir option.

Generate Loss Landscapes

Set checkpoint path in the lossladns.py file and load the models and their wirgths accordingly.
Losslands code will dump the loss_list - a list of vlaues used for generating visualization
```
python losslands.py
```

Contact

Aryan Chaudhary: [email protected]

Acknowledgements

We referred to WaveGlow, MelGAN and Tacotron2 to implement this.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
hifiGAN		hifiGAN
losslands		losslands
metrics		metrics
qhifiGAN		qhifiGAN
LICENSE		LICENSE
README.md		README.md
req.txt		req.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QGAN 🔈 🎶

QGAN: Low Footprint Quaternion Neural Vocoder for Speech Synthesis

Pre-requisites

Training

Pretrained Model

Inference from wav file

Generate Loss Landscapes

Contact

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Cross-Caps/QVocoder

Folders and files

Latest commit

History

Repository files navigation

QGAN 🔈 🎶

QGAN: Low Footprint Quaternion Neural Vocoder for Speech Synthesis

Pre-requisites

Training

Pretrained Model

Inference from wav file

Generate Loss Landscapes

Contact

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages