A fork of MD4: Simplified and Generalized Masked Diffusion for Discrete Data
extended to large‑scale pre‑training on SMILES, conditioned on Morgan
fingerprints predicted from MS/MS spectra with the MIST model.
Create a fresh environment (GPU example shown):
uv sync
@inproceedings{shi2024simplified,
title={Simplified and Generalized Masked Diffusion for Discrete Data},
author={Shi, Jiaxin and Han, Kehang and Wang, Zhe and Doucet, Arnaud and Titsias, Michalis K.},
booktitle={Advances in Neural Information Processing Systems},
year={2024}
}
Goldman, S., Wohlwend, J., Stražar, M. et al. Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nat Mach Intell (2023). https://doi.org/10.1038/s42256-023-00708-3