This repository contains python scripts to construct the Open Macromolecular Genome (OMG) database from eMolecules and train generative models described in the paper. The OMG monomers and polymers are available at Zenodo.
conda env create -f environment.yml
To begin, you will need the version.smi
file from eMolecules.
Download here: https://marketing.emolecules.com/incremental-file-download
Place version.smi
in the data
directory.
|-data
|-version.smi
To run a script, a file path in the script should be modified to be consistent with an attempted directory.
This directory contains scripts to construct the OMG database from eMolecules.
This directory contains a script to calculate SC score obtained from https://github.com/connorcoley/scscore.
This directory contains the OMG polymerization algorithms.
This directory contains modified SELFIES scripts to incorporate asterisk (*). The asterisk rules were added to the original work, https://github.com/aspuru-guzik-group/selfies
This directory contains the Molecule Chef generative model. The scripts were written referring to the original work, Bradshaw, J.; Paige, B.; Kusner, M. J.; Segler, M.; Hernández-Lobato, J. M. A Model to Search for Synthesizable Molecules. In Advances in Neural Information Processing Systems; Curran Associates, Inc., 2019; Vol. 32., and their scripts, https://github.com/john-bradshaw/molecule-chef.
This directory contains a variational autoencoder model. The scripts were written referring to https://github.com/aspuru-guzik-group/selfies/blob/master/examples/vae_example/chemistry_vae.py
This directory contains scripts to train Molecule Chef and SELFIES VAE. These scripts were written by referring to https://github.com/john-bradshaw/molecule-chef and https://github.com/aspuru-guzik-group/selfies/blob/master/examples/vae_example/chemistry_vae.py.
Seonghwan Kim, Charles M. Schroeder, and Nicholas E. Jackson
This work was supported by the IBM-Illinois Discovery Accelerator Institute. N.E.J. thanks the 3M Nontenured Faculty Award for support of this research. We thank Jed Pitera and Jeffrey Moore for critical readings of the manuscript and Prof. Tengfei Luo for assistance with the PI1M dataset.