Open Macromolecular Genome

This repository contains python scripts to construct the Open Macromolecular Genome (OMG) database from eMolecules and train generative models described in the paper. The OMG monomers and polymers are available at Zenodo.

Setup

Set up Python environment with Anaconda

conda env create -f environment.yml

eMolecules database

To begin, you will need the version.smi file from eMolecules. Download here: https://marketing.emolecules.com/incremental-file-download

Place version.smi in the data directory.

|-data
    |-version.smi

Script components

To run a script, a file path in the script should be modified to be consistent with an attempted directory.

1. data

This directory contains scripts to construct the OMG database from eMolecules.

2. scscore

This directory contains a script to calculate SC score obtained from https://github.com/connorcoley/scscore.

3. polymerization

This directory contains the OMG polymerization algorithms.

4. selfies

This directory contains modified SELFIES scripts to incorporate asterisk (*). The asterisk rules were added to the original work, https://github.com/aspuru-guzik-group/selfies

Authors

Seonghwan Kim, Charles M. Schroeder, and Nicholas E. Jackson

Funding Acknowledgements

This work was supported by the IBM-Illinois Discovery Accelerator Institute. N.E.J. thanks the 3M Nontenured Faculty Award for support of this research. We thank Jed Pitera and Jeffrey Moore for critical readings of the manuscript and Prof. Tengfei Luo for assistance with the PI1M dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Macromolecular Genome

Setup

Set up Python environment with Anaconda

eMolecules database

Script components

1. data

2. scscore

3. polymerization

4. selfies

5. molecule_chef

6. vae

7. train

Authors

Funding Acknowledgements

About

Releases 1

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
molecule_chef		molecule_chef
polymerization		polymerization
scscore		scscore
selfies		selfies
train		train
vae		vae
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml

License

TheJacksonLab/OpenMacromolecularGenome

Folders and files

Latest commit

History

Repository files navigation

Open Macromolecular Genome

Setup

Set up Python environment with Anaconda

eMolecules database

Script components

1. data

2. scscore

3. polymerization

4. selfies

5. molecule_chef

6. vae

7. train

Authors

Funding Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages