Skip to content

SeemonJ/combinatorial-library-design-dpp

Repository files navigation

de novo generated combinatorial library design

This is an implementation of the framework described in This preprint.

This framework is for combinatorial library design based on de novo generated molecules and optimizing across the products of attaching building blocks to the scaffold.

Setup environment

You can use Conda to create an environment with all the necessary packages installed.

$ conda env create -f environment.yml
[...]
$ conda activate comb-lib-design

If you additionally want to create your own libraries from start to end, we recommend you to use

bash -i install.sh

Using the Framework

The framework is currently written to work with the output formats of LibINVENT and AiZynthFinder. We here provide files to read both the respective outputs and process the results.

readLibInventOutput.py will read the logfile from LibINVENT and output

1. 2 dictionaries that translate between the synthons generated by the LibINVENT and the building blocks resulting from Amide Coupling (carbyxlic acids) and the Buchwald-Hartwig reaction(Aromatic Halides) since LibINVENT generates just the decoration part
2. Building Block lists which are filtered according to reaction constraint and input minimum QSAR value threshold for discarding low scoring building blocks. These are split into multiple .smi files according to `num_splits` for easier batch processing in AiZynthFinder

queryBuildingBlocks.py is a verification using SMARTS substructures to validate that the method used in readLibInventOutput.py indeed produces carboxylic acids and aromatic halides.

aggregateAiZynthFinderOutput.py will read the output file(s) that are generated by AiZynthFinder for analysing the building blocks. This requires AiZynthFinder environment aizynth-env to be installed and activated rather than the comb-lib-design environment. An example output file is provided.

The script will aggregate all input .hdf5 files and read the leaf nodes to extract the possible products, saving the route with the smallest depth x: 0<x<10 for each building block, or returning the depth 11 if the building block is not solved. The output is saved in a .csv file of the format [SMILES, depth].

randomSampler.py will read input building block .csv files and generate random selections with given number of samples, reaction depth and selection dimensions. TODO: Implement a parser rather than just a sample script for generating samples.

main.py is the runfile for the combinatorial library design.

For running the optimization using provided data, just using python main.py is sufficient. For generating new data from scratch, please follow the example in publication_example/README.md

Arguments:
--block_file_a <Input building block file name for Amide Coupling>
--block_file_b <Input building block file name Buchwald Hartwig>
--scaffold <Input smiles string of scaffold to attach building blocks>
--rows <Parameter for selection size for number of building block A to select>
--columns <Parameter for selection size for number of building block B to select>
--steps <Parameter for the tolerance of how many samples to draw without finding improvement before termination>
--seed <Parameter for the RNG seed>
--index <The index of the run, intended for batch submissions. Will act as a coefficient to the RNG seed to create separate runs> 

Outputs:
2 output lists, `selected_rows.pkl` and `selected_columns.pkl` respectively, that stores the SMILES of the building blocks in the selection

displayLibrary.py is an example of parsing the output from main.py, it takes the output .pkl files and displays the first 5x5 products in a .png figure.

Workflow Example

A small example of running LibINVENT for 10 epochs, extracting the results, putting the output through AiZynthFinder for 10 seconds per building block, and finally running a short optimization with a stopping criteria of 20 steps without improvement is provided in small_sample_run.sh

it can be run using

bash -i small_sample_run.sh

Note: This assumes that you have setup Lib-INVENT and AiZynthFinder using install.sh for the file paths to work.

About

Code related to the publication de novo generated combinatorial library design

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published