Drug discovery relies on lead optimization. In this process, chemists select a target (“lead”) molecule with known potential to interact with a specific biological target, then tweak its chemical properties for higher potency and other factors. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Department of Electrical Engineering and Computer Science (EECS) have developed a model that better selects lead molecule candidates based on desired properties. It also modifies the molecular structure needed to achieve a higher potency, while ensuring the molecule is still chemically valid.

The model basically takes as input molecular structure data and directly creates molecular graphs — detailed representations of a molecular structure, with nodes representing atoms and edges representing bonds. It breaks those graphs down into smaller clusters of valid functional groups that it uses as “building blocks” that help it more accurately reconstruct and better modify molecules. “The motivation behind this was to replace the inefficient human modification process of designing molecules with automated iteration and assure the validity of the molecules we generate,” says Wengong Jin, a PhD student in CSAIL and lead author of a paper describing the model.

“Today, it’s really a craft, which requires a lot of skilled chemists to succeed, and that’s what we want to improve,” says Regina Barzilay, the Delta Electronics Professor at CSAIL and EECS. “The next step is to take this technology from academia to use on real pharmaceutical design cases, and demonstrate that it can assist human chemists in doing their work, which can be challenging.”

“Automating the process also presents new machine learning challenges,” added Tommi S. Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science in CSAIL, EECS. “Learning to relate, modify, and generate molecular graphs drives new technical ideas and methods.”

Valid and more potent

The researchers trained their model on 250,000 molecular graphs from the ZINC database, a collection of 3-D molecular structures available for public use. They tested the model on tasks to generate valid molecules, find the best lead molecules, and design novel molecules with increase potencies.

In the first test, the researchers’ model generated 100 percent chemically valid molecules from a sample distribution, compared to SMILES models that generated 43 percent valid molecules from the same distribution. The second test involved two tasks. First, the model searched the entire collection of molecules to find the best lead molecule for the desired properties — solubility and synthetic accessibility. In that task, the model found a lead molecule with a 30 percent higher potency than traditional systems. The second task involved modifying 800 molecules for higher potency, but are structurally similar to the lead molecule. In doing so, the model created new molecules, closely resembling the lead’s structure, averaging a more than 80 percent improvement in potency.

The researchers next aim to test the model on more properties, beyond solubility, which are more therapeutically relevant. That, however, requires more data. “Pharmaceutical companies are more interested in properties that fight against biological targets, but they have less data on those. A challenge is developing a model that can work with a limited amount of training data,” Jin says.

“The algorithms described in the paper are an important step toward the goal of beginning to mimic the lead optimization molecular design work a medicinal chemist currently performs,” says Angel Guzman-Perez, director of medicinal chemistry at the pharmaceutical company Amgen, who was not involved in the research. “Since this computational method optimizes the properties in vector space, it has the potential to generate completely different and novel chemical structures than a medicinal chemist would propose by thinking in chemical structure space. Hence, algorithms such as this one have the potential to complement and enhance the work done by medicinal chemists.”

Source: Massachusetts Institute of Technology