MOLLEO
Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang*,1, Marta Skreta*,2,3, Cher-Tian Ser2, Wenhao Gao4, Lingkai Kong1, Felix Streith-Kalthoff5, Chenru Duan6, Yuchen Zhuang1, Yue Yu1, Yanqiao Zhu7, Yuanqi Du†,8, Alán Aspuru-Guzik†,2,3, Kirill Neklyudov†,9,10, Chao Zhang†,1
1Georgia Institute of Technology, 2University of Toronto, 3Vector Institute, 4Massachusetts Institute of Technology, 5University of Wuppertal, 6Deep Principle Inc., 7University of California, Los Angeles, 8Cornell University, 9Université de Montréal, 10Mila - Quebec AI Institute,
*Indicates Equal Contribution

Indicates Equal Senior-Authorship
MY ALT TEXT

MOLLEO uses chemistry-aware LLMs inside mutation and crossover operations to propose new molecules in the evolutionary searching process.

Abstract

Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations. In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs. Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings. We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations.

Introduction

Molecular discovery is a complex and iterative process involving the design, synthesis, evaluation, and refinement of molecule candidates. One significant challenge is that evaluating molecular properties often requires expensive evaluations (oracles).

Evolutionary Algorithms (EAs) are often used to generate molecular candidates since they do not require the evaluation of gradients and are thus well-suited for black-box objectives in molecular discovery. However, they randomly generate proposals and require many evaluations of the objective function. Hence, incorporating task-specific information into the proposal generation can reduce the number of evaluations needed, enhancing their practical application.

In this work, we propose MOLLEO, which incorporates LLMs into EAs to enhance the quality of generated proposals and accelerate the optimization process. MOLLEO leverages LLMs as genetic operators to produce new proposals through crossover or mutation.

MY ALT TEXT

Experiments


We evaluate MOLLEO on 18 total tasks including 12 single-objective optimization tasks from the practical molecular optimization benchmark (PMO), and additional structure-based drug design and multi-objective optimization tasks.

BibTeX

@misc{wang2024efficientevolutionarysearchchemical,
      title={Efficient Evolutionary Search Over Chemical Space with Large Language Models}, 
      author={Haorui Wang and Marta Skreta and Cher-Tian Ser and Wenhao Gao and Lingkai Kong and Felix Streith-Kalthoff and Chenru Duan and Yuchen Zhuang and Yue Yu and Yanqiao Zhu and Yuanqi Du and Alán Aspuru-Guzik and Kirill Neklyudov and Chao Zhang},
      year={2024},
      eprint={2406.16976},
      archivePrefix={arXiv},
      primaryClass={cs.NE}
      url={https://arxiv.org/abs/2406.16976}, 
}