Hybridization of SMILES and chemical-environment-aware tokens to improve performance of molecular structure generation

Citations

WEB OF SCIENCE

3
Citations

SCOPUS

3

초록

The Simplified Molecular Input Line Entry System (SMILES) is one of the most widely adopted molecular representations. However, SMILES notation suffers from limited token diversity and a lack of chemical information within individual tokens. To address these limitations while maintaining its simplicity, we propose a molecular representation method through the hybridization of standard SMILES tokens with Atom-In-SMILES (AIS) tokens, which incorporate local chemical environment information into a single token. This hybrid representation, termed SMI + AIS, combines SMILES and AIS tokens, allowing AIS tokens to differentiate chemical elements based on their chemical context without introducing additional tokens for less frequent elements. Using the SMI + AIS representation, we evaluated its performance by comparing the predefined metric of generated structures in chemical structure generation based on latent space optimization. Compared to standard SMILES, SMI + AIS achieved a 7% improvement in binding affinity and a 6% increase in synthesizability, highlighting its utility in the enhancement of machine learning-based molecular design. Our results demonstrate that the SMI + AIS representation provides a more effective and informative approach to encapsulate chemical context and presents potential for performance enhancement in other machine learning tasks in chemistry.

키워드

SMILESMolecular representationGenerative modelingSmall-molecule drugLatent space optimizationDrug discoveryPYRUVATE-DEHYDROGENASE KINASEDATABASEUNIVERSEREPRESENTATIONDISCOVERYINHIBITORDESIGN
제목
Hybridization of SMILES and chemical-environment-aware tokens to improve performance of molecular structure generation
저자
Han, HerimYeom, Min SunChoi, Sunghwan
DOI
10.1038/s41598-025-01890-7
발행일
2025-05-15
유형
Article
저널명
Scientific Reports
15
1