상세 보기
Hybridization of SMILES and chemical-environment-aware tokens to improve performance of molecular structure generation
- Han, Herim;
- Yeom, Min Sun;
- Choi, Sunghwan
WEB OF SCIENCE
3SCOPUS
3초록
The Simplified Molecular Input Line Entry System (SMILES) is one of the most widely adopted molecular representations. However, SMILES notation suffers from limited token diversity and a lack of chemical information within individual tokens. To address these limitations while maintaining its simplicity, we propose a molecular representation method through the hybridization of standard SMILES tokens with Atom-In-SMILES (AIS) tokens, which incorporate local chemical environment information into a single token. This hybrid representation, termed SMI + AIS, combines SMILES and AIS tokens, allowing AIS tokens to differentiate chemical elements based on their chemical context without introducing additional tokens for less frequent elements. Using the SMI + AIS representation, we evaluated its performance by comparing the predefined metric of generated structures in chemical structure generation based on latent space optimization. Compared to standard SMILES, SMI + AIS achieved a 7% improvement in binding affinity and a 6% increase in synthesizability, highlighting its utility in the enhancement of machine learning-based molecular design. Our results demonstrate that the SMI + AIS representation provides a more effective and informative approach to encapsulate chemical context and presents potential for performance enhancement in other machine learning tasks in chemistry.
키워드
- 제목
- Hybridization of SMILES and chemical-environment-aware tokens to improve performance of molecular structure generation
- 저자
- Han, Herim; Yeom, Min Sun; Choi, Sunghwan
- 발행일
- 2025-05-15
- 유형
- Article
- 권
- 15
- 호
- 1