DFT-Machine Learning Approach for Accurate Prediction of pKa

  • Lawler, Robin
  • Liu, Yao-Hao
  • Majaya, Nessa
  • Allam, Omar
  • Ju, Hyunchul
  • 외 2명
Citations

WEB OF SCIENCE

22
Citations

SCOPUS

20

초록

In this study, we propose a novel method of pK(a) prediction in a diverse set of acids, which combines density functional theory (DFT) method with machine learning (ML) methods. First, the DFT method with B3LYP/6-31++G**/SM8 is used to predict pK(a), yielding a mean absolute error of 1.85 pK(a) units. Subsequently, such pK(a) values predicted from the DFT method are employed as one of 10 molecular descriptors for developing ML models trained on experimental data. Kernel Ridge Regression (KRR), Gaussian Process Regression, and Artificial Neural Network are optimized using three Pipelines: Pipeline 1 involving only hyperparameter optimization (HPO), Pipeline 2 involving HPO followed by a relative contribution analysis (RCA) and recursive feature elimination (RFE), and Pipeline 3 involving HPO followed by RCA and RFE on an expanded set of composite features. Finally, it is demonstrated that KRR with Pipeline 3 yields optimal pK(a) prediction at an MAE of 0.60 log units. This algorithm was then utilized to predict the pKa of 37 novel acids. The two most important features were determined to be the number of hydrogen atoms in the molecule and the degree of oxidation of the acid. The predicted pKa values were documented for future reference.

키워드

ACID DISSOCIATION-CONSTANTSDENSITY-FUNCTIONAL METHODSSOLVATION FREE-ENERGIESCOMPLETE BASIS-SETPHOSPHONIC ACIDPROTON CONDUCTIVITYPROTOGENIC GROUPNEURAL-NETWORKSSULFONIC-ACIDABSOLUTE
제목
DFT-Machine Learning Approach for Accurate Prediction of pKa
저자
Lawler, RobinLiu, Yao-HaoMajaya, NessaAllam, OmarJu, HyunchulKim, Jin YoungJang, Seung Soon
DOI
10.1021/acs.jpca.1c05031
발행일
2021-10-07
유형
Article
저널명
Journal of Physical Chemistry A
125
39
페이지
8712 ~ 8722