Genome-wide enzyme function prediction system using K-NN classification on sequence similarity

초록

we developed a genome-wide enzyme function prediction system that uses k-nearest neighbors (k-NN) classification method based on sequence similarity. For classification learning, we extract 477,596 enzymes with EC number as the positive data and also select 118,560 non-enzyme proteins for the negative data from KEGG protein database release 43. In this system, we use K-NN classification since in general lazy learning schemes are recognized as good classification scheme although the learning task of them is simple. According to the experimental results, the developed system shows relatively high accuracy of 97.82% at maximum. Hence, the developed system can be helpful for annotation for either novel enzyme individually or genome-wide with various parameters.

제목
Genome-wide enzyme function prediction system using K-NN classification on sequence similarity
저자
YOO SUNG KIM
학회명
1st International Conference on Emerging Databases
개최지
BUSAN BEXCO
학회 개최일
2009-08-27 ~ 2009-08-28