Presentation Title

Protein Feature Selection using Hierarchal clustering and Support Vector Machine

Start Date

November 2016

End Date

November 2016

Location

HUB 302-95

Type of Presentation

Poster

Abstract

Proteins play critical roles in the human body. They fold into compact shapes called native structures, which determine how they function. It’s very complex to experimentally determine protein structures that occur in nature, therefore computational research has become highly necessary. Computational research includes the generation of a large set of models created from a sequence of amino acids and the selection of the best models is essential to the success of a method. Similar to all organisms, proteins can be described by a set of features. One important observation is that most machine learning methods discussed in literature give similar results. These results have lead researchers to believe that the features, rather than the method itself can produce better protein scoring and quality assessment functions. The inclusion of some features can cause noise to a protein model. In this research hierarchal clustering and the removal of one feature at a time were the two feature selection methods used to identify the best subset of features. New protein targets were then predicted using a machine learning algorithm, Support Vector Machine. These predictions were in the form of a quality measure that was calculated by two evaluation metrics. The evaluation metrics were compared with the most resent baseline results mentioned in the research paper “Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning”, IEEE journal. These rearch results showed that removing some features from the models can signifcantly improve the quality of protein structure predictions.

This document is currently not available here.

Share

COinS
 
Nov 12th, 1:00 PM Nov 12th, 2:00 PM

Protein Feature Selection using Hierarchal clustering and Support Vector Machine

HUB 302-95

Proteins play critical roles in the human body. They fold into compact shapes called native structures, which determine how they function. It’s very complex to experimentally determine protein structures that occur in nature, therefore computational research has become highly necessary. Computational research includes the generation of a large set of models created from a sequence of amino acids and the selection of the best models is essential to the success of a method. Similar to all organisms, proteins can be described by a set of features. One important observation is that most machine learning methods discussed in literature give similar results. These results have lead researchers to believe that the features, rather than the method itself can produce better protein scoring and quality assessment functions. The inclusion of some features can cause noise to a protein model. In this research hierarchal clustering and the removal of one feature at a time were the two feature selection methods used to identify the best subset of features. New protein targets were then predicted using a machine learning algorithm, Support Vector Machine. These predictions were in the form of a quality measure that was calculated by two evaluation metrics. The evaluation metrics were compared with the most resent baseline results mentioned in the research paper “Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning”, IEEE journal. These rearch results showed that removing some features from the models can signifcantly improve the quality of protein structure predictions.