We will see what are the scoring matrixes, how we will build scoring matrixes and how we can use them in the alignment process.
Introduction of Scoring Matrix
An amino acid can be replaced by another amino acid based on their chemical ,physical and other special properties. so if two amino acids have same chemical behaviour then there is a high chance for their substitution during evolution, However if there is totaly different properties then there is very low chance of such an amino acid replaced by another amino acid
So the scoring Matrixes they are variable or flexible in scoring such substitions and therefore they have the substition for each amino acid scored differently, ie scoring matrixes have substitution value of each amino acid in a unique way.
How to build Scoring Matrix?
Consider the protein sequences that are there in nature, find protein sequences that are similar to each other and homologous to each other, so once we isolated the set of similar protein sequences then we see which amino acid in one sequence is substituted with which amino acid in the other sequences.In this way we build a frequency list of amino acid that is frequently an amino acid is substituted by another amino acid,
Scores in the Scoring Matrix
So the scoring Matrix by looking at such frequencies may contain a +ve value ie a very easy transition or substitution from one amino acid to another , -ve value which means a rare substitution and may be a zero as well.
Here is an example of a protein called Ubiquitin
List of ubiqutins are shown from humans, chimps, mouse etc. and their sequences have been aligned with each other as we can see some of the amino acids are raely substituted while some others are completely conserved while some other amino acids are changed.
So what we do in such a frequency count is to apply a formula
Here S will be the Scoring Matrix
a will be he first amino acid
b will be the second amino acid
Pab is the probability of amino acid 'a' substituted by amino 'b' where a nd b can be any two amino acids.
fa and fb is the frequencies of amino acids a and b
So by computing the S for a,b, and if you vary a and b to all the amino acids that is 20 amino acids you can arrive at the scoring matrix.
Let us consider a scoring matrix
Consider the negative score -5 when D is substituted by W or zeros when S is substitued by D Q G. The diagonal indicated by red line shows very high scores specially if C is substituted by another C the score is substituted by 13, which means C is mostly conserved.
So in Conclusion, A postive value like +1,+5 has been assigned for match, similarly -10, -2 for mismatches for all the amino acids the scoring matrixes selectively score or score differently for each amino acids depending on their chemical and physical properties which are reflected in their frequency of occurance.
No comments:
Post a Comment