Monday, August 24, 2020

Scoring Matrix in Bioinformatics

In Pairwise Sequence Alignment or comparison of two sequences as a pair is essentialy to compare their constitution. The constitution can be from amino acis or neucliotides. When  You want to compare these two sequences then you want to score the result.But in the reallife due to the evolutionary pressures the rate of replacing the different amino acid by others is different.So therefore it is necessary to incorporate the variable propencity of replacement or substitution by suitable scores.In this goal we are helped by Scoring Matrix.

We will see what are the scoring matrixes, how we will build scoring matrixes and how we can use them in the alignment process.

Introduction of Scoring Matrix

 
An amino acid can be replaced by another amino acid based on their chemical ,physical and other special properties. so if two amino acids have same chemical behaviour then there is a high chance for their substitution during evolution, However if there is totaly different properties then there is  very low chance of such an amino acid replaced by another amino acid


So the scoring Matrixes they are variable or flexible in scoring such substitions and therefore they have the substition for each amino acid scored differently, ie scoring matrixes have substitution value of each amino acid in a unique way.

How to build Scoring Matrix?


Consider the protein sequences that are there in nature, find protein sequences that are similar to each other and homologous to each other, so once we isolated the set of similar protein sequences then we see which amino acid in one sequence is substituted with which amino acid in the other sequences.In this way we build a frequency list of amino acid that is frequently an amino acid is substituted by another amino acid,

Scores in the Scoring Matrix


So the scoring Matrix by looking at such frequencies may contain a +ve value ie a very easy transition or substitution from one amino acid to another , -ve value which means a rare substitution and may be a zero  as well.

Here is an example of a protein called Ubiquitin


List of ubiqutins are shown from humans, chimps, mouse etc. and their sequences have been aligned with each other as we can see some of the amino acids are raely substituted while some others are completely conserved while some other amino acids are changed.



So what we do in such a frequency count is to apply a formula

Here S will be the Scoring Matrix
         a  will be he first amino acid
         b  will be the second amino acid
So subtitution a by b will be scored by multiplying some constants

 by the log of this ratio 

Pab is the probability of amino acid 'a' substituted by amino 'b' where a nd b can be any two amino acids.
fa and fb  is the frequencies of amino acids a and b

So by computing the S for a,b, and if you vary a and b to all the amino acids that is 20 amino acids you can arrive at the scoring matrix.

Let us consider a scoring matrix



Consider the negative score -5 when D is substituted by W or zeros when S is substitued by D Q G. The diagonal indicated by red line shows very high scores specially if C is substituted by another C the score is substituted by 13, which means C is mostly conserved.

So in Conclusion, A postive value  like +1,+5 has been assigned for match, similarly -10, -2 for mismatches for all the amino acids the scoring matrixes selectively score or score differently for each amino acids depending on their chemical and physical properties which are reflected in their frequency of occurance.




No comments:

Post a Comment