Ten years ago it has been demonstrated that evolutionarily co-evolving residues in a protein mediate allosteric communication involved in cellular signaling. Recently, it has been observed that co-evolving positions can also explain folding, and key mutations implied in genetic diseases. Although predictions and biological evidence showed that some positions are correlated in the DNA and protein sequences, the evolutionary models used in phylogeny assume that these positions are evolving in an independent fashion.
Here we propose a new model that considers co-evolving positions.
The model is based on a 16X16 instantaneous rate matrix and three parameters: s, d, w. where s is the rate associated with a transition from a co-evolving combination to a non-co-evolving one, d is the rate of a transition from one non-co-evolving combination to a co-evolving one, w is the rate attributed to a single mutation occurring between two non-co-evolving combinations.
To evaluate the new model, we use likelihood ratio test (LRT) between two models: the null model where independent evolution is assumed for each position (i.e. s=d=w) and the dependent model in which co-evolution is assumed.
The results show that the null model has a weaker likelihood when two positions are co-evolving, whereas in the case of independent positions, the dependent and the null models have similar likelihoods.
In the past decade several methods have been developed to identify co-evolving positions using probabilistic or combinatorial approaches. These models give a score of correlation but they do not distinguish within co-evolving positions, combinations of nucleotides that are indeed co-evolving across the phylogeny.
This likelihood-based framework represents a step forward in reconstructing the evolution of co-evolving patterns based on a phylogeny with potential applications in evolutionary studies and mutagenesis experiments.