The large number of sequenced eukaryotic genomes and gene expression data provides unique opportunities to ask whether genes that co-evolve are also involved in similar biological processes. In fact, several methods have been successfully used to identify important functional relationships between a gene of interest and other eukaryotic genes[1-4] Unfortunately, the expertise and computational resources required to compare tens of genomes and gene expression data sets makes this type of analysis difficult for the average end-user[5,6]. Here, we describe and implement a web server that integrates profiles of sequence divergence derived by a Hidden Markov Model (HMM) and tissue-wide gene expression patterns to determine putative functional linkages between pairs of genes. We termed the server “EvoCor”, to denote that it detects functional relationships through evolutionary analysis and gene expression correlation.
To represent the evolutionary history of each gene, we constructed a binary vector of length 182, which represents the total number of Eukaryotic species in the NCBI database as of the writing of this paper. Each point in the vector encodes a 1 or a 0, which indicates whether a sequence homolog can be found in that species. In contrast to previous methods, we employ an HMM Profile search using HMMER36 to determine sequence homologs. We hypothesize that genes that show correlated patterns of sequence divergence will be functionally related. We use these matrices to calculate the pairwise Hamming Distance between the gene of interest and every other protein-coding gene in the human genome. We then calculate the Pearson correlation coefficient based on a tissue-wide atlas of gene expression data (NCBI GSE10246 and NCBI GSE1133) to identify genes that share a similar expression pattern as the gene of interest and generate a list of genes predicted to be functionally related.
Searching with EvoCor is easy! Simply type in your query gene and click search:
After clicking search, EvoCor will determine the top 50 genes that have a Pearson correlation coefficient of >0.2, and will rank the results by descending hamming distance (where 0 denotes an identical hamming matrix).
The search results generally take about one minute.
- Baughman, J., Perocchi, F. & Girgis, H. Integrative genomics identifies MCU as an essential component of the mitochondrial calcium uniporter. Nature, dpi:10.1038/nature10234 (2011).
- Pagliarini, D. J. et al. A Mitochondrial Protein Compendium Elucidates Complex I Disease Biology. Cell, 112-123, doi:10.1016/j.cell.2008.06.016 (2008).
- Mootha, V. K. et al. Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proceedings of the National Academy of Sciences 100, 605-610 (2003).
- Calvo, S. et al. Systematic identification of human mitochondrial disease genes through integrative genomics. Nature Genetics 38, 576-582 (2006).
- Pellegrini, M. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the … 96, 4285-4288 (1999).
- Eddy, S. R. Accelerated Profile HMM Searches. PLoS computational biology 7, e1002195, doi:10.1371/journal.pcbi.1002195 (2011).
- Lattin, J. E. et al. Expression analysis of G Protein-Coupled Receptors in mouse macrophages. Immunome research 4, 5, doi:10.1186/1745-7580-4-5 (2008).
- Su AI et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 2004 Apr 20;101(16):6062-7.