Identification of Conserved Regulatory Elements in Mammalian Promoter Regions: A Case Study Using the PCK1 Promoter
George E. Liu , Matthew T. Weirauch, Curtis P. Van Tassell, Robert W. Li, Tad S. Sonstegard, Lakshmi K. Matukumalli, Erin E. Connor, Richard W. Hanson, Jianqi Yang
A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found that the score distributions of most binding site models did not follow the Gaussian distribution required by many statistical methods. Therefore, we performed an empirical test to establish the optimal threshold for each model. We gauged our computational predictions by comparing with previously known TFBSs in the PCK1 gene promoter of the cytosolic isoform of phosphoenolpyruvate carboxykinase, and achieved a sensitivity of 75% and a specificity of approximately 32%. Almost all known sites overlapped with predicted sites, and several new putative TFBSs were also identified. We validated a predicted SP1 binding site in the control of PCK1 transcription using gel shift and reporter assays. Finally, we applied our computational approach to the prediction of putative TFBSs within the promoter regions of all available RefSeq genes. Our full set of TFBS predictions is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.
Comparative Genomic Study Reveals a Transition from TA Richness in Invertebrates to GC Richness in Vertebrates at CpG Flanking Sites: An Indication for Context-Dependent Mutagenicity of Methylated CpG Sites
Yong Wang, Frederick C.C. Leung
Vertebrate genomes are characterized with CpG deficiency, particularly for GC-poor regions. The GC content-related CpG deficiency is probably caused by context-dependent deamination of methylated CpG sites. This hypothesis was examined in this study by comparing nucleotide frequencies at CpG flanking positions among invertebrate and vertebrate genomes. The finding is a transition of nucleotide preference of 5′ T to 5′ A at the invertebrate-vertebrate boundary, indicating that a large number of CpG sites with 5′ Ts were depleted because of global DNA methylation developed in vertebrates. At genome level, we investigated CpG observed/expected (obs/exp) values in 500 bp fragments, and found that higher CpG obs/exp value is shown in GC-poor regions of invertebrate genomes (except sea urchin) but in GC-rich sequences of vertebrate genomes. We next compared GC content at CpG flanking positions with genomic average, showing that the GC content is lower than the average in invertebrate genomes, but higher than that in vertebrate genomes. These results indicate that although 5′ T and 5′ A are different in inducing deamination of methylated CpG sites, GC content is even more important in affecting the deamination rate. In all the tests, the results of sea urchin are similar to vertebrates perhaps due to its fractional DNA methylation. CpG deficiency is therefore suggested to be mainly a result of high mutation rates of methylated CpG sites in GC-poor regions.
Phylogenetic Analysis of Brine Shrimp (Artemia) in China Using DNA Barcoding
Weiwei Wang, Qibin Luo, Haiyan Guo, Peter Bossier, Gilbert Van Stappen, Patrick Sorgeloos, Naihong Xin, Qishi Sun, Songnian Hu, Jun Yu
DNA barcoding is a powerful approach for characterizing species of organisms, especially those with almost identical morphological features, thereby helping to to establish phylogenetic relationships and reveal evolutionary histories. In this study, we chose a 648-bp segment of the mitochondrial gene, cytochrome c oxidase subunit 1 (COI), as a standard barcode region to establish phylogenetic relationships among brine shrimp (Artemia) species from major habitats around the world and further focused on the biodiversity of Artemia species in China, especially in the Tibetan Plateau. Samples from five major salt lakes of the Tibetan Plateau located at altitudes over 4,000 m showed clear differences from other Artemia populations in China. We also observed two consistent amino acid changes, 153A/V and 183L/F, in the COI gene between the high and low altitude species in China. Moreover, indels in the COI sequence were identified in cyst and adult samples unique to the Co Qen population from the Tibetan Plateau, demonstrating the need for additional investigations of the mitochondrial genome among Tibetan Artemia populations.
Identification of Semaphorin 5A Interacting Protein by Applying Apriori Knowledge and Peptide Complementarity Related to Protein Evolution and Structure
Anguraj Sadanandam, Michelle L. Varney, Rakesh K. Singh
In the post-genomic era, various computational methods that predict protein-protein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false predictions. Here we developed a unique integrated approach to identify interacting partner(s) of Semaphorin 5A (SEMA5A), beginning with seven proteins sharing similar ligand interacting residues as putative binding partners. The methods include Dwyer and Root-Bernstein/Dillon theories of protein evolution, hydropathic complementarity of protein structure, pattern of protein functions among molecules, information on domain-domain interactions, co-expression of genes and protein evolution. Among the set of seven proteins selected as putative SEMA5A interacting partners, we found the functions of Plexin B3 and Neuropilin-2 to be associated with SEMA5A. We modeled the semaphorin domain structure of Plexin B3 and found that it shares similarity with SEMA5A. Moreover, a virtual expression database search and RT-PCR analysis showed co-expression of SEMA5A and Plexin B3 and these proteins were found to have co-evolved. In addition, we confirmed the interaction of SEMA5A with Plexin B3 in co-immunoprecipitation studies. Overall, these studies demonstrate that an integrated method of prediction can be used at the genome level for discovering many unknown protein binding partners with known ligand binding domains.
SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
Xiao Li, Qingan Ren, Yang Weng, Haoyang Cai, Yunmin Zhu, Yizheng Zhang
Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly sequenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred, to improve the accuracy of prediction by combining multiple sources of evidence. SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCG-Pred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCG-Pred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http://bio.scu.edu.cn/SCGPred/.
MOF: An R Function to Detect Outlier Microarray
Song Yang, Xiang Guo, Hai Hu
We developed an R function named “microarray outlier filter” (MOF) to assist in the identification of failed arrays. In sorting a group of similar arrays by the likelihood of failure, two statistical indices were employed: the correlation coefficient and the percentage of outlier spots. MOF can be used to monitor the quality of microarray data for both trouble shooting, and to eliminate bad datasets from downstream analysis. The function is freely avaliable at http://www.wriwindber.org/applications/mof/.
In Silico Analysis of Crop Science: Report on the First China-UK Workshop on Chips, Computers and Crops
Ming Chen, Andrew Harrison
A workshop on “Chips, Computers and Crops” was held in Hangzhou, China during September 26–27, 2008. The main objective of the workshop was to bring together China and UK scientists from mathematics, bioinformatics and plant molecular biology communities to exchange ideas, enhance awareness of each others’ fields, explore synergisms and make recommendations on fruitful future directions in crop science. Here we describe the contributions to the workshop, and examine some conceptual issues that lie at the foundations and future of crop systems biology.