Articles Online (Volume 8, Issue 2)


The Most Redundant Sequences in Human CpG Island Library Are Derived from Mitochondrial Genome

Ximiao He, Shu Tao, Jing Jin, Songnian Hu, Jun Yu

An altered pattern of epigenetic modifications, such as DNA methylation and histone modification, is critical to many common human diseases, including cancer. Recently, mitochondrial DNA (mtDNA) was reported to be associated with tumorigenesis through epigenetic regulation of methylation patterns. One of the promising approaches to study DNA methylation and CpG islands (CGIs) is sequencing and analysis of clones derived from the physical library generated by methyl-CpG-binding domain proteins and restriction enzyme MseI. In this study, we observed that the most redundant sequences of 349 clones in a human CGI library were all generated from the human mitochondrial genome. Further analysis indicated that there was a 5,845-bp DNA transfer from mtDNA to chromosome 1, and all the clones should be the products of a 510-bp MseI fragment, which contained a putative CGI of 270 bp. The 510-bp fragment was annotated as part of cytochrome c oxidase subunit II (COXII), and phylogenetic analysis of homologous sequences containing COXII showed three DNA transfer events from mtDNA to nuclear genome, one of which underwent secondary transfer events between different chromosomes. These results may further our understanding of how the mtDNA regulates DNA methylation in the nucleus.

Page 81–91

Review Article

Sequence Signatures of Nucleosome Positioning in Caenorhabditis elegans

Kaifu Chen, Lei Wang, Meng Yang, Jiucheng Liu, Chengqi Xin, Songnian Hu, Jun Yu

Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5'-end to the 3'-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.

Page 92–102

Review Article

Genome-Wide Survey and Evolutionary Analysis of Trypsin Proteases in Apicomplexan Parasites

Aylan Farid Arenas, Juan Felipe Osorio-Méndez, Andres Julian Gutierrez, Jorge E. Gomez-Marin

Apicomplexa are an extremely diverse group of unicellular organisms that infect humans and other animals. Despite the great advances in combating infectious diseases over the past century, these parasites still have a tremendous social and economic burden on human societies, particularly in tropical and subtropical regions of the world. Proteases from apicomplexa have been characterized at the molecular and cellular levels, and central roles have been proposed for proteases in diverse processes. In this work, 16 new genes encoding for trypsin proteases are identified in 8 apicomplexan genomes by a genome-wide survey. Phylogenetic analysis suggests that these genes were gained through both intracellular gene transfer and vertical gene transfer. Identification, characterization and understanding of the evolutionary origin of protease-mediated processes are crucial to increase the knowledge and improve the strategies for the development of novel chemotherapeutic agents and vaccines.

Page 103–112

Review Article

Computational Identification of miRNAs and Their Target Genes from Expressed Sequence Tags of Tea (Camellia sinensis)

G.R. Prabu, A.K.A. Mandal

MicroRNAs (miRNAs) are a newly identified class of small non-protein-coding post-transcriptional regulatory RNA in both plants and animals. The use of computational homology based search for expressed sequence tags (ESTs) with the Ambros empirical formula and other structural feature criteria filter is a suitable combination towards the discovery and isolation of conserved miRNAs from tea and other plant species whose genomes are not yet sequenced. In the present study, we blasted the database of tea (Camellia sinensis) ESTs to search for potential miRNAs, using previously known plant miRNAs. For the first time, four candidate miRNAs from four families were identified in tea. Using the newly identified miRNA sequences, a total of 30 potential target genes were identified for 11 miRNA families; 6 of these predicted target genes encode transcription factors (20%), 16 target genes appear to play roles in diverse physiological processes (53%) and 8 target genes have hypothetical or unknown functions (27%). These findings considerably broaden the scope of understanding the functions of miRNA in tea.

Page 113–121

Application Note

ZiF-Predict: A Web Tool for Predicting DNA-Binding Specificity in C2H2 Zinc Finger Proteins

Bhuvan Molparia, Kanav Goyal, Anita Sarkar, Sonu Kumar, Durai Sundar

Engineering zinc finger protein motifs for specific DNA targets in genomes is critical in the field of genome engineering. We have developed a computational method for predicting recognition helices for C2H2 zinc fingers that bind to specific target DNA sites. This prediction is based on artificial neural network using an exhaustive dataset of zinc finger proteins and their target DNA triplets. Users can select the option for two or three zinc fingers to be predicted either in a modular or synergistic fashion for the input DNA sequence. This method would be valuable for researchers interested in designing specific zinc finger transcription factors and zinc finger nucleases for several biological and biomedical applications. The web tool ZiF-Predict is available online at∼sundar/zifpredict/.

Page 122–126

Application Note

PsRNA: A Computing Engine for the Comparative Identification of Putative Small RNA Locations within Intergenic Regions

Jayavel Sridhar , Govindaraj Sowmiya, Kanagaraj Sekar, Ziauddin Ahamed Rafi

Small RNAs (sRNAs) are non-coding transcripts exerting their functions in the cells directly. Identification of sRNAs is a difficult task due to the lack of clear sequence and structural biases. Most sRNAs are identified within genus specific intergenic regions in related genomes. However, several of these regions remain un-annotated due to lack of sequence homology and/or potent statistical identification tools. A computational engine has been built to search within the intergenic regions to identify and roughly annotate new putative sRNA regions in Enterobacteriaceae genomes. It utilizes experimentally known sRNA data and their flanking genes/KEGG Orthology (KO) numbers as templates to identify similar sRNA regions in related query genomes. The search engine not only has the capability to locate putative intergenic regions for specific sRNAs, but also has the potency to locate conserved, shuffled or deleted gene clusters in query genomes. Because it uses the KO terms for locating functionally important regions such as sRNAs, any further KO number assignment to additional genes will increase the sensitivity. The PsRNA server is used for the identification of putative sRNA regions through the information retrieved from the sRNA of interest. The computing engine is available online at and

Page 127–134

Application Note

BiDiBlast: Comparative Genomics Pipeline for the PC

João M.G.C.F. de Almeida

Bi-directional BLAST is a simple approach to detect, annotate, and analyze candidate orthologous or paralogous sequences in a single go. This procedure is usually confined to the realm of customized Perl scripts, usually tuned for UNIX-like environments. Porting those scripts to other operating systems involves refactoring them, and also the installation of the Perl programming environment with the required libraries. To overcome these limitations, a data pipeline was implemented in Java. This application submits two batches of sequences to local versions of the NCBI BLAST tool, manages result lists, and refines both bi-directional and simple hits. GO Slim terms are attached to hits, several statistics are derived, and molecular evolution rates are estimated through PAML. The results are written to a set of delimited text tables intended for further analysis. The provided graphic user interface allows a friendly interaction with this application, which is documented and available to download at or under the GNU GPL license.

Page 135–138