Article Online

Articles Online (Volume 4, Issue 4)


Comparative Analysis of Eubacterial DNA Polymerase III Alpha Subunits

Xiao-Qian Zhao, Jian-Fei Hu, Jun Yu

DNA polymerase III is one of the five eubacterial DNA polymerases that is responsible for the replication of DNA duplex. Among the ten subunits of the DNA polymerase III core enzyme, the alpha subunit catalyzes the reaction for polymerizing both DNA strands. In this study, we extracted genomic sequences of the alpha subunit from 159 sequenced eubacterial genomes, and carried out sequence-based phylogenetic and structural analyses. We found that all eubacterial genomes have one or more alpha subunits, which form either homodimers or heterodimers. Phylogenetic and domain structural analyses as well as copy number variations of the alpha subunit in each bacterium indicate the classification of alpha subunit into four basic groups: polC, dnaE1, dnaE2, and dnaE3. This classification is of essence in genome composition analysis. We also consolidated the naming convention to avoid further confusion in gene annotations.

Page 203–211


High-Sensitivity Transcriptome Data Structure and Implications for Analysis and Biologic Interpretation

Sebastian Noth, Guillaume Brysbaert, François-Xavier Pellay, Arndt Benecke

Novel microarray technologies such as the AB1700 platform from Applied Biosys-tems promise significant increases in the signal dynamic range and a higher sensitivity for weakly expressed transcripts. We have compared a representative set of AB1700 data with a similarly representative Affymetrix HG-U133A dataset. The AB1700 design extends the signal dynamic detection range at the lower bound by one order of magnitude. The lognormal signal distribution profiles of these high-sensitivity data need to be represented by two independent distributions. The additional second distribution covers those transcripts that would have gone undetected using the Affymetrix technology. The signal-dependent variance distribution in the AB1700 data is a non-trivial function of signal intensity, describable using a composite function. The drastically different structure of these high-sensitivity transcriptome profiles requires adaptation or even redevelopment of the standard microarray analysis methods. Based on the statistical properties, we have derived a signal variance distribution model for AB1700 data that is necessary for such development. Interestingly, the dual lognormal distribution observed in the AB1700 data reflects two fundamentally different biologic mechanisms of transcription initiation.

Page 212–229


Comparative Analysis of Splice Site Regions by Information Content

T. Shashi Rekha, Chanchal K. Mitra

We have applied concepts from information theory for a comparative analysis of donor (gt) and acceptor (ag) splice site regions in the genes of five different organisms by calculating their mutual information content (relative entropy) over a selected block of nucleotides. A similar pattern that the information content decreases as the block size increases was observed for both regions in all the organisms studied. This result suggests that the information required for splicing might be contained in the consensus of ∼6–8 nt at both regions. We assume from our study that even though the nucleotides are showing some degrees of conservation in the flanking regions of the splice sites, certain level of variability is still tolerated, which leads the splicing process to occur normally even if the extent of base pairing is not fully satisfied. We also suggest that this variability can be compensated by recognizing different splice sites with different spliceosomal factors.

Page 230–237


Prediction of GPCR-G Protein Coupling Specificity Using Features of Sequences and Biological Functions

Toshihide Ono,Haretsugu Hishigaki

Understanding the coupling specificity between G protein-coupled receptors (GPCRs) and specific classes of G proteins is important for further elucidation of receptor functions within a cell. Increasing information on GPCR sequences and the G protein family would facilitate prediction of the coupling properties of GPCRs. In this study, we describe a novel approach for predicting the coupling specificity between GPCRs and G proteins. This method uses not only GPCR sequences but also the functional knowledge generated by natural language processing, and can achieve 92.2% prediction accuracy by using the C4.5 algorithm. Furthermore, rules related to GPCR-G protein coupling are generated. The combination of sequence analysis and text mining improves the prediction accuracy for GPCR-G protein coupling specificity, and also provides clues for understanding GPCR signaling.

Page 238–244


A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

Xiao-Gang Ruan, Jin-Lian Wang,Jian-Geng Li

Computational analysis is essential for transforming the masses of microarray data into a mechanistic understanding of cancer. Here we present a method for finding gene functional modules of cancer from microarray data and have applied it to colon cancer. First, a colon cancer gene network and a normal colon tissue gene network were constructed using correlations between the genes. Then the modules that tended to have a homogeneous functional composition were identified by splitting up the network. Analysis of both networks revealed that they are scale-free. Comparison of the gene functional modules for colon cancer and normal tissues showed that the modules' functions changed with their structures.

Page 245–252


VGIchan: Prediction and Classification of Voltage-Gated Ion Channels

Sudipto Saha, Jyoti Zack, Balvinder Singh, G.P.S. Raghava

This study describes methods for predicting and classifying voltage-gated ion channels. Firstly, a standard support vector machine (SVM) method was developed for predicting ion channels by using amino acid composition and dipeptide composition, with an accuracy of 82.89% and 85.56%, respectively. The accuracy of this SVM method was improved from 85.56% to 89.11% when combined with PSI-BLAST similarity search. Then we developed an SVM method for classifying ion channels (potassium, sodium, calcium, and chloride) by using dipeptide composition and achieved an overall accuracy of 96.89%. We further achieved a classification accuracy of 97.78% by using a hybrid method that combines dipeptide-based SVM and hidden Markov model methods. A web server VGIchan has been developed for predicting and classifying voltage-gated ion channels using the above approaches. VGIchan is freely available at

Page 253–258


KaKs_Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Zhang Zhang, Jun Li, Xiao-Qian Zhao, Jun Wang, Gane Ka-Shu Wong, Jun Yu

KaKs_Calculator is a software package that calculates nonsynonymous (Ka) and synonymous (Ks) substitution rates through model selection and model averaging. Since existing methods for this estimation adopt their specific mutation (substitution) models that consider different evolutionary features, leading to diverse estimates, KaKs_Calculator implements a set of candidate models in a maximum likelihood framework and adopts the Akaike information criterion to measure fitness between models and data, aiming to include as many features as needed for accurately capturing evolutionary information in protein-coding sequences. In addition, several existing methods for calculating Ka and Ks are also incorporated into this software. KaKs_Calculator, including source codes, compiled executables, and documentation, is freely available for academic use at

Page 259–263