Article Online

Articles Online (Volume 5, Issue 3)


A Scenario on the Stepwise Evolution of the Genetic Code

Jing-Fa Xiao, Jun Yu

It is believed that in the RNA world the operational (ribozymes) and the informational (riboscripts) RNA molecules were created with only three (adenosine, uridine, and guanosine) and two (adenosine and uridine) nucleosides, respectively, so that the genetic code started uncomplicated. Ribozymes subsequently evolved to be able to cut and paste themselves and riboscripts were acceptive to rigorous editing (adenosine to inosine); the intensive diversification of RNA molecules shaped novel cellular machineries that are capable of polymerizing amino acids—a new type of cellular building materials for life. Initially, the genetic code, encoding seven amino acids, was created only to distinguish purine and pyrimidine; it was later expanded in a stepwise way to encode 12, 15, and 20 amino acids through the relief of guanine from its roles as operational signals and through the recruitment of cytosine. Therefore, the maturation of the genetic code also coincided with (1) the departure of aminoacyl-tRNA synthetases (AARSs) from the primordial translation machinery, (2) the replacement of informational RNA by DNA, and (3) the co-evolution of AARSs and their cognate tRNAs. This model predicts gradual replacements of RNA-made molecular mechanisms, cellular processes by proteins, and informational exploitation by DNA.

Page 143–151


Evaluating Peptide Mass Fingerprinting-based Protein Identification

Senthilkumar Damodaran , Troy D. Wood, Priyadharsini Nagarajan, Richard A. Rabin

Identification of proteins by mass spectrometry (MS) is an essential step in proteomic studies and is typically accomplished by either peptide mass fingerprinting (PMF) or amino acid sequencing of the peptide. Although sequence information from MS/MS analysis can be used to validate PMF-based protein identification, it may not be practical when analyzing a large number of proteins and when high- throughput MS/MS instrumentation is not readily available. At present, a vast majority of proteomic studies employ PMF. However, there are huge disparities in criteria used to identify proteins using PMF. Therefore, to reduce incorrect protein identification using PMF, and also to increase confidence in PMF-based protein identification without accompanying MS/MS analysis, definitive guiding principles are essential. To this end, we propose a value-based scoring system that provides guidance on evaluating when PMF-based protein identification can be deemed sufficient without accompanying amino acid sequence data from MS/MS analysis.

Page 152–157

Review Article

Model-based Comparative Prediction of Transcription-Factor Binding Motifs in Anabolic Responses in Bone

Andy B. Chen, Kazunori Hamamura, Guohua Wang, Weirong Xing, Subburaman Mohan, Hiroki Yokota, Yunlong Liu

Understanding the regulatory mechanism that controls the alteration of global gene expression patterns continues to be a challenging task in computational biology. We previously developed an ant algorithm, a biologically-inspired computational technique for microarray data, and predicted putative transcription-factor binding motifs (TFBMs) through mimicking interactive behaviors of natural ants. Here we extended the algorithm into a set of web-based software, Ant Modeler, and applied it to investigate the transcriptional mechanism underlying bone formation. Mechanical loading and administration of bone morphogenic proteins (BMPs) are two known treatments to strengthen bone. We addressed a question: Is there any TFBM that stimulates both “anabolic responses of mechanical loading” and “BMP-mediated osteogenic signaling”? Although there is no significant overlap among genes in the two responses, a comparative model-based analysis suggests that the two independent osteogenic processes employ common TFBMs, such as a stress responsive element and a motif for peroxisome proliferator-activated receptor (PPAR). The post-modeling in vitro analysis using mouse osteoblast cells supported involvements of the predicted TFBMs such as PPAR, Ikaros 3, and LMO2 in response to mechanical loading. Taken together, the results would be useful to derive a set of testable hypotheses and examine the role of specific regulators in complex transcriptional control of bone formation.

Page 158–165

Review Article

Reconstruction of Pathways Associated with Amino Acid Metabolism in Human Mitochondria

Purnima Guda, Chittibabu Guda, Shankar Subramaniam

We have used a bioinformatics approach for the identification and reconstruction of metabolic pathways associated with amino acid metabolism in human mitochondria. Human mitochondrial proteins determined by experimental and computational methods have been superposed on the reference pathways from the KEGG database to identify mitochondrial pathways. Enzymes at the entry and exit points for each reconstructed pathway were identified, and mitochondrial solute carrier proteins were determined where applicable. Intermediate enzymes in the mitochondrial pathways were identified based on the annotations available from public databases, evidence in current literature, or our MITOPRED program, which predicts the mitochondrial localization of proteins. Through integration of the data derived from experimental, bibliographical, and computational sources, we reconstructed the amino acid metabolic pathways in human mitochondria, which could help better understand the mitochondrial metabolism and its role in human health.

Page 166–176


Prediction of Protein-Protein Interactions Using Protein Signature Profiling

Mahmood A. Mahdavi, Yen-Han Lin

Protein domains are conserved and functionally independent structures that play an important role in interactions among related proteins. Domain-domain interactions have been recently used to predict protein-protein interactions (PPI). In general, the interaction probability of a pair of domains is scored using a trained scoring function. Satisfying a threshold, the protein pairs carrying those domains are regarded as “interacting”. In this study, the signature contents of proteins were utilized to predict PPI pairs in Saccharomyces cerevisiae, Caenorhabditis ele-gans, and Homo sapiens. Similarity between protein signature patterns was scored and PPI predictions were drawn based on the binary similarity scoring function. Results show that the true positive rate of prediction by the proposed approach is approximately 32% higher than that using the maximum likelihood estimation method when compared with a test set, resulting in 22% increase in the area under the receiver operating characteristic (ROC) curve. When proteins containing one or two signatures were removed, the sensitivity of the predicted PPI pairs increased significantly. The predicted PPI pairs are on average 11 times more likely to interact than the random selection at a confidence level of 0.95, and on average 4 times better than those predicted by either phylogenetic profiling or gene expression profiling.

Page 177–186

Review Article

U7 snRNAs: A Computational Survey

Manja Marz, Axel Mosig, Bärbel M.R. Stadler, Peter F. Stadler

U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase II. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database.

Page 187–195

Review Article

Computational Prediction of Rice (Oryza sativa) miRNA Targets

Sunil Archak, J. Nagaraju

Bioinformatic approaches have complemented experimental efforts to inventorize plant miRNA targets. We carried out global computational analysis of rice (Oryza sativa) transcriptome to generate a comprehensive list of putative miRNA targets. Our predictions (684 unique transcripts) showed that rice miRNAs mediate regulation of diverse functions including transcription (41%), catalysis (28%), binding (18%), and transporter activity (11%). Among the predicted targets, 61.7% hits were in coding regions and nearly 72% targets had a solitary miRNA hit. The study predicted more than 70 novel targets of 34 miRNAs putatively regulating functions like stress-response, catalysis, and binding. It was observed that more than half (55%) of the targets were conserved between O. sativa indica and O. sativa japonica. Members of 31 miRNA families were found to possess conserved targets between rice and at least one of other grass family members. About 44% of the unique targets were common between two dissimilar miRNA prediction algorithms. Such an extent of cross-species conservation and algorithmic consensus

Page 196–206

Review Article

Construction, Characterization, and Chromosomal Mapping of a Fosmid Library of the White-Cheeked Gibbon (Nomascus leucogenys)

Liping Chen, Jianping Ye, Yan Liu, Jinghuan Wang, Weiting Su, Fengtang Yang, Wenhui Nie

Gibbons have experienced extensive karyotype rearrangements during evolution and represent an ideal model for studying the underlying molecular mechanism of evolutionary chromosomal rearrangements. It is anticipated that the cloning and sequence characterization of evolutionary chromosomal breakpoints will provide vital insights into the molecular force that has driven such a radical karyotype reshuffle in gibbons. We constructed and characterized a high-quality fosmid library of the white-cheeked gibbon (Nomascus leucogenys) containing 192,000 non- redundant clones with an average insert size of 38 kb and 2.5-fold genome coverage. By end sequencing of 100 randomly selected fosmid clones, we generated 196 sequence tags for the library. These end-sequenced fosmid clones were then mapped onto the chromosomes of the white-cheeked gibbon by fluorescence in situ hybridization, and no spurious chimeric clone was detected. BLAST search against the human genome showed a good correlation between the number of hit clones and the number of chromosomes, an indication of unbiased chromosomal distribution of the fosmid library. The chromosomal distribution of the mapped clones is also consistent with the BLAST search result against human and white-cheeked gibbon genomes. The fosmid library and the mapped clones will serve as a valuable resource for further studying gibbons' chromosomal rearrangements and the underlying molecular mechanism as well as for comparative genomic study in the lesser apes.

Page 207–215

Review Article

Transcriptome and Proteome Expressions Involved in Insulin Resistance in Muscle and Activated T-Lymphocytes of Patients with Type 2 Diabetes

Frankie B. Stentz, Abbas E. Kitabchi

We analyzed the genes expressed (transcriptomes) and the proteins translated (pro- teomes) in muscle tissues and activated CD4+ and CD8+ T-lymphocytes (T-cells) of five Type 2 diabetes (T2DM) subjects using Affymetrix microarrays and mass spectrometry, and compared them with matched non-diabetic controls. Gene expressions of insulin receptor (INSR), vitamin D receptor, insulin degrading enzyme, Akt, insulin receptor substrate-1 (IRS-1), IRS-2, glucose transporter 4 (GLUT4), and enzymes of the glycolytic pathway were decreased at least 50% in T2DM than in controls. However, there was greater than two-fold gene upregulation of plasma cell glycoprotein-1, tumor necrosis factor α (TNFα, and gluconeogenic enzymes in T2DM than in controls. The gene silencing for INSR or TNFα resulted in the inhibition or stimulation of GLUT4, respectively. Proteome profiles corresponding to molecular weights of the above translated transcriptomes showed different patterns of changes between T2DM and controls. Meanwhile, changes in transcriptomes and proteomes between muscle and activated T-cells of T2DM were comparable. Activated T-cells, analogous to muscle cells, expressed insulin signaling and glucose metabolism genes and gene products. In conclusion, T-cells and muscle in T2DM exhibited differences in expression of certain genes and gene products relative to non-diabetic controls. These alterations in transcriptomes and proteomes in T2DM may be involved in insulin resistance.

Page 216–235

Brief Report

Mycobacterial PE_PGRS Proteins Contain Calcium-Binding Motifs with Parallel β-roll Folds

Nandita Bachhawat, Balvinder Singh

The PE_PGRS family of proteins unique to mycobacteria is demonstrated to contain multiple calcium-binding and glycine-rich sequence motifs GGXGXD/NXUX. This sequence repeat constitutes a calcium-binding parallel β-roll or parallel β-helix structure and is found in RTX toxins secreted by many Gram-negative bacteria. It is predicted that the highly homologous PE PGRS proteins containing multiple copies of the nona-peptide motif could fold into similar calcium-binding structures. The implication of the predicted calcium-binding property of PE PGRS proteins in the light of macrophage-pathogen interaction and pathogenesis is presented.

Page 236–241


A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data

Nina Zhou, Lipo Wang

Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.

Page 242–249

Application Note

Oxypred: Prediction and Classification of Oxygen-Binding Proteins

S. Muthukrishnan, Aarti Garg, G.P.S. Raghava

This study describes a method for predicting and classifying oxygen-binding proteins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding proteins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Secondly, an SVM module was developed based on amino acid composition, classifying the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemocyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins (available from

Page 250–252

Application Note

FASMA: A Service to Format and Analyze Sequences in Multiple Alignments

Susan Costantini, Giovanni Colonna, Angelo M. Facchiano

Multiple sequence alignments are successfully applied in many studies for under- standing the structural and functional relations among single nucleic acids and protein sequences as well as whole families. Because of the rapid growth of sequence databases, multiple sequence alignments can often be very large and difficult to visualize and analyze. We offer a new service aimed to visualize and analyze the multiple alignments obtained with different external algorithms, with new features useful for the comparison of the aligned sequences as well as for the creation of a final image of the alignment. The service is named FASMA and is available at

Page 253–255

Application Note

Automated SNP Genotype Clustering Algorithm to Improve Data Completeness in High-Throughput SNP Genotyping Datasets from Custom Arrays

Edward M. Smith, Jack Littrell, Michael Olivier

High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.

Page 256–259