Articles Online (Volume 8, Issue 1)


Host Proteome Research in HIV Infection

Lijun Zhang , Xiaojun Zhang, Qing Ma, Honghao Zhou

Proteomics has been widely used in the last few years to look for new biomarkers and decipher the mechanism of HIV–host interaction. Herein, we review the recent developments of HIV/AIDS proteomic research, including the samples used in HIV/AIDS related research, the technologies used for proteomic study, the diagnosis biomarkers of HIV-associated disease especially HIV-associated neurocognitive impairment, the mechanisms of HIV–host interaction, HIV-associated dementia, substance abuse, and so on. In the end of this review, we also give some prospects about the limitation and future improvement of HIV/AIDS proteomic research.

Page 1–9


Characterization of Evolutionarily Conserved MicroRNAs in Amphioxus

Lei Wang, Lan Jiang, Songnian Hu, Yejun Wang

Amphioxus is an extant species closest to the ancestry of vertebrates. Observation of microRNA (miRNA) distribution of amphioxus would lend some hints for evolutionary research of vertebrates. In this study, using the pub-licly available scaffold data of the Florida amphioxus (Branchiostoma floridae) genome, we screened and characterized homologs of miRNAs that had been identified in other species. In total, 68 pieces of such homologs were obtained and classified into 33 families. Most of these miRNAs were distributed as clusters in genome. Inter-species comparison showed that many miRNAs, which had been thought as vertebrate- or mammal-specific before, were also present in amphioxus, while some miRNAs that had been considered as protostome-specific before also existed in amphioxus. Compared with ciona, amphioxus had an apparent miRNA gene expansion, but phylogenetic analysis showed that the duplicated miRNAs or clusters of amphioxus had a higher homology level than those duplicated ones in vertebrates.

Page 10–21


Study of Completed Archaeal Genomes and Proteomes: Hypothesis of Strong Mutational AT Pressure Existed in Their Common Predecessor

Vladislav V. Khrustalev, Eugene V. Barkovsky

The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study. We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorithm by calculating the total GC content (G+C), GC content in first, second and third codon positions as well as in fourfold and twofold degenerated sites from third codon positions, levels of arginine codon usage (Arg2: AGA/G; Arg4: CGX), levels of amino acid usage and the entropy of amino acid content distribution. In archaeal genomes with strong GC pressure, arginine is coded preferably by GC-rich Arg4 codons, whereas in most of archaeal genomes with G+C<0.6, arginine is coded preferably by AT-rich Arg2 codons. In the genome of Haloquadratum walsbyi, which is closely related to GC-rich archaea, GC content has decreased mostly in third codon positions, while Arg4>>Arg2 bias still persists. Proteomes of archaeal species carry characteristic amino acid biases: levels of isoleucine and lysine are elevated, while levels of alanine, histidine, glutamine and cytosine are relatively decreased. Numerous genomic and proteomic biases observed can be explained by the hypothesis of previously existed strong mutational AT pressure in the common predecessor of all archaea.

Page 22–32


Whole-Cell Protein Identification Using the Concept of Unique Peptides

Yupeng Zhao, Yen-Han Lin

A concept of unique peptides (CUP) was proposed and implemented to identify whole-cell proteins from tandem mass spectrometry (MS/MS) ion spectra. A unique peptide is defined as a peptide, irrespective of its length, that exists only in one protein of a proteome of interest, despite the fact that this peptide may appear more than once in the same protein. Integrating CUP, a two-step whole-cell protein identification strategy was developed to further increase the confidence of identified proteins. A dataset containing 40,243 MS/MS ion spectra of Saccharomyces cerevisiae and protein identification tools including Mascot and SEQUEST were used to illustrate the proposed concept and strategy. Without implementing CUP, the proteins identified by SEQUEST are 2.26 fold of those identified by Mascot. When CUP was applied, the proteins bearing unique peptides identified by SEQUEST are 3.89 fold of those identified by Mascot. By cross-comparing two sets of identified proteins, only 89 common proteins derived from CUP were found. The key discrepancy between identified proteins was resulted from the filterng criteria employed by each protein identification tool. According to the origin of peptides classified by CUP and the commonality of proteins recognized by protein identification tools, all identified proteins were cross-compared, resulting in four groups of proteins possessing different levels of assigned confidence.

Page 33–41

Review Article

In silico Analysis of Sequential, Structural and Functional Diversity of Wheat Cystatins and Its Implication in Plant Defense

Shriparna Dutt, V.K. Singh, Soma S. Marla, Anil Kumar

Phytocystatins constitute a multigene family that regulates the activity of endogenous and/or exogenous cysteine proteinases. Cereal crops like wheat are continuously threatened by a multitude of pathogens, therefore cystatins offer to play a pivotal role in deciding the plant response. In order to study the need of having diverse specificities and activities of various cystatins, we conducted comparative analysis of six wheat cystatins (WCs) with twelve rice, seven barley, one sorghum and ten corn cystatin sequences employing different bioinformatics tools. The obtained results identified highly conserved signature sequences in all the cystatins considered. Several other motifs were also identified, based on which the sequences could be categorized into groups in congruence with the phylogenetic clustering. Homology modeling of WCs revealed 3D structural topology so well shared by other cystatins. Protein–protein interaction of WCs with papain supported the notion that functional diversity is a con-sequence of existing differences in amino acid residues in highly conserved as well as relatively less conserved motifs. Thus there is a significant conservation at the sequential and structural levels; however, concomitant variations maintain the functional diversity in this protein family, which constantly modulates itself to reciprocate the diversity while counteracting the cysteine proteinases.

Page 42–56


Quality Assessment of Transcriptome Data Using Intrinsic Statistical Properties

Guillaume Brysbaert, François-Xavier Pellay, Sebastian Noth, Arndt Benecke

In view of potential application to biomedical diagnosis, tight transcriptome data quality control is compulsory. Usually, quality control is achieved using labeling and hybridization controls added at different stages throughout the processing of the biologic RNA samples. These control measures, however, only reflect the performance of the individual technical manipulations during the entire process and have no bearing as to the continued integrity of the RNA sample itself. Here we demonstrate that intrinsic statistical properties of the resulting transcriptome data signal and signal-variance distributions and their invariance can be identified independently of the animal species studied and the labeling protocol used. From these invariant properties we have developed a data model, the parameters of which can be estimated from individual experiments and used to compute relative quality measures based on similarity with large reference datasets. These quality measures add supplementary, non-redundant information to standard quality control estimates based on spike-in and hybridization controls, and are exploitable in data analysis. A software application for analyzing datasets as well as a reference dataset for AB1700 arrays are provided. They should allow AB1700 users to easily integrate this method into their analysis pipeline, and might instigate similar developments for other transcriptome platforms.

Page 57–71


A Modified Enrichment Method to Construct Microsatellite Library from Plateau Pika Genome (Ochotona curzoniae)

Jianing Geng, Kexin Li, Yanming Zhang , Songnian Hu

A microsatellite-enriched library of plateau pika (Ochotona curzoniae) was constructed according to the strong affinity between biotin and streptavidin. Firstly, genomic DNA was fragmented by ultrasonication, which is a major improvement over traditional methods. Linker-ligated DNA fragments were hybridized with biotinylated microsatellite probes, and then were subjected to streptavidin-coated magnetic beads. PCR amplification was performed to obtain double-stranded DNA fragments containing microsatellites. Ligation and transformation were carried out by using the pGEM-T Vector System I and Escherichia coli DH10B competent cells. Sequencing results showed that 80.2% of clones contained microsatellite repeat motif. Several modifications make this protocol time-efficient and technically easier than the traditional ones; particularly, composition and relative abundance of microsatellite repeats in plateau pika genome were truly represented through the optimized PCR conditions. This method has also been successfully applied to construct microsatellite-enriched genomic libraries of Chinese hamster (Cricetulus griseus) and small abalone [Haliotis diversicolor (Reeve)] with high rates of positive clones, demonstrating its feasibility and stability.

Page 72–76

Application Note

KaKs_Calculator 2.0: A Toolkit Incorporating Gamma-Series Methods and Sliding Window Strategies

Dapeng Wang, Yubin Zhang, Zhang Zhang, Jiang Zhu, Jun Yu

We present an integrated stand-alone software package named KaKs_Calculator 2.0 as an updated version. It incorporates 17 methods for the calculation of nonsynonymous and synonymous substitution rates; among them, we added our modified versions of several widely used methods as the gamma series including γ-NG, γ-LWL, γ-MLWL, γ-LPB, γ-MLPB, γ-YN and γ-MYN, which have been demonstrated to perform better under certain conditions than their original forms and are not implemented in the previous version. The package is readily used for the identification of positively selected sites based on a sliding window across the sequences of interests in 5' to 3' direction of protein-coding sequences, and have improved the overall performance on sequence analysis for evolution studies. A toolbox, including C++ and Java source code and executable files on both Windows and Linux platforms together with a user instruction, is downloadable from the website for academic purpose at

Page 77–80