Articles Online (Volume 10, Issue 3)

Original Research

CDS: A Fold-change Based Statistical Test for Concomitant Identification of Distinctness and Similarity in Gene Expression Analysis

Nicolas Tchitchek, José Felipe Golib Dzib, Brice Targat, Sebastian Noth, Arndt Benecke1, Annick Lesne

The problem of identifying differential activity such as in gene expression is a major defeat in biostatistics and bioinformatics. Equally important, however much less frequently studied, is the question of similar activity from one biological condition to another. The fold-change, or ratio, is usually considered a relevant criterion for stating difference and similarity between measurements. Importantly, no statistical method for concomitant evaluation of similarity and distinctness currently exists for biological applications. Modern microarray, digital PCR (dPCR), and Next-Generation Sequencing (NGS) technologies frequently provide a means of coefficient of variation estimation for individual measurements. Using fold-change, and by making the assumption that measurements are normally distributed with known variances, we designed a novel statistical test that allows us to detect concomitantly, thus using the same formalism, differentially and similarly expressed genes ( Given two sets of gene measurements in different biological conditions, the probabilities of making type I and type II errors in stating that a gene is differentially or similarly expressed from one condition to the other can be calculated. Furthermore, a confidence interval for the fold-change can be delineated. Finally, we demonstrate that the assumption of normality can be relaxed to consider arbitrary distributions numerically. The Concomitant evaluation of Distinctness and Similarity (CDS) statistical test correctly estimates similarities and differences between measurements of gene expression. The implementation, being time and memory efficient, allows the use of the CDS test in high-throughput data analysis such as microarray, dPCR, and NGS experiments. Importantly, the CDS test can be applied to the comparison of single measurements (N = 1) provided the variance (or coefficient of variation) of the signals is known, making CDS a valuable tool also in biomedical analysis where typically a single measurement per subject is available.

Page 127–135

Original Research

Microarray Analysis of Ageing-related Signatures and Their Expression in Tumors Based on a Computational Biology Approach

Xiaosheng Wang

Ageing and cancer have been associated with genetic and genomic changes. The identification of common signatures between ageing and cancer can reveal shared molecular mechanisms underlying them. In this study, we collected ageing-related gene signatures from ten published studies involved in six different human tissues and an online resource. We found that most of these gene signatures were tissue-specific and a few were related to multiple tissues. We performed a genome-wide examination of the expression of these signatures in various human tumor types, and found that a large proportion of these signatures were universally differentially expressed among normal vs. tumor phenotypes. Functional analyses of the highly-overlapping genes between ageing and cancer using DAVID tools have identified important functional categories and pathways linking ageing with cancer. The convergent and divergent mechanisms between ageing and cancer are discussed. This study provides insights into the biology of ageing and cancer, suggesting the possibility of potential interventions aimed at postponing ageing and preventing cancer.

Page 136–141

Original Research

B-cell Ligand Processing Pathways Detected by Large-scale Comparative Analysis

Fadi Towfic, Shakti Gupta, Vasant Honavar, Shankar Subramaniam

The initiation of B-cell ligand recognition is a critical step for the generation of an immune response against foreign bodies. We sought to identify the biochemical pathways involved in the B-cell ligand recognition cascade and sets of ligands that trigger similar immunological responses. We utilized several comparative approaches to analyze the gene coexpression networks generated from a set of microarray experiments spanning 33 different ligands. First, we compared the degree distributions of the generated networks. Second, we utilized a pairwise network alignment algorithm, BiNA, to align the networks based on the hubs in the networks. Third, we aligned the networks based on a set of KEGG pathways. We summarized our results by constructing a consensus hierarchy of pathways that are involved in B cell ligand recognition. The resulting pathways were further validated through literature for their common physiological responses. Collectively, the results based on our comparative analyses of degree distributions, alignment of hubs, and alignment based on KEGG pathways provide a basis for molecular characterization of the immune response states of B-cells and demonstrate the power of comparative approaches (e.g., gene coexpression network alignment algorithms) in elucidating biochemical pathways involved in complex signaling events in cells.

Page 142–152

Original Research

Overexpression of Annexin A2 Is Associated with Abnormal Ubiquitination in Breast Cancer

Shishan Deng, Baoqian Jing, Tianyong Xing, Lingmi Hou, Zhengwei Yang

Abnormal expression of annexin A2 contributes to metastasis and infiltration of cancer cells. To elucidate the cause of abnormal expression of annexin A2, Western blotting, immunoproteomics and immunohistochemical staining were performed to analyze differentially ubiquitinated proteins between fresh breast cancer tissue and its adjacent normal breast tissue from five female volunteers. We detected an ubiquitinated protein that was up-regulated in the cancer tissue, which was further identified as annexin A2 by mass spectrometry. These results suggest that abnormal ubiquitination and/or degradation of annexin A2 may lead to presence of annexin A2 at high level, which may further promote metastasis and infiltration of the breast cancer cells.

Page 153–157

Original Research

Computational Analysis of Position-dependent Disorder Content in DisProt Database

Jovana J. Kovačević

A bioinformatics analysis of disorder content of proteins from the DisProt database has been performed with respect to position of disordered residues. Each protein chain was divided into three parts: N- and C- terminal parts with each containing 30 amino acid (AA) residues and the middle region containing the remaining AA residues. The results show that in terminal parts, the percentage of disordered AA residues is higher than that of all AA residues (17% of disordered AA residues and 11% of all). We analyzed the percentage of disorder for each of 20 AA residues in the three parts of proteins with respect to their hydropathy and molecular weight. For each AA, the percentage of disorder in the middle part is lower than that in terminal parts which is comparable at the two termini. A new scale of AAs has been introduced according to their disorder content in the middle part of proteins: CIFWMLYHRNVTAGQDSKEP. All big hydrophobic AAs are less frequently disordered, while almost all small hydrophilic AAs are more frequently disordered. The results obtained may be useful for construction and improving predictors for protein disorder.

Page 158–165

Original Research

A Modified Statistically Optimal Null Filter Method for Recognizing Protein-coding Regions

Lei Zhang, Fengchun Tian, Shiyuan Wang

Computer-aided protein-coding gene prediction in uncharacterized genomic DNA sequences is one of the most important issues of biological signal processing. A modified filter method based on a statistically optimal null filter (SONF) theory is proposed for recognizing protein-coding regions. The square deviation gain (SDG) between the input and output of the model is used to identify the coding regions. The effective SDG amplification model with Class I and Class II enhancement is designed to suppress the non-coding regions. Also, an evaluation algorithm has been used to compare the modified model with most gene prediction methods currently available in terms of sensitivity, specificity and precision. The performance for identification of protein-coding regions has been evaluated at the nucleotide level using benchmark datasets and 91.4%, 96%, 93.7% were obtained for sensitivity, specificity and precision, respectively. These results suggest that the proposed model is potentially useful in gene finding field, which can help recognize protein-coding regions with higher precision and speed than present algorithms.

Page 166–173