Articles Online (Volume 12, Issue 1)


Relative Specificity: All Substrates Are Not Created Equal

Yan Zeng

A biological molecule, e.g., an enzyme, tends to interact with its many cognate substrates, targets, or partners differentially. Such a property is termed relative specificity and has been proposed to regulate important physiological functions, even though it has not been examined explicitly in most complex biochemical systems. This essay reviews several recent large-scale studies that investigate protein folding, signal transduction, RNA binding, translation and transcription in the context of relative specificity. These results and others support a pervasive role of relative specificity in diverse biological processes. It is becoming clear that relative specificity contributes fundamentally to the diversity and complexity of biological systems, which has significant implications in disease processes as well.

Page 1–7

Original Research

A TDG/CBP/RARα Ternary Complex Mediates the Retinoic Acid-dependent Expression of DNA Methylation-sensitive Genes

Hélène Léger, Caroline Smet-Nocca, Amel Attmane-Elakeb, Sara Morley-Fletcher, Arndt G. Benecke, Sebastian Eilebrecht

The thymine DNA glycosylase (TDG) is a multifunctional enzyme, which is essential for embryonic development. It mediates the base excision repair (BER) of G:T and G:U DNA mismatches arising from the deamination of 5-methyl cytosine (5-MeC) and cytosine, respectively. Recent studies have pointed at a role of TDG during the active demethylation of 5-MeC within CpG islands. TDG interacts with the histone acetylase CREB-binding protein (CBP) to activate CBP-dependent transcription. In addition, TDG also interacts with the retinoic acid receptor α (RARα), resulting in the activation of RARα target genes. Here we provide evidence for the existence of a functional ternary complex containing TDG, CBP and activated RARα. Using global transcriptome profiling, we uncover a coupling of de novo methylation-sensitive and RA-dependent transcription, which coincides with a significant subset of CBP target genes. The introduction of a point mutation in TDG, which neither affects overall protein structure nor BER activity, leads to a significant loss in ternary complex stability, resulting in the deregulation of RA targets involved in cellular networks associated with DNA replication, recombination and repair. We thus demonstrate for the first time a direct coupling of TDG’s epigenomic and transcription regulatory function through ternary complexes with CBP and RARα.

Page 8–18

Original Research

Expression of miR-15/107 Family MicroRNAs in Human Tissues and Cultured Rat Brain Cells

Wang-Xia Wang, Robert J. Danaher, Craig S. Miller, Joseph R. Berger, Vega G. Nubia, Bernard S. Wilfred, Janna H. Neltner , Christopher M. Norris, Peter T. Nelson

The miR-15/107 family comprises a group of 10 paralogous microRNAs (miRNAs), sharing a 5′ AGCAGC sequence. These miRNAs have overlapping targets. In order to characterize the expression of miR-15/107 family miRNAs, we employed customized TaqMan Low-Density micro-fluid PCR-array to investigate the expression of miR-15/107 family members, and other selected miRNAs, in 11 human tissues obtained at autopsy including the cerebral cortex, frontal cortex, primary visual cortex, thalamus, heart, lung, liver, kidney, spleen, stomach and skeletal muscle. miR-103, miR-195 and miR-497 were expressed at similar levels across various tissues, whereas miR-107 is enriched in brain samples. We also examined the expression patterns of evolutionarily conserved miR-15/107 miRNAs in three distinct primary rat brain cell preparations (enriched for cortical neurons, astrocytes and microglia, respectively). In primary cultures of rat brain cells, several members of the miR-15/107 family are enriched in neurons compared to other cell types in the central nervous system (CNS). In addition to mature miRNAs, we also examined the expression of precursors (pri-miRNAs). Our data suggested a generally poor correlation between the expression of mature miRNAs and their precursors. In summary, we provide a detailed study of the tissue and cell type-specific expression profile of this highly expressed and phylogenetically conserved family of miRNA genes.

Page 19–30

Original Research

Pathway-based Analysis of the Hidden Genetic Heterogeneities in Cancers

Xiaolei Zhao, Shouqiang Zhong, Xiaoyu Zuo, Meihua Lin, Jiheng Qin, Yizhao Luan, Naizun Zhang, Yan LiangShaoqi Rao

Many cancers apparently showing similar phenotypes are actually distinct at the molecular level, leading to very different responses to the same treatment. It has been recently demonstrated that pathway-based approaches are robust and reliable for genetic analysis of cancers. Nevertheless, it remains unclear whether such function-based approaches are useful in deciphering molecular heterogeneities in cancers. Therefore, we aimed to test this possibility in the present study. First, we used a NCI60 dataset to validate the ability of pathways to correctly partition samples. Next, we applied the proposed method to identify the hidden subtypes in diffuse large B-cell lymphoma (DLBCL). Finally, the clinical significance of the identified subtypes was verified using survival analysis. For the NCI60 dataset, we achieved highly accurate partitions that best fit the clinical cancer phenotypes. Subsequently, for a DLBCL dataset, we identified three hidden subtypes that showed very different 10-year overall survival rates (90%, 46% and 20%) and were highly significantly (P = 0.008) correlated with the clinical survival rate. This study demonstrated that the pathway-based approach is promising for unveiling genetic heterogeneities in complex human diseases.

Page 31–38

Original Research

Bayesian Peak Picking for NMR Spectra

Yichen Cheng, Xin Gao, Faming Liang

Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.

Page 39–47

Application Note

CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets

Ruiqi Liao, Yifan Zhang, Jihong Guan, Shuigeng Zhou

In the past decades, advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation. Recently, nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them, and has been applied to various fields of biological research. In this paper, we present CloudNMF, a distributed open-source implementation of NMF on a MapReduce framework. Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data, which may enable various kinds of a high-throughput biological data analysis in the cloud. CloudNMF is freely accessible at

Page 48–51

Application Note

Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data

Qian Zhou, Xiaoquan Su, Gongchao Jing, Kang Ning

Next-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control (QC) tools have been proposed, however, few of them have been verified to be suitable or efficient for metagenomic data, which are composed of multiple genomes and are more complex than other kinds of NGS data. Here we present a metagenomic data QC method named Meta-QC-Chain. Meta-QC-Chain combines multiple QC functions: technical tests describe input data status and identify potential errors, quality trimming filters poor sequencing-quality bases and reads, and contamination screening identifies higher eukaryotic species, which are considered as contamination for metagenomic data. Most computing processes are optimized based on parallel programming. Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads, and the whole quality control procedure was completed within 20 min. Therefore, Meta-QC-Chain provides a comprehensive, useful and high-performance QC tool for metagenomic data. Meta-QC-Chain is publicly available for free at:

Page 52–56