Articles Online (Volume 11, Issue 1)


GPB at 10

Jun Yu

Page 1

Life-time Achievement Review

My Journey to DNA Repair

Tomas Lindahl

I completed my medical studies at the Karolinska Institute in Stockholm but have always been devoted to basic research. My longstanding interest is to understand fundamental DNA repair mechanisms in the fields of cancer therapy, inherited human genetic disorders and ancient DNA. I initially measured DNA decay, including rates of base loss and cytosine deamination. I have discovered several important DNA repair proteins and determined their mechanisms of action. The discovery of uracil-DNA glycosylase defined a new category of repair enzymes with each specialized for different types of DNA damage. The base excision repair pathway was first reconstituted with human proteins in my group. Cell-free analysis for mammalian nucleotide excision repair of DNA was also developed in my laboratory. I found multiple distinct DNA ligases in mammalian cells, and led the first genetic and biochemical work on DNA ligases I, III and IV. I discovered the mammalian exonucleases DNase III (TREX1) and IV (FEN1). Interestingly, expression of TREX1 was altered in some human autoimmune diseases. I also showed that the mutagenic DNA adduct O6-methylguanine (O6mG) is repaired without removing the guanine from DNA, identifying a surprising mechanism by which the methyl group is transferred to a residue in the repair protein itself. A further novel process of DNA repair discovered by my research group is the action of AlkB as an iron-dependent enzyme carrying out oxidative demethylation.

Page 2-7


N6-methyl-adenosine (m6A) in RNA: An Old Modification with A Novel Epigenetic Function

Yamei Niu, Xu Zhao, Yong-Sheng Wu, Ming-Ming Li, Xiu-Jie Wang, Yun-Gui Yang

N6-methyl-adenosine (m6A) is one of the most common and abundant modifications on RNA molecules present in eukaryotes. However, the biological significance of m6A methylation remains largely unknown. Several independent lines of evidence suggest that the dynamic regulation of m6A may have a profound impact on gene expression regulation. The m6A modification is catalyzed by an unidentified methyltransferase complex containing at least one subunit methyltransferase like 3 (METTL3). m6A modification on messenger RNAs (mRNAs) mainly occurs in the exonic regions and 3′-untranslated region (3′-UTR) as revealed by high-throughput m6A-seq. One significant advance in m6A research is the recent discovery of the first two m6A RNA demethylases fat mass and obesity-associated (FTO) gene and ALKBH5, which catalyze m6A demethylation in an α-ketoglutarate (α-KG)- and Fe2+-dependent manner. Recent studies in model organisms demonstrate that METTL3, FTO and ALKBH5 play important roles in many biological processes, ranging from development and metabolism to fertility. Moreover, perturbation of activities of these enzymes leads to the disturbed expression of thousands of genes at the cellular level, implicating a regulatory role of m6A in RNA metabolism. Given the vital roles of DNA and histone methylations in epigenetic regulation of basic life processes in mammals, the dynamic and reversible chemical m6A modification on RNA may also serve as a novel epigenetic marker of profound biological significances.

Page 8–17


Interactome Mapping: Using Protein Microarray Technology to Reconstruct Diverse Protein Networks

Ijeoma Uzoma, Heng Zhu

A major focus of systems biology is to characterize interactions between cellular components, in order to develop an accurate picture of the intricate networks within biological systems. Over the past decade, protein microarrays have greatly contributed to advances in proteomics and are becoming an important platform for systems biology. Protein microarrays are highly flexible, ranging from large-scale proteome microarrays to smaller customizable microarrays, making the technology amenable for detection of a broad spectrum of biochemical properties of proteins. In this article, we will focus on the numerous studies that have utilized protein microarrays to reconstruct biological networks including protein–DNA interactions, posttranslational protein modifications (PTMs), lectin–glycan recognition, pathogen–host interactions and hierarchical signaling cascades. The diversity in applications allows for integration of interaction data from numerous molecular classes and cellular states, providing insight into the structure of complex biological systems. We will also discuss emerging applications and future directions of protein microarray technology in the global frontier.

Page 18–28


Recent Advances in Computational Methods for Nuclear Magnetic Resonance Data Processing

Xin Gao

Although three-dimensional protein structure determination using nuclear magnetic resonance (NMR) spectroscopy is a computationally costly and tedious process that would benefit from advanced computational techniques, it has not garnered much research attention from specialists in bioinformatics and computational biology. In this paper, we review recent advances in computational methods for NMR protein structure determination. We summarize the advantages of and bottlenecks in the existing methods and outline some open problems in the field. We also discuss current trends in NMR technology development and suggest directions for research on future computational methods for NMR.

Page 29–33


The History and Advances of Reversible Terminators Used in New Generations of Sequencing Technology

Fei Chen, Mengxing Dong, Meng Ge, Lingxiang Zhu, Lufeng Ren, Guocheng Liu, Rong Mu

DNA sequencing using reversible terminators, as one sequencing by synthesis strategy, has garnered a great deal of interest due to its popular application in the second-generation high-throughput DNA sequencing technology. In this review, we provided its history of development, classification, and working mechanism of this technology. We also outlined the screening strategies for DNA polymerases to accommodate the reversible terminators as substrates during polymerization; particularly, we introduced the “REAP” method developed by us. At the end of this review, we discussed current limitations of this approach and provided potential solutions to extend its application.

Page 34–40

Original Research

Does the Genetic Code Have A Eukaryotic Origin?

Zhang Zhang, Jun Yu

In the RNA world, RNA is assumed to be the dominant macromolecule performing most, if not all, core “house-keeping” functions. The ribo-cell hypothesis suggests that the genetic code and the translation machinery may both be born of the RNA world, and the introduction of DNA to ribo-cells may take over the informational role of RNA gradually, such as a mature set of genetic code and mechanism enabling stable inheritance of sequence and its variation. In this context, we modeled the genetic code in two content variables—GC and purine contents—of protein-coding sequences and measured the purine content sensitivities for each codon when the sensitivity (% usage) is plotted as a function of GC content variation. The analysis leads to a new pattern—the symmetric pattern—where the sensitivity of purine content variation shows diagonally symmetry in the codon table more significantly in the two GC content invariable quarters in addition to the two existing patterns where the table is divided into either four GC content sensitivity quarters or two amino acid diversity halves. The most insensitive codon sets are GUN (valine) and CAN (CAR for asparagine and CAY for aspartic acid) and the most biased amino acid is valine (always over-estimated) followed by alanine (always under-estimated). The unique position of valine and its codons suggests its key roles in the final recruitment of the complete codon set of the canonical table. The distinct choice may only be attributable to sequence signatures or signals of splice sites for spliceosomal introns shared by all extant eukaryotes.

Page 41–55

Original Research

Comparative Analysis of MicroRNA Promoters in Arabidopsis and Rice

Xin Zhao, Lei Li

Endogenously-encoded microRNAs (miRNAs) are a class of small regulatory RNAs that modulate gene expression at the post-transcriptional level. In plants, miRNAs have increasingly been identified by experiments based on next-generation sequencing (NGS). However, promoter organization is currently unknown for most plant miRNAs, which are transcribed by RNA polymerase II. This deficiency prevents a comprehensive understanding of miRNA-mediated gene networks. In this study, by analyzing full-length cDNA sequences related to miRNAs, we mapped transcription start sites (TSSs) for 62 and 55 miRNAs in Arabidopsis and rice, respectively. The average free energy (AFE) profiles in the vicinity of TSSs were studied for both species. By employing position weight matrices (PWM) for 99 plant cis-elements, we discovered that three cis-elements were over-represented in the miRNA promoters of both species, while four and ten cis-elements were over-represented in Arabidopsis only and in rice only. Thus, comparison of miRNA promoters between Arabidopsis and rice provides a new perspective for studying miRNA regulation in plants.

Page 56–60

Original Research

Shigella Strains Are Not Clones of Escherichia coli but Sister Species in the Genus Escherichia

Guanghong Zuo, Zhao Xu, Bailin Hao

Shigella species and Escherichia coli are closely related organisms. Early phenotyping experiments and several recent molecular studies put Shigella within the species E. coli. However, the whole-genome-based, alignment-free and parameter-free CVTree approach shows convincingly that four established Shigella species, Shigella boydii, Shigella sonnei, Shigella felxneri and Shigella dysenteriae, are distinct from E. coli strains, and form sister species to E. coli within the genus Escherichia. In view of the overall success and high resolution power of the CVTree approach, this result should be taken seriously. We hope that the present report may promote further in-depth study of the Shigella-E. coli relationship.

Page 61–65


Global Genomic Arrangement of Bacterial Genes Is Closely Tied with the Total Transcriptional Efficiency

Qin Ma, Ying Xu

The availability of a large number of sequenced bacterial genomes allows researchers not only to derive functional and regulation information about specific organisms but also to study the fundamental properties of the organization of a genome. Here we address an important and challenging question regarding the global arrangement of operons in a bacterial genome: why operons in a bacterial genome are arranged in the way they are. We have previously studied this question and found that operons of more frequently activated pathways tend to be more clustered together in a genome. Specifically, we have developed a simple sequential distance-based pseudo energy function and found that the arrangement of operons in a bacterial genome tend to minimize the clusteredness function (C value) in comparison with artificially-generated alternatives, for a variety of bacterial genomes. Here we extend our previous work, and report a number of new observations: (a) operons of the same pathways tend to group into a few clusters rather than one; and (b) the global arrangement of these operon clusters tend to minimize a new “energy” function (C+ value) that reflects the efficiency of the transcriptional activation of the encoded pathways. These observations provide insights into further study of the genomic organization of genes in bacteria.

Page 66–71

Application Note

MeRIP-PF: An Easy-to-use Pipeline for High-resolution Peak-finding in MeRIP-Seq Data

Yuli Li, Shuhui Song, Cuiping Li, Jun Yu

RNA modifications, especially methylation of the N6 position of adenosine (A)—m6A, represent an emerging research frontier in RNA biology. With the rapid development of high-throughput sequencing technology, in-depth study of m6A distribution and function relevance becomes feasible. However, a robust method to effectively identify m6A-modified regions has not been available yet. Here, we present a novel high-efficiency and user-friendly analysis pipeline called MeRIP-PF for the signal identification of MeRIP-Seq data in reference to controls. MeRIP-PF provides a statistical P-value for each identified m6A region based on the difference of read distribution when compared to the controls and also calculates false discovery rate (FDR) as a cut off to differentiate reliable m6A regions from the background. Furthermore, MeRIP-PF also achieves gene annotation of m6A signals or peaks and produce outputs in both XLS and graphical format, which are useful for further study. MeRIP-PF is implemented in Perl and is freely available at

Page 72–75