Articles Online (Volume 10, Issue 4)


The Pendulum Model for Genome Compositional Dynamics: from the Four Nucleotides to the Twenty Amino Acids

Zhang Zhang, Jun Yu

The genetic code serves as one of the natural links for life’s two conceptual frameworks—the informational and operational tracks—bridging the nucleotide sequence of DNA and RNA to the amino acid sequence of protein and thus its structure and function. On the informational track, DNA and its four building blocks have four basic variables: order, length, GC and purine contents; the latter two exhibit unique characteristics in prokaryotic genomes where protein-coding sequences dominate. Bridging the two tracks, tRNAs and their aminoacyl tRNA synthases that interpret each codon—nucleotide triplet, together with ribosomes, form a complex machinery that translates genetic information encoded on the messenger RNAs into proteins. On the operational track, proteins are selected in a context of cellular and organismal functions constantly. The principle of such a functional selection is to minimize the damage caused by sequence alteration in a seemingly random fashion at the nucleotide level and its function-altering consequence at the protein level; the principle also suggests that there must be complex yet sophisticated mechanisms to protect molecular interactions and cellular processes for cells and organisms from the damage in addition to both immediate or short-term eliminations and long-term selections. The two-century study of selection at species and population levels has been leading a way to understand rules of inheritance and evolution at molecular levels along the informational track, while ribogenomics, epigenomics and other operationally-defined omics (such as the metabolite-centric metabolomics) have been ushering biologists into the new millennium along the operational track.

Page 175–180


Systems Approaches to Biology and Disease Enable Translational Systems Medicine

Leroy Hood , Qiang Tian

The development and application of systems strategies to biology and disease are transforming medical research and clinical practice in an unprecedented rate. In the foreseeable future, clinicians, medical researchers, and ultimately the consumers and patients will be increasingly equipped with a deluge of personal health information, e.g., whole genome sequences, molecular profiling of diseased tissues, and periodic multi-analyte blood testing of biomarker panels for disease and wellness. The convergence of these practices will enable accurate prediction of disease susceptibility and early diagnosis for actionable preventive schema and personalized treatment regimes tailored to each individual. It will also entail proactive participation from all major stakeholders in the health care system. We are at the dawn of predictive, preventive, personalized, and participatory (P4) medicine, the fully implementation of which requires marrying basic and clinical researches through advanced systems thinking and the employment of high-throughput technologies in genomics, proteomics, nanofluidics, single-cell analysis, and computation strategies in a highly-orchestrated discipline we termed translational systems medicine.

Page 181–185

Original Research

Strand-biased Gene Distribution in Bacteria Is Related to both Horizontal Gene Transfer and Strand-biased Nucleotide Composition

Hao Wu, Hongzhu Qu, Ning Wan, Zhang Zhang, Songnian Hu, Jun Yu

Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of biases, the strand-preference of genes, and the influence of background nucleotide composition variations. Using a dataset composed of 364 non-redundant bacterial genomes, we sought to illustrate our current understanding of SGD. First, when we divided the collection of bacterial genomes into non-polC and polC groups according to their possession of DnaE isoforms that correlate closely with taxonomy, the SGD of the polC group stood out more significantly than that of the non-polC group. Second, when examining horizontal gene transfer, coupled with gene functional conservation (essentiality) and expressivity (level of expression), we realized that they all contributed to SGD. Third, we further demonstrated a weaker G-dominance on the leading strand of the non-polC group but strong purine dominance (both G and A) on the leading strand of the polC group. We propose that strand-biased nucleotide composition plays a decisive role for SGD since the polC-bearing genomes are not only AT-rich but also have pronounced purine-rich leading strands, and we believe that a special mutation spectrum that leads to a strong purine asymmetry and a strong strand-biased nucleotide composition coupled with functional selections for genes and their functions are both at work.

Page 186–196

Original Research

Putative Chitin Synthases from Branchiostoma floridae Show Extracellular Matrix-related Domains and Mosaic Structures

Gea Guerriero

The transition from unicellular to multicellular life forms requires the development of a specialized structural component, the extracellular matrix (ECM). In Metazoans, there are two main supportive systems, which are based on chitin and collagen/hyaluronan, respectively. Chitin is the major constituent of fungal cell walls and arthropod exoskeleton. However, presence of chitin/chitooligosaccharides has been reported in lower chordates and during specific stages of vertebrate development. In this study, the occurrence of chitin synthases (CHSs) was investigated with a bioinformatics approach in the cephalochordate Branchiostoma floridae, in which the presence of chitin was initially reported in the skeletal rods of the pharyngeal gill basket. Twelve genes coding for proteins containing conserved amino acid residues of processive glycosyltransferases from GT2 family were found and 10 of them display mosaic structures with novel domains never reported previously in a chitin synthase. In particular, the presence of a discoidin (DS) and a sterile alpha motif (SAM) domain was found in nine identified proteins. Sequence analyses and homology modelling suggest that these domains might interact with the extracellular matrix and mediate protein–protein interaction. The multi-domain putative chitin synthases from B. floridae constitute an emblematic example of the explosion of domain innovation and shuffling which predate Metazoans.

Page 197–207

Original Research

Novel Adaptors of Amyloid Precursor Protein Intracellular Domain and Their Functional Implications

Arunabha Chakrabarti, Debashis Mukhopadhyay

Amyloid precursor protein intracellular domain (AICD) is one of the potential candidates in deciphering the complexity of Alzheimer’s disease. It plays important roles in determining cell fate and neurodegeneration through its interactions with several adaptors. The presence or absence of phosphorylation at specific sites determines the choice of partners. In this study, we identified 20 novel AICD-interacting proteins by in vitro pull down experiments followed by 2D gel electrophoresis and MALDI-MS analysis. The identified proteins can be grouped into different functional classes including molecular chaperones, structural proteins, signaling and transport molecules, adaptors, motor proteins and apoptosis determinants. Interactions of nine proteins were further validated either by colocalization using confocal imaging or by co-immunoprecipitation followed by immunoblotting. The cellular functions of most of the proteins can be correlated with AD. Hence, illustration of their interactions with AICD may shed some light on the disease pathophysiology.

Page 208–216

Original Research

Homepeptide Repeats: Implications for Protein Structure, Function and Evolution

Muthukumarasamy Uthayakumara, Bowdadu Benazira, Sanjeev Patraa, Marthandan Kirti Vaishnavia, Manickam Gurusarana, Kanagarajan Surekab, Jeyaraman Jeyakanthanb, Kanagaraj Sekar

Analysis of protein sequences from Mycobacterium tuberculosis H37Rv (Mtb H37Rv) was performed to identify homopeptide repeat-containing proteins (HRCPs). Functional annotation of the HRCPs showed that they are preferentially involved in cellular metabolism. Furthermore, these homopeptide repeats might play some specific roles in protein–protein interaction. Repeat length differences among Bacteria, Archaea and Eukaryotes were calculated in order to identify the conservation of the repeats in these divergent kingdoms. From the results, it was evident that these repeats have a higher degree of conservation in Bacteria and Archaea than in Eukaryotes. In addition, there seems to be a direct correlation between the repeat length difference and the degree of divergence between the species. Our study supports the hypothesis that the presence of homopeptide repeats influences the rate of evolution of the protein sequences in which they are embedded. Thus, homopeptide repeat may have structural, functional and evolutionary implications on proteins.

Page 217–225

Application Note

NSort/DB: An Intranuclear Compartment Protein Database

Kai Willadsen, Nurul Mohamad, Mikael Bodén

Distinct substructures within the nucleus are associated with a wide variety of important nuclear processes. Structures such as chromatin and nuclear pores have specific roles, while others such as Cajal bodies are more functionally varied. Understanding the roles of these membraneless intra-nuclear compartments requires extensive data sets covering nuclear and compartment-associated proteins. NSort/DB is a database providing access to intra- or sub-nuclear compartment associations for the mouse nuclear proteome. Based on resources ranging from large-scale curated data sets to detailed experiments, this data set provides a high-quality set of annotations of non-exclusive association of nuclear proteins with structures such as promyelocytic leukaemia bodies and chromatin. The database is searchable by protein identifier or compartment, and has a documented web service API. The search interface, web service and data download are all freely available online at Availability of this data set will enable systematic analyses of the protein complements of nuclear compartments, improving our understanding of the diverse functional repertoire of these structures.

Page 226–229

Application Note

Wavelet Analysis of DNA Walks on the Human and Chimpanzee MAGE/CSAG-palindromes

Yanjiao Qi, Nengzhi Jin, Duiyuan Ai

The palindrome is one class of symmetrical duplications with reverse complementary characters, which is widely distributed in many organisms. Graphical representation of DNA sequence provides a simple way of viewing and comparing various genomic structures. Through 3-D DNA walk analysis, the similarity and differences in nucleotide composition, as well as the evolutionary relationship between human and chimpanzee MAGE/CSAG-palindromes, can be clearly revealed. Further wavelet analysis indicated that duplicated segments have irregular patterns compared to their surrounding sequences. However, sequence similarity analysis suggests that there is possible common ancestor between human and chimpanzee MAGE/CSAG-palindromes. Based on the specific distribution and orientation of the repeated sequences, a simple possible evolutionary model of the palindromes is suggested, which may help us to better understand the evolutionary course of the genes and the symmetrical sequences.

Page 230–236