Articles Online (Volume 14, Issue 1)


Precision Medicine: What Do We Expect in the Scope of Basic Biomedical Sciences?

Jun Yu

Page 1-3

Research Highlight

A Key to Genome Maze in 3D

Zhihua Zhang

Page 4-6

Review Article

Understanding Spatial Genome Organization: Methods and Insights

Vijay Ramani, Jay Shendure, Zhijun Duan

The manner by which eukaryotic genomes are packaged into nuclei while maintaining crucial nuclear functions remains one of the fundamental mysteries in biology. Over the last ten years, we have witnessed rapid advances in both microscopic and nucleic acid-based approaches to map genome architecture, and the application of these approaches to the dissection of higher-order chromosomal structures has yielded much new information. It is becoming increasingly clear, for example, that interphase chromosomes form stable, multilevel hierarchical structures. Among them, self-associating domains like so-called topologically associating domains (TADs) appear to be building blocks for large-scale genomic organization. This review describes features of these broadly-defined hierarchical structures, insights into the mechanisms underlying their formation, our current understanding of how interactions in the nuclear space are linked to gene regulation, and important future directions for the field.
一个正常人体由数千亿细胞构成,这些细胞可能在功能或形态上各不相同,但它们都携带相同的遗传信息(同一个基因组)且都来自于一个共同的细胞---受精卵。生物学家们已普遍相信表观遗传机理在多细胞真核生物 (例如人类) 的生长发育过程中起决定性的调控作用。近年来的研究表明,基因组作为遗传信息的载体,不是随机摆放在细胞核里而是具有高度有序的三维结构。真核生物基因组的三维结构既是庞大的基因组能存放在微小的细胞核中所必须,也同时为表观遗传机理调控遗传信息的阅读,解析和传承提供了物质平台。例如,人类基因组DNA的总长度可达2 米,而其存放的细胞核直径只有10 微米。正是高度有序的三维结构保证了人类基因组在高度压缩的同时,基因转录,DNA复制和修复等细胞核功能仍能有序进行。本综述首先回顾了近十年来在研究基因组三维结构的相关技术上所取得的快速进展,然后讨论了当前生物界对真核生物基因组三维结构的一些结构特征的认知。 最后, 我们尝试为进一步研究基因组三维结构和功能提出了一些重要的研究方向。

Page 7-20

Review Article

Single-cell Transcriptome Study as Big Data

Pingjian Yu, Wei Lin

The rapid growth of single-cell RNA-seq studies (scRNA-seq) demands efficient data storage, processing, and analysis. Big-data technology provides a framework that facilitates the comprehensive discovery of biological signals from inter-institutional scRNA-seq datasets. The strategies to solve the stochastic and heterogeneous single-cell transcriptome signal are discussed in this article. After extensively reviewing the available big-data applications of next-generation sequencing (NGS)-based studies, we propose a workflow that accounts for the unique characteristics of scRNA-seq data and primary objectives of single-cell studies.

Page 21-30

Review Article

Translational Bioinformatics: Past, Present, and Future

Jessica D. Tenenbaum

Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contextualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field.

Page 31-41

Review Article

Roles, Functions, and Mechanisms of Long Non-coding RNAs in Cancer

Yiwen Fang, Melissa J. Fullwood

Long non-coding RNAs (lncRNAs) play important roles in cancer. They are involved in chromatin remodeling, as well as transcriptional and post-transcriptional regulation, through a variety of chromatin-based mechanisms and via cross-talk with other RNA species. lncRNAs can function as decoys, scaffolds, and enhancer RNAs. This review summarizes the characteristics of lncRNAs, including their roles, functions, and working mechanisms, describes methods for identifying and annotating lncRNAs, and discusses future opportunities for lncRNA-based therapies using antisense oligonucleotides.

Page 42-54

Original Research

EDISON-WMW: Exact Dynamic Programing Solution of the Wilcoxon–Mann–Whitney Test

Alexander Marx, Christina Backes, Eckart Meese, Hans-Peter Lenhof, Andreas Keller

In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney (WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program, the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease (COPD). We found that approximated P values were generally higher than the exact solution provided by EDISON-WMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at
In vielen Forschungsgebieten werden Hypothesentests genutzt, um festzustellen, ob ein Ergebnis statistisch signifikant ist, oder durch Zufall entstanden sein könnte. Der Wilcoxon-Mann-Whitney (WMW) Test zählt zu einem der bekanntesten Hypothesentests, die in den Bereichen der Medizin- und Biowissenschaften genutzt wird, um Gleichheit der Verteilung zweier Testgruppen zu evaluieren. Dieser nicht-parametrische statistische Homogenitätstest wird oft im Bereich der molekularen Diagnostik genutzt. Um den WMW Test zu lösen, entsteht ein hoher kombinatorischer Aufwand. Deshalb wird der p-Wert in solchen Fällen oft mit Hilfe einer Normalverteilung abgeschätzt. Aus diesem Grund entwickelten wir EDISON-WMW, ein neuer Ansatz, der die exakte Permutation des ungepaarten WMW Tests ohne die Anwendung von Korrekturen, selbst für Gruppen mit vielen Duplikaten berechnet. Der Ansatz nutzt dynamische Programmierung um das kombinatorische Problem effizient zu lösen. Neben der naiven Implementierung des Algorithmus, präsentieren wir verschiedene Optimierungsstrategien und einen parallelisierten Ansatz. Mit EDISON-WMW kann der exakte p-Wert für Testgruppen mit mehr als 1000 Elementen, die Duplikate enthalten, innerhalb von Minuten berechnet werden. Die Performance des Tests wurde sowohl mit zufällig generierten Daten, als auch gegen 13 bereits bekannte Methoden getestet. Des Weiteren evaluierten wir mit EDISON-WMW molekulare Biomarker für Lungenkrebs und chronisch obstruktive Lungenerkrankungen. Als Resultat dieser Tests stellten wir fest, dass die approximierten p-Werte anderer Ansätze grundsätzlich höher sind als die exakt berechneten p-Werte von EDISON-WMW. Zudem kann der Ansatz auch genutzt werden um hight-throughput Omics mit hunderten oder tausenden von Eigenschaften zu untersuchen. Um die parallelisierte Version von EDISON-WMW öffentlich zugänglich zu machen, haben wir EDISON-WMW als kostenlose Webapplikation unter bereit gestellt.

Page 55-61

Original Research

A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions

Mengqu Ge, Ao Li, Minghui Wang

As one large class of non-coding RNAs (ncRNAs), long ncRNAs (lncRNAs) have gained considerable attention in recent years. Mutations and dysfunction of lncRNAs have been implicated in human disorders. Many lncRNAs exert their effects through interactions with the corresponding RNA-binding proteins. Several computational approaches have been developed, but only few are able to perform the prediction of these interactions from a network-based point of view. Here, we introduce a computational method named lncRNA–protein bipartite network inference (LPBNI). LPBNI aims to identify potential lncRNA–interacting proteins, by making full use of the known lncRNA–protein interactions. Leave-one-out cross validation (LOOCV) test shows that LPBNI significantly outperforms other network-based methods, including random walk (RWR) and protein-based collaborative filtering (ProCF). Furthermore, a case study was performed to demonstrate the performance of LPBNI using real data in predicting potential lncRNA–interacting proteins.
近年来长非编码RNA(lncRNA)作为一类重要的非编码RNA备受关注,现有研究表明lncRNA的功能异常与人类疾病存在密切关系。由于许多lncRNA在发挥功能时需要结合特定的蛋白质,因此对lnRNA与蛋白质相互作用的预测有助于揭示lncRNA的功能和作用机制。鉴于现有预测方法未考虑lncRNA和蛋白质构成的相互作用网络信息,本文提出了一种名为LPBNI(LncRNA–Protein Bipartite Network Inference)的计算方法,可通过已知的相互作用网络信息预测潜在的LncRNA-蛋白质相互作用关系。通过留一交叉验证的结果显示,LPBNI的性能显著优于包括随机游走和基于蛋白质的协同过滤在内的网络预测方法,对预测结果的后续分析进一步验证了该方法的优异性能。

Page 62-71