Article Online - Genomics, Proteomics & Bioinformatics

Volume: 14, Issue: 1

Perspective

Precision Medicine: What Do We Expect in the Scope of Basic Biomedical Sciences?

Jun Yu

View abstract

Page 1-3

Download 1399

Research Highlight

A Key to Genome Maze in 3D

Zhihua Zhang

View abstract

Page 4-6

Download 1476

Review Article

Understanding Spatial Genome Organization: Methods and Insights

Vijay Ramani, Jay Shendure, Zhijun Duan

View abstract

The manner by which eukaryotic genomes are packaged into nuclei while maintaining crucial nuclear functions remains one of the fundamental mysteries in biology. Over the last ten years, we have witnessed rapid advances in both microscopic and nucleic acid-based approaches to map genome architecture, and the application of these approaches to the dissection of higher-order chromosomal structures has yielded much new information. It is becoming increasingly clear, for example, that interphase chromosomes form stable, multilevel hierarchical structures. Among them, self-associating domains like so-called topologically associating domains (TADs) appear to be building blocks for large-scale genomic organization. This review describes features of these broadly-defined hierarchical structures, insights into the mechanisms underlying their formation, our current understanding of how interactions in the nuclear space are linked to gene regulation, and important future directions for the field.

一个正常人体由数千亿细胞构成，这些细胞可能在功能或形态上各不相同，但它们都携带相同的遗传信息（同一个基因组）且都来自于一个共同的细胞---受精卵。生物学家们已普遍相信表观遗传机理在多细胞真核生物 (例如人类) 的生长发育过程中起决定性的调控作用。近年来的研究表明，基因组作为遗传信息的载体，不是随机摆放在细胞核里而是具有高度有序的三维结构。真核生物基因组的三维结构既是庞大的基因组能存放在微小的细胞核中所必须，也同时为表观遗传机理调控遗传信息的阅读，解析和传承提供了物质平台。例如，人类基因组DNA的总长度可达2 米，而其存放的细胞核直径只有10 微米。正是高度有序的三维结构保证了人类基因组在高度压缩的同时，基因转录，DNA复制和修复等细胞核功能仍能有序进行。本综述首先回顾了近十年来在研究基因组三维结构的相关技术上所取得的快速进展，然后讨论了当前生物界对真核生物基因组三维结构的一些结构特征的认知。最后，我们尝试为进一步研究基因组三维结构和功能提出了一些重要的研究方向。

Page 7-20

Download 1403

Review Article

Single-cell Transcriptome Study as Big Data

Pingjian Yu, Wei Lin

View abstract

The rapid growth of single-cell RNA-seq studies (scRNA-seq) demands efficient data storage, processing, and analysis. Big-data technology provides a framework that facilitates the comprehensive discovery of biological signals from inter-institutional scRNA-seq datasets. The strategies to solve the stochastic and heterogeneous single-cell transcriptome signal are discussed in this article. After extensively reviewing the available big-data applications of next-generation sequencing (NGS)-based studies, we propose a workflow that accounts for the unique characteristics of scRNA-seq data and primary objectives of single-cell studies.

随着单细胞转录组测序技术在近几年的快速发展，这种大规模高信息量的数据对存储、处理和分析的相关计算资源提出了越来越高的要求。大数据无疑是用于深度挖掘跨研究项目数据组合中的复杂生物信息的最佳工具。我们通过本文讨论了单细胞转录组数据的复杂性中已经信号特点。此外，通过回顾并参考以往与二代深度测序相关的大数据解决方案的范例，我们提出了一个多层技术路线来实现单细胞转录组测序数据的大数据解决方案。

Page 21-30

Download 1748

Review Article

Translational Bioinformatics: Past, Present, and Future

Jessica D. Tenenbaum

View abstract

Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contextualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field.

作为一个相对年轻的学科，转化生物信息学（TBI）在精密医学的时代已成为生物医学研究的重要组成部分。高通量的发展技术和电子健康记录已经引起了医疗和生物医学研究的范式转变。新颖技术和方法的提出对于将越来越庞大的数据转换成有用的知识变得必不可少。这篇综述提供了术语TBI的精确定义，描述了学科的简要历史，过去的成就，以及当前的焦点。最后给出该领域未来发展方向的预测。

Page 31-41

Download 1570

Review Article

Roles, Functions, and Mechanisms of Long Non-coding RNAs in Cancer

Yiwen Fang, Melissa J. Fullwood

View abstract

Long non-coding RNAs (lncRNAs) play important roles in cancer. They are involved in chromatin remodeling, as well as transcriptional and post-transcriptional regulation, through a variety of chromatin-based mechanisms and via cross-talk with other RNA species. lncRNAs can function as decoys, scaffolds, and enhancer RNAs. This review summarizes the characteristics of lncRNAs, including their roles, functions, and working mechanisms, describes methods for identifying and annotating lncRNAs, and discusses future opportunities for lncRNA-based therapies using antisense oligonucleotides.

Page 42-54

Download 2012

Original Research

EDISON-WMW: Exact Dynamic Programing Solution of the Wilcoxon–Mann–Whitney Test

Alexander Marx, Christina Backes, Eckart Meese, Hans-Peter Lenhof, Andreas Keller

View abstract

In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney (WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program, the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease (COPD). We found that approximated P values were generally higher than the exact solution provided by EDISON-WMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/.

In vielen Forschungsgebieten werden Hypothesentests genutzt, um festzustellen, ob ein Ergebnis statistisch signifikant ist, oder durch Zufall entstanden sein könnte. Der Wilcoxon-Mann-Whitney (WMW) Test zählt zu einem der bekanntesten Hypothesentests, die in den Bereichen der Medizin- und Biowissenschaften genutzt wird, um Gleichheit der Verteilung zweier Testgruppen zu evaluieren. Dieser nicht-parametrische statistische Homogenitätstest wird oft im Bereich der molekularen Diagnostik genutzt. Um den WMW Test zu lösen, entsteht ein hoher kombinatorischer Aufwand. Deshalb wird der p-Wert in solchen Fällen oft mit Hilfe einer Normalverteilung abgeschätzt. Aus diesem Grund entwickelten wir EDISON-WMW, ein neuer Ansatz, der die exakte Permutation des ungepaarten WMW Tests ohne die Anwendung von Korrekturen, selbst für Gruppen mit vielen Duplikaten berechnet. Der Ansatz nutzt dynamische Programmierung um das kombinatorische Problem effizient zu lösen. Neben der naiven Implementierung des Algorithmus, präsentieren wir verschiedene Optimierungsstrategien und einen parallelisierten Ansatz. Mit EDISON-WMW kann der exakte p-Wert für Testgruppen mit mehr als 1000 Elementen, die Duplikate enthalten, innerhalb von Minuten berechnet werden. Die Performance des Tests wurde sowohl mit zufällig generierten Daten, als auch gegen 13 bereits bekannte Methoden getestet. Des Weiteren evaluierten wir mit EDISON-WMW molekulare Biomarker für Lungenkrebs und chronisch obstruktive Lungenerkrankungen. Als Resultat dieser Tests stellten wir fest, dass die approximierten p-Werte anderer Ansätze grundsätzlich höher sind als die exakt berechneten p-Werte von EDISON-WMW. Zudem kann der Ansatz auch genutzt werden um hight-throughput Omics mit hunderten oder tausenden von Eigenschaften zu untersuchen. Um die parallelisierte Version von EDISON-WMW öffentlich zugänglich zu machen, haben wir EDISON-WMW als kostenlose Webapplikation unter http://www.ccb.uni-saarland.de/software/wtest/ bereit gestellt.

Page 55-61

Download 1442

Original Research

A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions

Mengqu Ge, Ao Li, Minghui Wang

View abstract

As one large class of non-coding RNAs (ncRNAs), long ncRNAs (lncRNAs) have gained considerable attention in recent years. Mutations and dysfunction of lncRNAs have been implicated in human disorders. Many lncRNAs exert their effects through interactions with the corresponding RNA-binding proteins. Several computational approaches have been developed, but only few are able to perform the prediction of these interactions from a network-based point of view. Here, we introduce a computational method named lncRNA–protein bipartite network inference (LPBNI). LPBNI aims to identify potential lncRNA–interacting proteins, by making full use of the known lncRNA–protein interactions. Leave-one-out cross validation (LOOCV) test shows that LPBNI significantly outperforms other network-based methods, including random walk (RWR) and protein-based collaborative filtering (ProCF). Furthermore, a case study was performed to demonstrate the performance of LPBNI using real data in predicting potential lncRNA–interacting proteins.

近年来长非编码RNA（lncRNA）作为一类重要的非编码RNA备受关注，现有研究表明lncRNA的功能异常与人类疾病存在密切关系。由于许多lncRNA在发挥功能时需要结合特定的蛋白质，因此对lnRNA与蛋白质相互作用的预测有助于揭示lncRNA的功能和作用机制。鉴于现有预测方法未考虑lncRNA和蛋白质构成的相互作用网络信息，本文提出了一种名为LPBNI（LncRNA–Protein Bipartite Network Inference）的计算方法，可通过已知的相互作用网络信息预测潜在的LncRNA-蛋白质相互作用关系。通过留一交叉验证的结果显示，LPBNI的性能显著优于包括随机游走和基于蛋白质的协同过滤在内的网络预测方法，对预测结果的后续分析进一步验证了该方法的优异性能。

Page 62-71

Download 1966