Article Online

Articles Online (Volume 20, Issue 3)

Database

CircR2Disease v2.0: An Updated Web Server for Experimentally Validated circRNA–disease Associations and Its Application

Chunyan Fan, Xiujuan Lei, Jiaojiao Tie, Yuchen Zhang, Fang-Xiang Wu, Yi Pan

With accumulating dysregulated circular RNAs (circRNAs) in pathological processes, the regulatory functions of circRNAs, especially circRNAs as microRNA (miRNA) sponges and their interactions with RNA-binding proteins (RBPs), have been widely validated. However, the collected information on experimentally validated circRNA–disease associations is only preliminary. Therefore, an updated CircR2Disease database providing a comprehensive resource and web tool to clarify the relationships between circRNAs and diseases in diverse species is necessary. Here, we present an updated CircR2Disease v2.0 with the increased number of circRNA–disease associations and novel characteristics. CircR2Disease v2.0 provides more than 5-fold experimentally validated circRNA–disease associations compared to its previous version. This version includes 4201 entries between 3077 circRNAs and 312 disease subtypes. Secondly, the information of circRNA–miRNA, circRNA–miRNA–target, and circRNA–RBP interactions has been manually collected for various diseases. Thirdly, the gene symbols of circRNAs and disease name IDs can be linked with various nomenclature databases. Detailed descriptions such as samples and journals have also been integrated into the updated version. Thus, CircR2Disease v2.0 can serve as a platform for users to systematically investigate the roles of dysregulated circRNAs in various diseases and further explore the posttranscriptional regulatory function in diseases. Finally, we propose a computational method named circDis based on the graph convolutional network (GCN) and gradient boosting decision tree (GBDT) to illustrate the applications of the CircR2Disease v2.0 database. CircR2Disease v2.0 is available at http://bioinfo.snnu.edu.cn/CircR2Disease_v2.0 and https://github.com/bioinforlab/CircR2Disease-v2.0.

Page 435-445


Database

dbDEMC 3.0: Functional Exploration of Differentially Expressed miRNAs in Cancers of Human and Model Organisms

Feng Xu, Yifan Wang, Yunchao Ling, Chenfen Zhou, Haizhou Wang, Andrew E. Teschendorff, Yi Zhao, Haitao Zhao, Yungang He, Guoqing Zhang, Zhen Yang

MicroRNAs (miRNAs) are important regulators in gene expression. The dysregulation of miRNA expression is widely reported in the transformation from physiological to pathological states of cells. A large number of differentially expressed miRNAs (DEMs) have been identified in various human cancers by using high-throughput technologies, such as microarray and miRNA-seq. Through mining of published studies with high-throughput experiment information, the database of DEMs in human cancers (dbDEMC) was constructed with the aim of providing a systematic resource for the storage and query of the DEMs. Here we report an update of the dbDEMC to version 3.0, which contains two-fold more data entries than the second version and now includes also data from mice and rats. The dbDEMC 3.0 contains 3268 unique DEMs in 40 different cancer types. The current datasets for differential expression analysis have expanded to 9 generalized categories. Moreover, the current release integrates functional annotations of DEMs obtained by using experimentally validated targets. The annotations can be of great benefit to the intensive analysis of the roles of DEMs in cancer. In summary, dbDEMC 3.0 provides a valuable resource for characterizing molecular functions and regulatory mechanisms of DEMs in human cancers. The dbDEMC 3.0 is freely accessible at https://www.biosino.org/dbDEMC.
研究问题 microRNA(miRNA)在调控基因表达的过程中发挥着重要作用。miRNA的表达失调在细胞从生理到病理状态的转化过程中的角色被广泛报道。通过使用高通量技术,如微阵列和miRNA-seq,研究人员已经在各种人类癌症中发现了大量异常表达的miRNA分子。然而,这些有价值的数据分散在大量文献中,因此有必要对其进行准确和全面的收集和汇总,从而为有效的利用和系统查询这些数据提供有效工具。 研究方法 基于对公共miRNA数据的广泛收集、严格质控与重新分析,对已发表文献、数据库的信息挖掘,对多来源信息的交叉整合,对结果的合并去冗余,对数据框架的重构,我们发布了人类癌症异常表达miRNA数据库第三版(dbDEMC 3.0),旨在为癌症异常表达miRNA的存储和查询提供一个系统的资源库。 主要成果1 dbDEMC 3.0整合了403套来源于GEO, ArrayExpress, SRA, TCGA的高通量miRNA表达数据集,包括了40种不同癌症类型中的46,388个样本,共涵盖807套差异表达分析的结果,差异表达miRNA数量覆盖超过人类基因组中90%的miRNA基因。 主要成果2 除了人类数据外,dbDEMC 3.0还首次整合了小鼠和大鼠的肿瘤miRNA表达谱数据,可以更为系统地描述异常表达miRNA在其他模式生物肿瘤发生模型中的功能。 主要成果3 除数据量大幅增加之外,dbDEMC 3.0还整理了差异表达miRNA的靶基因数据,从而构建差异miRNA调控网络,并对靶基因进行GO和KEGG富集分析,从而实现差异表达miRNA的功能注释。此外,我们还优化了数据库的web界面,以便更好地实现上述数据的可视化。 数据库链接 https://www.biosino.org/dbDEMC

Page 446-454


Database

FertilityOnline: A Straightforward Pipeline for Functional Gene Annotation and Disease Mutation Discovery

Jianing Gao, Huan Zhang, Xiaohua Jiang, Asim Ali, Daren Zhao, Jianqiang Bao, Long Jiang, Furhan Iqbal, Qinghua Shi, Yuanwei Zhang

Exploring the genetic basis of human infertility is currently under intensive investigation. However, only a handful of genes have been validated in animal models as disease-causing genes in infertile men. Thus, to better understand the genetic basis of human spermatogenesis and bridge the knowledge gap between humans and other animal species, we construct the FertilityOnline, a database integrating the literature-curated functional genes during spermatogenesis into an existing spermatogenic database, SpermatogenesisOnline 1.0. Additional features, including the functional annotation and genetic variants of human genes, are also incorporated into FertilityOnline. By searching this database, users can browse the functional genes involved in spermatogenesis and instantly narrow down the number of candidates of genetic mutations underlying male infertility in a user-friendly web interface. Clinical application of this database was exampled by the identification of novel causative mutations in synaptonemal complex central element protein 1 (SYCE1) and stromal antigen 3 (STAG3) in azoospermic men. In conclusion, FertilityOnline is not only an integrated resource for spermatogenic genes but also a useful tool facilitating the exploration of the genetic basis of male infertility. FertilityOnline can be freely accessed at http://mcg.ustc.edu.cn/bsc/spermgenes2.0/index.html.

Page 455-465


Method

ASpediaFI: Functional Interaction Analysis of Alternative Splicing Events

Kyubin Lee, Doyeong Yu, Daejin Hyung, Soo Young Cho, Charny Park

Alternative splicing (AS) regulates biological processes governing phenotypes and diseases. Differential AS (DAS) gene test methods have been developed to investigate important exonic expression from high-throughput datasets. However, the DAS events extracted using statistical tests are insufficient to delineate relevant biological processes. In this study, we developed a novel application, Alternative Splicing Encyclopedia: Functional Interaction (ASpediaFI), to systemically identify DAS events and co-regulated genes and pathways. ASpediaFI establishes a heterogeneous interaction network of genes and their feature nodes (i.e., AS events and pathways) connected by coexpression or pathway gene set knowledge. Next, ASpediaFI explores the interaction network using the random walk with restart algorithm and interrogates the proximity from a query gene set. Finally, ASpediaFI extracts significant AS events, genes, and pathways. To evaluate the performance of our method, we simulated RNA sequencing (RNA-seq) datasets to consider various conditions of sequencing depth and sample size. The performance was compared with that of other methods. Additionally, we analyzed three public datasets of cancer patients or cell lines to evaluate how well ASpediaFI detects biologically relevant candidates. ASpediaFI exhibits strong performance in both simulated and public datasets. Our integrative approach reveals that DAS events that recognize a global co-expression network and relevant pathways determine the functional importance of spliced genes in the subnetwork. ASpediaFI is publicly available at https://bioconductor.org/packages/ASpediaFI.

Page 466-482


Method

DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning

Zhongxiao Li, Yisheng Li, Bin Zhang, Yu Li, Yongkang Long, Juexiao Zhou, Xudong Zou, Min Zhang, Yuhui Hu, Wei Chen, Xin Gao

Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic studies have mainly focused on the recognition of polyadenylation sites (PASs) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PASs in the same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account. To address this, here we propose a deep learning architecture, Deep Regulatory Code and Tools for Alternative Polyadenylation (DeeReCT-APA), to quantitatively predict the usage of all alternative PASs of a given gene. To accommodate different genes with potentially different numbers of PASs, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a convolutional neural network-long short-term memory (CNN-LSTM) architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PASs, and outputs percentage scores representing the usage levels of all PASs of a gene. In addition to the fact that only our method can quantitatively predict the usage of all the PASs within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task, and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and sheds light on future mechanistic understanding in APA regulation. Our code and data are available at https://github.com/lzx325/DeeReCT-APA-repo.

Page 483-495


Method

DeepCAGE: Incorporating Transcription Factors in Genome-wide Prediction of Chromatin Accessibility

Qiao Liu, Kui Hua, Xuegong Zhang, Wing Hung Wong, Rui Jiang

Although computational approaches have been complementing high-throughput biological experiments for the identification of functional regions in the human genome, it remains a great challenge to systematically decipher interactions between transcription factors (TFs) and regulatory elements to achieve interpretable annotations of chromatin accessibility across diverse cellular contexts. To solve this problem, we propose DeepCAGE, a deep learning framework that integrates sequence information and binding statuses of TFs, for the accurate prediction of chromatin accessible regions at a genome-wide scale in a variety of cell types. DeepCAGE takes advantage of a densely connected deep convolutional neural network architecture to automatically learn sequence signatures of known chromatin accessible regions and then incorporates such features with expression levels and binding activities of human core TFs to predict novel chromatin accessible regions. In a series of systematic comparisons with existing methods, DeepCAGE exhibits superior performance in not only the classification but also the regression of chromatin accessibility signals. In a detailed analysis of TF activities, DeepCAGE successfully extracts novel binding motifs and measures the contribution of a TF to the regulation with respect to a specific locus in a certain cell type. When applied to whole-genome sequencing data analysis, our method successfully prioritizes putative deleterious variants underlying a human complex trait and thus provides insights into the understanding of disease-associated genetic variants. DeepCAGE can be downloaded from https://github.com/kimmo1019/DeepCAGE.

Page 496-507


Method

PHISDetector: A Tool to Detect Diverse In Silico Phage–host Interaction Signals for Virome Studies

Fengxia Zhou, Rui Gan, Fan Zhang, Chunyan Ren, Ling Yu, Yu Si, Zhiwei Huang

Phage–microbe interactions are appealing systems to study coevolution, and have also been increasingly emphasized due to their roles in human health, disease, and the development of novel therapeutics. Phage–microbe interactions leave diverse signals in bacterial and phage genomic sequences, defined as phage–host interaction signals (PHISs), which include clustered regularly interspaced short palindromic repeats (CRISPR) targeting, prophage, and protein–protein interaction signals. In the present study, we developed a novel tool phage–host interaction signal detector (PHISDetector) to predict phage–host interactions by detecting and integrating diverse in silico PHISs, and scoring the probability of phage–host interactions using machine learning models based on PHIS features. We evaluated the performance of PHISDetector on multiple benchmark datasets and application cases. When tested on a dataset of 758 annotated phage–host pairs, PHISDetector yields the prediction accuracies of 0.51 and 0.73 at the species and genus levels, respectively, outperforming other phage–host prediction tools. When applied to on 125,842 metagenomic viral contigs (mVCs) derived from 3042 geographically diverse samples, a detection rate of 54.54% could be achieved. Furthermore, PHISDetector could predict infecting phages for 85.6% of 368 multidrug-resistant (MDR) bacteria and 30% of 454 human gut bacteria obtained from the National Institutes of Health (NIH) Human Microbiome Project (HMP). The PHISDetector can be run either as a web server (http://www.microbiome-bigdata.com/PHISDetector/) for general users to study individual inputs or as a stand-alone version (https://github.com/HIT-ImmunologyLab/PHISDetector) to process massive phage contigs from virome studies.

Page 508-523


Method

inGAP-family: Accurate Detection of Meiotic Recombination Loci and Causal Mutations by Filtering Out Artificial Variants due to Genome Complexities

Qichao Lian, Yamao Chen, Fang Chang, Ying Fu, Ji Qi

Accurately identifying DNA polymorphisms can bridge the gap between phenotypes and genotypes and is essential for molecular marker assisted genetic studies. Genome complexities, including large-scale structural variations, bring great challenges to bioinformatic analysis for obtaining high-confidence genomic variants, as sequence differences between non-allelic loci of two or more genomes can be misinterpreted as polymorphisms. It is important to correctly filter out artificial variants to avoid false genotyping or estimation of allele frequencies. Here, we present an efficient and effective framework, inGAP-family, to discover, filter, and visualize DNA polymorphisms and structural variants (SVs) from alignment of short reads. Applying this method to polymorphism detection on real datasets shows that elimination of artificial variants greatly facilitates the precise identification of meiotic recombination points as well as causal mutations in mutant genomes or quantitative trait loci. In addition, inGAP-family provides a user-friendly graphical interface for detecting polymorphisms and SVs, further evaluating predicted variants and identifying mutations related to genotypes. It is accessible at https://sourceforge.net/projects/ingap-family/.

Page 524-535


Application Note

KaKs_Calculator 3.0: Calculating Selective Pressure on Coding and Non-coding Sequences

Zhang Zhang

KaKs_Calculator 3.0 is an updated toolkit that is capable of calculating selective pressure on both coding and non-coding sequences. Similar to the nonsynonymous/synonymous substitution rate ratio for coding sequences, selection on non-coding sequences can be quantified as the ratio of non-coding nucleotide substitution rate to synonymous substitution rate of adjacent coding sequences. As testified on empirical data, KaKs_Calculator 3.0 shows effectiveness to detect the strength and mode of selection operated on molecular sequences, accordingly demonstrating its great potential to achieve genome-wide scan of natural selection on diverse sequences and identification of potentially functional elements at a whole-genome scale. The package of KaKs_Calculator 3.0 is freely available for academic use only at https://ngdc.cncb.ac.cn/biocode/tools/BT000001.
KaKs_Calculator是分子序列选择压力计算的工具包,最新版本3.0可支持编码和非编码序列的选择压力计算。编码序列的选择压力利用非同义替换率与同义替换率的比值来表征;与之类似,非编码序列的选择压力可用非编码序列的替换率与非编码序列临近编码序列的同义替换率的比值来计算。通过对真实数据的比较分析发现,KaKs_Calculator 3.0可准确计算出非编码序列的选择压力,从而为在全基因组水平上自然选择压力的解析和潜在功能原件的识别提供了重要方法。KaKs_Calculator 3.0工具包可免费获取用于学术研究。

Page 536-540


Application Note

ezQTL: A Web Platform for Interactive Visualization and Colocalization of QTLs and GWAS Loci

Tongwu Zhang, Alyssa Klein, Jian Sang, Jiyeon Choi, Kevin M. Brown

Genome-wide association studies (GWAS) have identified thousands of genomic loci associated with complex diseases and traits, including cancer. The vast majority of common trait-associated variants identified via GWAS fall in non-coding regions of the genome, posing a challenge in elucidating the causal variants, genes, and mechanisms involved. Expression quantitative trait locus (eQTL) and other molecular QTL studies have been valuable resources in identifying candidate causal genes from GWAS loci through statistical colocalization methods. While QTL colocalization is becoming a standard analysis in post-GWAS investigation, an easy web tool for users to perform formal colocalization analyses with either user-provided or public GWAS and eQTL datasets has been lacking. Here, we present ezQTL, a web-based bioinformatic application to interactively visualize and analyze genetic association data such as GWAS loci and molecular QTLs under different linkage disequilibrium (LD) patterns (1000 Genomes Project, UK Biobank, or user-provided data). This application allows users to perform data quality control for variants matched between different datasets, LD visualization, and two-trait colocalization analyses using two state-of-the-art methodologies (eCAVIAR and HyPrColoc), including batch processing. ezQTL is a free and publicly available cross-platform web tool, which can be accessed online at https://analysistools.cancer.gov/ezqtl.

Page 541-548


Application Note

DEBKS: A Tool to Detect Differentially Expressed Circular RNAs

Zelin Liu, Huiru Ding, Jianqi She, Chunhua Chen, Weiguang Zhang, Ence Yang

Circular RNAs (circRNAs) are involved in various biological processes and disease pathogenesis. However, only a small number of functional circRNAs have been identified among hundreds of thousands of circRNA species, partly because most current methods are based on circular junction counts and overlook the fact that a circRNA is formed from the host gene by back-splicing (BS). To distinguish the expression difference originating from BS or the host gene, we present differentially expressed back-splicing (DEBKS), a software program to streamline the discovery of differential BS events between two rRNA-depleted RNA sequencing (RNA-seq) sample groups. By applying to real and simulated data and employing RT-qPCR for validation, we demonstrate that DEBKS is efficient and accurate in detecting circRNAs with differential BS events between paired and unpaired sample groups. DEBKS is available at https://github.com/yangence/DEBKS as open-source software.

Page 549-556


Application Note

Interactive Web-based Annotation of Plant MicroRNAs with iwa-miRNA

Ting Zhang, Jingjing Zhai, Xiaorong Zhang, Lei Ling, Menghan Li, Shang Xie, Minggui Song, Chuang Ma

MicroRNAs (miRNAs) are important regulators of gene expression. The large-scale detection and profiling of miRNAs have been accelerated with the development of high-throughput small RNA sequencing (sRNA-Seq) techniques and bioinformatics tools. However, generating high-quality comprehensive miRNA annotations remains challenging due to the intrinsic complexity of sRNA-Seq data and inherent limitations of existing miRNA prediction tools. Here, we present iwa-miRNA, a Galaxy-based framework that can facilitate miRNA annotation in plant species by combining computational analysis and manual curation. iwa-miRNA is specifically designed to generate a comprehensive list of miRNA candidates, bridging the gap between already annotated miRNAs provided by public miRNA databases and new predictions from sRNA-Seq datasets. It can also assist users in selecting promising miRNA candidates in an interactive mode, contributing to the accessibility and reproducibility of genome-wide miRNA annotation. iwa-miRNA is user-friendly and can be easily deployed as a web application for researchers without programming experience. With flexible, interactive, and easy-to-use features, iwa-miRNA is a valuable tool for the annotation of miRNAs in plant species with reference genomes. We also illustrate the application of iwa-miRNA for miRNA annotation using data from plant species with varying genomic complexity. The source codes and web server of iwa-miRNA are freely accessible at http://iwa-miRNA.omicstudio.cloud/.
研究问题 如何获得高质量的植物miRNA注释? 解决方案 研发可交互的生物信息学分析平台iwa-miRNA,综合利用计算分析和人工审查两种策略,获得高质量的植物miRNA注释数据。 实现方式 iwa-miRNA内置了一系列的生物信息学分析流程,自动聚合miRBase、PmiREN和sRNAanno等具有代表性的miRNA数据库的注释数据,挖掘高通量小RNA测序数据中的候选miRNA分子,刻画miRNA在序列、结构和表达等多个层面的特征信息,建立动态的Web交互界面实现高质量miRNA的人工辅助筛选,构建基于机器学习的计算生物学模型进行全基因组水平的miRNA精确预测。 iwa-miRNA链接 http://iwa-miRNA.omicstudio.cloud或https://github.com/cma2015/iwa-miRNA

Page 557-567


Application Note

i2dash: Creation of Flexible, Interactive, and Web-based Dashboards for Visualization of Omics Data

Arsenij Ustjanzew, Jens Preussner, Mette Bentsen, Carsten Kuenne, Mario Looso

Data visualization and interactive data exploration are important aspects of illustrating complex concepts and results from analyses of omics data. A suitable visualization has to be intuitive and accessible. Web-based dashboards have become popular tools for the arrangement, consolidation, and display of such visualizations. However, the combination of automated data processing pipelines handling omics data and dynamically generated, interactive dashboards is poorly solved. Here, we present i2dash, an R package intended to encapsulate functionality for the programmatic creation of customized dashboards. It supports interactive and responsive (linked) visualizations across a set of predefined graphical layouts. i2dash addresses the needs of data analysts/software developers for a tool that is compatible and attachable to any R-based analysis pipeline, thereby fostering the separation of data visualization on one hand and data analysis tasks on the other hand. In addition, the generic design of i2dash enables the development of modular extensions for specific needs. As a proof of principle, we provide an extension of i2dash optimized for single-cell RNA sequencing analysis, supporting the creation of dashboards for the visualization needs of such experiments. Equipped with these features, i2dash is suitable for extensive use in large-scale sequencing/bioinformatics facilities. Along this line, we provide i2dash as a containerized solution, enabling a straightforward large-scale deployment and sharing of dashboards using cloud services. i2dash is freely available via the R package archive CRAN (https://CRAN.R-project.org/package=i2dash).

Page 568-577


Application Note

NORMA: The Network Makeup Artist — A Web Tool for Network Annotation Visualization

Mikaela Koutrouli, Evangelos Karatzas, Katerina Papanikolopoulou, Georgios A. Pavlopoulos

The Network Makeup Artist (NORMA) is a web tool for interactive network annotation visualization and topological analysis, able to handle multiple networks and annotations simultaneously. Precalculated annotations (e.g., Gene Ontology, Pathway enrichment, community detection, or clustering results) can be uploaded and visualized in a network, either as colored pie-chart nodes or as color-filled areas in a 2D/3D Venn-diagram-like style. In the case where no annotation exists, algorithms for automated community detection are offered. Users can adjust the network views using standard layout algorithms or allow NORMA to slightly modify them for visually better group separation. Once a network view is set, users can interactively select and highlight any group of interest in order to generate publication-ready figures. Briefly, with NORMA, users can encode three types of information simultaneously. These are 1) the network, 2) the communities or annotations of interest, and 3) node categories or expression values. Finally, NORMA offers basic topological analysis and direct topological comparison across any of the selected networks. NORMA service is available at http://norma.pavlopouloslab.info, whereas the code is available at https://github.com/PavlopoulosLab/NORMA.

Page 578-586


Application Note

SynergyFinder Plus: Toward Better Interpretation and Annotation of Drug Combination Screening Datasets

Shuyu Zheng, Wenyu Wang, Jehad Aldahdooh, Alina Malyutina, Tolou Shadbahr,Ziaurrehman Tanoli, Alberto Pessia, Jing Tang

Combinatorial therapies have been recently proposed to improve the efficacy of anticancer treatment. The SynergyFinder R package is a software used to analyze pre-clinical drug combination datasets. Here, we report the major updates to the SynergyFinder R package for improved interpretation and annotation of drug combination screening results. Unlike the existing implementations, the updated SynergyFinder R package includes five main innovations. 1) We extend the mathematical models to higher-order drug combination data analysis and implement dimension reduction techniques for visualizing the synergy landscape. 2) We provide a statistical analysis of drug combination synergy and sensitivity with confidence intervals and P values. 3) We incorporate a synergy barometer to harmonize multiple synergy scoring methods to provide a consensus metric for synergy. 4) We evaluate drug combination synergy and sensitivity to provide an unbiased interpretation of the clinical potential. 5) We enable fast annotation of drugs and cell lines, including their chemical and target information. These annotations will improve the interpretation of the mechanisms of action of drug combinations. To facilitate the use of the R package within the drug discovery community, we also provide a web server at www.synergyfinderplus.org as a user-friendly interface to enable a more flexible and versatile analysis of drug combination data.

Page 587-596