Article Online

Articles Online (Volume 16, Issue 4)


Bioinformatics Commons: The Cornerstone of Life and Health Sciences

Zhang Zhang, Yu Xue, Fangqing Zhao

Page 223-225


CIRCpedia v2: An Updated Database for Comprehensive Circular RNA Annotation and Expression Comparison

Rui Dong, Xu-Kai Ma, Guo-Wei Li, Li Yang

Circular RNAs (circRNAs) from back-splicing of exon(s) have been recently identified to be broadly expressed in eukaryotes, in tissue- and species-specific manners. Although functions of most circRNAs remain elusive, some circRNAs are shown to be functional in gene expression regulation and potentially relate to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. Profiling circRNAs by integrating their expression among different samples thus provides molecular basis for further functional study of circRNAs and their potential application in clinic. Here, we report CIRCpedia v2, an updated database for comprehensive circRNA annotation from over 180 RNA-seq datasets across six different species. This atlas allows users to search, browse, and download circRNAs with expression features in various cell types/tissues, including disease samples. In addition, the updated database incorporates conservation analysis of circRNAs between humans and mice. Finally, the web interface also contains computational tools to compare circRNA expression among samples. CIRCpedia v2 is accessible at
外显子反向剪接产生的环形RNA是一类不具有5'末端帽子和3'末端poly(A)尾巴、却以共价键形成闭环结构的RNA新分子,其在真核生物中广泛表达、并具有显著的组织和物种特异表达方式。因此,绘制环形RNA的组织和物种表达谱将为深入研究其详细的生成加工机制和潜在功能作用原理奠定基础。中国科学院—德国马普学会计算生物学伙伴研究所杨力研究组,近期发布了升级版的环形RNA数据库网站CIRCpedia v2 (,其包含了6个物种中超过180个样品的环形RNA分析数据。使用者可通过检索、浏览、下载等模块获取环形RNA基因组坐标、表达水平、可变反向剪接、人鼠保守性等多样化信息,并通过新的在线分析工具对不同样品中的环形RNA开展比较分析。这一升级版的环形RNA数据库网站为环形RNA研究提供了一个全面和综合性的平台,为深入开展环形RNA功能研究提供了数据支持和理论依据。

Page 226-233


HeteroMeth: A Database of Cell-to-cell Heterogeneity in DNA Methylation

Qing Huan, Yuliang Zhang, Shaohuan Wu, Wenfeng Qian

DNA methylation is an important epigenetic mark that plays a vital role in gene expression and cell differentiation. The average DNA methylation level among a group of cells has been extensively documented. However, the cell-to-cell heterogeneity in DNA methylation, which reflects the differentiation of epigenetic status among cells, remains less investigated. Here we established a gold standard of the cell-to-cell heterogeneity in DNA methylation based on single-cell bisulfite sequencing (BS-seq) data. With that, we optimized a computational pipeline for estimating the heterogeneity in DNA methylation from bulk BS-seq data. We further built HeteroMeth, a database for searching, browsing, visualizing, and downloading the data for heterogeneity in DNA methylation for a total of 141 samples in humans, mice, Arabidopsis, and rice. Three genes are used as examples to illustrate the power of HeteroMeth in the identification of unique features in DNA methylation. The optimization of the computational strategy and the construction of the database in this study complement the recent experimental attempts on single-cell DNA methylomes and will facilitate the understanding of epigenetic mechanisms underlying cell differentiation and embryonic development. HeteroMeth is publicly available at

Page 234-243


PTMD: A Database of Human Disease-associated Post-translational Modifications

Haodong Xu, Yongbo Wang, Shaofeng Lin, Wankun Deng, Di Peng, Qinghua Cui, YuXue

Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Therefore, an integral resource of PTM–disease associations (PDAs) would be a great help for both academic research and clinical use. In this work, we reported PTMD, a well-curated database containing PTMs that are associated with human diseases. We manually collected 1950 known PDAs in 749 proteins for 23 types of PTMs and 275 types of diseases from the literature. Database analyses show that phosphorylation has the largest number of disease associations, whereas neurologic diseases have the largest number of PTM associations. We classified all known PDAs into six classes according to the PTM status in diseases and demonstrated that the upregulation and presence of PTM events account for a predominant proportion of disease-associated PTM events. By reconstructing a disease–gene network, we observed that breast cancers have the largest number of associated PTMs and AKT1 has the largest number of PTMs connected to diseases. Finally, the PTMD database was developed with detailed annotations and can be a useful resource for further analyzing the relations between PTMs and human diseases. PTMD is freely accessible at

Page 244-251


GAAD: A Gene and Autoimmiune Disease Association Database

Guanting Lu, Xiaowen Hao, Wei-Hua Chen, Shijie Mu

Autoimmune diseases (ADs) arise from an abnormal immune response of the body against substances and tissues normally present in the body. More than a hundred of ADs have been described in the literature so far. Although their etiology remains largely unclear, various types of ADs tend to share more associated genes with other types of ADs than with non-AD types. Here we present GAAD, a gene and AD association database. In GAAD, we collected 44,762 associations between 49 ADs and 4249 genes from public databases and MEDLINE documents. We manually verified the associations to ensure the quality and credibility. We reconstructed and recapitulated the relationships among ADs using their shared genes, which further validated the quality of our data. We also provided a list of significantly co-occurring gene pairs among ADs; with embedded tools, users can query gene co-occurrences and construct customized co-occurrence network with genes of interest. To make GAAD more straightforward to experimental biologists and medical scientists, we extracted additional information describing the associations through text mining, including the putative diagnostic value of the associations, type and position of gene polymorphisms, expression changes of implicated genes, as well as the phenotypical consequences, and grouped the associations accordingly. GAAD is freely available at
自身免疫疾病(autoimmune diseases)是指机体对自身抗原发生免疫反应而导致自身组织损害所引起的疾病。到目前为止,各种文献中已经介绍了超过一百种自身免疫病。 尽管我们对于自身免疫病的病因仍然不清楚,但是我们发现:与非自身免疫病相比,自身免疫病之间会有更多的相关基因(associated genes)。基于这点,我们开发了GAAD(A Gene and Autoimmiune Disease Association Database)数据库。 在GAAD数据库中,我们收集了来自公共数据库和 MEDLINE 文档的49个自身免疫病和4249个基因之间的44762个关联信息(associations)。我们通过人工检验的方式,保证了这些关联信息的质量和准确度。此外,我们使用了这些自身免疫病的共享基因重建并且重现了自身免疫病之间的关系,从而进一步确保我们数据的可靠性。在数据库中,我们还提供了自身免疫病之间显著共存基因对(co-occurring gene pairs)的列表,根据这个数据,用户可以使用嵌入式工具查询基因共现(gene co-occurrences),并利用感兴趣的基因构建特定的共现网络。为了使实验生物学和医学科学研究人员更加方便地使用数据库,我们通过文本挖掘(text mining)的方式提取了描述每个关联的其他相关信息,包括关联的公认诊断价值、基因多态性的类型和位置、关联基因的表达变化和表型变化,并根据关联进行分组。 GAAD(数据库支持免费使用和下载数据。

Page 252-261


CCGD-ESCC: A Comprehensive Database for Genetic Variants Associated with Esophageal Squamous Cell Carcinoma in Chinese Population

Linna Peng, Sijin Cheng, Yuan Lin, Qionghua Cui, Yingying Luo, Jiahui Chu, Mingming Shao, Wenyi Fan, Yamei Chen, Ai Lin, Yiyi Xi, Yanxia Sun,Lei Zhang, Chao Zhang, Wen Tan, Ge Gao, Chen Wu, Dongxin Lin

Esophageal squamous-cell carcinoma (ESCC) is one of the most lethal malignancies in the world and occurs at particularly higher frequency in China. While several genome-wide association studies (GWAS) of germline variants and whole-genome or whole-exome sequencing studies of somatic mutations in ESCC have been published, there is no comprehensive database publically available for this cancer. Here, we developed the Chinese Cancer Genomic Database-Esophageal Squamous Cell Carcinoma (CCGD-ESCC) database, which contains the associations of 69,593 single nucleotide polymorphisms (SNPs) with ESCC risk in 2022 cases and 2039 controls, survival time of 1006 ESCC patients (survival GWAS) and gene expression (expression quantitative trait loci, eQTL) in 94 ESCC patients. Moreover, this database also provides the associations between 8833 somatic mutations and survival time in 675 ESCC patients. Our user-friendly database is a resource useful for biologists and oncologists not only in identifying the associations of genetic variants or somatic mutations with the development and progression of ESCC but also in studying the underlying mechanisms for tumorigenesis of the cancer. CCGD-ESCC is freely accessible at

Page 262-268


HCCDB: A Database of Hepatocellular Carcinoma Expression Atlas

Qiuyu Lian, Shicheng Wang, Guchao Zhang, Dongfang Wang, Guijuan Luo, Jing Tang, Lei Chen, Jin Gu

Hepatocellular carcinoma (HCC) is highly heterogeneous in nature and has been one of the most common cancer types worldwide. To ensure repeatability of identified gene expression patterns and comprehensively annotate the transcriptomes of HCC, we carefully curated 15 public HCC expression datasets that cover around 4000 clinical samples and developed the database HCCDB to serve as a one-stop online resource for exploring HCC gene expression with user-friendly interfaces. The global differential gene expression landscape of HCC was established by analyzing the consistently differentially expressed genes across multiple datasets. Moreover, a 4D metric was proposed to fully characterize the expression pattern of each gene by integrating data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). To facilitate a comprehensive understanding of gene expression patterns in HCC, HCCDB also provides links to third-party databases on drug, proteomics, and literatures, and graphically displays the results from computational analyses, including differential expression analysis, tissue-specific and tumor-specific expression analysis, survival analysis, and co-expression analysis. HCCDB is freely accessible at

Page 269-275


TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis

Jingcheng Wu, Wenyi Zhao, Binbin Zhou, Zhixi Su, Xun Gu, Zhan Zhou, Shuqing Chen

Tumor-specific neoantigens have attracted much attention since they can be used as biomarkers to predict therapeutic effects of immune checkpoint blockade therapy and as potential targets for cancer immunotherapy. In this study, we developed a comprehensive tumor-specific neoantigen database (TSNAdb v1.0), based on pan-cancer immunogenomic analyses of somatic mutation data and human leukocyte antigen (HLA) allele information for 16 tumor types with 7748 tumor samples from The Cancer Genome Atlas (TCGA) and The Cancer Immunome Atlas (TCIA). We predicted binding affinities between mutant/wild-type peptides and HLA class I molecules by NetMHCpan v2.8/v4.0, and presented detailed information of 3,707,562/1,146,961 potential neoantigens generated by somatic mutations of all tumor samples. Moreover, we employed recurrent mutations in combination with highly frequent HLA alleles to predict potential shared neoantigens across tumor patients, which would facilitate the discovery of putative targets for neoantigen-based cancer immunotherapy. TSNAdb is freely available at
随着肿瘤基因组学和免疫治疗的快速发展,肿瘤特异性新抗原的重要性愈发凸显。其不仅可作为预测检查点抑制疗法的疗效指标,也可以作为肿瘤免疫细胞治疗的潜在靶点。本研究基于肿瘤免疫基因组学分析开发了针对肿瘤特异性新抗原的系统分析数据库TSNAdb v1.0。我们从TCGA数据库中收集了 16个肿瘤类型共7748例肿瘤样本的体细胞突变信息,并从数据库TCIA中得到对应肿瘤样本的HLA分型,利用HLA分型与多肽亲和力预测软件NetMHCpan进行了新生抗原的预测。我们利用两个版本的NetMHCpan(v2.8和v 4.0)对体细胞突变产生的肿瘤特异性新抗原进行了预测,分别得到3,707,562 和1,146,961个潜在的肿瘤新抗原。此外,我们提取肿瘤样本中出现的高频体细胞突变和高频HLA分型信息,以此为基础预测在肿瘤患者群体中广泛存在的潜在新抗原,为新抗原靶向的免疫治疗提供潜在靶点。我们相信,随着肿瘤免疫基因组学的不断进步,将不断促进肿瘤特异性新抗原的发现鉴定,以及新抗原靶向的肿瘤免疫治疗方法的开发。TSNAdb数据库可以通过 免费开放获取。

Page 276-282


PlaD: A Transcriptomics Database for Plant Defense Responses to Pathogens, Providing New Insights into Plant Immune System

Huan Qi, Zhenhong Jiang, Kang Zhang, Shiping Yang, Fei He, Ziding Zhang

High-throughput transcriptomics technologies have been widely used to study plant transcriptional reprogramming during the process of plant defense responses, and a large quantity of gene expression data have been accumulated in public repositories. However, utilization of these data is often hampered by the lack of standard metadata annotation. In this study, we curated 2444 public pathogenesis-related gene expression samples from the model plant Arabidopsis and three major crops (maize, rice, and wheat). We organized the data into a user-friendly database termed as PlaD. Currently, PlaD contains three key features. First, it provides large-scale curated data related to plant defense responses, including gene expression and gene functional annotation data. Second, it provides the visualization of condition-specific expression profiles. Third, it allows users to search co-regulated genes under the infections of various pathogens. Using PlaD, we conducted a large-scale transcriptome analysis to explore the global landscape of gene expression in the curated data. We found that only a small fraction of genes were differentially expressed under multiple conditions, which might be explained by their tendency of having more network connections and shorter network distances in gene networks. Collectively, we hope that PlaD can serve as an important and comprehensive knowledgebase to the community of plant sciences, providing insightful clues to better understand the molecular mechanisms underlying plant immune responses. PlaD is freely available at or
高通量转录组技术已被广泛应用于植物免疫转录重编程的研究,一些公共数据库中已积累了大量的转录组数据。然而,这些数据由于缺乏标准化的注释,蕴藏在其中的巨大价值还未有效利用。本研究中,我们从模式植物拟南芥以及三个重要作物(玉米,水稻和小麦)中精选了2444个病原菌相关的基因表达样本。通过对这些数据的整理和分析,我们构建了一个用户友好的转录组数据库PlaD目前, PlaD具有以下三个重要特征。第一,它提供了大规模的植物防御反应相关的数据,主要包括基因表达数据和功能注释信息。第二,它实现了条件特异的表达谱的可视化。第三,它允许用户搜索被多种条件共同调控的基因。同时,利用储存在PlaD里的数据,我们开展了大规模的转录组学分析,旨在探索植物基因表达变化的全局特征。我们发现只有少部分基因在多个条件下发生了差异表达,这部分基因在基因功能网络中倾向于有更多的网络连接和更短的网络距离。综上,我们希望PlaD可以作为一个综合的知识库,为植物科学家们进一步研究植物免疫应答机制提供有用的线索。目前,PlaD的网址为: 或。

Page 283-293

Web Server

DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning

Yubin Xie, Xiaotong Luo, Yupeng Li, Li Chen, Wenbin Ma, Junjiu Huang, Jun Cui, Yong Zhao, Yu Xue, Zhixiang Zuo, Jian Ren

Protein nitration and nitrosylation are essential post-translational modifications (PTMs) involved in many fundamental cellular processes. Recent studies have revealed that excessive levels of nitration and nitrosylation in some critical proteins are linked to numerous chronic diseases. Therefore, the identification of substrates that undergo such modifications in a site-specific manner is an important research topic in the community and will provide candidates for targeted therapy. In this study, we aimed to develop a computational tool for predicting nitration and nitrosylation sites in proteins. We first constructed four types of encoding features, including positional amino acid distributions, sequence contextual dependencies, physicochemical properties, and position-specific scoring features, to represent the modified residues. Based on these encoding features, we established a predictor called DeepNitro using deep learning methods for predicting protein nitration and nitrosylation. Using n-fold cross-validation, our evaluation shows great AUC values for DeepNitro, 0.65 for tyrosine nitration, 0.80 for tryptophan nitration, and 0.70 for cysteine nitrosylation, respectively, demonstrating the robustness and reliability of our tool. Also, when tested in the independent dataset, DeepNitro is substantially superior to other similar tools with a 7%−42% improvement in the prediction performance. Taken together, the application of deep learning method and novel encoding schemes, especially the position-specific scoring feature, greatly improves the accuracy of nitration and nitrosylation site prediction and may facilitate the prediction of other PTM sites. DeepNitro is implemented in JAVA and PHP and is freely available for academic research at
蛋白质硝基化和亚硝基化是一种关键的蛋白质翻译后修饰类型,它在多种常见的细胞调控过程中都发挥着重要的作用。最近的研究表明,在某些关键蛋白上的异常硝基化及亚硝基化水平与多种慢性疾病相关。因此,在修饰底物上鉴定精确的修饰位点是当前研究的重要关注点,并且能为慢性疾病的靶向治疗提供潜在靶点。本研究中,我们针对蛋白质硝基化及亚硝基化开发了一套精确的位点预测工具——DeepNitro。首先,我们在计算模型中引入了氨基酸分布、序列上下游特征、理化性质以及位点特异性打分这四种编码算法来对修饰位点进行训练特征提取。基于这些特征编码,我们利用深度学习算法构建了一个专门针对蛋白质硝基化及亚硝基化的位点预测模型。同时,N折交叉验证显示,本研究所建立的模型可以给出稳定及可信的预测结果,其中对酪氨酸、色氨酸硝基化以及半胱氨酸亚硝基化的预测AUC分别达到0.65、0.80以及0.70。另外,在独立测试集的评估中我们也发现DeepNitro在预测精度上要显著高于当前已有的工具。相较于其他工具,DeepNitro具有7% - 42%的预测性能提升。综合上述,应用深度学习算法及新型的特征编码方法,我们提高了针对蛋白质硝基化及亚硝基化的预测精度,进一步辅助了对这些修饰位点的高通量鉴定。目前,DeepNitro使用JAVA和PHP开发,可以通过http://deepnitro.renlab.org免费获取。

Page 294-306