Volume: 16, Issue: 4


Bioinformatics Commons: The Cornerstone of Life and Health Sciences

Zhang Zhang, Yu Xue, Fangqing Zhao

Page 223-225


CIRCpedia v2: An Updated Database for Comprehensive Circular RNA Annotation and Expression Comparison

Rui Dong, Xu-Kai Ma, Guo-Wei Li, Li Yang

Circular RNAs (circRNAs) from back-splicing of exon(s) have been recently identified to be broadly expressed in eukaryotes, in tissue- and species-specific manners. Although functions of most circRNAs remain elusive, some circRNAs are shown to be functional in gene expression regulation and potentially relate to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. Profiling circRNAs by integrating their expression among different samples thus provides molecular basis for further functional study of circRNAs and their potential application in clinic. Here, we report CIRCpedia v2, an updated database for comprehensive circRNA annotation from over 180 RNA-seq datasets across six different species. This atlas allows users to search, browse, and download circRNAs with expression features in various cell types/tissues, including disease samples. In addition, the updated database incorporates conservation analysis of circRNAs between humans and mice. Finally, the web interface also contains computational tools to compare circRNA expression among samples. CIRCpedia v2 is accessible at http://www.picb.ac.cn/rnomics/circpedia.
外显子反向剪接产生的环形RNA是一类不具有5'末端帽子和3'末端poly(A)尾巴、却以共价键形成闭环结构的RNA新分子,其在真核生物中广泛表达、并具有显著的组织和物种特异表达方式。因此,绘制环形RNA的组织和物种表达谱将为深入研究其详细的生成加工机制和潜在功能作用原理奠定基础。中国科学院—德国马普学会计算生物学伙伴研究所杨力研究组,近期发布了升级版的环形RNA数据库网站CIRCpedia v2 (http://www.picb.ac.cn/rnomics/circpedia),其包含了6个物种中超过180个样品的环形RNA分析数据。使用者可通过检索、浏览、下载等模块获取环形RNA基因组坐标、表达水平、可变反向剪接、人鼠保守性等多样化信息,并通过新的在线分析工具对不同样品中的环形RNA开展比较分析。这一升级版的环形RNA数据库网站为环形RNA研究提供了一个全面和综合性的平台,为深入开展环形RNA功能研究提供了数据支持和理论依据。

Page 226-233


HeteroMeth: A Database of Cell-to-cell Heterogeneity in DNA Methylation

Qing Huan, Yuliang Zhang, Shaohuan Wu, Wenfeng Qian

DNA methylation is an important epigenetic mark that plays a vital role in gene expression and cell differentiation. The average DNA methylation level among a group of cells has been extensively documented. However, the cell-to-cell heterogeneity in DNA methylation, which reflects the differentiation of epigenetic status among cells, remains less investigated. Here we established a gold standard of the cell-to-cell heterogeneity in DNA methylation based on single-cell bisulfite sequencing (BS-seq) data. With that, we optimized a computational pipeline for estimating the heterogeneity in DNA methylation from bulk BS-seq data. We further built HeteroMeth, a database for searching, browsing, visualizing, and downloading the data for heterogeneity in DNA methylation for a total of 141 samples in humans, mice, Arabidopsis, and rice. Three genes are used as examples to illustrate the power of HeteroMeth in the identification of unique features in DNA methylation. The optimization of the computational strategy and the construction of the database in this study complement the recent experimental attempts on single-cell DNA methylomes and will facilitate the understanding of epigenetic mechanisms underlying cell differentiation and embryonic development. HeteroMeth is publicly available at http://qianlab.genetics.ac.cn/HeteroMeth.

Page 234-243


PTMD: A Database of Human Disease-associated Post-translational Modifications

Haodong Xu, Yongbo Wang, Shaofeng Lin, Wankun Deng, Di Peng, Qinghua Cui, YuXue

Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Therefore, an integral resource of PTM–disease associations (PDAs) would be a great help for both academic research and clinical use. In this work, we reported PTMD, a well-curated database containing PTMs that are associated with human diseases. We manually collected 1950 known PDAs in 749 proteins for 23 types of PTMs and 275 types of diseases from the literature. Database analyses show that phosphorylation has the largest number of disease associations, whereas neurologic diseases have the largest number of PTM associations. We classified all known PDAs into six classes according to the PTM status in diseases and demonstrated that the upregulation and presence of PTM events account for a predominant proportion of disease-associated PTM events. By reconstructing a disease–gene network, we observed that breast cancers have the largest number of associated PTMs and AKT1 has the largest number of PTMs connected to diseases. Finally, the PTMD database was developed with detailed annotations and can be a useful resource for further analyzing the relations between PTMs and human diseases. PTMD is freely accessible at http://ptmd.biocuckoo.org.

Page 244-251


GAAD: A Gene and Autoimmiune Disease Association Database

Guanting Lu, Xiaowen Hao, Wei-Hua Chen, Shijie Mu

Autoimmune diseases (ADs) arise from an abnormal immune response of the body against substances and tissues normally present in the body. More than a hundred of ADs have been described in the literature so far. Although their etiology remains largely unclear, various types of ADs tend to share more associated genes with other types of ADs than with non-AD types. Here we present GAAD, a gene and AD association database. In GAAD, we collected 44,762 associations between 49 ADs and 4249 genes from public databases and MEDLINE documents. We manually verified the associations to ensure the quality and credibility. We reconstructed and recapitulated the relationships among ADs using their shared genes, which further validated the quality of our data. We also provided a list of significantly co-occurring gene pairs among ADs; with embedded tools, users can query gene co-occurrences and construct customized co-occurrence network with genes of interest. To make GAAD more straightforward to experimental biologists and medical scientists, we extracted additional information describing the associations through text mining, including the putative diagnostic value of the associations, type and position of gene polymorphisms, expression changes of implicated genes, as well as the phenotypical consequences, and grouped the associations accordingly. GAAD is freely available at http://gaad.medgenius.info.
自身免疫疾病(autoimmune diseases)是指机体对自身抗原发生免疫反应而导致自身组织损害所引起的疾病。到目前为止,各种文献中已经介绍了超过一百种自身免疫病。 尽管我们对于自身免疫病的病因仍然不清楚,但是我们发现:与非自身免疫病相比,自身免疫病之间会有更多的相关基因(associated genes)。基于这点,我们开发了GAAD(A Gene and Autoimmiune Disease Association Database)数据库。 在GAAD数据库中,我们收集了来自公共数据库和 MEDLINE 文档的49个自身免疫病和4249个基因之间的44762个关联信息(associations)。我们通过人工检验的方式,保证了这些关联信息的质量和准确度。此外,我们使用了这些自身免疫病的共享基因重建并且重现了自身免疫病之间的关系,从而进一步确保我们数据的可靠性。在数据库中,我们还提供了自身免疫病之间显著共存基因对(co-occurring gene pairs)的列表,根据这个数据,用户可以使用嵌入式工具查询基因共现(gene co-occurrences),并利用感兴趣的基因构建特定的共现网络。为了使实验生物学和医学科学研究人员更加方便地使用数据库,我们通过文本挖掘(text mining)的方式提取了描述每个关联的其他相关信息,包括关联的公认诊断价值、基因多态性的类型和位置、关联基因的表达变化和表型变化,并根据关联进行分组。 GAAD(http://gaad.medgenius.info)数据库支持免费使用和下载数据。

Page 252-261


CCGD-ESCC: A Comprehensive Database for Genetic Variants Associated with Esophageal Squamous Cell Carcinoma in Chinese Population

Linna Peng, Sijin Cheng, Yuan Lin, Qionghua Cui, Yingying Luo, Jiahui Chu, Mingming Shao, Wenyi Fan, Yamei Chen, Ai Lin, Yiyi Xi, Yanxia Sun,Lei Zhang, Chao Zhang, Wen Tan, Ge Gao, Chen Wu, Dongxin Lin

Esophageal squamous-cell carcinoma (ESCC) is one of the most lethal malignancies in the world and occurs at particularly higher frequency in China. While several genome-wide association studies (GWAS) of germline variants and whole-genome or whole-exome sequencing studies of somatic mutations in ESCC have been published, there is no comprehensive database publically available for this cancer. Here, we developed the Chinese Cancer Genomic Database-Esophageal Squamous Cell Carcinoma (CCGD-ESCC) database, which contains the associations of 69,593 single nucleotide polymorphisms (SNPs) with ESCC risk in 2022 cases and 2039 controls, survival time of 1006 ESCC patients (survival GWAS) and gene expression (expression quantitative trait loci, eQTL) in 94 ESCC patients. Moreover, this database also provides the associations between 8833 somatic mutations and survival time in 675 ESCC patients. Our user-friendly database is a resource useful for biologists and oncologists not only in identifying the associations of genetic variants or somatic mutations with the development and progression of ESCC but also in studying the underlying mechanisms for tumorigenesis of the cancer. CCGD-ESCC is freely accessible at http://db.cbi.pku.edu.cn/ccgd/ESCCdb.

Page 262-268