Volume: 16, Issue: 4

Preface

Bioinformatics Commons: The Cornerstone of Life and Health Sciences

Zhang Zhang, Yu Xue, Fangqing Zhao

Page 223-225


Database

CIRCpedia v2: An Updated Database for Comprehensive Circular RNA Annotation and Expression Comparison

Rui Dong, Xu-Kai Ma, Guo-Wei Li, Li Yang

Circular RNAs (circRNAs) from back-splicing of exon(s) have been recently identified to be broadly expressed in eukaryotes, in tissue- and species-specific manners. Although functions of most circRNAs remain elusive, some circRNAs are shown to be functional in gene expression regulation and potentially relate to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. Profiling circRNAs by integrating their expression among different samples thus provides molecular basis for further functional study of circRNAs and their potential application in clinic. Here, we report CIRCpedia v2, an updated database for comprehensive circRNA annotation from over 180 RNA-seq datasets across six different species. This atlas allows users to search, browse, and download circRNAs with expression features in various cell types/tissues, including disease samples. In addition, the updated database incorporates conservation analysis of circRNAs between humans and mice. Finally, the web interface also contains computational tools to compare circRNA expression among samples. CIRCpedia v2 is accessible at http://www.picb.ac.cn/rnomics/circpedia.
外显子反向剪接产生的环形RNA是一类不具有5'末端帽子和3'末端poly(A)尾巴、却以共价键形成闭环结构的RNA新分子,其在真核生物中广泛表达、并具有显著的组织和物种特异表达方式。因此,绘制环形RNA的组织和物种表达谱将为深入研究其详细的生成加工机制和潜在功能作用原理奠定基础。中国科学院—德国马普学会计算生物学伙伴研究所杨力研究组,近期发布了升级版的环形RNA数据库网站CIRCpedia v2 (http://www.picb.ac.cn/rnomics/circpedia),其包含了6个物种中超过180个样品的环形RNA分析数据。使用者可通过检索、浏览、下载等模块获取环形RNA基因组坐标、表达水平、可变反向剪接、人鼠保守性等多样化信息,并通过新的在线分析工具对不同样品中的环形RNA开展比较分析。这一升级版的环形RNA数据库网站为环形RNA研究提供了一个全面和综合性的平台,为深入开展环形RNA功能研究提供了数据支持和理论依据。

Page 226-233


Database

HeteroMeth: A Database of Cell-to-cell Heterogeneity in DNA Methylation

Qing Huan, Yuliang Zhang, Shaohuan Wu, Wenfeng Qian

DNA methylation is an important epigenetic mark that plays a vital role in gene expression and cell differentiation. The average DNA methylation level among a group of cells has been extensively documented. However, the cell-to-cell heterogeneity in DNA methylation, which reflects the differentiation of epigenetic status among cells, remains less investigated. Here we established a gold standard of the cell-to-cell heterogeneity in DNA methylation based on single-cell bisulfite sequencing (BS-seq) data. With that, we optimized a computational pipeline for estimating the heterogeneity in DNA methylation from bulk BS-seq data. We further built HeteroMeth, a database for searching, browsing, visualizing, and downloading the data for heterogeneity in DNA methylation for a total of 141 samples in humans, mice, Arabidopsis, and rice. Three genes are used as examples to illustrate the power of HeteroMeth in the identification of unique features in DNA methylation. The optimization of the computational strategy and the construction of the database in this study complement the recent experimental attempts on single-cell DNA methylomes and will facilitate the understanding of epigenetic mechanisms underlying cell differentiation and embryonic development. HeteroMeth is publicly available at http://qianlab.genetics.ac.cn/HeteroMeth.
DNA甲基化作为一个重要的表观调控因子,在基因表达调控和细胞分化的过程中发挥着至关重要的作用。研究者通常以细胞群体作为一个整体,基于这群细胞的平均DNA甲基化水平开展分析。值得注意的是,不同细胞之间DNA甲基化修饰并非均一,而这种异质性可能反映了细胞间表观修饰状态的分化。但是目前关于单细胞DNA甲基化修饰异质性的研究还鲜见报道。我们基于单细胞亚硫酸氢盐测序数据,为计算单细胞DNA甲基化异质性建立了金标准,进而优化了从细胞群体样品亚硫酸氢盐测序数据中计算单细胞DNA甲基化异质性的策略,并搭建了HeteroMeth数据库。该数据库提供了来自人类、小鼠、拟南芥和水稻共141个样品的DNA甲基化异质性数据,可以方便得进行查找、浏览、可视化和下载。此研究中计算策略的优化和数据库的建立,将推动DNA甲基化修饰异质性特征的系统识别,从而为细胞分化和胚胎发育过程中表观调控机制的探索提供关键性的支持。HeteroMeth的公共链接地址为:http://qianlab.genetics.ac.cn/HeteroMeth。

Page 234-243


Database

PTMD: A Database of Human Disease-associated Post-translational Modifications

Haodong Xu, Yongbo Wang, Shaofeng Lin, Wankun Deng, Di Peng, Qinghua Cui, YuXue

Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Therefore, an integral resource of PTM–disease associations (PDAs) would be a great help for both academic research and clinical use. In this work, we reported PTMD, a well-curated database containing PTMs that are associated with human diseases. We manually collected 1950 known PDAs in 749 proteins for 23 types of PTMs and 275 types of diseases from the literature. Database analyses show that phosphorylation has the largest number of disease associations, whereas neurologic diseases have the largest number of PTM associations. We classified all known PDAs into six classes according to the PTM status in diseases and demonstrated that the upregulation and presence of PTM events account for a predominant proportion of disease-associated PTM events. By reconstructing a disease–gene network, we observed that breast cancers have the largest number of associated PTMs and AKT1 has the largest number of PTMs connected to diseases. Finally, the PTMD database was developed with detailed annotations and can be a useful resource for further analyzing the relations between PTMs and human diseases. PTMD is freely accessible at http://ptmd.biocuckoo.org.
通过调控蛋白质的功能,蛋白质翻译后修饰(简称:修饰)几乎参与了所有的生物学过程,并且修饰异常状态常常与人类疾病有着密切的联系。因此,整合已有的疾病相关修饰信息将对学术研究和临床应用提供非常巨大的帮助。在这项工作中,我们发布了一个精准注释的与人类疾病相关修饰信息的数据库PTMD。我们从文献中手工收集了1950个疾病相关修饰信息。这些疾病相关修饰位于749个蛋白质上,涵盖了23种修饰类型和275种疾病类型。其中,磷酸化修饰有最多的疾病关联,而神经系统疾病则覆盖了最多的修饰类型。我们将所有已知的疾病相关修饰按照修饰对疾病的影响分为六类,结果表明修饰水平上调和修饰的存在与疾病有着更为紧密的关联。通过构建疾病−基因作用网络,我们发现乳腺癌拥有最大数量的修饰关联,而AKT1基因上则拥有最大数目的疾病相关修饰信息。最后,PTMD数据库带有非常详尽的注释信息,可以成为进一步分析修饰与人类疾病之间关系的有用资源。用户可以通过http://ptmd.biocuckoo.org访问PTMD数据库。

Page 244-251


Database

GAAD: A Gene and Autoimmiune Disease Association Database

Guanting Lu, Xiaowen Hao, Wei-Hua Chen, Shijie Mu

Autoimmune diseases (ADs) arise from an abnormal immune response of the body against substances and tissues normally present in the body. More than a hundred of ADs have been described in the literature so far. Although their etiology remains largely unclear, various types of ADs tend to share more associated genes with other types of ADs than with non-AD types. Here we present GAAD, a gene and AD association database. In GAAD, we collected 44,762 associations between 49 ADs and 4249 genes from public databases and MEDLINE documents. We manually verified the associations to ensure the quality and credibility. We reconstructed and recapitulated the relationships among ADs using their shared genes, which further validated the quality of our data. We also provided a list of significantly co-occurring gene pairs among ADs; with embedded tools, users can query gene co-occurrences and construct customized co-occurrence network with genes of interest. To make GAAD more straightforward to experimental biologists and medical scientists, we extracted additional information describing the associations through text mining, including the putative diagnostic value of the associations, type and position of gene polymorphisms, expression changes of implicated genes, as well as the phenotypical consequences, and grouped the associations accordingly. GAAD is freely available at http://gaad.medgenius.info.
自身免疫疾病(autoimmune diseases)是指机体对自身抗原发生免疫反应而导致自身组织损害所引起的疾病。到目前为止,各种文献中已经介绍了超过一百种自身免疫病。 尽管我们对于自身免疫病的病因仍然不清楚,但是我们发现:与非自身免疫病相比,自身免疫病之间会有更多的相关基因(associated genes)。基于这点,我们开发了GAAD(A Gene and Autoimmiune Disease Association Database)数据库。 在GAAD数据库中,我们收集了来自公共数据库和 MEDLINE 文档的49个自身免疫病和4249个基因之间的44762个关联信息(associations)。我们通过人工检验的方式,保证了这些关联信息的质量和准确度。此外,我们使用了这些自身免疫病的共享基因重建并且重现了自身免疫病之间的关系,从而进一步确保我们数据的可靠性。在数据库中,我们还提供了自身免疫病之间显著共存基因对(co-occurring gene pairs)的列表,根据这个数据,用户可以使用嵌入式工具查询基因共现(gene co-occurrences),并利用感兴趣的基因构建特定的共现网络。为了使实验生物学和医学科学研究人员更加方便地使用数据库,我们通过文本挖掘(text mining)的方式提取了描述每个关联的其他相关信息,包括关联的公认诊断价值、基因多态性的类型和位置、关联基因的表达变化和表型变化,并根据关联进行分组。 GAAD(http://gaad.medgenius.info)数据库支持免费使用和下载数据。

Page 252-261


Database

CCGD-ESCC: A Comprehensive Database for Genetic Variants Associated with Esophageal Squamous Cell Carcinoma in Chinese Population

Linna Peng, Sijin Cheng, Yuan Lin, Qionghua Cui, Yingying Luo, Jiahui Chu, Mingming Shao, Wenyi Fan, Yamei Chen, Ai Lin, Yiyi Xi, Yanxia Sun,Lei Zhang, Chao Zhang, Wen Tan, Ge Gao, Chen Wu, Dongxin Lin

Esophageal squamous-cell carcinoma (ESCC) is one of the most lethal malignancies in the world and occurs at particularly higher frequency in China. While several genome-wide association studies (GWAS) of germline variants and whole-genome or whole-exome sequencing studies of somatic mutations in ESCC have been published, there is no comprehensive database publically available for this cancer. Here, we developed the Chinese Cancer Genomic Database-Esophageal Squamous Cell Carcinoma (CCGD-ESCC) database, which contains the associations of 69,593 single nucleotide polymorphisms (SNPs) with ESCC risk in 2022 cases and 2039 controls, survival time of 1006 ESCC patients (survival GWAS) and gene expression (expression quantitative trait loci, eQTL) in 94 ESCC patients. Moreover, this database also provides the associations between 8833 somatic mutations and survival time in 675 ESCC patients. Our user-friendly database is a resource useful for biologists and oncologists not only in identifying the associations of genetic variants or somatic mutations with the development and progression of ESCC but also in studying the underlying mechanisms for tumorigenesis of the cancer. CCGD-ESCC is freely accessible at http://db.cbi.pku.edu.cn/ccgd/ESCCdb.
食管癌作为中国人群的特色肿瘤,基因组数据相对于其它肿瘤仍显不足。目前,国际上仍没有一个全面系统展现、查询的食管癌关联研究数据库。因此,我们整合分析了多种食管癌关联数据,包括(1)2022个食管癌病例和2039个正常对照的食管癌易感性全基因组关联研究;(2)1006个食管癌患者生存的全基因组关联研究;(3)94个食管癌患者的肿瘤组织和配对癌旁正常组织的遗传变异与基因表达的关联研究;(4)675个食管癌患者的体细胞变异与生存的关联研究,建立了首个食管癌关联基因数据库CCGD-ESCC,最大程度上免费共享数据资源,助力食管癌遗传学和基因组学研究。

Page 262-268