Volume: 19, Issue: 1

Database

Chinese Glioma Genome Atlas (CGGA): A Comprehensive Resource with Functional Genomic Data from Chinese Glioma Patients

Zheng Zhao, Ke-Nan Zhang, Qiangwei Wang, Guanzhang Li, Fan Zeng, Ying Zhang, Fan Wu, Ruichao Chai, Zheng Wang, Chuanbao Zhang, Wei Zhang, Zhaoshi Bao, Tao Jiang

Gliomas are the most common and malignant intracranial tumors in adults. Recent studies have revealed the significance of functional genomics for glioma pathophysiological studies and treatments. However, access to comprehensive genomic data and analytical platforms is often limited. Here, we developed the Chinese Glioma Genome Atlas (CGGA), a user-friendly data portal for the storage and interactive exploration of cross-omics data, including nearly 2000 primary and recurrent glioma samples from Chinese cohort. Currently, open access is provided to whole-exome sequencing data (286 samples), mRNA sequencing (1018 samples) and microarray data (301 samples), DNA methylation microarray data (159 samples), and microRNA microarray data (198 samples), and to detailed clinical information (age, gender, chemoradiotherapy status, WHO grade, histological type, critical molecular pathological information, and survival data). In addition, we have developed several tools for users to analyze the mutation profiles, mRNA/microRNA expression, and DNA methylation profiles, and to perform survival and gene correlation analyses of specific glioma subtypes. This database removes the barriers for researchers, providing rapid and convenient access to high‐quality functional genomic data resources for biological studies and clinical applications. CGGA is available at http://www.cgga.org.cn.
脑胶质瘤是成人颅内最常见的恶性肿瘤。近年来的研究揭示了功能基因组学研究在脑胶质瘤病理生理研究及治疗中的重要意义。然而,现阶段获得全面的基因组数据和分析平台往往是有限的。在这里,我们开发了中国脑胶质瘤基因组图谱计划(CGGA),这是一个用户友好的数据门户,用于存储和交互式探索脑胶质瘤功能基因组信息资源。该数据库包含了来自中国人群的2000余例原发与复发脑胶质瘤组学和临床信息。截至目前,提供开发获取全外显子组测序数据(286例)、全转录组测序数据(1018例)、全转录组芯片数据(301例)、DNA甲基化芯片数据(159例)、小RNA芯片数据(198例),以及详细患者临床资料(例如年龄、性别、放化疗信息、WHO等级、组织病理学分级、分子病理信息以及生存情况)。此外,我们开发了多个分析工具,用于分析脑胶质瘤的基因组突变、转录组及小RNA表达及其DNA甲基化谱,并对特定胶质瘤亚型进行生存和基因相关性分析等。本数据库为脑胶质瘤的基础和临床研究提供了快速、便捷的高质量功能基因组学数据资源。CGGA网站的网址是http://www.cgga.org.cn。

Page 1-12


Review

Computational Screening of Phase-separating Proteins

Boyan Shen, Zhaoming Chen, Chunyu Yu, Taoyu Chen, Minglei Shi, Tingting Li

Phase separation is an important mechanism that mediates the compartmentalization of proteins in cells. Proteins that can undergo phase separation in cells share certain typical sequence features, like intrinsically disordered regions (IDRs) and multiple modular domains. Sequence-based analysis tools are commonly used in the screening of these proteins. However, current phase separation predictors are mostly designed for IDR-containing proteins, thus inevitably overlook the phase-separating proteins with relatively low IDR content. Features other than amino acid sequence could provide crucial information for identifying possible phase-separating proteins: protein–protein interaction (PPI) networks show multivalent interactions that underlie phase separation process; post-translational modifications (PTMs) are crucial in the regulation of phase separation behavior; spherical structures revealed in immunofluorescence (IF) images indicate condensed droplets formed by phase-separating proteins, distinguishing these proteins from non-phase-separating proteins. Here, we summarize the sequence-based tools for predicting phase-separating proteins and highlight the importance of incorporating PPIs, PTMs, and IF images into phase separation prediction in future studies.
生物相分离是一种重要的分子机制,可以调控蛋白质在细胞中的区室化分布。能在细胞中发生相分离的蛋白质通常具有特定的序列特征,如含有内部无序区(intrinsically disordered regions,IDRs)或多模块化结构域,因此,序列分析工具可被用于筛选这些蛋白质。然而,现有的相分离预测工具通常是基于IDR蛋白质的序列特征设计的,因此不可避免地会存在偏倚,忽略IDR含量相对较低的相分离蛋白。与此同时,除序列的氨基酸排列外,蛋白质的其他特征也可以为鉴定潜在的相分离蛋白提供关键信息:蛋白-蛋白相互作用(protein–protein interaction,PPI)网络可以提供蛋白间多价相互作用的信息,而多价相互作用与相分离关系密切;翻译后修饰(post-translational modifications,PTMs)也对于调节相分离过程至关重要。此外,蛋白发生相分离所形成的液滴,会在免疫荧光(immunofluorescence,IF)图像上显示为球形荧光结构,我们也可以基于此特征,将这些蛋白与非相分离蛋白区分开来。在本篇综述中,我们总结了用于预测相分离蛋白的序列分析工具,并着重探讨了在未来的研究中将PPI,PTM和IF图像特征纳入相分离预测的意义。

Page 13-24


Review

Glycoproteogenomics: Setting the Course for Next-generation Cancer Neoantigen Discovery for Cancer Vaccines

José Alexandre Ferreira, Marta Relvas-Santos, Andreia Peixoto, André M.N. Silva, Lúcio Lara Santos

Molecular-assisted precision oncology gained tremendous ground with high-throughput next-generation sequencing (NGS), supported by robust bioinformatics. The quest for genomics-based cancer medicine set the foundations for improved patient stratification, while unveiling a wide array of neoantigens for immunotherapy. Upfront pre-clinical and clinical studies have successfully used tumor-specific peptides in vaccines with minimal off-target effects. However, the low mutational burden presented by many lesions challenges the generalization of these solutions, requiring the diversification of neoantigen sources. Oncoproteogenomics utilizing customized databases for protein annotation by mass spectrometry (MS) is a powerful tool toward this end. Expanding the concept toward exploring proteoforms originated from post-translational modifications (PTMs) will be decisive to improve molecular subtyping and provide potentially targetable functional nodes with increased cancer specificity. Walking through the path of systems biology, we highlight that alterations in protein glycosylation at the cell surface not only have functional impact on cancer progression and dissemination but also originate unique molecular fingerprints for targeted therapeutics. Moreover, we discuss the outstanding challenges required to accommodate glycoproteomics in oncoproteogenomics platforms. We envisage that such rationale may flag a rather neglected research field, generating novel paradigms for precision oncology and immunotherapy.
A oncologia de precisão guiada por informação molecular tem vindo a ganhar relevância, maioritariamente suportada por tecnologias de sequenciação de nova geração (NGS) e métodos robustos de bioinformática. O acesso a informação genética tem permitido melhorar a estratificação de doentes, enquanto desvenda novos neoantigénios com potencial imunoterapêutico. Além disso, ensaios pré-clínicos e clínicos levaram já ao desenvolvimento de vacinas anti-tumorais baseadas em péptidos específicos do cancro (neoantigénios) identificados por sequenciação genética. Contudo, a baixa carga mutacional de alguns tumores tem dificultado a generalização destas abordagens terapêuticas, enfatizando a necessidade de diversificação de fontes de neoantigénios. A oncoproteogenómica com recurso a bases de dados customizadas para a identificação de proteínas por espectrometria de massa é uma ferramenta valiosa para alcançar este objetivo. A expansão deste conceito para a identificação de proteoformas derivadas de modificações pós-traducionais será decisiva para o melhoramento da subtipagem molecular dos tumores, bem como na identificação de novos alvos com especificidade tumoral acrescida. Percorrendo o caminho da biologia de sistemas, salientamos que as alterações na glicosilação de proteínas de superfície celular têm não só um impacto funcional na progressão e disseminação tumoral, como também originam assinaturas moleculares únicas com potencial para terapia guiada. Ainda, discutimos os desafios inerentes à implementação de plataformas de glicoproteomica e oncoproteogenómica. Por fim, antecipamos que o racional apresentado possa evidenciar uma área de investigação pouco explorada enquanto cria um novo paradigma de oncologia de precisão e imunoterapia.

Page 25-43


Preview

DNA Methylation Reshapes Sex Development in Zebrafish

Yan Li, Feng Liu

Page 44-47


Research Article

The Role of DNA Methylation Reprogramming During Sex Determination and Transition in Zebrafish

Xinxin Wang, Xin Ma, Gaobo Wei, Weirui Ma, Zhen Zhang, Xuepeng Chen, Lei Gao, Zhenbo Liu, Yue Yuan, Lizhi Yi, Jun Wang, Toshinobu Tokumoto, Junjiu Huang, Dahua Chen, Jian Zhang, Jiang Liu

DNA methylation is a prevalent epigenetic modification in vertebrates, and it has been shown to be involved the regulation of gene expression and embryo development. However, it remains unclear how DNA methylation regulates sexual development, especially in species without sex chromosomes. To determine this, we utilized zebrafish to investigate DNA methylation reprogramming during juvenile germ cell development and adult female-to-male sex transition. We reveal that primordial germ cells (PGCs) undergo significant DNA methylation reprogramming during germ cell development, and the methylome of PGCs is reset to an oocyte/ovary-like pattern at 9 days post fertilization (9 dpf). When DNA methyltransferase (DNMT) activity in juveniles was blocked after 9 dpf, the zebrafish developed into females. We also show that Tet3 is involved in PGC development. Notably, we find that DNA methylome reprogramming during adult zebrafish sex transition is similar to the reprogramming during the sex differentiation from 9 dpf PGCs to sperm. Furthermore, inhibiting DNMT activity can prevent the female-to-male sex transition, suggesting that methylation reprogramming is required for zebrafish sex transition. In summary, DNA methylation plays important roles in zebrafish germ cell development and sexual plasticity.
研究问题:斑马鱼生殖细胞的DNA甲基化是如何重编程的?生殖细胞的DNA甲基化重编程是否与斑马鱼的性别决定有关? 成年雌性斑马鱼可以改变性别成为雄性,其背后的分子机制是什么? 研究方法:在本研究中,利用流式分选技术和显微操作分离斑马鱼生殖细胞,通过微量细胞转录组和DNA甲基化测序,绘制了全面的斑马鱼幼年生殖细胞发育和性别反转过程中的DNA甲基化图谱。利用基因编辑技术,证明了Tet3蛋白参与斑马鱼的生殖细胞发育。并通过抑制DNA甲基化转移酶 (DNMTs)活性的方法,揭示了DNA甲基化在斑马鱼性别决定和性别反转过程中的作用。 主要结果1:斑马鱼生殖细胞在发育过程中经历了显著的重编程,但与哺乳类动物生殖细胞中情况完全不同。 主要结果2:DNA甲基化重编程与斑马鱼生殖发育和性别决定有关。 主要结果3:Tet3蛋白参与斑马鱼生殖细胞的发育。 主要结果4:在成年斑马鱼性别转换过程中,DNA甲基化可以从雌性模式重置为雄性模式。通过抑制DNMTs活性,证明DNA甲基化对成年斑马鱼的性别可塑性至关重要。

Page 48-63


Research Article

Genome-wide 5-Hydroxymethylcytosine Profiling Analysis Identifies MAP7D1 as A Novel Regulator of Lymph Node Metastasis in Breast Cancer

Shuang-Ling Wu, Xiaoyi Zhang, Mengqi Chang, Changcai Huang, Jun Qian, Qing Li, Fang Yuan, Lihong Sun, Xinmiao Yu, Xinmiao Cui, Jiayi Jiang, Mengyao Cui, Ye Liu, Huan-Wen Wu, Zhi-Yong Liang, Xiaoyue Wang, Yamei Niu, Wei-Min Tong, Feng Jin

Although DNA 5-hydroxymethylcytosine (5hmC) is recognized as an important epigenetic mark in cancer, its precise role in lymph node metastasis remains elusive. In this study, we investigated how 5hmC associates with lymph node metastasis in breast cancer. Accompanying with high expression of TET1 and TET2 proteins, large numbers of genes in the metastasis-positive primary tumors exhibit higher 5hmC levels than those in the metastasis-negative primary tumors. In contrast, the TET protein expression and DNA 5hmC decrease significantly within the metastatic lesions in the lymph nodes compared to those in their matched primary tumors. Through genome-wide analysis of 8 sets of primary tumors, we identified 100 high-confidence metastasis-associated 5hmC signatures, and it is found that increased levels of DNA 5hmC and gene expression of MAP7D1 associate with high risk of lymph node metastasis. Furthermore, we demonstrate that MAP7D1, regulated by TET1, promotes tumor growth and metastasis. In conclusion, the dynamic 5hmC profiles during lymph node metastasis suggest a link between DNA 5hmC and lymph node metastasis. Meanwhile, the role of MAP7D1 in breast cancer progression suggests that the metastasis-associated 5hmC signatures are potential biomarkers to predict the risk for lymph node metastasis, which may serve as diagnostic and therapeutic targets for metastatic breast cancer.
淋巴结转移是影响乳腺癌患者生存预后的重要因素,而DNA 5-羟甲基胞嘧啶(5hmC)作为肿瘤中重要的表观修饰,其在乳腺癌淋巴结转移过程中的改变与调控机制尚未可知。为此,该研究分别从整体水平与基因水平探索了5hmC与淋巴结转移之间的相关性。作者发现相比于无淋巴结转移的乳腺癌(PT),DNA羟甲基化酶TET1、TET2在有淋巴结转移的乳腺癌(MT)组织中显著高表达,且大量基因的5hmC水平显著升高。另一方面,与乳腺癌原发灶(MT)相比,与之配对的淋巴结转移灶(MLN)中肿瘤细胞的DNA 5hmC与TET蛋白表达水平均呈现显著性降低。上述结果提示DNA 5hmC的动态改变与淋巴结转移密切相关。基于此,作者选取了8对PT与MT肿瘤样本,通过手动显微切割去除间质组织、利用其中的肿瘤组织进行了全基因组的DNA 5hmC测序。通过数据分析,该研究筛选到100个与乳腺癌淋巴结转移高度相关的差异羟甲基化基因(DhMGs),这些基因有望应用于临床早期乳腺癌淋巴结转移能力的评估。在此基础上,作者收集了更多肿瘤样本中对这些基因的5hmC修饰与转录表达水平进行检测,最终鉴定出 5hmC修饰与RNA表达水平与淋巴结转移密切相关的数个基因。其中MT样本中MAP7D1 外显子区域5hmC修饰与RNA表达水平均显著高于PT样本,由此推测 MAP7D1基因可能与乳腺癌淋巴结转移的生物学过程相关。为探其究竟,作者分别进行了体内、体外实验,最终发现MAP7D1表达受TET1调控,且可以促进乳腺癌细胞的增殖、侵袭与转移。综上,该研究利用临床标本的全基因组5hmC测序分析揭示了乳腺癌淋巴结转移过程中5hmC的动态变化规律,鉴定出一批有望用于评估早期乳腺癌淋巴结转移能力的5hmC修饰基因,同时发现MAP7D1基因具有促进乳腺癌淋巴结转移的生物学功能以及作为乳腺癌患者潜在治疗靶点的应用前景。

Page 64-79


Research Article

Global Profiling of the Lysine Crotonylome in Different Pluripotent States

Yuan Lv, Chen Bu, Jin Meng, Carl Ward, Giacomo Volpe, Jieyi Hu, Mengling Jiang, Lin Guo, Jiekai Chen, Miguel A. Esteban, Xichen Bao, Zhongyi Cheng

Pluripotent stem cells (PSCs) can be expanded in vitro in different culture conditions, resulting in a spectrum of cell states with distinct properties. Understanding how PSCs transition from one state to another, ultimately leading to lineage-specific differentiation, is important for developmental biology and regenerative medicine. Although there is significant information regarding gene expression changes controlling these transitions, less is known about post-translational modifications of proteins. Protein crotonylation is a newly discovered post-translational modification where lysine residues are modified with a crotonyl group. Here, we employed affinity purification of crotonylated peptides and liquid chromatography–tandem mass spectrometry (LC–MS/MS) to systematically profile protein crotonylation in mouse PSCs in different states including ground, metastable, and primed states, as well as metastable PSCs undergoing early pluripotency exit. We successfully identified 3628 high-confidence crotonylated sites in 1426 proteins. These crotonylated proteins are enriched for factors involved in functions/processes related to pluripotency such as RNA biogenesis, central carbon metabolism, and proteasome function. Moreover, we found that increasing the cellular levels of crotonyl-coenzyme A (crotonyl-CoA) through crotonic acid treatment promotes proteasome activity in metastable PSCs and delays their differentiation, consistent with previous observations showing that enhanced proteasome activity helps to sustain pluripotency. Our atlas of protein crotonylation will be valuable for further studies of pluripotency regulation and may also provide insights into the role of metabolism in other cell fate transitions.
多能干细胞(pluripotent stem cells,PSCs)可在体外不同培养条件下维持不同的多能性状态。理解PSCs如何在不同多能性状态之间转换并最终向特定谱系分化,是发育生物学和再生医学领域的重要问题。尽管我们对多能性转变过程中的基因表达改变和调控已有相当多的了解,但有关蛋白质翻译后修饰层面的认知还很缺乏。作为一种新发现的蛋白质修饰,蛋白质赖氨酸可以被巴豆酰化修饰。我们利用特异性识别巴豆酰化肽段的抗体,对基态、亚稳态和始发态PSCs以及早期分化细胞中的巴豆酰化修饰蛋白质进行亲和纯化及液相色谱-质谱联用蛋白质组学定量分析。我们在1426个蛋白上鉴定得到3628个高可信度的巴豆酰化位点。这些巴豆酰化蛋白富集与多能性有关的RNA生物合成,中心碳代谢和蛋白酶体等功能。此外,与先前报道的增强蛋白酶体活性可以促进PSCs多能性相一致,我们发现通过添加巴豆酸提高细胞内巴豆酰辅酶A的水平,可以促进亚稳态PSCs的蛋白酶体活性并延缓分化。我们提供的PSCs蛋白质巴豆酰化修饰图谱有助于未来深入研究巴豆酰化修饰对多能性的调控作用,也为研究其他细胞命运转变中的代谢作用提供深入的见解。

Page 80-93


Research Article

Quantitative Secretome Analysis Reveals Clinical Values of Carbonic Anhydrase II in Hepatocellular Carcinoma

Xiaohua Xing, Hui Yuan, Hongzhi Liu, Xionghong Tan, Bixing Zhao, Yingchao Wang, Jiahe Ouyang, Minjie Lin, Xiaolong Liu, Aimin Huang

Early detection and intervention are key strategies to reduce mortality, increase long-term survival, and improve the therapeutic effects of hepatocellular carcinoma (HCC) patients. Herein, the isobaric tag for relative and absolute quantitation (iTRAQ)-based quantitative proteomic strategy was used to study the secretomes in conditioned media from HCC cancerous tissues, surrounding noncancerous tissues, and distal noncancerous tissues to identify diagnostic and prognostic biomarkers for HCC. In total, 22 and 49 dysregulated secretory proteins were identified in the cancerous and surrounding noncancerous tissues, respectively, compared with the distal noncancerous tissues. Among these proteins, carbonic anhydrase II (CA2) was identified to be significantly upregulated in the secretome of cancerous tissues; correspondingly, the serum concentrations of CA2 were remarkably increased in HCC patients compared with that in normal populations. Interestingly, a significant increase of serum CA2 in recurrent HCC patients after radical resection was also confirmed compared with HCC patients without recurrence, and the serum level of CA2 could act as an independent prognostic factor for time to recurrence and overall survival. Regarding the mechanism, the secreted CA2 enhances the migration and invasion of HCC cells by activating the epithelial mesenchymal transition pathway. Taken together, this study identified a novel biomarker for HCC diagnosis and prognosis, and provided a valuable resource of HCC secretome for investigating serological biomarkers.
早期诊断和干预是降低肝细胞癌(HCC)死亡率、增加其远期生存率、提高治疗效果的关键。因此,我们对HCC手术切除的来源于同一个病人的癌、癌旁和远癌组织分别进行体外培养,应用iTRAQ定量蛋白质组技术开展分泌蛋白质组的系统分析,筛选HCC诊断和预后的生物标志物。研究结果发现,与远癌组织相比,在癌组织和癌旁组织中分别鉴定了22和49个失调的分泌蛋白。其中,碳酸酐酶II (CA2)在癌组织中的分泌显著高于癌旁组织和远癌组织,并且HCC病人的血清CA2浓度也明显高于正常人群。有价值的是,复发HCC病人在根治性切除术后血清CA2浓度水平显著高于未复发的HCC病人;且血清CA2水平可作为独立的预后危险因素,显著影响病人的无复发生存(TTR)和总生存(OS)。在机制研究发现,分泌性CA2可以促进肝癌细胞的侵袭和转移,并激活上皮间充质转化(EMT)信号通路。综上所述,本研究鉴定了一种新的可用于HCC诊断和预后判断的新型生物标志物,并为研究肝癌分泌组蛋白组和血清学生物标志物提供了宝贵的数据资源。

Page 4-107


Research Article

An Integrated Systems Biology Approach Identifies the Proteasome as A Critical Host Machinery for ZIKV and DENV Replication

Guang Song, Emily M. Lee, Jianbo Pan, Miao Xu, Hee-Sool Rho, Yichen Cheng, Nadia Whitt, Shu Yang, Jennifer Kouznetsova, Carleen Klumpp-Thomas, Samuel G. Michael, Cedric Moore, Ki-Jun Yoon, Kimberly M. Christian, Anton Simeonov, Wenwei Huang, Menghang Xia, Ruili Huang, Madhu Lal-Nag, Hengli Tang, Wei Zheng, Jiang Qian, Hongjun Song, Guo-li Ming, Heng Zhu

The Zika virus (ZIKV) and dengue virus (DENV) flaviviruses exhibit similar replicative processes but have distinct clinical outcomes. A systematic understanding of virus–host protein–protein interaction networks can reveal cellular pathways critical to viral replication and disease pathogenesis. Here we employed three independent systems biology approaches toward this goal. First, protein array analysis of direct interactions between individual ZIKV/DENV viral proteins and 20,240 human proteins revealed multiple conserved cellular pathways and protein complexes, including proteasome complexes. Second, an RNAi screen of 10,415 druggable genes identified the host proteins required for ZIKV infection and uncovered that proteasome proteins were crucial in this process. Third, high-throughput screening of 6016 bioactive compounds for ZIKV inhibition yielded 134 effective compounds, including six proteasome inhibitors that suppress both ZIKV and DENV replication. Integrative analyses of these orthogonal datasets pinpoint proteasomes as critical host machinery for ZIKV/DENV replication. Our study provides multi-omics datasets for further studies of flavivirus–host interactions, disease pathogenesis, and new drug targets.
寨卡病毒(ZIKV)和登革热病毒(DENV)同属于黄病毒属(flaviviruses),具有相似的分子结构、复制过程及流行病特征,但感染后的患者却呈现出完全不同的临床症状。系统的构建和解析病毒蛋白组与宿主蛋白质组之间的相互作用网络,对于揭示病毒复制和相关疾病发病机制相关的关键细胞信号通路,以及筛选和鉴定能有效抑制病毒扩增的药物具有重要意义。在本研究中,我们采用了三种独立的系统生物学方法来实现这一目标。首先,利用荧光分子标记每一个重组表达的ZIKV / DENV病毒蛋白,通过与包含有20,240种蛋白的HuProtTM人蛋白质组芯片杂交,筛选和鉴定ZIKV / DENV病毒扩增相关的多种保守细胞信号通路和关键蛋白复合物,其中包括蛋白酶体复合物。其次,利用RNAi技术对10,415个可成药基因(druggable genes)的高通量筛选鉴定ZIKV感染扩增所需的关键宿主蛋白,发现蛋白酶体蛋白在此过程中至关重要。第三,高通量药物筛选技术对6,016种生物活性化合物在抑制ZIKV病毒活性方面进行筛选,鉴定到了134种有效抑制ZIKV和DENV复制的化合物,其中包括六种蛋白酶体抑制剂。综合分析上述多组学技术产生的正交数据集发现蛋白酶体是ZIKV / DENV复制的关键宿主细胞器。本研究为进一步探索黄病毒-宿主相互作用,疾病发病机制和新药靶标提供了丰富的多组学数据集(multi-omics datasets)。

Page 108-122


Research Article

Gigantic Genomes Provide Empirical Tests of Transposable Element Dynamics Models

Jie Wang, Michael W. Itgen, Huiju Wang, Yuzhou Gong, Jianping Jiang, Jiatang Li, Cheng Sun, Stanley K. Sessions, Rachel Lockridge Mueller

Transposable elements (TEs) are a major determinant of eukaryotic genome size. The collective properties of a genomic TE community reveal the history of TE/host evolutionary dynamics and impact present-day host structure and function, from genome to organism levels. In rare cases, TE community/genome size has greatly expanded in animals, associated with increased cell size and changes to anatomy and physiology. Here, we characterize the TE landscape of the genome and transcriptome in an amphibian with a giant genome — the caecilian Ichthyophis bannanicus, which we show has a genome size of 12.2 Gb. Amphibians are an important model system because the clade includes independent cases of genomic gigantism. The I. bannanicus genome differs compositionally from other giant amphibian genomes, but shares a low rate of ectopic recombination-mediated deletion. We examine TE activity using expression and divergence plots; TEs account for 15% of somatic transcription, and most superfamilies appear active. We quantify TE diversity in the caecilian, as well as other vertebrates with a range of genome sizes, using diversity indices commonly applied in community ecology. We synthesize previous models that integrate TE abundance, diversity, and activity, and test whether the caecilian meets model predictions for genomes with high TE abundance. We propose thorough, consistent characterization of TEs to strengthen future comparative analyses. Such analyses will ultimately be required to reveal whether the divergent TE assemblages found across convergent gigantic genomes reflect fundamental shared features of TE/host genome evolutionary dynamics.
转座子是决定真核生物基因组大小的主要因素。对基因组中转座子的特征进行研究,不仅能反映出转座子与宿主的动态进化历史,还能在基因组乃至整个生物水平揭示转座子对宿主结构与功能的影响。只有少数的动物中转座子的含量及基因组的大小发生显著的扩增,而这些扩增与它们细胞的尺寸扩张及形态与生理的改变紧密相关。两栖动物版纳鱼螈(Ichthyophis bannanicus)具有超大的基因组(约12.2G)。我们的研究发现,其基因组中的转座子组成与其它两栖类甚为不同;但被易位重组介导的转座子的删除速率与其它两栖类相似,都非常低,并且它的大多数转座子超家族依然活跃,表达量达体细胞总表达量的15%。我们进而整合以往研究中关于转座子数量、多样性及活力的模型,并统一计算版纳鱼螈与其它模式动物中转座子的多样性指数,以验证是否符合这些模型的预期。我们提出今后应对转座子的特征进行一致性地分析,这对不同基因组中千差万别转座子的比较研究甚为必要,并将最终反映转座子与宿主博弈的共性特征。

Page 123-139


Method

MACMIC Reveals A Dual Role of CTCF in Epigenetic Regulation of Cell Identity Genes

Guangyu Wang, Bo Xia, Man Zhou, Jie Lv, Dongyu Zhao, Yanqiang Li, Yiwen Bu, Xin Wang, John P. Cooke, Qi Cao, Min Gyu Lee, Lili Zhang, Kaifu Chen

Numerous studies of relationship between epigenomic features have focused on their strong correlation across the genome, likely because such relationship can be easily identified by many established methods for correlation analysis. However, two features with little correlation may still colocalize at many genomic sites to implement important functions. There is no bioinformatic tool for researchers to specifically identify such feature pairs. Here, we develop a method to identify feature pairs in which two features have maximal colocalization minimal correlation (MACMIC) across the genome. By MACMIC analysis of 3306 feature pairs in 16 human cell types, we reveal a dual role of CCCTC-binding factor (CTCF) in epigenetic regulation of cell identity genes. Although super-enhancers are associated with activation of target genes, only a subset of super-enhancers colocalized with CTCF regulate cell identity genes. At super-enhancers colocalized with CTCF, CTCF is required for the active marker H3K27ac in cell types requiring the activation, and also required for the repressive marker H3K27me3 in other cell types requiring repression. Our work demonstrates the biological utility of the MACMIC analysis and reveals a key role for CTCF in epigenetic regulation of cell identity. The code for MACMIC is available at https://github.com/bxia888/MACMIC.
由于关联性分析可以很好的发现具有相关性的特征组合,因此表观基因组学特征的关联性研究主要集中在对于强相关的特征组合的研究。然而,除了强相关特征组合外,两个低相关性的表观基因组学特征同样可能共存于基因组的很多位置,这类特征组合同样可以有重要的生物学功能。目前没有生物信息学方法可以有效的发现这类特征组合。我们开发了MACMIC算法,MACMIC可以用于鉴定低相关性、高共定位的表观基因组学特征组合。我们将MACMIC用于15个细胞类别中的3385个特征对,发现了CCCTC-binding factor(CTCF)在表观基因组学中的双重调控功能。已有的研究发现超级增强子主要调控细胞身份基因(cell identity gene),本研究发现只有与CTCF共定位的超级增强子是主要调控细胞身份基因(Cell identity gene)。在对超级增强子与CTCF共定位的区域的进一步研究中,我们发现在需要该区域被激活的细胞中CTCF在DNA上的结合是对应的染色体位置上获得H3K27ac修饰的必要条件,在需要该区域被抑制的细胞中CTCF的结合是获得H3K27me3修饰的必要条件。我们的工作显示出MACMIC在生物学中广泛的应用前景,同时发现CTCF可以从表冠基因组层面调控细胞身份基因。MACMIC的脚本可以从GitHub网站中下载,https://github.com/bxia888/MACMIC。

Page 140-153


Method

PM2RA: A Framework for Detecting and Quantifying Relationship Alterations in Microbial Community

Zhi Liu, Kai Mi, Zhenjiang Zech Xu, Qiankun Zhang, Xingyin Liu

The dysbiosis of gut microbiota is associated with the pathogenesis of human diseases. However, observing shifts in the microbe abundance cannot fully reveal underlying perturbations. Examining the relationship alterations (RAs) in the microbiome between health and disease statuses provides additional hints about the pathogenesis of human diseases, but no methods were designed to detect and quantify the RAs between different conditions directly. Here, we present profile monitoring for microbial relationship alteration (PM2RA), an analysis framework to identify and quantify the microbial RAs. The performance of PM2RA was evaluated with synthetic data, and it showed higher specificity and sensitivity than the co-occurrence-based methods. Analyses of real microbial datasets showed that PM2RA was robust for quantifying microbial RAs across different datasets in several diseases. By applying PM2RA, we identified several novel or previously reported microbes implicated in multiple diseases. PM2RA is now implemented as a web-based application available at http://www.pm2ra-xingyinliulab.cn/.

Page 154-167