Article Online - Genomics, Proteomics & Bioinformatics

Volume: 17, Issue: 3

Special Report

That’s A Scientist Should Do—A Dialog with Tomas Lindahl

Shanshan Xie, Yuxia Jiao

Dr Tomas Lindahl is a world-renowned scientist specialized in cancer research, in particular, DNA repair [1]. In 2015, he was awarded the Nobel Prize in Chemistry jointly with Dr Paul L. Modrich and Dr Aziz Sancar “for mechanistic studies of DNA repair” (https://www.nobelprize.org/prizes/chemistry/2015/press-release/) [2]. During his recent visits in China, besides delivering lectures, Tomas also attended a couple of group meetings with students and PIs. He actively interacted with the audience, not only on topics relating to DNA repair, but also on how to do science and beyond. We presented this special report based on recordings.

Page 223-225

Download 2475

Preview

Mapping Genome Variants Sheds Light on Genetic and Phenotypic Differentiation in Chinese

Li Guo, Kai Ye

View abstract

遗传变异和人类健康和精准医疗息息相关，因此绘制全人类基因组遗传变异图谱成为全球科学家共同奋斗的目标。近年来，国际千人基因组等多个研究小组纷纷致力于发现世界不同种族人群中基因组变异。我国是个多民族国家，拥有大约20%的世界人口和丰富的遗传多样性。但由于缺乏中国南北方人群特异的参考基因组以及深度测序数据，全面准确地绘制中国人群遗传变异图谱尚未实现。中国科学院于2016年启动了精准医学计划（CASPMI）。近日，北京基因组所的曾长青和肖景发研究员团队联合发表了CASPMI计划的最新成果。作者通过使用二代和三代测序技术，首次测序并组装了一个北方汉人的高质量基因组序列（NH1.0）。他们发现中国人测序数据和NH1.0的比对有着比国际上惯用的人类参考基因组（GRCh38）更低的错配率，表明NH1.0更能代表中国人的遗传背景。在NH1.0和两个现有南方汉人基因组基础上，作者进一步在597个中国人中鉴定了2880万个遗传变异，其中1175万个属于新发现的变异。他们还发现了约10.6万个结构变异（Indel和CNV），其中大多数为低频变异，并且这些变异富集在和体重指数以及肥胖相关的代谢基因通路上。作者还发现与世界其他人群相比，中国人群存在6万多个特有的遗传变异。这些变异和腰围、体重指数、脂肪代谢等表型高度相关，揭示了中国人群特有的代谢特征背后的潜在遗传因素。此外，作者还发现南北方人群在基因组多条染色体上的遗传多样性，以及和癌症相关的体细胞基因突变特征方面呈现出显著的差异。总之，该研究成果是中国人群的遗传多样性研究领域的重大进展，是全人类种群遗传变异研究成果的关键补充，也代表中国向实现精准医疗迈出重要的一步。

Page 226-228

Download 2361

Original Research

Whole Genome Analyses of Chinese Population and De Novo Assembly of A Northern Han Genome

Zhenglin Du, Liang Ma, Hongzhu Qu, Wei Chen, Bing Zhang, Xi Lu, Weibo Zhai, Xin Sheng, Yongqiao Sun, Wenjie Li, Meng Lei, Qiuhui Qi, Na Yuan, Shuo Shi, Jingyao Zeng, Jinyue Wang, Yadong Yang, Qi Liu, Yaqiang Hong, Lili Dong, Zhewen Zhang, Dong Zou, Yanqing Wang, Shuhui Song, Fan Liu, Xiangdong Fang, Hua Chen, Xin Liu, Jingfa Xiao, Changqing Zeng

View abstract

To unravel the genetic mechanisms of disease and physiological traits, it requires comprehensive sequencing analysis of large sample size in Chinese populations. Here, we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative (CASPMI) project launched by the Chinese Academy of Sciences, including the de novo assembly of a northern Han reference genome (NH1.0) and whole genome analyses of 597 healthy people coming from most areas in China. Given the two existing reference genomes for Han Chinese (YH and HX1) were both from the south, we constructed NH1.0, a new reference genome from a northern individual, by combining the sequencing strategies of PacBio, 10× Genomics, and Bionano mapping. Using this integrated approach, we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1. In order to generate a genomic variation map of Chinese populations, we performed the whole-genome sequencing of 597 participants and identified 24.85 million (M) single nucleotide variants (SNVs), 3.85 M small indels, and 106,382 structural variations. In the association analysis with collected phenotypes, we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males. Moreover, significant genetic diversity in MTHFR, TCN2, FADS1, and FADS2, which associate with circulating folate, vitamin B12, or lipid metabolism, was observed between northerners and southerners. Especially, for the homocysteine-increasing allele of rs1801133 (MTHFR 677T), we hypothesize that there exists a “comfort” zone for a high frequency of 677T between latitudes of 35–45 degree North. Taken together, our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.

从1990年正式启动并于2003年宣布完成的人类基因组计划，与曼哈顿原子弹计划和阿波罗登月计划并称为三大科学计划，是人类科学史上的一个具有划时代意义的伟大工程。其宗旨在于测定人类染色体的30亿个碱基对的DNA序列，绘制人类基因组图谱，破译人类遗传信息。人类基因组计划获得了第一套染色体水平的参考基因组，在基因组学和生物信息学等领域中发挥着举足轻重的作用，经过多年的修补完善，至今已更新到GRCh38版本。但由于该参考基因组基于高加索人的遗传背景，对于分析其他人群的遗传数据可能产生一定程度的偏差。为了更好地理解不同种族人群的疾病遗传基础及促进个体化精准医疗的发展，世界各国开始建立针对本国人群的人类参考基因组，例如2016年发布的韩国人参考基因组AK1。近年来中国人群的两个参考基因组YH2.0和HX1也相继发布，但两者均基于中国南方汉族个体，并且相比国际人类参照基因组在序列完整性上还存在很大差距。基于DNA标记和单核苷酸多态性（SNP）阵列分析的研究表明，中国南北方人群早在史前农业文明时期就开始经历显著的遗传分化。考虑到中国南北方人群的遗传多样性以及进一步提高中国人参照基因组的完整性，从头建立一套中国北方人群的参考基因组将很大程度造福于今后的大规模人群队列研究。另一方面，为了揭示疾病及生理表型的遗传机制需要建立大规模人群队列，开展基于高通量测序的全基因组遗传变异分析。随着二代、三代测序技术的快速发展，测序通量飞速提高，测序成本的大幅降低，为开展大规模人群队列研究提供了一个前所未有的契机。从最初的国际人类基因组单体型图计划（HapMap）和千人基因组计划（1KGP）开始，世界各国先后开展了大规模人群队列的基因组研究，例如英国的万人（UK10K）和十万人基因组计划、美国万人基因组计划、日本千人基因组计划（1KJPN）等。作为拥有世界1/5人口的大国，我国目前尚缺乏基于大规模人群、高深度全基因组测序的队列研究。面向我国发展精准医学研究的重大需求，2016年中国科学院北京基因组研究所牵头启动了中科院精准医学研究计划（CASPMI），目标是建立一个高质量、大规模的中国职业人群前瞻性队列，开展全基因组遗传变异分析、疾病及重要表型的关联分析，构建中国人群遗传变异图谱，形成中国人基因组变异数据库体系和精准医学知识库；以此为基础构建标准化的电子健康档案和报告系统，实现对于中国人群重要慢病的风险预测预警，最终建立一套的精准医学研究范式体系。本文主要报道了CASPMI项目一期的主要研究成果，主要包括中国北方汉族参考基因组组装和中国人群遗传变异图谱的绘制，并分析了中国人群特异的遗传变异位点、南北方人群的遗传差异以及全基因组大片段结构变异。

Page 229-247

Download 2993

Original Research

Novel Autoantibodies Related to Cell Death and DNA Repair Pathways in Systemic Lupus Erythematosus

Hui Luo, Ling Wang, Ding Bao, Li Wang, Hongjun Zhao, Yun Lian, Mei Yan, Chandra Mohan, Quan-Zhen Li

View abstract

Systemic lupus erythematosus (SLE) is a complex autoimmune syndrome characterized by various co-existing autoantibodies (autoAbs) in patients’ blood. However, the full spectrum of autoAbs in SLE has not been comprehensively elucidated. In this study, a commercial platform bearing 9400 antigens (ProtoArray) was used to identify autoAbs that were significantly elevated in the sera of SLE patients. By comparing the autoAb profiles of SLE patients with those of healthy controls, we identified 437 IgG and 1213 IgM autoAbs that the expression levels were significantly increased in SLE (P < 0.05). Use of the ProtoArray platform uncovered over 300 novel autoAbs targeting a broad range of nuclear, cytoplasmic, and membrane antigens. Molecular interaction network analysis revealed that the antigens targeted by the autoAbs were most significantly enriched in cell death, cell cycle, and DNA repair pathways. A group of autoAbs associated with cell apoptosis and DNA repair function, including those targeting APEX1, AURKA, POLB, AGO1, HMGB1, IFIT5, MAPKAPK3, PADI4, RGS3, SRP19, UBE2S, and VRK1, were further validated by ELISA and Western blot in a larger cohort. In addition, the levels of autoAbs against APEX1, HMGB1, VRK1, AURKA, PADI4, and SRP19 were positively correlated with the level of anti-dsDNA in SLE patients. Comprehensive autoAb screening has identified novel autoAbs, which may shed light on potential pathogenic pathways leading to lupus.

系统性红斑狼疮（Systemic lupus erythematosus, SLE）是一个复杂的自身免疫性疾病，可产生多种血清自身抗体是SLE主要特征之一。然而，SLE的自身抗体谱尚未被彻底解读。本研究应用一种包含9400种自身抗原的芯片（Protoarray），比较SLE患者与正常人之间自身抗体谱的差别。结果发现：有437种IgG抗体以及1213种IgM抗体在SLE患者血清中的表达显著高于正常人（P<0.05）；并且发现抗细胞核、胞浆及细胞膜的各类新抗体共300个。应用分子间作用网络信息分析发现，这300种新抗体主要作用集中在细胞死亡、细胞周期及DNA修复中。进一步在更大样本量的对列研究中，应用ELISA和Western blot验证了其中一组与细胞凋亡及DNA修复相关的自身抗体（APEX1, AURKA, POLB, AGO1, HMGB1, IFIT5, MAPKAPK3, PADI4, RGS3, SRP19, UBE2S及VRK1），并发现APEX1, HMGB1, VRK1, AURKA, PADI4及SRP19抗体的表达水平与SLE患者抗dsDNA抗体水平呈正相关。应用高通量的芯片方法可有助于发现新的自身抗体，为揭示SLE的发病机制提供依据。

Page 248-259

Download 2600

Original Research

Proteomics Analysis of Lipid Droplets from the Oleaginous Alga Chromochloris zofingiensis Reveals Novel Proteins for Lipid Metabolism

Xiaofei Wang, Hehong Wei, Xuemei Mao, Jin Liu

View abstract

Chromochloris zofingiensis represents an industrially relevant and unique green alga, given its capability of synthesizing triacylglycerol (TAG) and astaxanthin simultaneously for storage in lipid droplets (LDs). To further decipher lipid metabolism, the nitrogen deprivation (ND)-induced LDs from C. zofingiensis were isolated, purified, and subjected to proteomic analysis. Intriguingly, many C. zofingiensis LD proteins had no orthologs present in LD proteome of the model alga Chlamydomonas reinhardtii. Seven novel LD proteins (i.e., two functionally unknown proteins, two caleosins, two lipases, and one l-gulonolactone oxidase) and the major LD protein (MLDP), which were all transcriptionally up-regulated by ND, were selected for further investigation. Heterologous expression in yeast demonstrated that all tested LD proteins were localized to LDs and all except the two functionally unknown proteins enabled yeast to produce more TAG. MLDP could restore the phenotype of mldp mutant strain and enhance TAG synthesis in wild-type strain of C. reinhardtii. Although MLDP and caleosins had a comparable abundance in LDs, they responded distinctly to ND at the transcriptional level. The two lipases, instead of functioning as TAG lipases, likely recycled polar lipids to support TAG synthesis. For the first time, we reported that l-gulonolactone oxidase was abundant in LDs and facilitated TAG accumulation. Moreover, we also proposed a novel working model for C. zofingiensis LDs. Taken together, our work unravels the unique characteristics of C. zofingiensis LDs and provides insights into algal LD biogenesis and TAG synthesis, which would facilitate genetic engineering of this alga for TAG improvement.

佐夫小球藻（Chromochloris zofingiensis）能够在多种营养条件下快速生长，可同时合成甘油三酯（triacylglycerol, TAG）和虾青素（astaxanthin）并储存在脂滴（lipid droplet, LD）中，是一种有工业化应用前景的绿藻。为了进一步探究脂质代谢，我们从缺氮诱导2天的佐夫小球藻细胞分离、纯化了LD，并进行了蛋白组学分析，鉴定到了295种LD蛋白。通过与佐夫小球藻近缘的莱茵衣藻（Chlamydomonas reinhardtii）的LD蛋白组对比分析发现，佐夫小球藻中的不少LD蛋白并不存在于莱茵衣藻LD中。我们挑选了7个新的LD蛋白（包括2个功能未知蛋白，2个油体钙蛋白（caleosin），2个脂酶，1个古洛糖酸内酯氧化酶）和主要LD蛋白（the major lipid droplet protein, MLDP）开展进一步研究。编码这些LD蛋白的基因都受缺氮诱导上调表达。它们在酵母中表达时都定位在LD上，并且大部分都能促进TAG的合成积累。异源表达MLDP基因能够回复莱茵衣藻mldp突变体的表型，还能提高莱茵衣藻野生株中的TAG含量。MLDP和油体钙蛋白在LD上的丰度相当，这明显区别于其它藻类。此外，我们为佐夫小球藻的LD功能提出了新的工作模型。一方面，LD结构蛋白包括MLDP、油体钙蛋白和一些功能未知蛋白，高丰度表达以维持脂滴的稳定；另一方面，一些参与脂质代谢的酶包括脂酶、长链酯酰CoA合成酶（long-chain acyl-CoA synthetase, LACS）和位于内质网上的甘油二酯酰基转移酶（diacylglycerol acyltransferase, DGAT）协同作用合成TAG。这些结果揭示佐夫小球藻LD的特异性，不仅有助于理解藻类LD生物发生和TAG合成，也为通过基因工程改造提高脂质含量提供了启示。

Page 260-272

Download 2886

Original Research

Warburg Effects in Cancer and Normal Proliferating Cells: Two Tales of the Same Name

Huiyan Sun, Liang Chen, Sha Cao, Yanchun Liang, Ying Xu

View abstract

It has been observed that both cancer tissue cells and normal proliferating cells (NPCs) have the Warburg effect. Our goal here is to demonstrate that they do this for different reasons. To accomplish this, we have analyzed the transcriptomic data of over 7000 cancer and control tissues of 14 cancer types in TCGA and data of five NPC types in GEO. Our analyses reveal that NPCs accumulate large quantities of ATPs produced by the respiration process before starting the Warburg effect, to raise the intracellular pH from ∼6.8 to ∼7.2 and to prepare for cell division energetically. Once cell cycle starts, the cells start to rely on glycolysis for ATP generation followed by ATP hydrolysis and lactic acid release, to maintain the elevated intracellular pH as needed by cell division since together the three processes are pH neutral. The cells go back to the normal respiration-based ATP production once the cell division phase ends. In comparison, cancer cells have reached their intracellular pH at ∼7.4 from top down as multiple acid-loading transporters are up-regulated and most acid-extruding ones except for lactic acid exporters are repressed. Cancer cells use continuous glycolysis for ATP production as way to acidify the intracellular space since the lactic acid secretion is decoupled from glycolysis-based ATP generation and is pH balanced by increased expressions of acid-loading transporters. Co-expression analyses suggest that lactic acid secretion is regulated by external, non-pH related signals. Overall, our data strongly suggest that the two cell types have the Warburg effect for very different reasons.

人体细胞产生能量（ATP）的过程由两部分组成：糖酵解及有氧呼吸。前一部分将1分子葡萄糖转化为2分子ATP及2分子丙酮酸；后一部分丙酮酸将进入三羧酸循环进行氧化，最终产生30个ATP。一般情况下，我们的细胞使用两种方式，将一个葡萄糖转化为32个ATP。在有些特殊条件下，如缺氧，丙酮酸将不进入三羧酸循环，而是转化为乳酸的形式排除细胞。 1927年，德国生化学家奥托∙瓦博格发现：肿瘤细胞无论是否缺氧，都以糖酵解为主要方式产生ATP，然后将乳酸排除细胞。这一现象称为瓦博格效应。后来的研究发现，所有的肿瘤都有不同程度的瓦博格效应。1967年，瓦博格进一步称：在表层上，肿瘤可能有无数种成病原因，但其根本原因只有一种，那就是糖酵解替代了有氧呼吸来产生ATP。由于使用低效的方式产生能量，导致肿瘤对糖的需求为正常细胞的几十倍，大幅上升的糖代谢也是PET/CT用于肿瘤诊断的基础。但为什么快速增殖的肿瘤细胞选择使用低效的方式产生能量，这一问题已困惑了肿瘤学界90年之久。在过去二十年，很多学者试图解释造成这一现象的原因，但都没能得到肿瘤学界的普遍认可。2009年，有学者提出：瓦博格效应是所有增殖细胞，包括正常增殖细胞共有的现象［1］，这一观点受到很多的学者认可。在本文工作中，通过对大量、多种类肿瘤组织转录数据进行统计分析，统计结果说明：虽然肿瘤细胞与正常增殖细胞都使用糖酵解方式产生ATP并将乳酸排除细胞，但产生这一现象的根本原因在两种细胞中完全不同。为了理解它们的异同，我们以细胞浆pH为切入点，其原因是正常细胞浆的pH为6.8，但当细胞增殖时，其pH需要上升到7.2-7.4，并维持在这一水平。以往的文献认为细胞是通过排出乳酸，来提升其内部的pH；而细胞从有氧呼吸转为糖酵解来产生ATP是为了其它代谢的需求［1］。我们知道：糖酵解产生ATP的化学反应可写成：glucose + 2 ADP3- + 2 HPO42- -> 2 lactate- + 2 ATP4-，即这一反应过程是pH 中性；而有氧呼吸产生ATP的过程消耗一个氢离子：ADP3- + HPO42- -> ATP4- + OH-。另外，ATP水解将释放一个氢离子：ATP4- + H2O = ADP3- + HPO42- + H+。因而，整体上使用糖酵解方式产生的ATP, 当使用后，将释放一个氢离子，而有氧呼吸产生的ATP，当使用后是pH中性。我们的模型如下：正常增殖细胞在准备增殖时，开始储存由有氧呼吸产生的ATP，因而导致细胞内pH上升。当ATP的储存达到一定程度、同时pH也上升到一定程度（如7.2-7.4）时，细胞开始增殖。这时细胞产生ATP的方式从有氧呼吸转为糖酵解，并开始使用储存的ATP。每消耗一个有氧呼吸产生的ATP，将伴随产生一个氢离子。而每一个由糖酵解产生的ATP,将伴随产生一个乳酸盐。细胞将 “乳酸盐＋氢离子 = 乳酸“排除，这样维持了细胞浆的pH不变。这一过程一直持续到储存的有氧呼吸产生的ATP用光，也完成了细胞增殖，能量代谢的方式又重新回到有氧呼吸。因而我们认为：正常细胞在增殖时，将有氧呼吸转化为糖酵解是为了提升及保持细胞浆的pH。另一点值得注意：正常增殖细胞还通过增加摄入碱性物质，抑制酸性物质的摄入来帮助提上其细胞浆pH。有关肿瘤细胞，其细胞浆pH的变化方式在以下几个方面不同于正常增殖细胞：（1）肿瘤细胞浆的pH一直维持在7.4左右，而不像正常增殖细胞的pH在6.8到7.4之间变化；（2）肿瘤细胞排除乳酸的速率与乳酸合成的速率基本是独立的，而这一速率与细胞的免疫反应成强相关；（3）肿瘤细胞摄入与排除酸、碱性物质的方式与正常增殖细胞正好相反（除乳酸排除外），即增加酸性物质的摄入，而抑制碱性物质的摄入;（4）肿瘤使用大量的代谢重编程来酸化其细胞浆［3］。另外，我们以前的工作发现:所有肿瘤组织细胞浆都有芬顿反应：∙O_2^- + H2O2 -> ∙OH + OH- ＋O2 （以〖Fe〗^(2+)为催化剂），其不断产生碱性物质，导致pH上升。综合以上，本文的总体模型为：由于肿瘤细胞浆发生了持续的芬顿反应，且其产生OH-的速率能很快饱和细胞的pH缓冲器，进而导致了一系列的代谢重编程，来酸化其细胞浆，使其pH保持在正常范围。瓦博格效应是其中的一项代谢重编程，目的也是酸化细胞浆。而肿瘤细胞排除乳酸的主要目的是为肿瘤细胞提供保护，免受免疫细胞的攻击［5］。参考文献 1. Vander Heiden MG, Cantley LC, Thompson CB. Understanding the Warburg effect: the metabolic requirements of cell proliferation. Science 2009;324:1029-33. 2. Damaghi M, Wojtkowiak JW, Gillies RJ. pH sensing and regulation in cancer. Front Physiol 2013;4:370. 3. Huiyan Sun, Yi Zhou, Michael Skaro, Yiran Wu, Zexing Qu, Fenglou Mao, Suwen Zhao, Ying Xu. Stress-Induced Metabolic Reprogramming in Cancer, submitted. 4. Huiyan Sun Chi Zhang Sha Cao Tao Sheng Ning Dong Ying Xu. Fenton reactions drive nucleotide and ATP syntheses in cancer, JMCB, 10(5), 448–459, 2018. 5. Franziska Hirschhaeuser, Ulrike G.A. Sattler, Wolfgang Mueller-Klieser. Lactate: A Metabolic Key Player in Cancer. Cancer Research, 2011, 71(22):6921-6925.

Page 273-286

Download 2685

Original Research

Characterization of Distinct T Cell Receptor Repertoires in Tumor and Distant Non-tumor Tissues from Lung Cancer Patients

Xiang Wang, Botao Zhang, Yikun Yang, Jiawei Zhu, Shujun Cheng, Yousheng Mao, Lin Feng, Ting Xiao

View abstract

T cells and T cell receptors (TCRs) play pivotal roles in adaptive immune responses against tumors. The development of next-generation sequencing technologies has enabled the analysis of the TCRβ repertoire usage. Given the scarce investigations on the TCR repertoire in lung cancer tissues, in this study, we analyzed TCRβ repertoires in lung cancer tissues and the matched distant non-tumor lung tissues (normal lung tissues) from 15 lung cancer patients. Based on our results, the general distribution of T cell clones was similar between cancer tissues and normal lung tissues; however, the proportion of highly expanded clones was significantly higher in normal lung tissues than in cancer tissues (0.021% ± 0.002% vs. 0.016% ± 0.001%, P = 0.0054, Wilcoxon signed rank test). In addition, a significantly higher TCR diversity was observed in cancer tissues than in normal lung tissues (431.37 ± 305.96 vs. 166.20 ± 101.58, P = 0.0075, Mann-Whitney U test). Moreover, younger patients had a significantly higher TCR diversity than older patients (640.7 ± 295.3 vs. 291.8 ± 233.6, P = 0.036, Mann-Whitney U test), and the higher TCR diversity in tumors was significantly associated with worse cancer outcomes. Thus, we provided a comprehensive comparison of the TCR repertoires between cancer tissues and matched normal lung tissues and demonstrated the presence of distinct T cell immune microenvironments in lung cancer patients.

T细胞和T细胞受体(TCR)在抗肿瘤的适应性免疫反应中发挥重要作用。TCR负责特异性的识别抗原,其重排过程中产生的高度可变的互补决定区3（complementary determining region 3，CDR3）序列是识别TCR特异性的主要标记。采用二代测序的方法评估TCRβ链的CDR3区可以很大程度上反映T细胞的克隆组成及其多样性，进而评估机体的抗肿瘤免疫反应，这就是通常所说的T细胞免疫组库。目前关于肺癌组织的T细胞免疫组库研究较少，本研究中我们对比分析了15例肺癌患者的肿瘤组织和癌旁正常肺组织的T细胞免疫组库。基于我们的研究结果，肺癌组织和癌旁正常肺组织的总体T细胞克隆分布相似；而正常肺组织相比于肺癌组织具有更高比例的高扩增克隆（0.021% ± 0.002% vs. 0.016% ± 0.001%，P = 0.0054，Wilcoxon signed rank test）。此外，我们发现肿瘤组织的TCR多样性显著高于正常肺组织（431.37 ± 305.96 vs. 166.20 ± 101.58，P = 0.0075，Mann-Whitney U test）。年轻的肺癌患者TCR的多样性要显著高于老年患者（640.7 ± 295.3 vs. 291.8 ± 233.6, P = 0.036, Mann-Whitney U test），而肿瘤组织中更高的TCR多样性与患者的不良预后显著相关。通过此项研究，我们比较分析了肺癌患者肿瘤组织和配对的正常肺组织的T细胞免疫组库特征，证明了肺癌患者间存在不同的T细胞免疫微环境，同时发现肺癌组织的TCR多样性与肺癌患者临床预后之间存在关联，提示可能通过检测肺癌患者的组织T细胞免疫组库来评价患者的预后，具有重要的临床价值。

Page 287-296

Download 2411

Letter

H3K27me3 Signal in the Cis Regulatory Elements Reveals the Differentiation Potential of Progenitors During Drosophila Neuroglial Development

Xiaolong Chen, Youqiong Ye, Liang Gu, Jin Sun, Yanhua Du, Wen-Ju Liu, Wei Li, Xiaobai Zhang, Cizhong Jiang

View abstract

Drosophila neural development undergoes extensive chromatin remodeling and precise epigenetic regulation. However, the roles of chromatin remodeling in establishment and maintenance of cell identity during cell fate transition remain enigmatic. Here, we compared the changes in gene expression, as well as the dynamics of nucleosome positioning and key histone modifications between the four major neural cell types during Drosophila neural development. We find that the neural progenitors can be separated from the terminally differentiated cells based on their gene expression profiles, whereas nucleosome distribution in the flanking regions of transcription start sites fails to identify the relationships between the progenitors and the differentiated cells. H3K27me3 signal in promoters and enhancers can not only distinguish the progenitors from the differentiated cells but also identify the differentiation path of the neural stem cells (NSCs) to the intermediate progenitor cells to the glial cells. In contrast, H3K9ac signal fails to identify the differentiation path, although it activates distinct sets of genes with neuron-specific and glia-related functions during the differentiation of the NSCs into neurons and glia, respectively. Together, our study provides novel insights into the crucial roles of chromatin remodeling in determining cell type during Drosophila neural development.

果蝇神经发育中发生广泛的染色质重塑和精确的表观调控。然而，染色质重塑在细胞命运转化中细胞身份的建立与维持中的作用并不清楚。这里，我们比较了果蝇神经发育中四种主要神经细胞之间的基因表达变化、核小体定位与关键组蛋白修饰的动态变化。结果发现基因表达谱能分开神经前体细胞与终端分化神经细胞，但转录起始位点附近的核小体占有不能区分二者。启动子与增强子上的H3K27me3信号不但能区分神经前体细胞与分化的神经细胞，而且可以鉴定出神经前体细胞到中间神经前体细胞再到神经胶质细胞的分化路径。相对比，H3K9ac信号不能鉴定出这条分化路径，不过它可以在神经前体细胞分化为神经元与神经胶质细胞过程中，分别激活具有神经元特异与神经胶质特异功能的不同基因集。总之，我们的研究对果蝇神经发育过程中染色质重塑功能提出新见解。

Page 297-304

Download 2661

Application Note

gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks

Madison Caballero, Jill Wegrzyn

View abstract

Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames, start sites, splice sites, and related structural features. The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures. In addition, the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. These frameworks also lack consideration for functional attributes, such as the presence or absence of protein domains that can be used for gene model validation. To provide oversight to the increasing number of published genome annotations, we present a software package, the Gene Filtering, Analysis, and Conversion (gFACs), to filter, analyze, and convert predicted gene models and alignments. The software operates across a wide range of alignment, analysis, and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers, and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space. gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs.

近年来，随着高通量测序技术的发展和普及，基因组的大小以及组装注释的复杂度日益增长。尽管如此，GenBank数据库的近7800个真核细胞基因组中，仅有少数组装注释到染色体水平。而就这些真核细胞的基因组而言，超过85%的基因组包含错误的基因注释信息。这种错误的产生往往是由于整合不同格式基因注释信息的文本文件所导致的。此外，大多数基因预测流程缺少冗余基因过滤步骤，并且不提供主流标准输出形式的注释结果文件。同时，这些流程很少涉及功能属性注释，如那些可以用于考证基因模型准确性的蛋白质结构域信息等。为了对日益增多的基因组注释信息提供有效的监督，我们开发了一个针对基因注释文件和比对结果，集筛选、分析、转换功能于一体的软件包——gFACs。通过结合参考基因组的信息，这款软件可以过滤错误的基因模型，生成统计信息，并提供可以进行下游分析可视化的输出文件。值得注意的是，这款软件并不能代替基于从头注释或者相似性注释的基因预测模型，而是提供一个用于比较争议性注释信息的工具，从而提高基因注释信息的准确性。同时，gFACs提供常用的附加功能，如基因组浏览，以及生成有关筛选过程的详细信息。gFACs是由Perl中的BioPerl库提供基础支持的开源包，可供研究者免费下载使用，下载地址https://gitlab.com/PlantGenomicsLab/gFACs。

Page 305-310

Download 2608

Application Note

C3: Consensus Cancer Driver Gene Caller

Chen-Yu Zhu, Chi Zhou, Yun-Qin Chen, Ai-Zong Shen, Zong-Ming Guo, Zhao-Yi Yang, Xiang-Yun Ye, Shen Qu, Jia Wei, Qi Liu

View abstract

Next-generation sequencing has allowed identification of millions of somatic mutations in human cancer cells. A key challenge in interpreting cancer genomes is to distinguish drivers of cancer development among available genetic mutations. To address this issue, we present the first web-based application, consensus cancer driver gene caller (C3), to identify the consensus driver genes using six different complementary strategies, i.e., frequency-based, machine learning-based, functional bias-based, clustering-based, statistics model-based, and network-based strategies. This application allows users to specify customized operations when calling driver genes, and provides solid statistical evaluations and interpretable visualizations on the integration results. C3 is implemented in Python and is freely available for public use at http://drivergene.rwebox.com/c3.

随着第二代测序技术（Next-generation sequencing，NGS）的飞速发展，人类癌症细胞基因组中的数百万的体细胞变异（Somatic Mutation）被鉴定发现。然而，区分癌症的驱动突变（Driver Mutation）和乘客突变（Passenger Mutation）对于该领域来说仍是一个挑战。针对这一挑战，我们开发了首个基于网页应用用户友好的癌症驱动基因整合分析平台（consensus cancer driver gene caller, C3）。该平台集成了六个基于不同策略的工具获得一致而可靠的驱动基因预测结果，这些工具各有特点又相互补充。同时，我们允许用户在一定范围内调整参数，并对分析结果进行整合分析和可视化。用户可以通过http://drivergene.rwebox.com/c3访问并使用C3。

Page 311-318

Download 2651

Application Note

Diversified Application of Barcoded PLATO (PLATO-BC) Platform for Identification of Protein Interactions

Weili Kong, Tsuyoshi Hayashi, Guillaume Fiches, Qikai Xu, Mamie Z. Li, Jianwen Que, Shuai Liu, Wei Zhang, Jun Qi, Netty Santoso, Stephen J. Elledge, Jian Zhu

View abstract

Proteins usually associate with other molecules physically to execute their functions. Identifying these interactions is important for the functional analysis of proteins. Previously, we reported the parallel analysis of translated ORFs (PLATO) to couple ribosome display of full-length ORFs with affinity enrichment of mRNA/protein/ribosome complexes for the “bait” molecules, followed by the deep sequencing analysis of mRNA. However, the sample processing, from extraction of precipitated mRNA to generation of DNA libraries, includes numerous steps, which is tedious and may cause the loss of materials. Barcoded PLATO (PLATO-BC), an improved platform was further developed to test its application for protein interaction discovery. In this report, we tested the antisera-antigen interaction using serum samples from patients with inclusion body myositis (IBM). Tripartite motif containing 21 (TRIM21) was identified as a potentially new IBM autoantigen. We also expanded the application of PLATO-BC to identify protein interactions for JQ1, single ubiquitin peptide, and NS5 protein of Zika virus. From PLATO-BC analyses, we identified new protein interactions for these “bait” molecules. We demonstrate that Ewing sarcoma breakpoint region 1 (EWSR1) binds to JQ1 and their interactions may interrupt the EWSR1 association with acetylated histone H4. RIO kinase 3 (RIOK3), a newly identified ubiquitin-binding protein, is preferentially associated with K63-ubiquitin chain. We also find that Zika NS5 protein interacts with two previously unreported host proteins, par-3 family cell polarity regulator (PARD3) and chromosome 19 open reading frame 53 (C19orf53), whose attenuated expression benefits the replication of Zika virus. These results further demonstrate that PLATO-BC is capable of identifying novel protein interactions for various types of “bait” molecules.

蛋白质通常需要通过与其他分子（抗体，小分子药物，肽段或者蛋白质）互作来发挥其功能，鉴定蛋白质与这些分子的相互作用有助于我们来理解该蛋白质的功能。之前，我们建立了一种PLATO方法用来鉴定这些相互作用，该方法采用核糖体展示全长基因的开放阅读框（ORF）来形成一种mRNA/蛋白质/核糖体复合体，通过与诱饵分子共同孵育，pulldown与其相互作用的分子，通过二代测序mRNA序列可以鉴定出与某一特定诱饵分子作用的特异性蛋白质。但是该方法在样品制备上存在一些弊端，包括从mRNA的提取到DNA 文库的制备，其步骤繁琐，耗时较长，并且会导致一些基因的丢失。我们通过给每个基因添加一段特有的条形码，建立了一种优化的Barcoded PLATO 。本研究中我们通过测定四种不同的诱饵分子（包涵体肌炎病人的血清抗体、BRD4抑制剂JQ1、泛素肽和ZIKV病毒的非结构蛋白NS5）来评估该方法的实用性。研究结果表明，新的Barcoded PLATO不仅可以鉴定先前报道的自身抗原，同时也鉴定出了一种新的未被报道的自身抗原TRIM21。JQ1可以与癌症相关的蛋白ESWR1结合, 同时JQ1还可以阻断EWSR1与乙酰化的组蛋白H4间的相互作用。一种新的泛素结合蛋白RIOK3在本研究中发现，进一步研究证明RIOK3优先结合K63泛素链。PARD3 和C19orf53不仅可以与ZIKV 病毒NS5结合，还可以抑制ZIKV病毒的复制。总之，这些结果证实barcoded PLATO 是一种可以用来鉴定多种诱饵分子相互作用的方法。

Page 319-331

Download 2569