Article Online - Genomics, Proteomics & Bioinformatics

Volume: 19, Issue: 5

Database

COVID-ONE-hi: The One-stop Database for COVID-19-specific Humoral Immunity and Clinical Parameters

Zhaowei Xu, Yang Li1, Qing Lei, Likun Huang, Dan-yun Lai, Shu-juan Guo, He-wei Jiang, Hongyan Hou, Yun-xiao Zheng, Xue-ning Wang, Jiaoxiang Wu, Ming-liang Ma, Bo Zhang, Hong Chen, Caizheng Yu, Jun-biao Xue, Hai-nan Zhang, Huan Qi, Siqi Yu, Mingxi Lin, Yandi Zhang, Xiaosong Lin, Zongjie Yao, Huiming Sheng, Ziyong Sun, Feng Wang, Xionglin Fan, Sheng-cTao

View abstract

Coronavirus disease 2019 (COVID-19), which is caused by SARS-CoV-2, varies with regard to symptoms and mortality rates among populations. Humoral immunity plays critical roles in SARS-CoV-2 infection and recovery from COVID-19. However, differences in immune responses and clinical features among COVID-19 patients remain largely unknown. Here, we report a database for COVID-19-specific IgG/IgM immune responses and clinical parameters (named COVID-ONE-hi). COVID-ONE-hi is based on the data that contain the IgG/IgM responses to 24 full-length/truncated proteins corresponding to 20 of 28 known SARS-CoV-2 proteins and 199 spike protein peptides against 2360 serum samples collected from 783 COVID-19 patients. In addition, 96 clinical parameters for the 2360 serum samples and basic information for the 783 patients are integrated into the database. Furthermore, COVID-ONE-hi provides a dashboard for defining samples and a one-click analysis pipeline for a single group or paired groups. A set of samples of interest is easily defined by adjusting the scale bars of a variety of parameters. After the “START” button is clicked, one can readily obtain a comprehensive analysis report for further interpretation. COVID-ONE-hi is freely available at www.COVID-ONE.cn.

该工作基于此前开发的新冠病毒蛋白质组芯片（https://mp.weixin.qq.com/s/hAe4g9t5U8DG9Q_2SDm_JQ，https://mp.weixin.qq.com/s/AJ3r3Pf5ExC-ajmQ0iw15Q），筛选了783例新冠病人的2360例血清，涵盖了21个新冠蛋白和197个S蛋白小肽的抗体免疫数据。同时，为了加强与临床的相关性，数据库收录了这783例新冠病人的96项临床参数。基于抗体免疫数据和临床参数，通过整合多种生物信息学分析工具，作者构建了一站式COVID-19特异性体液免疫数据库（COVID-ONE-hi， www.covid-one.cn）。用户可在数据库上选择感兴趣的样本组，点击分析按钮后即可获得全面的分析报告。

Page 669-678

Download 1729

Review Article

Gut Microbiome Alterations in COVID-19

Tao Zuo, Xiaojian Wu, Weiping Wen, Ping Lan

View abstract

Since the outset of the coronavirus disease 2019 (COVID-19) pandemic, the gut microbiome in COVID-19 has garnered substantial interest, given its significant roles in human health and pathophysiology. Accumulating evidence is unveiling that the gut microbiome is broadly altered in COVID-19, including the bacterial microbiome, mycobiome, and virome. Overall, the gut microbial ecological network is significantly weakened and becomes sparse in patients with COVID-19, together with a decrease in gut microbiome diversity. Beyond the existence of severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), the gut microbiome of patients with COVID-19 is also characterized by enrichment of opportunistic bacteria, fungi, and eukaryotic viruses, which are also associated with disease severity and presentation. Meanwhile, a multitude of symbiotic bacteria and bacteriophages are decreased in abundance in patients with COVID-19. Such gut microbiome features persist in a significant subset of patients with COVID-19 even after disease resolution, coinciding with ‘long COVID’ (also known as post-acute sequelae of COVID-19). The broadly-altered gut microbiome is largely a consequence of SARS-CoV-2 infection and its downstream detrimental effects on the systemic host immunity and the gut milieu. The impaired host immunity and distorted gut microbial ecology, particularly loss of low-abundance beneficial bacteria and blooms of opportunistic fungi including Candida, may hinder the reassembly of the gut microbiome post COVID-19. Future investigation is necessary to fully understand the role of the gut microbiome in host immunity against SARS-CoV-2 infection, as well as the long-term effect of COVID-19 on the gut microbiome in relation to the host health after the pandemic.

肠道微生物群对于人体的健康和病理生理过程发挥着极其重要的作用。自新冠肺炎疫情爆发以来，新冠肺炎患者的肠道微生态变化及其对感染免疫的抵抗功能得到越来越多的关注。越来越多证据显示，新冠肺炎患者的肠道微生物群发生了明显的改变，包括细菌微生物群、真菌微生物群和病毒微生物群。总的来说，新冠肺炎患者的肠道微生物多样性降低，生态网络变得结构性稀疏。新冠肺炎患者的肠道中除了存在 SARS-CoV-2以外，还存在大量的机会致病菌、真菌和真核生物病毒，它们都与新冠肺炎的严重程度和临床表现有关。此外，绝大部分新冠肺炎患者肠道中的共生细菌、益生菌和噬菌体数量减少。大部分新冠肺炎患者的肠道微生态紊乱在疾病痊愈后也持续存在，此与“Long COVID”（新冠肺炎患者感染清除后数周或数月仍出现新冠肺炎症状）相一致，提示肠道微生态的持续紊乱在“新冠肺炎后时代”的长期健康危险。新冠肺炎感染对宿主免疫和肠道环境的不利影响在很大程度上导致了肠道微生物群改变。宿主免疫力受损和肠道微生物生态失调，特别是低丰度有益细菌的缺失和机会致病真菌(包括念珠菌)的大量繁殖，可能会阻碍新冠肺炎后肠道微生物群的恢复。因此，研究肠道微生物群在宿主免疫对抗SARS-CoV-2感染中的作用，和研究疫情大流行后新冠肺炎对肠道微生态的长期影响及与宿主健康的关系是十分有必要的。

Page 679-688

Download 1936

Review Article

Quantitative Proteomics Using Isobaric Labeling: A Practical Guide

Xiulan Chen, Yaping Sun, Tingting Zhang, Lian Shu, Peter Roepstorff, Fuquan Yang

View abstract

In the past decade, relative proteomic quantification using isobaric labeling technology has developed into a key tool for comparing the expression of proteins in biological samples. Although its multiplexing capacity and flexibility make this a valuable technology for addressing various biological questions, its quantitative accuracy and precision still pose significant challenges to the reliability of its quantification results. Here, we give a detailed overview of the different kinds of isobaric mass tags and the advantages and disadvantages of the isobaric labeling method. We also discuss which precautions should be taken at each step of the isobaric labeling workflow, to obtain reliable quantification results in large-scale quantitative proteomics experiments. In the last section, we discuss the broad applications of the isobaric labeling technology in biological and clinical studies, with an emphasis on thermal proteome profiling and proteogenomics.

Page 689-706

Download 5258

Research Article

Integrative Multi-omics Landscape of Non-structural Protein 3 of Severe Acute Respiratory Syndrome Coronaviruses

Ruona Shi, Zhenhuan Feng, Xiaofei Zhang

View abstract

The coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is currently a global pandemic. Extensive investigations have been performed to study the clinical and cellular effects of SARS-CoV-2 infection. Mass spectrometry-based proteomics studies have revealed the cellular changes due to the infection and identified a plethora of interactors for all SARS-CoV-2 components, except for the longest non-structural protein 3 (NSP3). Here, we expressed the full-length NSP3 proteins of SARS-CoV and SARS-CoV-2 to investigate their unique and shared functions using multi-omics methods. We conducted interactome, phosphoproteome, ubiquitylome, transcriptome, and proteome analyses of NSP3-expressing cells. We found that NSP3 plays essential roles in cellular functions such as RNA metabolism and immune response (e.g., NF-κB signal transduction). Interestingly, we showed that SARS-CoV-2 NSP3 has both endoplasmic reticulum and mitochondrial localizations. In addition, SARS-CoV-2 NSP3 is more closely related to mitochondrial ribosomal proteins, whereas SARS-CoV NSP3 is related to the cytosolic ribosomal proteins. In summary, our integrative multi-omics study of NSP3 improves the understanding of the functions of NSP3 and offers potential targets for the development of anti-SARS strategies.

研究问题： NSP3 是 SARS-CoVs 中具有蛋白酶功能的两个蛋白之一，参与切割pp1a 和 pp1ab 两个长肽。前期的研究表明 NSP3 与其他 NSPs 一起调控病毒基因组 RNA 的合成。NSP3 是否还具有其他功能？NSP3 对于宿主细胞有什么影响？SARS-CoV 和 SARS-CoV-2 两种冠状病毒的 NSP3 功能是否相同？研究方法：在本研究中，我们通过免疫荧光发现了两种 NSP3 蛋白在宿主细胞中定位的不同。利用相互作用组、磷酸化组、泛素化组、转录组和蛋白组的多组学方法整合分析，我们发现两种 SARS-CoV 的 NSP3 蛋白对宿主细胞的影响的异同。主要结果1： SARS-CoV 和 SARS-CoV-2 的 NSP3 蛋白具有细胞定位的倾向性。主要结果2： NSP3 的相互作用组揭示了 SARS-CoV-2 NSP3 与内质网和线粒体核糖体关系密切，而 SARS-CoV NSP3 则与细胞质核糖体相关。主要结果3：多组学揭示由 NSP3 调控的细胞过程。数据链接： http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD023927 https://ngdc.cncb.ac.cn/gsa-human/browse/HRA000634

Page 707-726

Download 3873

Research Article

Genomic Epidemiology of SARS-CoV-2 in Pakistan

Shuhui Song, Cuiping Li, Lu Kang, Dongmei Tian, Nazish Badar, Wentai Ma, Shilei Zhao, Xuan Jiang, Chun Wang, Yongqiao Sun, Wenjie Li, Meng Lei, Shuangli Li, Qiuhui Qi, Aamer Ikram, Muhammad Salman, Massab Umair, Huma Shireen, Fatima Batool, Bing Zhang, Hua Chen, Yun-Gui Yang, Amir Ali Abbasi, Mingkun Li, Yongbiao Xue, Yiming Bao

View abstract

COVID-19 has swept globally and Pakistan is no exception. To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1, 2020. We identified a total of 347 mutated positions, 31 of which were over-represented in Pakistan. Meanwhile, we found over 1000 intra-host single-nucleotide variants (iSNVs). Several of them occurred concurrently, indicating possible interactions among them or coevolution. Some of the high-frequency iSNVs in Pakistan were not observed in the global population, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation (G8371T in ORF1ab) of this cluster. Furthermore, 28 putative international introductions were identified, several of which are consistent with the epidemiological investigations. In all, this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan, which could aid ongoing and future viral surveillance and COVID-19 control.

研究内容： 2020年3月12日，世界卫生组织将新冠病毒肺炎列为全球大流行病。新冠肺炎疫情席卷全球，严重威胁全世界各国人民的身体健康和生命安全，也给全球经济和社会发展造成重创，巴基斯坦也不例外。为了调查SARS-CoV-2在巴基斯坦的最初引入和传播，中国科学院北京基因组研究所（国家生物信息中心）联合巴基斯坦，开展了巴基斯坦最大规模的COVID-19基因组流行病学研究，对巴基斯坦境内新冠病毒传播演化规律进行了深入地分析探究。研究方法：巴基斯坦国立卫生研究院（NIH）和真纳大学的研究人员系统采集并提供了巴基斯坦境内数百例早期新冠肺炎确诊病人样本。中国科学院北京基因组研究所（国家生物信息中心）新冠科技攻关团队，同时采用全基因组多重PCR扩增和探针杂交富集两种建库方法对采集到的样本进行了新冠病毒全基因组测序，通过生物信息学分析获取高质量新冠病毒全基因组序列，并进行了变异特征和传播演化的深入分析。研究人员根据单体型网络演化和溯源分析结果，进一步推测了巴基斯坦疫情可能的输入情况。主要结果1：系统采集了巴基斯坦境内150例早期（2020年6月2日之前）新冠肺炎确诊病人样本，并获得150个高质量新冠病毒基因组序列。确定了347个突变位点，其中31个突变位点的群体发生率在巴基斯坦研究的病例中显著高于同时期全球平均水平。主要结果2：检测到1057个宿主内单核苷酸变异（iSNVs）。其中一些iSNVs存在连锁现象，表明它们之间可能存在相互作用或共同进化。由于未观察到高频iSNVs，表明研究群体中存在强烈的纯化选择。主要成果3：单体型网络演化分析发现该批临床样本分布在五个独立的传播团簇，其中最大的一个团簇包含74条基因组序列形成了至少跨越2-5代的较大传播链。基于全球公开数据的网络分析，推测约有28次巴基斯坦境外关联输入事件，包括来自美国、法国、葡萄牙等的国家。数据链接： https://ngdc.cncb.ac.cn/gsa/browse/CRA003122 https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=PRJCA003179&page=1

Page 727-740

Download 2894

Research Article

Integrative Analysis of Genome, 3D Genome, and Transcriptome Alterations of Clinical Lung Cancer Samples

Tingting Li, Ruifeng Li, Xuan Dong, LinShi, Miao Lin, Ting Peng, Pengze Wu, Yuting Liu, Xiaoting Li, Xuheng He, Xu Han, Bin Kang, Yinan Wang, Zhiheng Liu, Qing Chen, Yue Shen, Mingxiang Feng, Xiangdong Wang, Duojiao Wu, Jian Wang, Cheng Li

View abstract

Genomic studies of cancer cell alterations, such as mutations, copy number variations (CNVs), and translocations, greatly promote our understanding of the genesis and development of cancers. However, the 3D genome architecture of cancers remains less studied due to the complexity of cancer genomes and technical difficulties. To explore the 3D genome structure in clinical lung cancer, we performed Hi-C experiments using paired normal and tumor cells harvested from patients with lung cancer, combining with RNA sequenceing analysis. We demonstrated the feasibility of studying 3D genome of clinical lung cancer samples with a small number of cells (1 × 104), compared the genome architecture between clinical samples and cell lines of lung cancer, and identified conserved and changed spatial chromatin structures between normal and cancer samples. We also showed that Hi-C data can be used to infer CNVs and point mutations in cancer. By integrating those different types of cancer alterations, we showed significant associations between CNVs, 3D genome, and gene expression. We propose that 3D genome mediates the effects of cancer genomic alterations on gene expression through altering regulatory chromatin structures. Our study highlights the importance of analyzing 3D genomes of clinical cancer samples in addition to cancer cell lines and provides an integrative genomic analysis pipeline for future larger-scale studies in lung cancer and other cancers.

Page 741-753

Download 1786

Research Article

Oleic Acid and Eicosapentaenoic Acid Reverse Palmitic Acid-induced Insulin Resistance in Human HepG2 Cells via the Reactive Oxygen Species/JUN Pathway

Yaping Sun, Jifeng Wang, Xiaojing Guo, Nali Zhu, Lili Niu, Xiang Ding, Zhensheng Xie, Xiulan Chen, Fuquan Yang

View abstract

Oleic acid (OA), a monounsaturated fatty acid (MUFA), has previously been shown to reverse saturated fatty acid palmitic acid (PA)-induced hepatic insulin resistance (IR). However, its underlying molecular mechanism is unclear. In addition, previous studies have shown that eicosapentaenoic acid (EPA), a ω-3 polyunsaturated fatty acid (PUFA), reverses PA-induced muscle IR, but whether EPA plays the same role in hepatic IR and its possible mechanism involved need to be further clarified. Here, we confirmed that EPA reversed PA-induced IR in HepG2 cells and compared the proteomic changes in HepG2 cells after treatment with different free fatty acids (FFAs). A total of 234 proteins were determined to be differentially expressed after PA+OA treatment. Their functions were mainly related to responses to stress and endogenous stimuli, lipid metabolic process, and protein binding. For PA+EPA treatment, the PA-induced expression changes of 1326 proteins could be reversed by EPA, 415 of which were mitochondrial proteins, with most of the functional proteins involved in oxidative phosphorylation (OXPHOS) and tricarboxylic acid (TCA) cycle. Mechanistic studies revealed that the protein encoded by JUN and reactive oxygen species (ROS) play a role in OA- and EPA-reversed PA-induced IR, respectively. EPA and OA alleviated PA-induced abnormal adenosine triphosphate (ATP) production, ROS generation, and calcium (Ca2+) content. Importantly, H2O2-activated production of ROS increased the protein expression of JUN, further resulting in IR in HepG2 cells. Taken together, we demonstrate that ROS/JUN is a common response pathway employed by HepG2 cells toward FFA-regulated IR.

Page 754-771

Download 2157

Research Article

Mining Unknown Porcine Protein Isoforms by Tissue-based Map of Proteome Enhances Pig Genome Annotation

Pengju Zhao, Xianrui Zheng, Ying Yu, Zhuocheng Hou, Chenguang Diao, Haifei Wang, Huimin Kang, Chao Ning, Junhui Li, Wen Feng, Wen Wang, George E. Liu, Bugao Li, Jacqueline Smith, Yangzom Chamba, Jian-Feng Liu

View abstract

A lack of the complete pig proteome has left a gap in our knowledge of the pig genome and has restricted the feasibility of using pigs as a biomedical model. In this study, we developed a tissue-based proteome map using 34 major normal pig tissues. A total of 5841 unknown protein isoforms were identified and systematically characterized, including 2225 novel protein isoforms, 669 protein isoforms from 460 genes symbolized beginning with LOC, and 2947 protein isoforms without clear NCBI annotation in the current pig reference genome. These newly identified protein isoforms were functionally annotated through profiling the pig transcriptome with high-throughput RNA sequencing of the same pig tissues, further improving the genome annotation of the corresponding protein-coding genes. Combining the well-annotated genes that have parallel expression pattern and subcellular witness, we predicted the tissue-related subcellular locations and potential functions for these unknown proteins. Finally, we mined 3081 orthologous genes for 52.7% of unknown protein isoforms across multiple species, referring to 68 KEGG pathways as well as 23 disease signaling pathways. These findings provide valuable insights and a rich resource for enhancing studies of pig genomics and biology, as well as biomedical model application to human medicine.

猪作为中国最重要的家畜之一，不仅为人类提供丰富的动物性蛋白，还是进行重大疾病医学模型以及人体器官移植的重要科学工具。猪的蛋白质图谱包含了主要组织和器官的蛋白质亚细胞定位数据和表达数据，既是生猪育种研究的必要基础也是生物医学模型构建的重要参考。本研究结合RNA-seq和label-free技术分别对猪的34个正常组织的转录组数据和质谱信息进行了深层次的挖掘。将检测得到的2225个新蛋白质、2947个无基因组注释的蛋白质以及669个以LOC为标志的蛋白质整合成为猪的未知蛋白组，并将它们全部映射到最新的猪基因组中。通过进一步解析了这些未知蛋白组的猪基因表达图谱和亚细胞特征，推断它们与组织的特定功能之间的联系。最后，通过系统地比较发现52.75%的猪未知蛋白与其他多种物种的蛋白组间存在同源关系，并且这些同源蛋白显著地富集于68个功能通路其中包括23个疾病相关的通路。本研究的发现将有利于猪基因组的研究和发展，并将进一步提高人们对猪基因组中所感兴趣的基因和功能网络的认识，有利于猪作为生物医学模型在人类医学中的应用。

Page 772-786

Download 2737

Research Article

Integrating Genomic and Transcriptomic Data to Reveal Genetic Mechanisms Underlying Piao Chicken Rumpless Trait

Yun-Mei Wang, Saber Khederzadeh, Shi-Rong Li, Newton Otieno Otecko, David M. Irwin, Mukesh Thakur, Xiao-Die Ren, Ming-Shan Wang, Dong-Dong Wu, Ya-Ping Zhang

View abstract

Piao chicken, a rare Chinese native poultry breed, lacks primary tail structures, such as pygostyle, caudal vertebra, uropygial gland, and tail feathers. So far, the molecular mechanisms underlying tail absence in this breed remain unclear. In this study, we comprehensively employed comparative transcriptomic and genomic analyses to unravel potential genetic underpinnings of rumplessness in Piao chicken. Our results reveal many biological factors involved in tail development and several genomic regions under strong positive selection in this breed. These regions contain candidate genes associated with rumplessness, including Irx4, Il18, Hspb2, and Cryab. Retrieval of quantitative trait loci (QTL) and gene functions implies that rumplessness might be consciously or unconsciously selected along with the high-yield traits in Piao chicken. We hypothesize that strong selection pressures on regulatory elements might lead to changes in gene activity in mesenchymal stem cells of the tail bud. The ectopic activity could eventually result in tail truncation by impeding differentiation and proliferation of the stem cells. Our study provides fundamental insights into early initiation and genetic basis of the rumpless phenotype in Piao chicken.

瓢鸡是我国特有的典型无尾品种鸡，主要分布于我国云南省普洱市镇沅县，具有良好的产蛋产肉性能，是非常重要的畜禽遗传资源[1-3]。瓢鸡没有尾椎骨、尾棕骨、尾脂腺和主尾羽，尾部圆滑，形似葫芦瓢，故名瓢鸡[3]。研究表明瓢鸡的无尾性状是常染色体显性性状，并在胚胎期就已经形成[1,4]，但其发生的具体分子机制和发育时期仍不清楚。为了探究瓢鸡无尾性状形成的分子遗传机制，我们通过整合比较基因组学、转录组学和胚胎发生显微镜观察等技术手段，在瓢鸡中发现了许多受强正选择的基因，这些基因与诸如体重、脂肪沉积、卵子受精、就巢率和就巢时间等生产性能密切相关，其中Irx4，Il18，Hspb2和Cryab则可能与瓢鸡尾部缺失相关。综合前人的研究，我们提出假设：瓢鸡因尾部缺失而导致其后部暴露，使其在群体内乃至群体间更加容易交配，产蛋量显著提高，加上肉质鲜美，其无尾性状因而随着它的高产性能被人们有意识或无意识地保留了下来。同时我们发现瓢鸡的无尾性状在胚胎发育第四天时就已经基本形成，暗示这些无尾性状相关的基因在瓢鸡胚胎发育早期就发挥功能，破坏尾牙的间质维持，阻断后续远端结构的发育，最终导致瓢鸡尾部的缺失。我们的工作揭示了瓢鸡无尾性状发生的可能的遗传机制，同时表明，不同鸡品种的常染色体显性的无尾椎性状或许有着相同的发生机制，为其它脊椎动物如人类尾巴退化和椎骨缺陷等的研究提供重要的参考。

Page 787-799

Download 1726

Research Article

Elucidation of the MicroRNA Transcriptome in Western Corn Rootworm Reveals Its Dynamic and Evolutionary Complexity

Xiaozeng Yang, Elane Fishilevich, Marcelo A. German, Premchand Gandra, Robert E. McEwan, André Billion, Eileen Knorr, Andreas Vilcinskas, Kenneth E. Narva

View abstract

Diabrotica virgifera virgifera (western corn rootworm, WCR) is one of the most destructive agricultural insect pests in North America. It is highly adaptive to environmental stimuli and crop protection technologies. However, little is known about the underlying genetic basis of WCR behavior and adaptation. More specifically, the involvement of small RNAs (sRNAs), especially microRNAs (miRNAs), a class of endogenous small non-coding RNAs that regulate various biological processes, has not been examined, and the datasets of putative sRNA sequences have not previously been generated for WCR. To achieve a comprehensive collection of sRNA transcriptomes in WCR, we constructed, sequenced, and analyzed sRNA libraries from different life stages of WCR and northern corn rootworm (NCR), and identified 101 conserved precursor miRNAs (pre-miRNAs) in WCR and other Arthropoda. We also identified 277 corn rootworm specific pre-miRNAs. Systematic analyses of sRNA populations in WCR revealed that its sRNA transcriptome, which includes PIWI-interacting RNAs (piRNAs) and miRNAs, undergoes a dynamic change throughout insect development. Phylogenetic analysis of miRNA datasets from model species reveals that a large pool of species-specific miRNAs exists in corn rootworm; these are potentially evolutionarily transient. Comparisons of WCR miRNA clusters to other insect species highlight conserved miRNA-regulated processes that are common to insects. Parallel Analysis of RNA Ends (PARE) also uncovered potential miRNA-guided cleavage sites in WCR. Overall, this study provides a new resource for studying the sRNA transcriptome and miRNA-mediated gene regulation in WCR and other Coleopteran insects.

研究问题：美洲玉米根虫是北美最具破坏性的农业害虫之一，具有很强的环境适应性。对美洲玉米根虫的遗传学研究很少，尤其是在小RNA领域，目前还是空白。研究方法：深度测序了美洲西部玉米根虫和北部玉米根虫中不同发育阶段的小RNA，以及降解组。通过生物信息学的手段综合鉴定了这两个物种的miRNA，利用比较基因组学的手段系统分析了不同物种中miRNA簇的进化等。主要成果1: 在玉米根虫中鉴定到了101个保守的miRNA基因和277个玉米根虫特异的miRNA基因。主要成果2：揭示了玉米根虫中整体小RNA[包括piwi相互作用RNA（piRNA）和miRNA）]在昆虫发育过程中存在丰度动态变化的现象。主要成果3：比较发现了不同节肢动物中miRNA簇动态进化和表达分布特点；发现了在动物中不多存在的miRNA介导的对靶基因的剪切调控在西部玉米根虫中广泛存在。

Page 800-814

Download 1637

Method

Characterizing RNA Pseudouridylation by Convolutional Neural Networks

Xuan He, Sai Zhang, Yanqing Zhang, Zhixin Lei, Tao Jiang, Jianyang Zeng

View abstract

Pseudouridine (Ψ) is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and mRNAs. However, the functions, mechanisms, and precise distribution of Ψs (especially in mRNAs) still remain largely unclear. The landscape of Ψs across the transcriptome has not yet been fully delineated. Here, we present a highly effective model based on a convolutional neural network (CNN), called PseudoUridyLation Site Estimator (PULSE), to analyze large-scale profiling data of Ψ sites and characterize the contextual sequence features of pseudouridylation. PULSE, consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network, can automatically learn the hidden patterns of pseudouridylation from the local sequence information. Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy, thus enabling us to further characterize the transcriptome-wide landscape of Ψ sites. We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation, such as the regulations of RNA secondary structure, codon usage, translation, and RNA stability, and the connection to single nucleotide variants. The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE.

Page 815-833

Download 1476

Method

kLDM: Inferring Multiple Metagenomic Association Networks Based on the Variation of Environmental Factors

Yuqing Yang, Xin Wang, Kaikun Xie, Congmin Zhu, Ning Chen, Ting Chen

View abstract

Identification of significant biological relationships or patterns is central to many metagenomic studies. Methods that estimate association networks have been proposed for this purpose; however, they assume that associations are static, neglecting the fact that relationships in a microbial ecosystem may vary with changes in environmental factors (EFs), which can result in inaccurate estimations. Therefore, in this study, we propose a computational model, called the k-Lognormal-Dirichlet-Multinomial (kLDM) model, which estimates multiple association networks that correspond to specific environmental conditions, and simultaneously infers microbe–microbe and EF–microbe associations for each network. The effectiveness of the kLDM model was demonstrated on synthetic data, a colorectal cancer (CRC) dataset, the Tara Oceans dataset, and the American Gut Project dataset. The results revealed that the widely-used Spearman’s rank correlation coefficient method performed much worse than the other methods, indicating the importance of separating samples by environmental conditions. Cancer fecal samples were then compared with cancer-free samples, and the estimation achieved by kLDM exhibited fewer associations among microbes but stronger associations between specific bacteria, especially five CRC-associated operational taxonomic units, indicating gut microbe translocation in cancer patients. Some EF-dependent associations were then found within a marine eukaryotic community. Finally, the gut microbial heterogeneity of inflammatory bowel disease patients was detected. These results demonstrate that kLDM can elucidate the complex associations within microbial ecosystems. The kLDM program, R, and Python scripts, together with all experimental datasets, are accessible at https://github.com/tinglab/kLDM.git.

Page 834-847

Download 1813

Method

FunHoP: Enhanced Visualization and Analysis of Functionally Homologous Proteins in Complex Metabolic Networks

Kjersti Rise, May-Britt Tessem, Finn Drabløs, Morten B. Rye

View abstract

Cytoscape is often used for visualization and analysis of metabolic pathways. For example, based on KEGG data, a reader for KEGG Markup Language (KGML) is used to load files into Cytoscape. However, although multiple genes can be responsible for the same reaction, the KGML-reader KEGGScape only presents the first listed gene in a network node for a given reaction. This can lead to incorrect interpretations of the pathways. Our new method, FunHoP, shows all possible genes in each node, making the pathways more complete. FunHoP collapses all genes in a node into one measurement using read counts from RNA-seq. Assuming that activity for an enzymatic reaction mainly depends upon the gene with the highest number of reads, and weighting the reads on gene length and ratio, a new expression value is calculated for the node as a whole. Differential expression at node level is then applied to the networks. Using prostate cancer as model, we integrate RNA-seq data from two patient cohorts with metabolism data from literature. Here we show that FunHoP gives more consistent pathways that are easier to interpret biologically. Code and documentation for running FunHoP can be found at https://github.com/kjerstirise/FunHoP.

Page 848-859

Download 1695

Application Note

CharPlant: A De Novo Open Chromatin Region Prediction Tool for Plant Genomes

Yin Shen, Ling-Ling Chen, Junxiang Gao

View abstract

Chromatin accessibility is a highly informative structural feature for understanding gene transcription regulation, because it indicates the degree to which nuclear macromolecules such as proteins and RNAs can access chromosomal DNA. Studies have shown that chromatin accessibility is highly dynamic during stress response, stimulus response, and developmental transition. Moreover, physical access to chromosomal DNA in eukaryotes is highly cell-specific. Therefore, current technologies such as DNase-seq, ATAC-seq, and FAIRE-seq reveal only a portion of the open chromatin regions (OCRs) present in a given species. Thus, the genome-wide distribution of OCRs remains unknown. In this study, we developed a bioinformatics tool called CharPlant for the de novo prediction of OCRs in plant genomes. To develop this tool, we constructed a three-layer convolutional neural network (CNN) and subsequently trained the CNN using DNase-seq and ATAC-seq datasets of four plant species. The model simultaneously learns the sequence motifs and regulatory logics, which are jointly used to determine DNA accessibility. All of these steps are integrated into CharPlant, which can be run using a simple command line. The results of data analysis using CharPlant in this study demonstrate its prediction power and computational efficiency. To our knowledge, CharPlant is the first de novo prediction tool that can identify potential OCRs in the whole genome. The source code of CharPlant and supporting files are freely available from https://github.com/Yin-Shen/CharPlant.

染色质可及性表征了蛋白质和RNA等核内大分子能够接近染色体DNA的程度，是含有丰富信息的理解基因转录调控的结构特征。研究表明，在胁迫响应、刺激反应和发育过程中染色质可及性高度动态化；而且真核生物的染色质可及性也具有高度细胞特异性。目前的检测技术如DNase-seq、ATAC-seq和FAIRE-seq仅能揭示特定物种中存在的一部分染色质开放区域（open chromatin region，OCR），因此OCRs的全基因组分布仍然是未知的。在这项研究中，我们开发了一个生物信息学工具CharPlant，用于从头预测植物基因组中染色质开放区域。在CharPlant中我们构建了一个三层卷积神经网络，然后使用四种植物（水稻、拟南芥、苜蓿和番茄）的ATAC-seq和DNase-seq数据集训练该模型。该模型同时学习序列基序和调控逻辑以确定DNA的可及性，所有步骤都集成在CharPlant中可使用简单的命令行运行。利用CharPlant进行数据分析的结果证实了该方法的预测能力和计算效率。CharPlant是第一个能够在整个基因组中识别潜在OCRs的从头预测工具，其源代码和支持文件可以从下面的链接免费获取：https://github.com/Yin-Shen/CharPlant。

Page 860-871

Download 1591