Article Online

Articles Online (Volume 18, Issue 6)


From Mutation Signature to Molecular Mechanism in the RNA World: A Case of SARS-CoV-2

Jun Yu

Page 627-639

Original Research

Population Genetics of SARS-CoV-2: Disentangling Effects of Sampling Bias and Infection Clusters

Qi Liu, Shilei Zhao, Cheng-Min Shi, Shuhui Song, Sihui Zhu, Yankai Su, Wenming Zhao, Mingkun Li, Yiming Bao, Yongbiao Xue, Hua Chen

A novel RNA virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is responsible for the ongoing outbreak of coronavirus disease 2019 (COVID-19). Population genetic analysis could be useful for investigating the origin and evolutionary dynamics of COVID-19. However, due to extensive sampling bias and existence of infection clusters during the epidemic spread, direct applications of existing approaches can lead to biased parameter estimations and data misinterpretation. In this study, we first present robust estimator for the time to the most recent common ancestor (TMRCA) and the mutation rate, and then apply the approach to analyze 12,909 genomic sequences of SARS-CoV-2. The mutation rate is inferred to be 8.69 × 10−4 per site per year with a 95% confidence interval (CI) of [8.61 × 10−4, 8.77 × 10−4], and the TMRCA of the samples inferred to be Nov 28, 2019 with a 95% CI of [Oct 20, 2019, Dec 9, 2019]. The results indicate that COVID-19 might originate earlier than and outside of Wuhan Seafood Market. We further demonstrate that genetic polymorphism patterns, including the enrichment of specific haplotypes and the temporal allele frequency trajectories generated from infection clusters, are similar to those caused by evolutionary forces such as natural selection. Our results show that population genetic methods need to be developed to efficiently detangle the effects of sampling bias and infection clusters to gain insights into the evolutionary mechanism of SARS-CoV-2. Software for implementing VirusMuT can be downloaded at

Page 640-647

Original Research

Compositional Variability and Mutation Spectra of Monophyletic SARS-CoV-2 Clades

Xufei Teng, Qianpeng Li, Zhao Li, Yuansheng Zhang, Guangyi Niu, Jingfa Xiao, Jun Yu, Zhang Zhang, Shuhui Song

COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months, and a global fight against both has been intensifying. Here, we describe an analysis procedure where genome composition and its variables are related, through the genetic code to molecular mechanisms, based on understanding of RNA replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex. Our analysis starts with primary sequence information, identity-based phylogeny based on 22,051 SARS-CoV-2 sequences, and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades. All are tailored to two key mechanisms: strand-biased and function-associated mutations. Our findings are listed as follows: 1) The most dominant mutation is C-to-U permutation, whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity, albeit assumed most slightly deleterious. 2) The second abundance group includes three negative-strand mutations (U-to-C, A-to-G, and G-to-A) and a positive-strand mutation (G-to-U) due to DNA repair mechanisms after cellular abasic events. 3) A clade-associated biased mutation trend is found attributable to elevated level of negative-sense strand synthesis. 4) Within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes. These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes, to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications. Such actions are in desperate need, especially in the middle of the War against COVID-19.

Page 648-663

Original Research

Long Non-coding RNA Derived from lncRNA–mRNA Co-expression Networks Modulates the Locust Phase Change

Ting Li, Bing Chen, Pengcheng Yang, Depin Wang, Baozhen Du, Le Kang

Long non-coding RNAs (lncRNAs) regulate various biological processes ranging from gene expression to animal behavior. Although protein-coding genes, microRNAs, and neuropeptides play important roles in the regulation of phenotypic plasticity in migratory locust, empirical studies on the function of lncRNAs in this process remain limited. Here, we applied high-throughput RNA-seq to compare the expression patterns of lncRNAs and mRNAs in the time course of locust phase change. We found that lncRNAs responded more rapidly at the early stages of phase transition. Functional annotations demonstrated that early changed lncRNAs employed different pathways in isolation and crowding phases to cope with changes in the population density. Two overlapping hub lncRNA loci in the crowding and isolation networks were screened for functional verification. One of them, LNC1010057, was validated as a potential regulator of locust phase change. This work offers insights into the molecular mechanism underlying locust phase change and expands the scope of lncRNA functions in animal behavior.
长链非编码RNA(lncRNA)调控了从基因表达到动物行为等众多的生物学过程,但人们对于昆虫体内lncRNA的表达和功能的了解还很少。蝗虫是世界性的农业害虫,尽管很多研究表明蛋白编码基因,小RNA和神经肽等均在飞蝗群散两型转变的过程中发挥重要作用,但是关于型变过程中lncRNA功能的实验性研究还很少。在本研究中,我们利用高通量RNA测序的方法比较了飞蝗型变过程中lncRNA和mRNA的表达模式。我们发现lncRNA在型变的早期反应更快,并且在群居化和散居化过程中参与不同通路的调节。通过lncRNA–mRNA共表达网络,我们鉴定到多个核心的lncRNA,并筛选了其中两个lncRNA进行功能实验的验证。体内干扰实验和行为实验表明,其中一个lncRNA LNC1010057可能调控了飞蝗的群散两型转变过程,此结果说明lncRNA在飞蝗的型变过程中可能发挥十分重要的作用。该工作为深入探究飞蝗型变的分子机制提供了新的角度并拓展了人们对lncRNA在调节动物行为中所起作用的认识。

Page 664-678


MicroPhenoDB Associates Metagenomic Data with Pathogenic Microbes, Microbial Core Genes, and Human Disease Phenotypes

Guocai Yao, Wenliang Zhang, Minglei Yang, Huan Yang, Jianbo Wang, Haiyue Zhang, Lai Wei, Zhi Xie, Weizhong Li

Microbes play important roles in human health and disease. The interaction between microbes and hosts is a reciprocal relationship, which remains largely under-explored. Current computational resources lack manually and consistently curated data to connect metagenomic data to pathogenic microbes, microbial core genes, and disease phenotypes. We developed the MicroPhenoDB database by manually curating and consistently integrating microbe-disease association data. MicroPhenoDB provides 5677 non-redundant associations between 1781 microbes and 542 human disease phenotypes across more than 22 human body sites. MicroPhenoDB also provides 696,934 relationships between 27,277 unique clade-specific core genes and 685 microbes. Disease phenotypes are classified and described using the Experimental Factor Ontology (EFO). A refined score model was developed to prioritize the associations based on evidential metrics. The sequence search option in MicroPhenoDB enables rapid identification of existing pathogenic microbes in samples without running the usual metagenomic data processing and assembly. MicroPhenoDB offers data browsing, searching, and visualization through user-friendly web interfaces and web service application programming interfaces. MicroPhenoDB is the first database platform to detail the relationships between pathogenic microbes, core genes, and disease phenotypes. It will accelerate metagenomic data analysis and assist studies in decoding microbes related to human diseases. MicroPhenoDB is available through and
研究问题: 宏基因组数据与病原微生物、微生物核心基因和疾病表型关联与量化,以及构建关联关系的数据库平台。 研究方法: 研究人员通过人工编审和计算方法系统整理集成病原微生物、微生物核心基因和人类疾病表型的关联数据,以及毒力因子基因和抗生素耐药性基因的相关信息;通过赋予不同研究证据的不同权重优化评分模型,以量化微生物与人类疾病的相关性。 主要成果1:对微生物-疾病表型关联性的数据集进行整合和标准化注释,获得人类疾病表型与微生物关联性的高质量数据集。 主要成果2:通过改进评分模型对微生物-疾病关联进行定量描述,以科学的量化分值评估微生物与疾病表型关联性强弱。 主要成果3:数据库平台提供用户友好界面和API网络应用,以完成在线数据浏览、搜索和可视化分析服务,其序列检索能够快速识别宏基因组样品中存在的病原微生物,避免了宏基因组数据常规处理的繁杂步骤。 主要成果4:利用MicroPhenoDB提供的数据和分析工具,作者完成了多个微生物与疾病表型相关的数据分析案例。 数据库链接 和

Page 670-772

Original Research

Screening of Potential Biomarkers for Gastric Cancer with Diagnostic Value Using Label-free Global Proteome Analysis

Yongxi Song, Jun Wang, Jingxu Sun, Xiaowan Chen, Jinxin Shi, Zhonghua Wu, Dehao Yu, Fei Zhang, Zhenning Wang

Gastric cancer (GC) is known as a top malignant type of tumors worldwide. Despite the recent decrease in mortality rates, the prognosis remains poor. Therefore, it is necessary to find novel biomarkers with early diagnostic value for GC. In this study, we present a large-scale proteomic analysis of 30 GC tissues and 30 matched healthy tissues using label-free global proteome profiling. Our results identified 537 differentially expressed proteins, including 280 upregulated and 257 downregulated proteins. The ingenuity pathway analysis (IPA) results indicated that the sirtuin signaling pathway was the most activated pathway in GC tissues whereas oxidative phosphorylation was the most inhibited. Moreover, the most activated molecular function was cellular movement, including tissue invasion by tumor cell lines. Based on IPA results, 15 hub proteins were screened. Using the receiver operating characteristic curve, most of hub proteins showed a high diagnostic power in distinguishing between tumors and healthy controls. A four-protein (ATP5B-ATP5O-NDUFB4-NDUFB8) diagnostic signature was built using a random forest model. The area under the curve (AUC) values of this model were 0.996 and 0.886 for the training and testing sets, respectively, suggesting that the four-protein signature has a high diagnostic power. This signature was further tested with independent datasets using plasma enzyme-linked immune sorbent assays, resulting in an AUC value of 0.778 for distinguishing GC tissues from healthy controls, and using immunohistochemical tissue microarray analysis, resulting in an AUC value of 0.805. In conclusion, this study identifies potential biomarkers and improves our understanding of the pathogenesis, providing novel therapeutic targets for GC.

Page 679-695

Original Research

Classification of the Gut Microbiota of Patients in Intensive Care Units During Development of Sepsis and Septic Shock

Wanglin Liu, Mingyue Cheng, Jinman Li, Peng Zhang, Hang Fan, Qinghe Hu, Maozhen Han, Longxiang Su, Huaiwu He, Yigang Tong, Kang Ning, Yun Long

The gut microbiota of intensive care unit (ICU) patients displays extreme dysbiosis associated with increased susceptibility to organ failure, sepsis, and septic shock. However, such dysbiosis is difficult to characterize owing to the high dimensional complexity of the gut microbiota. We tested whether the concept of enterotype can be applied to the gut microbiota of ICU patients to describe the dysbiosis. We collected 131 fecal samples from 64 ICU patients diagnosed with sepsis or septic shock and performed 16S rRNA gene sequencing to dissect their gut microbiota compositions. During the development of sepsis or septic shock and during various medical treatments, the ICU patients always exhibited two dysbiotic microbiota patterns, or ICU-enterotypes, which could not be explained by host properties such as age, sex, and body mass index, or external stressors such as infection site and antibiotic use. ICU-enterotype I (ICU E1) comprised predominantly Bacteroides and an unclassified genus of Enterobacteriaceae, while ICU-enterotype II (ICU E2) comprised predominantly Enterococcus. Among more critically ill patients with Acute Physiology and Chronic Health Evaluation II (APACHE II) scores > 18, septic shock was more likely to occur with ICU E1 (P = 0.041). Additionally, ICU E1 was correlated with high serum lactate levels (P = 0.007). Therefore, different patterns of dysbiosis were correlated with different clinical outcomes, suggesting that ICU-enterotypes should be diagnosed as independent clinical indices. Thus, the microbial-based human index classifier we propose is precise and effective for timely monitoring of ICU-enterotypes of individual patients. This work is a first step toward precision medicine for septic patients based on their gut microbiota profiles.
重症监护病房(ICU)患者肠道菌群的紊乱会增加他们患器官衰竭,脓毒症和感染性休克的几率。然而,由于肠道菌群具有高维复杂性,这种紊乱难以被简单定义。针对这一难题,本文作者应用肠型的概念来将这种ICU菌群紊乱进行分类,并定义为ICU肠型。作者从64位诊断为脓毒症或感染性休克的ICU患者中收集了131份粪便样本,并进行了16S rRNA基因测序,以分析其肠道菌群组成。发现在脓毒症或感染性休克的发展过程中,ICU患者的肠道菌群总是能聚类到两种ICU肠型。并且ICU 肠型的存在并不能由宿主特性如年龄,性别和体重指数,或外部压力因素例如感染部位和抗生素来解释。 ICU肠型I型(ICU E1)主要由拟杆菌属和某肠杆菌科的未鉴定属作为驱动菌,而ICU肠型II型(ICU E2)主要由肠球菌作为驱动菌。在APACHE II得分大于18的危重患者中,ICU E1的患者更有可能发生感染性休克(P = 0.041)。此外,ICU E1的患者会有更高的血清乳酸水平(P = 0.007)。作者发现不同ICU肠型的病人会对应不同的病理状态,可将ICU肠型作为独立的临床指标。作者基于此提出的MHI分类器,对于及时监测单个患者的ICU肠型是精确而有效的。这项研究为基于脓毒症患者肠道菌群的精准医学干预迈出了第一步。

Page 696-707

Original Research

Increased Expression of Colonic Mucosal Melatonin in Patients with Irritable Bowel Syndrome Correlated with Gut Dysbiosis

Ben Wang, Shiwei Zhu, Zuojing Liu, Hui Wei, Lu Zhang, Meibo He, Fei Pei, Jindong Zhang, Qinghua Sun, Liping Duan

Dysregulation of the gut microbiota/gut hormone axis contributes to the pathogenesis of irritable bowel syndrome (IBS). Melatonin plays a beneficial role in gut motility and immunity. However, altered expression of local mucosal melatonin in IBS and its relationship with the gut microbiota remain unclear. Therefore, we aimed to detect the colonic melatonin levels and microbiota profiles in patients with diarrhea-predominant IBS (IBS-D) and explore their relationship in germ-free (GF) rats and BON-1 cells. Thirty-two IBS-D patients and twenty-eight healthy controls (HCs) were recruited. Fecal specimens from IBS-D patients and HCs were separately transplanted into GF rats by gavage. The levels of colon mucosal melatonin were assessed by immunohistochemical methods, and fecal microbiota communities were analyzed using 16S rDNA sequencing. The effect of butyrate on melatonin synthesis in BON-1 cells was evaluated by ELISA. Melatonin levels were significantly increased and negatively correlated with visceral hypersensitivity in IBS-D patients. GF rats inoculated with fecal microbiota from IBS-D patients had high colonic melatonin levels. Butyrate-producing Clostridium cluster XIVa species, such as Roseburia species and Lachnospira species, were positively related to colonic mucosal melatonin expression. Butyrate significantly increased melatonin secretion in BON-1 cells. Increased melatonin expression may be an adaptive protective mechanism in the development of IBS-D. Moreover, some Clostridium cluster XIVa species could increase melatonin expression via butyrate production. Modulation of the gut hormone/gut microbiota axis offers a promising target of interest for IBS in the future.
•研究问题: IBS患者肠道黏膜褪黑素水平与肠道菌群之间的关系尚不清楚。 •研究方法: 纳入32名符合罗马III诊断标准的IBS-D患者和 28 名健康志愿者,并应用无菌雄性 Sprague-Dawley(SD) 大鼠,分别以 IBS-D 患者和对照者的粪便进行粪菌移植。结肠黏膜褪黑素的表达水平应用免疫组织化学方法进行检测。肠道菌群采用16S rDNA 测序分析。此外,使用丁酸钠干预 BON-1 细胞系,并用 ELISA 方法检测细胞上清中的褪黑素水平。 •主要结果1: IBS-D 患者结肠黏膜褪黑素水平显著高于健康对照组,并且与内脏敏感性呈显著负相关。 •主要结果2: 移植 IBS-D 患者菌群的大鼠结肠黏膜褪黑素水平显著高于移植健康志愿者菌群的大鼠。 •主要结果3: IBS-D患者伴有显著的肠道菌群紊乱。 •主要结果4: 产丁酸的梭菌 XIVa 簇的 Roseburia 和Lachnospira 菌群与结肠黏膜褪黑素的水平显著正相关。 •主要结果5: 丁酸可以显著促进 BON-1 细胞褪黑素的释放。 •数据链接: (GSA: CRA001604)

Page 708-720

Original Research

Antidiabetic Effects of Gegen Qinlian Decoction via the Gut Microbiota Are Attributable to Its Key Ingredient Berberine

Xizhan Xu , Zezheng Gao, Fuquan Yang, Yingying Yang, Liang Chen, Lin Han, Na Zhao, Jiayue Xu, Xinmiao Wang, Yue Ma, Lian Shu, Xiaoxi Hu, Na Lyu, Yuanlong Pan, Baoli Zhu, Linhua Zhao, Xiaolin Tong, Jun Wang

Gegen Qinlian Decoction (GQD), a traditional Chinese medicine (TCM) formula, has long been used for the treatment of common metabolic diseases, including type 2 diabetes mellitus. However, the main limitation of its wider application is ingredient complexity of this formula. Thus, it is critically important to identify the major active ingredients of GQD and to illustrate mechanisms underlying its action. Here, we compared the effects of GQD and berberine, a hypothetical key active pharmaceutical ingredient of GQD, on a diabetic rat model by comprehensive analyses of gut microbiota, short-chain fatty acids, proinflammatory cytokines, and ileum transcriptomics. Our results show that berberine and GQD had similar effects on lowering blood glucose levels, modulating gut microbiota, inducing ileal gene expression, as well as relieving systemic and local inflammation. As expected, both berberine and GQD treatment significantly altered the overall gut microbiota structure and enriched many butyrate-producing bacteria, including Faecalibacterium and Roseburia, thereby attenuating intestinal inflammation and lowering glucose. Levels of short-chain fatty acids in rat feces were also significantly elevated after treatment with berberine or GQD. Moreover, concentration of serum proinflammatory cytokines and expression of immune-related genes, including Nfkb1, Stat1, and Ifnrg1, in pancreatic islets were significantly reduced after treatment. Our study demonstrates that the main effects of GQD can be attributed to berberine via modulating gut microbiota. The strategy employed would facilitate further standardization and widespread application of TCM in many diseases.
•研究问题: 鉴定中药复方葛根芩连汤中通过调节肠道菌群治疗2型糖尿病的关键活性成分,并阐明其可能的作用机制。 •研究方法: 本研究中,我们以自发2型糖尿病GK大鼠为动物模型,共设置5组,包括正常Wistar大鼠组(N组)、糖尿病模型组(D组)、二甲双胍治疗组(M组)、葛根芩连汤治疗组(GQD组)和小檗碱治疗组(BBR组),实验共干预12周,期间持续监测体重及随机血糖变化情况。干预结束后,通过OGTT实验评估各组大鼠的葡萄糖耐量改善状况;利用16S rRNA基因扩增子测序评估药物干预后,大鼠的肠道菌群多样性及菌群结构的变化情况,并进一步检测各实验组大鼠粪便短链脂肪酸的变化,进行组间比较分析;利用回肠转录组测序,比较药物干预前后大鼠的肠道功能改变,通过生物信息学分析挖掘葛根芩连汤和小檗碱治疗2型糖尿病的潜在机制;利用Luminex液相芯片检测各组大鼠的血清细胞因子水平,评估系统性炎症水平的改善情况;利用实时荧光定量PCR,检测各组大鼠胰岛局部炎症缓解状况。 •主要结果1: 二甲双胍、葛根芩连汤以及小檗碱治疗均可显著降低GK大鼠的血糖及胰岛素抵抗水平。 •主要结果2: 葛根芩连汤和小檗碱治疗均可显著改变GK大鼠的肠道菌群结构,促进其向正常菌群结构恢复,显著富集包括Faecalibacterium、Roseburia和Gemmiger等菌属在内的多种产丁酸细菌,增加乙酸、丙酸和丁酸等短链脂肪酸含量。 •主要结果3: 葛根芩连汤和小檗碱治疗可显著下调回肠免疫相关基因的表达,并上调糖脂代谢相关基因表达,从而调控GK大鼠的肠道黏膜免疫,提高脂质和碳水化合物的代谢能力。 •主要结果4: 糖尿病GK大鼠的系统性炎症水平及胰岛局部炎症水平在药物治疗后显著降低。

Page 721-736


Different Gene Networks Are Disturbed by Zika Virus Infection in A Mouse Microcephaly Model

Yafei Chang, Yisheng Jiang, Cui Li, Qin Wang, Feng Zhang, Cheng-Feng Qin, Qing-Feng Wu, Jing Li, Zhiheng Xu

The association of Zika virus (ZIKV) infection with microcephaly has raised alarm worldwide. Their causal link has been confirmed in different animal models infected by ZIKV. However, the molecular mechanisms underlying ZIKV pathogenesis are far from clear. Hence, we performed global gene expression analysis of ZIKV-infected mouse brains to unveil the biological and molecular networks underpinning microcephaly. We found significant dysregulation of the sub-networks associated with brain development, immune response, cell death, microglial cell activation, and autophagy amongst others. We provided detailed analysis of the related complicated gene networks and the links between them. Additionally, we analyzed the signaling pathways that were likely to be involved. This report provides systemic insights into not only the pathogenesis, but also a path to the development of prophylactic and therapeutic strategies against ZIKV infection.
寨卡病毒(ZIKV)感染与小头畸形的关联得到了全世界的关注。它们的因果关系已在ZIKV感染的动物模型和临床研究中得到证实。 但是,ZIKV导致的小头畸形发病机理的分子机制尚不清楚。 因此,我们对ZIKV感染的小鼠大脑进行了详细的全基因组表达分析,以揭示寨卡病毒导致小头畸形所涉及的生物学过程和分子网络。我们发现,与脑发育、免疫反应、细胞死亡、小胶质细胞活化和自噬等相关的分子网络严重失调。我们对相关的复杂基因网络及其之间的联系进行了深入解析。此外,我们分析了多条可能涉及的信号通路。该研究不仅对发病机理提供了系统的见解,而且还将为开发针对ZIKV感染的预防和治疗提供重要的策略。

Page 737-748


The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR

Shuhui Song, Lina Ma, Dong Zou, Dongmei Tian, Cuiping Li, Junwei Zhu, Meili Chen, Anke Wang, Yingke Ma, Mengwei Li, Xufei Teng, Ying Cui, Guangya Duan, Mochen Zhang, Tong Jin, Chengmin Shi, Zhenglin Du, Yadong Zhang, Chuandong Liu, Rujiao Li, Jingyao Zeng, Lili Hao, Shuai Jiang, Hua Chen, Dali Han, Jingfa Xiao, Zhang Zhang, Wenming Zhao, Yongbiao Xue, Yiming Bao

On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, haplotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at

Page 749-759