Article Online

Articles Online (Volume 20, Issue 6)

Editorial

A Beloved Bioinformatician Buddy—In Memory of Professor Weimin Zhu

Yixue Li

Page 1037-1039


Original Research

Revisiting the Evolutionary History of Pigs via De Novo Mutation Rate Estimation in A Three-generation Pedigree

Mingpeng Zhang, Qiang Yang, Huashui Ai, Lusheng Huang

The mutation rate used in the previous analyses of pig evolution and demographics was cursory and hence invited potential bias in inferring evolutionary history. Herein, we estimated the de novo mutation rate of pigs as 3.6  109 per base per generation using high-quality whole-genome sequencing data from nine individuals in a three-generation pedigree through stringent filtering and validation. Using this mutation rate, we re-investigated the evolutionary history of pigs. The estimated divergence time of  10 kiloyears ago (KYA) between European wild and domesticated pigs was consistent with the domestication time of European pigs based on archaeological evidence. However, other divergence events inferred here were not as ancient as previously described. Our estimates suggest that Sus speciation occurred  1.36 million years ago (MYA); European wild pigs split from Asian wild pigs only  219 KYA; and south and north Chinese wild pigs split  25 KYA. Meanwhile, our results showed that the most recent divergence event between Chinese wild and domesticated pigs occurred in the Hetao Plain, northern China, approximately 20 KYA, supporting the possibly independent domestication in northern China along the middle Yellow River. We also found that the maximum effective population size of pigs was  6 times larger than estimated before. An archaic migration from other Sus species originating  2 MYA to European pigs was detected during western colonization of pigs, which may affect the accuracy of previous demographic inference. Our de novo mutation rate estimation and its consequences for demographic history inference reasonably provide a new vision regarding the evolutionary history of pigs.
研究问题: 以往猪群体遗传学研究中突变率参数都是借鉴人类的,这造成了猪的演化历史推断偏差。因此计算一个准确的猪的突变率对猪的群体遗传学研究是一个基础且重要的科学问题。 研究方法: 本研究利用一个特定设计的三代家系中9个个体的高深度重测序数据来估计猪的新发突变率。研究中利用BWA, GATK和 bcftools等软件进行比对和SNPs的检测,并经过一系列的严格筛选过程对新发突变(De Novo mutant, DNM)进行检测。所有的新发突变通过Sanger测序验证评估其准确性。利用家系的重测序数据计算突变率是目前计算突变率最为准确的方法。基于新得到的突变率,作者从NCBI SRA数据库下载了26个具有地域代表性的个体的高质量重测序数据,包含了卷毛野猪(Sus cebifrons)、苏门答腊野猪、中国南方野猪、中国北方野猪、欧洲野猪、大白猪、曼加利察猪、河套猪、八眉猪、民猪、金华猪、巴马香猪、和五指山猪,通过基因组分析方法重新研究了猪的进化史和驯化史。 主要结果1: 总共鉴定出了44个新发突变(DNMs),并计算出猪的突变率为3.6 × 10-9每世代每个位点。 主要结果2: 野猪同其他猪属分离的时间是~1.36 百万年前(Mya);然后猪从印度尼西亚岛屿到达欧亚大陆,并在~275 千年前(Kya)时扩散到亚洲东南部;野猪向欧洲的扩散的时间为 ~219 Kya;中国南方野猪直到25 Kya在迁移到中国北方。此外,沿黄河中游的华北地区可能是亚洲一个独立的驯化地点,野猪和驯化猪在那里分歧时间为~20 Kya。利用基因组数据,首次估计欧洲野生猪和欧洲驯化家猪的分化时间约为10 Kya。 主要结果3: 猪的最大的有效群体大小为2.7 × 105,是过去估计的6倍之大,并在野猪向欧洲迁徙的过程中,发现了一个更为严重的人口瓶颈。 主要结果4: 在野猪向欧洲迁徙过程中,古老猪属的渗入导致了过去研究中对猪的演化历史推断的偏差。 数据链接: http://bigd.big.ac.cn/gsa/CRA005031
研究问题: 以往猪群体遗传学研究中突变率参数都是借鉴人类的,这造成了猪的演化历史推断偏差。因此计算一个准确的猪的突变率对猪的群体遗传学研究是一个基础且重要的科学问题。 研究方法: 本研究利用一个特定设计的三代家系中9个个体的高深度重测序数据来估计猪的新发突变率。研究中利用BWA, GATK和 bcftools等软件进行比对和SNPs的检测,并经过一系列的严格筛选过程对新发突变(De Novo mutant, DNM)进行检测。所有的新发突变通过Sanger测序验证评估其准确性。利用家系的重测序数据计算突变率是目前计算突变率最为准确的方法。基于新得到的突变率,作者从NCBI SRA数据库下载了26个具有地域代表性的个体的高质量重测序数据,包含了卷毛野猪(Sus cebifrons)、苏门答腊野猪、中国南方野猪、中国北方野猪、欧洲野猪、大白猪、曼加利察猪、河套猪、八眉猪、民猪、金华猪、巴马香猪、和五指山猪,通过基因组分析方法重新研究了猪的进化史和驯化史。 主要结果1: 总共鉴定出了44个新发突变(DNMs),并计算出猪的突变率为3.6 × 10-9每世代每个位点。 主要结果2: 野猪同其他猪属分离的时间是~1.36 百万年前(Mya);然后猪从印度尼西亚岛屿到达欧亚大陆,并在~275 千年前(Kya)时扩散到亚洲东南部;野猪向欧洲的扩散的时间为 ~219 Kya;中国南方野猪直到25 Kya在迁移到中国北方。此外,沿黄河中游的华北地区可能是亚洲一个独立的驯化地点,野猪和驯化猪在那里分歧时间为~20 Kya。利用基因组数据,首次估计欧洲野生猪和欧洲驯化家猪的分化时间约为10 Kya。 主要结果3: 猪的最大的有效群体大小为2.7 × 105,是过去估计的6倍之大,并在野猪向欧洲迁徙的过程中,发现了一个更为严重的人口瓶颈。 主要结果4: 在野猪向欧洲迁徙过程中,古老猪属的渗入导致了过去研究中对猪的演化历史推断的偏差。 数据链接: http://bigd.big.ac.cn/gsa/CRA005031

Page 1040-1052


Original Research

Adaptive Bird-like Genome Miniaturization During the Evolution of Scallop Swimming Lifestyle

Yuli Li, Yaran Liu, Hongwei Yu, Fuyun Liu, ... Shi Wang

Genome miniaturization drives key evolutionary innovations of adaptive traits in vertebrates, such as the flight evolution of birds. However, whether similar evolutionary processes exist in invertebrates remains poorly understood. Derived from the second-largest animal phylum, scallops are a special group of bivalve molluscs and acquire the evolutionary novelty of the swimming lifestyle, providing excellent models for investigating the coordinated genome and lifestyle evolution. Here, we show for the first time that genome sizes of scallops exhibit a generally negative correlation with locomotion activity. To elucidate the co-evolution of genome size and swimming lifestyle, we focus on the Asian moon scallop (Amusium pleuronectes) that possesses the smallest known scallop genome while being among scallops with the highest swimming activity. Whole-genome sequencing of A. pleuronectes reveals highly conserved chromosomal macrosynteny and microsynteny, suggestive of a highly contracted but not degenerated genome. Genome reduction of A. pleuronectes is facilitated by significant inactivation of transposable elements, leading to reduced gene length, elevated expression of genes involved in energy-producing pathways, and decreased copy numbers and expression levels of biomineralization-related genes. Similar evolutionary changes of relevant pathways are also observed for bird genome reduction with flight evolution. The striking mimicry of genome miniaturization underlying the evolution of bird flight and scallop swimming unveils the potentially common, pivotal role of genome size fluctuation in the evolution of novel lifestyles in the animal kingdom.

Page 1066-1077


Original Research

Comparative Genomics Reveals Evolutionary Drivers of Sessile Life and Left-right Shell Asymmetry in Bivalves

Yang Zhang, Fan Mao, Shu Xiao, Haiyan Yu, ... Ziniu Yu

Tea green leafhopper (TGL), Empoasca onukii, is of biological and economic interest. Despite numerous studies, the mechanisms underlying its adaptation and evolution remain enigmatic. Here, we use previously untapped genome and population genetics approaches to examine how the pest adapted to different environmental variables and thus has expanded geographically. We complete a chromosome-level assembly and annotation of the E. onukii genome, showing notable expansions of gene families associated with adaptation to chemoreception and detoxification. Genomic signals indicating balancing selection highlight metabolic pathways involved in adaptation to a wide range of tea varieties grown across ecologically diverse regions. Patterns of genetic variations among 54 E. onukii samples unveil the population structure and evolutionary history across different tea-growing regions in China. Our results demonstrate that the genomic changes in key pathways, including those linked to metabolism, circadian rhythms, and immune system functions, may underlie the successful spread and adaptation of E. onukii. This work highlights the genetic and molecular basis underlying the evolutionary success of a species with broad economic impacts, and provides insights into insect adaptation to host plants, which will ultimately facilitate more sustainable pest management.
研究问题: 双壳贝类通常以足丝进行附着并具有对称的壳形态,然而牡蛎演变出了独特的“固着”方式,并导致左右壳不对称,然而驱动其形成的分子机制仍然是未解之谜。 研究方法: 本研究完成23.25 Gb PacBio(31.9X)及147.25 Gb illumina(201.8X)测序后,使用ALLPATH-LG进行illumina数据组装,同时借助WTDBG 1.1.006对PacBio数据组装。随后基于三代组装的Contigs,通过quickmerge对ALLPATH-LG生产的Contigs进行优化。使用LACHESIS完成Hi-C数据的染色体挂载,最终完成 729.6 Mb香港牡蛎高质量基因组组装。通过比较基因组分析、共线性分析、基因家族、转录组等联合分析深入解析牡蛎固着方式和壳不对称的演化机制。 主要成果1: 通过利用PacBio、Hi-C及Illumina测序,构建了高质量染色体水平的香港牡蛎基因组,为后续进化研究提供重要的框架基础。 主要成果2: 与其他已知双壳类动物基因组进行比较,发现同源异型框基因Antp的丢失和细胞外基质家族的扩张是驱动牡蛎足丝腺丢失和附着到固着转变的关键。 主要成果3: 牡蛎起始固着可以分为附着面的感知、细胞外基质/离子的分泌和细胞外基质修饰加工,此外锌结合基因相关的调控网络在协调细胞外基质修饰和启动粘附过程中发挥着重要作用。 主要成果4: 比较香港牡蛎(壳不对称)和珠母贝(壳对称)揭示了同源转录因子(Pitx2和Rfx6)及种系特异扩张基因家族tyrosinase的不对称表达,可能是壳不对称的分子基础。 数据链接: 本文报告的原始序列数据已保存在国家基因组科学数据中心(GSA登录号:CRA004099)(https://ngdc.cncb.ac.cn/search/?dbId=gsa&q=CRA004099)。本文报告的全基因组序列数据已保存在国家基因组科学数据中心(GWH登录号:GWHBAZL00000000)(https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHBAZL00000000)。 香港牡蛎基因组被保存在NCBI(BioProject Access:PRJA592306)的Bioproject数据库中。(https://www.ncbi.nlm.nih.gov/bioproject/PRJNA592306)。Hi-C数据(Biosample Accession: SRR10583824)(https://www.ncbi.nlm.nih.gov/sra/SRR10583824)。转录组的RNA序列数据已存入NCBI的Bioproject数据库(BioProject Accession:PRJNA588628),(https://www.ncbi.nlm.nih.gov/bioproject/PRJNA588628)。
研究问题: 双壳贝类通常以足丝进行附着并具有对称的壳形态,然而牡蛎演变出了独特的“固着”方式,并导致左右壳不对称,然而驱动其形成的分子机制仍然是未解之谜。 研究方法: 本研究完成23.25 Gb PacBio(31.9X)及147.25 Gb illumina(201.8X)测序后,使用ALLPATH-LG进行illumina数据组装,同时借助WTDBG 1.1.006对PacBio数据组装。随后基于三代组装的Contigs,通过quickmerge对ALLPATH-LG生产的Contigs进行优化。使用LACHESIS完成Hi-C数据的染色体挂载,最终完成 729.6 Mb香港牡蛎高质量基因组组装。通过比较基因组分析、共线性分析、基因家族、转录组等联合分析深入解析牡蛎固着方式和壳不对称的演化机制。 主要成果1: 通过利用PacBio、Hi-C及Illumina测序,构建了高质量染色体水平的香港牡蛎基因组,为后续进化研究提供重要的框架基础。 主要成果2: 与其他已知双壳类动物基因组进行比较,发现同源异型框基因Antp的丢失和细胞外基质家族的扩张是驱动牡蛎足丝腺丢失和附着到固着转变的关键。 主要成果3: 牡蛎起始固着可以分为附着面的感知、细胞外基质/离子的分泌和细胞外基质修饰加工,此外锌结合基因相关的调控网络在协调细胞外基质修饰和启动粘附过程中发挥着重要作用。 主要成果4: 比较香港牡蛎(壳不对称)和珠母贝(壳对称)揭示了同源转录因子(Pitx2和Rfx6)及种系特异扩张基因家族tyrosinase的不对称表达,可能是壳不对称的分子基础。 数据链接: 本文报告的原始序列数据已保存在国家基因组科学数据中心(GSA登录号:CRA004099)(https://ngdc.cncb.ac.cn/search/?dbId=gsa&q=CRA004099)。本文报告的全基因组序列数据已保存在国家基因组科学数据中心(GWH登录号:GWHBAZL00000000)(https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHBAZL00000000)。 香港牡蛎基因组被保存在NCBI(BioProject Access:PRJA592306)的Bioproject数据库中。(https://www.ncbi.nlm.nih.gov/bioproject/PRJNA592306)。Hi-C数据(Biosample Accession: SRR10583824)(https://www.ncbi.nlm.nih.gov/sra/SRR10583824)。转录组的RNA序列数据已存入NCBI的Bioproject数据库(BioProject Accession:PRJNA588628),(https://www.ncbi.nlm.nih.gov/bioproject/PRJNA588628)。

Page 1078-1091


Research Article

Genomes of Two Flying Squid Species Provide Novel Insights into Adaptations of Cephalopods to Pelagic Life

Min Li, Baosheng Wu, Peng Zhang, Ye Li, Wenjie Xu, Kun Wang, Qiang Qiu, Jun Zhang, Jie Li, Chi Zhang, Jiangtao Fan, Chenguang Feng, Zuozhi Chen

Tea green leafhopper (TGL), Empoasca onukii, is of biological and economic interest. Despite numerous studies, the mechanisms underlying its adaptation and evolution remain enigmatic. Here, we use previously untapped genome and population genetics approaches to examine how the pest adapted to different environmental variables and thus has expanded geographically. We complete a chromosome-level assembly and annotation of the E. onukii genome, showing notable expansions of gene families associated with adaptation to chemoreception and detoxification. Genomic signals indicating balancing selection highlight metabolic pathways involved in adaptation to a wide range of tea varieties grown across ecologically diverse regions. Patterns of genetic variations among 54 E. onukii samples unveil the population structure and evolutionary history across different tea-growing regions in China. Our results demonstrate that the genomic changes in key pathways, including those linked to metabolism, circadian rhythms, and immune system functions, may underlie the successful spread and adaptation of E. onukii. This work highlights the genetic and molecular basis underlying the evolutionary success of a species with broad economic impacts, and provides insights into insect adaptation to host plants, which will ultimately facilitate more sustainable pest management.

Page 1092-1105


Original Research

Genomic Variations in the Tea Leafhopper Reveal the Basis of Its Adaptive Evolution

Qian Zhao, Longqing Shi, Weiyi He, Jinyu Li, ... Minsheng You

Tea green leafhopper (TGL), Empoasca onukii, is of biological and economic interest. Despite numerous studies, the mechanisms underlying its adaptation and evolution remain enigmatic. Here, we use previously untapped genome and population genetics approaches to examine how the pest adapted to different environmental variables and thus has expanded geographically. We complete a chromosome-level assembly and annotation of the E. onukii genome, showing notable expansions of gene families associated with adaptation to chemoreception and detoxification. Genomic signals indicating balancing selection highlight metabolic pathways involved in adaptation to a wide range of tea varieties grown across ecologically diverse regions. Patterns of genetic variations among 54 E. onukii samples unveil the population structure and evolutionary history across different tea-growing regions in China. Our results demonstrate that the genomic changes in key pathways, including those linked to metabolism, circadian rhythms, and immune system functions, may underlie the successful spread and adaptation of E. onukii. This work highlights the genetic and molecular basis underlying the evolutionary success of a species with broad economic impacts, and provides insights into insect adaptation to host plants, which will ultimately facilitate more sustainable pest management.

Page 1092-1105


Original Research

Genome Assembly and Population Resequencing Reveal the Geographical Divergence of Shanmei (Rubus corchorifolius)

Yinqing Yang, Kang Zhang, Ya Xiao, Lingkui Zhang, ... Feng Cheng

Rubus corchorifolius (Shanmei or mountain berry, 2n = 14) is widely distributed in China, and its fruits possess high nutritional and medicinal values. Here, we reported a high-quality chromosome-scale genome assembly of Shanmei, with contig size of 215.69 Mb and 26,696 genes. Genome comparison among Rosaceae species showed that Shanmei and Fupenzi (Rubus chingii Hu) were most closely related, followed by blackberry (Rubus occidentalis), and that environmental adaptation-related genes were expanded in the Shanmei genome. Further resequencing of 101 samples of Shanmei collected from four regions in the provinces of Yunnan, Hunan, Jiangxi, and Sichuan in China revealed that among these samples, the Hunan population of Shanmei possessed the highest diversity and represented the more ancestral population. Moreover, the Yunnan population underwent strong selection based on the nucleotide diversity, linkage disequilibrium, and historical effective population size analyses. Furthermore, genes from candidate genomic regions that showed strong divergence were significantly enriched in the flavonoid biosynthesis and plant hormone signal transduction pathways, indicating the genetic basis of adaptation of Shanmei to the local environment. The high-quality assembled genome and the variome dataset of Shanmei provide valuable resources for breeding applications and for elucidating the genome evolution and ecological adaptation of Rubus species.

Page 1106-1118


Original Research

Chromosome-level Genomes Reveal the Genetic Basis of Descending Dysploidy and Sex Determination in Morus Plants

Zhongqiang Xia, Xuelei Dai, Wei Fan, Changying Liu, ... Aichun Zhao

Multiple plant lineages have independently evolved sex chromosomes and variable karyotypes to maintain their sessile lifestyles through constant biological innovation. Morus notabilis, a dioecious mulberry species, has the fewest chromosomes among Morus spp., but the genetic basis of sex determination and karyotype evolution in this species has not been identified. In this study, three high-quality genome assemblies were generated for Morus spp. [including dioecious M. notabilis (male and female) and Morus yunnanensis (female)] with genome sizes of 301–329 Mb and were grouped into six pseudochromosomes. Using a combination of genomic approaches, we found that the putative ancestral karyotype of Morus species was close to 14 protochromosomes, and that several chromosome fusion events resulted in descending dysploidy (2n = 2x = 12). We also characterized a ∼ 6.2-Mb sex-determining region on chromosome 3. Four potential male-specific genes, a partially duplicated DNA helicase gene (named MSDH) and three Ty3_Gypsy long terminal repeat retrotransposons (named MSTG1/2/3), were identified in the Y-linked area and considered to be strong candidate genes for sex determination or differentiation. Population genomic analysis showed that Guangdong accessions in China were genetically similar to Japanese accessions of mulberry. In addition, genomic areas containing selective sweeps that distinguish domesticated mulberry from wild populations in terms of flowering and disease resistance were identified. Our study provides an important genetic resource for sex identification research and molecular breeding in mulberry.

Page 1119–1137


Original Research

Multi-omics Analyses Provide Insight into the Biosynthesis Pathways of Fucoxanthin in Isochrysis galbana

Duo Chen, Xue Yuan, Xuehai Zheng, Jingping Fang, Gang Lin, Rongmao Li, Jiannan Chen, Wenjin He, Zhen Huang, Wenfang Fan, Limin Liang, Chentao Lin, Jinmao Zhu, Youqiang Chen, Ting Xue

Isochrysis galbana is considered an ideal bait for functional foods and nutraceuticals of humans because of its high fucoxanthin (Fx) content. However, multi-omics analysis of the regulatory networks for Fx biosynthesis in I. galbana has not been reported. In this study, we report a high-quality genome assembly of I. galbana LG007, which has a genome size of 92.73 Mb, with a contig N50 of 6.99 Mb and 14,900 protein-coding genes. Phylogenetic analysis confirmed the monophyly of Haptophyta, with I. galbana sister to Emiliania huxleyi and Chrysochromulina tobinii. Evolutionary analysis revealed an estimated divergence time between I. galbana and E. huxleyi of ∼ 133 million years ago. Gene family analysis indicated that lipid metabolism-related genes exhibited significant expansion, including IgPLMT, IgOAR1, and IgDEGS1. Metabolome analysis showed that the content of carotenoids in I. galbana cultured under green light for 7 days was higher than that under white light, and β-carotene was the main carotenoid, accounting for 79.09% of the total carotenoids. Comprehensive multi-omics analysis revealed that the content of β-carotene, antheraxanthin, zeaxanthin, and Fx was increased by green light induction, which was significantly correlated with the expression of IgMYB98, IgZDS, IgPDS, IgLHCX2, IgZEP, IgLCYb, and IgNSY. These findings contribute to the understanding of Fx biosynthesis and its regulation, providing a valuable reference for food and pharmaceutical applications.

Page 1138-1153


Original Research

Genomic Epidemiology of Carbapenemase-producing Klebsiella pneumoniae in China

Cuidan Li, Xiaoyuan Jiang, Tingting Yang, Yingjiao Ju, ... Dongsheng Zhou

Page 1154-1167


Original Research

Genomic Shift in Population Dynamics of mcr-1-positive Escherichia coli in Human Carriage

Yingbo Shen, Rong Zhang, Dongyan Shao, Lu Yang, ... Zhangqi Shen

Emergence of the colistin resistance gene, mcr-1, has attracted worldwide attention. Despite the prevalence of mcr-1-positive Escherichia coli (MCRPEC) strains in human carriage showing a significant decrease between 2016 and 2019, genetic differences in MCRPEC strains remain largely unknown. We therefore conducted a comparative genomic study on MCRPEC strains from fecal samples of healthy human subjects in 2016 and 2019. We identified three major differences in MCRPEC strains between these two time points. First, the insertion sequence ISApl1 was often deleted and the percentage of mcr-1-carrying IncI2 plasmids was increased in MCRPEC strains in 2019. Second, the antibiotic resistance genes (ARGs), aac(3)-IVa and blaCTX-M-1, emerged and coexisted with mcr-1 in 2019. Third, MCRPEC strains in 2019 contained more virulence genes, resulting in an increased proportion of extraintestinal pathogenic E. coli (ExPEC) strains (36.1%) in MCRPEC strains in 2019 compared to that in 2016 (10.5%), implying that these strains could occupy intestinal ecological niches by competing with other commensal bacteria. Our results suggest that despite the significant reduction in the prevalence of MCRPEC strains in humans from 2016 to 2019, MCRPEC exhibits increased resistance to other clinically important ARGs and contains more virulence genes, which may pose a potential public health threat.

Page 1168-1179


Original Research

Reprogramming Mycobacterium tuberculosis CRISPR System for Gene Editing and Genomewide RNA Interference Screening

Khaista Rahman, Muhammad Jamal, Xi Chen, Wei Zhou, ... Gang Cao

Mycobacterium tuberculosis is the causative agent of tuberculosis (TB), which is still the leading cause of mortality from a single infectious disease worldwide. The development of novel anti-TB drugs and vaccines is severely hampered by the complicated and time-consuming genetic manipulation techniques for M. tuberculosis. Here, we harnessed an endogenous type III-A CRISPR/Cas10 system of M. tuberculosis for efficient gene editing and RNA interference (RNAi). This simple and easy method only needs to transform a single mini-CRISPR array plasmid, thus avoiding the introduction of exogenous protein and minimizing proteotoxicity. We demonstrated that M. tuberculosis genes can be efficiently and specifically knocked in/out by this system as confirmed by DNA high-throughput sequencing. This system was further applied to single- and multiple-gene RNAi. Moreover, we successfully performed genome-wide RNAi screening to identify M. tuberculosis genes regulating in vitro and intracellular growth. This system can be extensively used for exploring the functional genomics of M. tuberculosis and facilitate the development of novel anti-TB drugs and vaccines.
世界上约有三分之一的人口感染了结核分枝杆菌。潜在的抗结核病疫苗、抗生素耐药以及宿主体内结核分枝杆菌的持续存在使得结核病的预防和控制十分困难。为了确定药物靶标并开发有效疫苗,强大的基因组编辑工具和高通量筛选方法等功能遗传研究必不可少。我们利用结核分枝杆菌的内源性III-A型CRISPR-Cas10系统,通过电转含有微型CRISPR阵列的质粒,进行基因编辑、RNA干扰和基因组规模的RNA干扰筛选。此外,我们应用此工具来鉴定对巨噬细胞内生存至关重要的结核分枝杆菌基因。我们的结果证明了几个结核分枝杆菌特异性药物靶点和疫苗候选基因。该系统可广泛用于探索结核分枝杆菌的功能基因组学。

Page 1180-1196


Method

JAX-CNV: A Whole-genome Sequencing-based Algorithm for Copy Number Detection at Clinical Grade Level

Wan-Ping Lee, Qihui Zhu, Xiaofei Yang, Silvia Liu, ... Chengsheng Zhang

We aimed to develop a whole-genome sequencing (WGS)-based copy number variant (CNV) calling algorithm with the potential of replacing chromosomal microarray assay (CMA) for clinical diagnosis. JAX-CNV is thus developed for CNV detection from WGS data. The performance of this CNV calling algorithm was evaluated in a blinded manner on 31 samples and compared to the 112 CNVs reported by clinically validated CMAs for these 31 samples. The result showed that JAX-CNV recalled 100% of these CNVs. Besides, JAX-CNV identified an average of 30 CNVs per individual, respresenting an approximately seven-fold increase compared to calls of clinically validated CMAs. Experimental validation of 24 randomly selected CNVs showed one false positive, i.e., a false discovery rate (FDR) of 4.17%. A robustness test on lower-coverage data revealed a 100% sensitivity for CNVs larger than 300 kb (the current threshold for College of American Pathologists) down to 10× coverage. For CNVs larger than 50 kb, sensitivities were 100% for coverages deeper than 20×, 97% for 15×, and 95% for 10×. We developed a WGS-based CNV pipeline, including this newly developed CNV caller JAX-CNV, and found it capable of detecting CMA-reported CNVs at a sensitivity of 100% with about a FDR of 4%. We propose that JAX-CNV could be further examined in a multi-institutional study to justify the transition of first-tier genetic testing from CMAs to WGS. JAX-CNV is available at https://github.com/TheJacksonLaboratory/JAX-CNV.

Page 1197-1206


Web Server

Ori-Finder 2022: A Comprehensive Web Server for Prediction and Analysis of Bacterial Replication Origins

Mei-Jing Dong, Hao Luo, Feng Gao

The replication of DNA is a complex biological process that is essential for life. Bacterial DNA replication is initiated at genomic loci referred to as replication origins (oriCs). Integrating the Z-curve method, DnaA box distribution, and comparative genomic analysis, we developed a web server to predict bacterial oriCs in 2008 called Ori-Finder, which is helpful to clarify the characteristics of bacterial oriCs. The oriCs of hundreds of sequenced bacterial genomes have been annotated in the genome reports using Ori-Finder and the predicted results have been deposited in DoriC, a manually curated database of oriCs. This has facilitated large-scale data mining of functional elements in oriCs and strand-biased analysis. Here, we describe Ori-Finder 2022 with updated prediction framework, interactive visualization module, new analysis module, and user-friendly interface. More species-specific indicator genes and functional elements of oriCs are integrated into the updated framework, which has also been redesigned to predict oriCs in draft genomes. The interactive visualization module displays more genomic information related to oriCs and their functional elements. The analysis module includes regulatory protein annotation, repeat sequence discovery, homologous oriC search, and strand-biased analyses. The redesigned interface provides additional customization options for oriC prediction. Ori-Finder 2022 is freely available at http://tubic.tju.edu. cn/Ori-Finder/ and https://tubic.org/Ori-Finder/
研究问题: DNA复制开始的特定基因组位置被称为复制起始点。对复制起始点的识别和表征可以为DNA复制和细胞周期调控机制提供新的见解,并促进药物开发、基因组设计、质粒构建等方面的研究。然而,仅通过实验手段识别复制起始点费时费力,迫切需要开发可大规模预测复制起始点的生物信息学算法。 研究方法:Ori-Finder是整合Z曲线方法、DnaA box分布和比较基因组分析来预测细菌复制起始点的网上服务。自2008年发布以来,Ori-Finder已经应用于Nature、Nature Microbiology 等刊物发表的数百个新测序基因组的复制起始点预测(SCI引用两百余次,曾入选ESI高被引论文)。快速发展的高通量测序技术产生了大量的基因组草图,如何对其中的复制起始点进行预测已成为当前日益凸显、亟待解决的问题。同时,为了进一步整合最新的相关研究成果并提升用户体验,我们对Ori-Finder进行了全面升级,推出了最新版Ori-Finder 2022。 主要结果:Ori-Finder 2022具有重新设计的预测框架、友好的用户界面,交互式的可视化模块和新增的分析模块:重新设计的预测框架整合了更多物种特异的指示基因和功能元件,采用打分规则定量反映每个基因间序列的特征,并可用于基因组草图复制起始点的预测;基于新的预测框架,用户界面被重新设计,为用户提供了更多的自定义选项;交互式的可视化模块显示了更多与复制起始点及功能元件相关的信息;分析模块包括复制调控蛋白注释、重复序列发现、链偏差分析和同源复制起始点查找。 网上服务网址:http://tubic.tju.edu.cn/Ori-Finder/ 和https://tubic.org/Ori-Finder/

Page 1207-1213


Letter

Ongoing Positive Selection Drives the Evolution of SARS-CoV-2 Genomes

Yali Hou, Shilei Zhao, Qi Liu, Xiaolong Zhang, ... Hua Chen

SARS-CoV-2 is a new RNA virus affecting humans and spreads extensively throughout the world since its first outbreak in December, 2019. Whether the transmissibility and pathogenicity of SARS-CoV-2 in humans after zoonotic transfer are actively evolving, and driven by adaptation to the new host and environments is still under debate. Understanding the evolutionary mechanism underlying epidemiological and pathological characteristics of COVID-19 is essential for predicting the epidemic trend, and providing guidance for disease control and treatments. Interrogating novel strategies for identifying natural selection using within-species polymorphisms and 3,674,076 SARS-CoV-2 genome sequences of 169 countries as of December 30, 2021, we demonstrate with population genetic evidence that during the course of SARS-CoV-2 pandemic in humans, 1) SARS-CoV-2 genomes are overall conserved under purifying selection, especially for the 14 genes related to viral RNA replication, transcription, and assembly; 2) ongoing positive selection is actively driving the evolution of 6 genes (e.g., S, ORF3a, and N) that play critical roles in molecular processes involving pathogen–host interactions, including viral invasion into and egress from host cells, and viral inhibition and evasion of host immune response, possibly leading to high transmissibility and mild symptom in SARS-CoV-2 evolution. According to an established haplotype phylogenetic relationship of 138 viral clusters, a spatial and temporal landscape of 556 critical mutations is constructed based on their divergence among viral haplotype clusters or repeatedly increase in frequency within at least 2 clusters, of which multiple mutations potentially conferring alterations in viral transmissibility, pathogenicity, and virulence of SARS-CoV-2 are highlighted, warranting attention.

Page 1214-1223


Letter

Single-cell Sequencing Reveals Clearance of Blastula Chromosomal Mosaicism in In Vitro Fertilization Babies

Yuan Gao, Jinning Zhang, Zhenyu Liu, Shuyue Qi, ... Yuanqing Yao

Although chromosomal mosaic embryos detected by trophectoderm (TE) biopsy offer healthy embryos available for transfer, high-resolution postnatal karyotyping and chromosome testing of the transferred embryos are insufficient. Here, we applied single-cell multi-omics sequencing for seven infants with blastula chromosomal mosaicism detected by TE biopsy. The chromosome ploidy was examined by single-cell genome analysis, with the cellular identity being identified by single-cell transcriptome analysis. A total of 1616 peripheral leukocytes from seven infants with embryonic chromosomal mosaicism and three control ones with euploid TE biopsy were analyzed. A small number of blood cells showed copy number alterations (CNAs) on seemingly random locations at a frequency of 0%−2.5% per infant. However, none of the cells showed CNAs that were the same as those of the corresponding TE biopsies. The blastula chromosomal mosaicism may be fully self-corrected, probably through the selective loss of the aneuploid cells during development, and the transferred embryos can be born as euploid infants without mosaic CNAs corresponding to the TE biopsies. The results provide a new reference for the evaluations of transferring chromosomal mosaic embryos in certain situations.

Page 1224-1231