Volume: 21, Issue: 6

Editorial

T2T-YAO, T2T-SHUN, and more

Jingfa Xiao, Jun Yu

no abstract

Page 1081-1082


News and Views

T2T-YAO Reference Genome of Han Chinese — New Step in Advancing Precision Medicine in China

Xue Zhang

no abstract

Page 1083-1084


Original Research

T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese

Yukun He, Yanan Chu, Shuming Guo, Jiang Hu, Ran Li, Yali Zheng, Xinqian Ma, Zhenglin Du, Lili Zhao, Wenyi Yu, Jianbo Xue, Wenjie Bian, Feifei Yang, Xi Chen, Pingan Zhang, Rihan Wu, Yifan Ma, Changjun Shao, Jing Chen, Jian Wang, Jiwei Li, Jing Wu, Xiaoyi Hu, Qiuyue Long, Mingzheng Jiang, Hongli Ye, Shixu Song, Guangyao Li, Yue Wei, Yu Xu, Yanliang Ma, Yanwen Chen, Keqiang Wang, Jing Bao, Wen Xi, Fang Wang, Wentao Ni, Moqin Zhang, Yan Yu, Shengnan Li, Yu Kang, Zhancheng Gao

Since its initial release in 2001, the human reference genome has undergone continuous improvement in quality, and the recently released telomere-to-telomere (T2T) version — T2T-CHM13 — reaches its highest level of continuity and accuracy after 20 years of effort by working on a simplified, nearly homozygous genome of a hydatidiform mole cell line. Here, to provide an authentic complete diploid human genome reference for the Han Chinese, the largest population in the world, we assembled the genome of a male Han Chinese individual, T2T-YAO, which includes T2T assemblies of all the 22 + X + M and 22 + Y chromosomes in both haploids. The quality of T2T-YAO is much better than those of all currently available diploid assemblies, and its haploid version, T2T-YAO-hp, generated by selecting the better assembly for each autosome, reaches the top quality of fewer than one error per 29.5 Mb, even higher than that of T2T-CHM13. Derived from an individual living in the aboriginal region of the Han population, T2T-YAO shows clear ancestry and potential genetic continuity from the ancient ancestors. Each haplotype of T2T-YAO possesses ∼ 330-Mb exclusive sequences, ∼ 3100 unique genes, and tens of thousands of nucleotide and structural variations as compared with CHM13, highlighting the necessity of a population-stratified reference genome. The construction of T2T-YAO, an accurate and authentic representative of the Chinese population, would enable precise delineation of genomic variations and advance our understandings in the hereditability of diseases and phenotypes, especially within the context of the unique variations of the Chinese population.
研究问题 如何构建高精度中国汉族人二倍体端粒到端粒完整基因组参考序列?该序列与高加索人种基因组序列间的差异如何? 研究方法 作者利用“唐尧”样本的ONT ultra-long和 PacBio HiFi数据,及其父母样本的二代测序数据进行“唐尧”二倍体基因组的分型组装、补洞;使用DeepVariant、Merfin以及手动的方法对组装的基因组进行校对,利用Merqury评估基因组质量,使用Mummer和Syri等检测基因组间差异,并利用ddPCR和Bionano等进行验证。 主要结果 1. 实现“唐尧”基因组的T2T分型组装,获得中国人完整二倍体参考基因组。其优选染色体组成的单倍型参考序列T2T-YAO-hp的拼接质量值达到Q74.69,相当于平均29.5Mb基因序列中错误不多于一个,是目前拼接质量和完整性最高的人类基因组。 2. 与此前发表的最完整的T2T-CHM13单倍型基因组相比,“唐尧”基因组来自其父母的两个单倍型各有~330Mb差异序列和~3,100个差异基因,以及数十万SNP和数万结构变异,同时其父母单倍型之间的差异明显小于与CHM13之间的差异。

Page 1085-1100


Original Research

Novel Time-dependent Multi-omics Integration in Sepsis-associated Liver Dysfunction

Ann-Yae Na, Hyojin Lee, Eun Ki Min, Sanjita Paudel, So Young Choi, HyunChae Sim, Kwang-Hyeon Liu, Ki-Tae Kim, Jong-Sup Bae, Sangkyu Lee

The recently developed technologies that allow the analysis of each single omics have provided an unbiased insight into ongoing disease processes. However, it remains challenging to specify the study design for the subsequent integration strategies that can associate sepsis pathophysiology and clinical outcomes. Here, we conducted a time-dependent multi-omics integration (TDMI) in a sepsis-associated liver dysfunction (SALD) model. We successfully deduced the relation of the Toll-like receptor 4 (TLR4) pathway with SALD. Although TLR4 is a critical factor in sepsis progression, it is not specified in single-omics analyses but only in the TDMI analysis. This finding indicates that the TDMI-based approach is more advantageous than single-omics analyses in terms of exploring the underlying pathophysiological mechanism of SALD. Furthermore, TDMI-based approach can be an ideal paradigm for insightful biological interpretations of multi-omics datasets that will potentially reveal novel insights into basic biology, health, and diseases, thus allowing the identification of promising candidates for therapeutic strategies.

Page 1101-1116


Original Research

Comprehensive Characterization and Global Transcriptome Analysis of Human Fetal Liver Terminal Erythropoiesis

Yongshuai Han, Shihui Wang, Yaomei Wang, Yumin Huang, Chengjie Gao, Xinhua Guo, Lixiang Chen, Huizhi Zhao, Xiuli An

The fetal liver (FL) is the key erythropoietic organ during fetal development, but knowledge on human FL erythropoiesis is very limited. In this study, we sorted primary erythroblasts from FL cells and performed RNA sequencing (RNA-seq) analyses. We found that temporal gene expression patterns reflected changes in function during primary human FL terminal erythropoiesis. Notably, the expression of genes enriched in proteolysis and autophagy was up-regulated in orthochromatic erythroblasts (OrthoEs), suggesting the involvement of these pathways in enucleation. We also performed RNA-seq of in vitro cultured erythroblasts derived from FL CD34+ cells. Comparison of transcriptomes between the primary and cultured erythroblasts revealed significant differences, indicating impacts of the culture system on gene expression. Notably, the expression of lipid metabolism-related genes was increased in cultured erythroblasts. We further immortalized erythroid cell lines from FL and cord blood (CB) CD34+ cells (FL-iEry and CB-iEry, respectively). FL-iEry and CB-iEry were immortalized at the proerythroblast stage and can be induced to differentiate into OrthoEs, but their enucleation ability was very low. Comparison of the transcriptomes between OrthoEs with and without enucleation capability revealed the down-regulation of pathways involved in chromatin organization and mitophagy in OrthoEs without enucleation capacity, indicating that defects in chromatin organization and mitophagy contribute to the inability of OrthoEs to enucleate. Additionally, the expression of HBE1, HBZ, and HBG2 was up-regulated in FL-iEry compared with CB-iEry, and such up-regulation was accompanied by down-regulated expression of BCL11A and up-regulated expression of LIN28B and IGF2BP1. Our study provides new insights into human FL erythropoiesis and rich resources for future studies.

Page 1117-1132


Original Research

Acid–base Homeostasis and Implications to the Phenotypic Behaviors of Cancer

Yi Zhou, Wennan Chang, Xiaoyu Lu, Jin Wang, Chi Zhang, Ying Xu

Acid–base homeostasis is a fundamental property of living cells, and its persistent disruption in human cells can lead to a wide range of diseases. In this study, we conducted a computational modeling analysis of transcriptomic data of 4750 human tissue samples of 9 cancer types in The Cancer Genome Atlas (TCGA) database. Built on our previous study, we quantitatively estimated the average production rate of OH− by cytosolic Fenton reactions, which continuously disrupt the intracellular pH (pHi) homeostasis. Our predictions indicate that all or at least a subset of 43 reprogrammed metabolisms (RMs) are induced to produce net protons (H+) at comparable rates of Fenton reactions to keep the pHi stable. We then discovered that a number of well-known phenotypes of cancers, including increased growth rate, metastasis rate, and local immune cell composition, can be naturally explained in terms of the Fenton reaction level and the induced RMs. This study strongly suggests the possibility to have a unified framework for studies of cancer-inducing stressors, adaptive metabolic reprogramming, and cancerous behaviors. In addition, strong evidence is provided to demonstrate that a popular view that Na+/H+ exchangers along with lactic acid exporters and carbonic anhydrases are responsible for the intracellular alkalization and extracellular acidification in cancer may not be justified.

Page 1133-1148


Original Research

Multi-omics Data Reveal the Effect of Sodium Butyrate on Gene Expression and Protein Modification in Streptomyces

Jiazhen Zheng, Yue Li, Ning Liu, Jihui Zhang, Shuangjiang Liu, Huarong Tan

Streptomycetes possess numerous gene clusters and the potential to produce a large amount of natural products. Histone deacetylase (HDAC) inhibitors play an important role in the regulation of histone modifications in fungi, but their roles in prokaryotes remain poorly understood. Here, we investigated the global effects of the HDAC inhibitor, sodium butyrate (SB), on marine-derived Streptomyces olivaceus FXJ 8.021, particularly focusing on the activation of secondary metabolite biosynthesis. The antiSMASH analysis revealed 33 secondary metabolite biosynthetic gene clusters (BGCs) in strain FXJ 8.021, among which the silent lobophorin BGC was activated by SB. Transcriptomic data showed that the expression of genes involved in lobophorin biosynthesis (ge00097–ge00139) and CoA-ester formation (e.g., ge02824), as well as the glycolysis/gluconeogenesis pathway (e.g., ge01661), was significantly up-regulated in the presence of SB. Intracellular CoA-ester analysis confirmed that SB triggered the biosynthesis of CoA-ester, thereby increasing the precursor supply for lobophorin biosynthesis. Further acetylomic analysis revealed that the acetylation levels on 218 sites of 190 proteins were up-regulated and those on 411 sites of 310 proteins were down-regulated. These acetylated proteins were particularly enriched in transcriptional and translational machinery components (e.g., elongation factor GE04399), and their correlations with the proteins involved in lobophorin biosynthesis were established by protein–protein interaction network analysis, suggesting that SB might function via a complex hierarchical regulation to activate the expression of lobophorin BGC. These findings provide solid evidence that acetylated proteins triggered by SB could affect the expression of genes involved in the biosynthesis of primary and secondary metabolites in prokaryotes.

Page 1149-1162


Original Research

Protein Lactylation and Metabolic Regulation of the Zoonotic Parasite Toxoplasma gondii

Deqi Yin, Ning Jiang, Chang Cheng, Xiaoyu Sang, Ying Feng, Ran Chen, Qijun Chen

The biology of Toxoplasma gondii, the causative pathogen of one of the most widespread parasitic diseases (toxoplasmosis), remains poorly understood. Lactate, which is derived from glucose metabolism, is not only an energy source in a variety of organisms, including T. gondii, but also a regulatory molecule that participates in gene activation and protein function. Lysine lactylation (Kla) is a type of post-translational modifications (PTMs) that has been recently associated with chromatin remodeling; however, Kla of histone and non-histone proteins has not yet been studied in T. gondii. To examine the prevalence and function of lactylation in T. gondii parasites, we mapped the lactylome of proliferating tachyzoite cells and identified 1964 Kla sites on 955 proteins in the T. gondii RH strain. Lactylated proteins were distributed in multiple subcellular compartments and were closely related to a wide variety of biological processes, including mRNA splicing, glycolysis, aminoacyl-tRNA biosynthesis, RNA transport, and many signaling pathways. We also performed a chromatin immunoprecipitation sequencing (ChIP-seq) analysis using a lactylation-specific antibody and found that the histones H4K12la and H3K14la were enriched in the promoter and exon regions of T. gondii associated with microtubule-based movement and cell invasion. We further confirmed the delactylase activity of histone deacetylases TgHDAC2–4, and found that treatment with anti-histone acetyltransferase (TgMYST-A) antibodies profoundly reduced protein lactylation in T. gondii. This study offers the first dataset of the global lactylation proteome and provides a basis for further dissecting the functional biology of T. gondii.

Page 1163-1181


Original Research

Sequence-based Functional Metagenomics Reveals Novel Natural Diversity of Functional CopA in Environmental Microbiomes

Wenjun Li, Likun Wang, Xiaofang Li, Xin Zheng, Michael F. Cohen, Yong-Xin Liu

Exploring the natural diversity of functional genes/proteins from environmental DNA in high throughput remains challenging. In this study, we developed a sequence-based functional metagenomics procedure for mining the diversity of copper (Cu) resistance gene copA in global microbiomes, by combining the metagenomic assembly technology, local BLAST, evolutionary trace analysis (ETA), chemical synthesis, and conventional functional genomics. In total, 87 metagenomes were collected from a public database and subjected to copA detection, resulting in 93,899 hits. Manual curation of 1214 hits of high confidence led to the retrieval of 517 unique CopA candidates, which were further subjected to ETA. Eventually, 175 novel copA sequences of high quality were discovered. Phylogenetic analysis showed that almost all these putative CopA proteins were distantly related to known CopA proteins, with 55 sequences from totally unknown species. Ten novel and three known copA genes were chemically synthesized for further functional genomic tests using the Cu-sensitive Escherichia coli (ΔcopA). The growth test and Cu uptake determination showed that five novel clones had positive effects on host Cu resistance and uptake. One recombinant harboring copA-like 15 (copAL15) successfully restored Cu resistance of the host with a substantially enhanced Cu uptake. Two novel copA genes were fused with the gfp gene and expressed in E. coli for microscopic observation. Imaging results showed that they were successfully expressed and their proteins were localized to the membrane. The results here greatly expand the diversity of known CopA proteins, and the sequence-based procedure developed overcomes biases in length, screening methods, and abundance of conventional functional metagenomics.
构成蛋白质分子的氨基酸种类、数目以及排列顺序决定了蛋白质的多样性。而数百万年、数十亿年的自然选择过程形成了蛋白质的自然多样性。目前已知的功能蛋白仅仅代表了自然界中蛋白质的一小部分。而完整地复原功能蛋白的自然多样性,不仅可以为探究天然蛋白质的进化过程奠定基础,还可以为蛋白质工程改造提供大规模数据库。然而传统的环境宏基因组方法,如压力筛选阳性克隆,或通过设计引物从宏基因组中克隆目标基因,都不能大量获得自然界特定家族未知蛋白。近日,中国科学院遗传发育所农业资源研究中心李小方团队,通过创新的基于序列的宏基因组学和功能基组学技术,从全球不同生境挖掘到了大量新颖铜抗性copA功能基因。在这项研究中,李小方研究团队通过结合宏基因组组装、本地BLAST、进化踪迹分析、化学合成以及传统功能基因组学技术,挖掘了全球微生物宏基因组数据中的铜抗性基因copA的多样性。此研究共收集了来自世界各地的87个宏基因组数据,从数亿蛋白序列中筛查到近100000条候选序列,进一步手工筛查得到517个不重复的CopA候选序列,进一步筛选出175个高置信新颖copA基因序列。系统发育分析表明,预测到的CopA蛋白都与已知CopA蛋白有近缘关系,其中55个序列来自完全未知的物种。通过化学合成10个预测copA基因和3个已知copA基因并转化铜敏感大肠杆菌,发现其中5个预测的copA基因不仅提高了宿主的铜抗性并显著影响了宿主对铜的吸收。GFP(绿色荧光蛋白)成像结果显示,两个候选CopA蛋白(CopAL12和CopAL16)成功定位于宿主大肠杆菌细胞膜上。该研究不仅扩展了我们对环境微生物CopA功能多样性的认识,也为基因工程修复生物材料的开发提供了宝贵的基因资源。

Page 1182-1194


Original Research

High Sensitivity of Shotgun Metagenomic Sequencing in Colon Tissue Biopsy by Host DNA Depletion

Wing Yin Cheng, Wei-Xin Liu, Yanqiang Ding, Guoping Wang, Yu Shi, Eagle S.H. Chu, Sunny Wong, Joseph J.Y. Sung, Jun Yu

The high host genetic background of tissue biopsies hinders the application of shotgun metagenomic sequencing in characterizing the tissue microbiota. We proposed an optimized method that removed host DNA from colon biopsies and examined the effect on metagenomic analysis. Human or mouse colon biopsies were divided into two groups, with one group undergoing host DNA depletion and the other serving as the control. Host DNA was removed through differential lysis of mammalian and bacterial cells before sequencing. The impact of host DNA depletion on microbiota was compared based on phylogenetic diversity analyses and regression analyses. Removing host DNA enhanced bacterial sequencing depth and improved species discovery, increasing bacterial reads by 2.46 ± 0.20 folds while reducing host reads by 6.80% ± 1.06%. Moreover, 2.40 times more of bacterial species were detected after host DNA depletion. This was confirmed from mouse colon tissues, increasing bacterial reads by 5.46 ± 0.42 folds while decreasing host reads by 10.2% ± 0.83%. Similarly, significantly more bacterial species were detected in the mouse colon tissue upon host DNA depletion (P < 0.001). Furthermore, an increased microbial richness was evident in the host DNA-depleted samples compared with non-depleted controls in human colon biopsies and mouse colon tissues (P < 0.001). Our optimized method of host DNA depletion improves the sensitivity of shotgun metagenomic sequencing in bacteria detection in the biopsy, which may yield a more accurate taxonomic profile of the tissue microbiota and identify bacteria that are important for disease initiation or progression.

Page 1195-1205


Original Research

Superior Fidelity and Distinct Editing Outcomes of SaCas9 Compared with SpCas9 in Genome Editing

Zhi-Xue Yang, Ya-Wen Fu, Juan-Juan Zhao, Feng Zhang, Si-Ang Li, Mei Zhao, Wei Wen, Lei Zhang, Tao Cheng, Jian-Ping Zhang, Xiao-Bing Zhang

A series of clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR associated protein 9 (Cas9) systems have been engineered for genome editing. The most widely used Cas9 is SpCas9 from Streptococcus pyogenes and SaCas9 from Staphylococcus aureus. However, a comparison of their detailed gene editing outcomes is still lacking. By characterizing the editing outcomes of 11 sites in human induced pluripotent stem cells (iPSCs) and K562 cells, we found that SaCas9 could edit the genome with greater efficiencies than SpCas9. We also compared the effects of spacer lengths of single-guide RNAs (sgRNAs; 18–21 nt for SpCas9 and 19–23 nt for SaCas9) and found that the optimal spacer lengths were 20 nt and 21 nt for SpCas9 and SaCas9, respectively. However, the optimal spacer length for a particular sgRNA was 18–21 nt for SpCas9 and 21–22 nt for SaCas9. Furthermore, SpCas9 exhibited a more substantial bias than SaCas9 for nonhomologous end-joining (NHEJ) +1 insertion at the fourth nucleotide upstream of the protospacer adjacent motif (PAM), indicating a characteristic of a staggered cut. Accordingly, editing with SaCas9 led to higher efficiencies of NHEJ-mediated double-stranded oligodeoxynucleotide (dsODN) insertion or homology-directed repair (HDR)-mediated adeno-associated virus serotype 6 (AAV6) donor knock-in. Finally, GUIDE-seq analysis revealed that SaCas9 exhibited significantly reduced off-target effects compared with SpCas9. Our work indicates the superior performance of SaCas9 to SpCas9 in transgene integration-based therapeutic gene editing and the necessity to identify the optimal spacer length to achieve desired editing results.
研究问题: CRISPR-Cas9技术已经成为基因编辑领域的热门工具。Cas9蛋白根据来源不同,有不同类型。其中,来自链球菌(Streptococcus pyogenes)的SpCas9和金黄色葡萄球菌(Staphylococcus aureus)的SaCas9两者在基因编辑中的应用最为广泛,但它们在效果和特性上存在差异,尚缺乏对它们的基因编辑结果的详细比较研究。对SpCas9和SaCas9进行系统比较,不仅可以揭示它们在基因编辑中的优势和局限,还可以为我们提供更多关于如何优化基因编辑策略的线索。这对于开发更高效、具有不同特性和适应性的基因编辑工具具有重要的前瞻性意义。 研究方法: 本研究在人类诱导性多能干细胞(induced pluripotent stem cells,iPSCs)和K562细胞中严格比较了SaCas9和SpCas9在11个具有临床应用前景的目标位点上的编辑效果。通过对编辑效率、sgRNA最佳长度、DNA双链断裂(double strand break, DSB)修复模式和脱靶效率的详细比较分析,结果表明SaCas9在基因编辑和临床基因治疗中具有独特优势。 主要结果1: 通过对人类iPSCs和K562细胞中的11个特定位点的编辑结果进行详细分析,我们发现SaCas9的基因编辑效率高于SpCas9。 主要结果2: SpCas9和SaCas9最佳的sgRNA长度分别是20 nt和21 nt。但是,根据所使用的特定sgRNA,这一最佳长度可能会有所变化。SpCas9和SaCas9的sgRNA最佳长度分别为18-21 nt或21-22 nt。 主要结果3: SpCas9和SaCas9进行编辑后,DSB的修复模式或结果存在显著差异。与SaCas9相比,SpCas9在非同源末端连接(non-homologous end joining,NHEJ)+1插入时显示出更大的偏好性。因此,使用SaCas9进行编辑时,NHEJ介导的双链寡脱氧核苷酸(double-stranded oligodeoxynucleotide, dsODN)插入或腺相关病毒血清型6(adeno-associated virus type 6, AAV6)供体介导的同源定向修复(homologous end joining,HDR)的敲入效率更高。 主要结果4: 利用全基因组脱靶检测技术GUIDE-seq分析显示,SaCas9的特异性高于SpCas9。

Page 1206-1220


Original Research

GREPore-seq: A Robust Workflow to Detect Changes After Gene Editing Through Long-range PCR and Nanopore Sequencing

Zi-Jun Quan, Si-Ang Li, Zhi-Xue Yang, Juan-Juan Zhao, Guo-Hua Li, Feng Zhang, Wei Wen, Tao Cheng, Xiao-Bing Zhang

To achieve the enormous potential of gene-editing technology in clinical therapies, one needs to evaluate both the on-target efficiency and unintended editing consequences comprehensively. However, there is a lack of a pipelined, large-scale, and economical workflow for detecting genome editing outcomes, in particular insertion or deletion of a large fragment. Here, we describe an approach for efficient and accurate detection of multiple genetic changes after CRISPR/Cas9 editing by pooled nanopore sequencing of barcoded long-range PCR products. Recognizing the high error rates of Oxford nanopore sequencing, we developed a novel pipeline to capture the barcoded sequences by grepping reads of nanopore amplicon sequencing (GREPore-seq). GREPore-seq can assess nonhomologous end-joining (NHEJ)-mediated double-stranded oligodeoxynucleotide (dsODN) insertions with comparable accuracy to Illumina next-generation sequencing (NGS). GREPore-seq also reveals a full spectrum of homology-directed repair (HDR)-mediated large gene knock-in, correlating well with the fluorescence-activated cell sorting (FACS) analysis results. Of note, we discovered low-level fragmented and full-length plasmid backbone insertion at the CRISPR cutting site. Therefore, we have established a practical workflow to evaluate various genetic changes, including quantifying insertions of short dsODNs, knock-ins of long pieces, plasmid insertions, and large fragment deletions after CRISPR/Cas9-mediated editing. GREPore-seq is freely available at GitHub (https://github.com/lisiang/GREPore-seq) and the National Genomics Data Center (NGDC) BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007293).
研究问题:  如何高效分流含barcode标记的长片段扩增子测序数据?  如何准确定量分析多种基因编辑后,靶位点大片段基因突变结果(包括:HDR、NHEJ介导的大片段插入;大片段删除等)? 研究方案:  针对Nanopore测序的特点,开发全新的GREPseq分流算法,通过一系列特定长度、step的短k-mers对测序数据进行两步分流。  我们提出了数据分析方法,通过统计所有read中包含插入或删除的read所占比例,或使用Minimap2长片段比对后来定量分析大片段插入或删除效率。 主要结果1: 利用Grepseq与BCprimer-seq正确提取与分流混合样本的测序数据。 主要结果2: 根据预期插入片段生成一系列互相有重叠的短序列,用于定量检测靶位点NHEJ介导的插入突变(NHEJ介导的ssODN插入、NHEJ介导的质粒骨架插入)。 主要结果3: 通过统计样本与参考序列比对后文件中的CIGAR字符串信息,定量分析靶位点大片段插入与删除。 数据链接: https://bigd.big.ac.cn/gsa-human/browse/HRA001801 https://bigd.big.ac.cn/gsa-human/browse/HRA001802 算法链接: https://github.com/lisiang/GREPore-seq

Page 1221-1236


Application Note

Systematic Exploration of Optimized Base Editing gRNA Design and Pleiotropic Effects with BExplorer

Gongchen Zhang, Chenyu Zhu, Xiaohan Chen, Jifang Yan, Dongyu Xue, Zixuan Wei, Guohui Chuai, Qi Liu

Base editing technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNAs (gRNAs) relies substantially on empirical experience rather than a dependable and efficient in silico design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors in silico. Using BExplorer, we described its results for two types of mainstream base editors, BE3 and ABE7.10, and evaluated the pleiotropic effects of the corresponding base editing loci. BExplorer revealed 524 and 900 editable pathogenic single nucleotide polymorphism (SNP) loci in the human genome together with the selected optimized gRNAs for BE3 and ABE7.10, respectively. In addition, the impact of 707 edited pathogenic SNP loci following base editing on 131 diseases was systematically explored by revealing their pleiotropic effects, indicating that base editing should be carefully utilized given the potential pleiotropic effects. Collectively, the systematic exploration of optimized base editing gRNA design and the corresponding pleiotropic effects with BExplorer provides a computational basis for applying base editing in disease treatment.

Page 1237-1245


Letter

The Proteome Landscape of Human Placentas for Monochorionic Twins with Selective Intrauterine Growth Restriction

Xin-Lu Meng, Peng-Bo Yuan, Xue-Ju Wang, Jing Hang, Xiao-Ming Shi, Yang-Yu Zhao, Yuan Wei

In perinatal medicine, intrauterine growth restriction (IUGR) is one of the greatest challenges. The etiology of IUGR is multifactorial, but most cases are thought to arise from placental insufficiency. However, identifying the placental cause of IUGR can be difficult due to numerous confounding factors. Selective IUGR (sIUGR) would be a good model to investigate how impaired placentation affects fetal development, as the growth discordance between monochorionic twins cannot be explained by confounding genetic or maternal factors. Herein, we constructed and analyzed the placental proteomic profiles of IUGR twins and normal cotwins. Specifically, we identified a total of 5481 proteins, of which 233 were differentially expressed (57 up-regulated and 176 down-regulated) in IUGR twins. Bioinformatics analysis indicates that these differentially expressed proteins (DEPs) are mainly associated with cardiovascular system development and function, organismal survival, and organismal development. Notably, 34 DEPs are significantly enriched in angiogenesis, and diminished placental angiogenesis in IUGR twins has been further elaborately confirmed. Moreover, we found decreased expression of metadherin (MTDH) in the placentas of IUGR twins and demonstrated that MTDH contributes to placental angiogenesis and fetal growth in vitro. Collectively, our findings reveal the comprehensive proteomic signatures of placentas for sIUGR twins, and the DEPs identified may provide in-depth insights into the pathogenesis of placental dysfunction and subsequent impaired fetal growth.
胎儿宫内生长受限(intrauterine growth restriction,IUGR)指胎儿在母体子宫内因多种因素影响而无法达到其全部生长潜能,是最常见的妊娠并发症之一,全球发病率约 7%,与多种不良围产期结局密切相关。已有多篇报道显示胎盘功能不全、向生长中的胎儿输送氧和营养物质不足是导致疾病发生的主要原因。然而受许多遗传及母体混杂因素的影响,确定胎盘功能不全的分子机制存在困难。 选择性宫内生长受限(selective intrauterine growth restriction,sIUGR)是指单绒毛膜(monochorionic,MC)双胎中,一胎儿生长受限,而另一胎儿生长发育正常,是 MC 双胎妊娠一种独特的并发症,发病率约为 10%–15%。sIUGR 双胎胎儿具有相同的基因背景,并共享母体环境,有助于排除遗传和母体混杂因素影响。对 sIUGR 双胎胎盘进行研究,能够深入了解胎盘功能不全和胎儿生长受限的潜在分子机制。 在这项研究中,我们收集合并 sIUGR 的 MC 双胎妊娠胎盘组织,运用定量蛋白质组学技术,绘制了 sIUGR 双胎胎盘的蛋白质组学图谱;通过生物信息学分析,揭示 sIUGR 双胎胎盘差异表达蛋白显著富集的生物学过程及功能。通过体外细胞实验及转录组测序,探讨候选分子在疾病发生中可能的作用机制。

Page 1246-1259


Perspective

Cancer Is A Survival Process under Persistent Microenvironmental and Cellular Stresses

Renbo Tan, Yi Zhou, Zheng An, Ying Xu

No abstract

Page 1260-1265


Review Article

A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction

Farzaneh Esmaili, Mahdi Pourmirzaei, Shahin Ramazi, Seyedehsamaneh Shojaeilangari, Elham Yavari

Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.

Page 1266-1285