Article Online

Articles Online (Volume 23, Issue 3)

Research Highlight

Foundation Model: A New Era for Plant Single-cell Genomics

Yuansong Zeng (曾远松), Yuedong Yang (杨跃东)

no abstract

Page qzaf059


Research Highlight

Haplotype-based Pangenomics: A Blueprint for Climate Adaptation in Plants

Wanfei Liu (刘万飞), Peng Cui (崔鹏)

no abstract

Page qzaf023


Review Article

CRISPR Technology and Its Emerging Applications

Xuejing Zhang , Dongyuan Ma , Feng Liu

The discovery and iteration of clustered regularly interspaced short palindromic repeats (CRISPR) systems have revolutionized genome editing due to their remarkable efficiency and easy programmability, enabling precise manipulation of genomic elements. Owing to these unique advantages, CRISPR technology has the transformative potential to elucidate biological mechanisms and develop clinical treatments. This review provides a comprehensive overview of the development and applications of CRISPR technology. After describing the three primary CRISPR-Cas systems — CRISPR-associated protein 9 (Cas9) and Cas12a targeting DNA, and Cas13 targeting RNA — which serve as the cornerstone for technological advancements, we describe a series of novel CRISPR-Cas systems that offer new avenues for research, and then explore the applications of CRISPR technology in large-scale genetic screening, lineage tracing, genetic diagnosis, and gene therapy. As this technology evolves, it holds significant promise for studying gene functions and treating human diseases in the near future.
CRISPR技术作为一种革命性的基因编辑工具,凭借高效性和多功能性的特点,实现了对基因组序列的精确操控。本文概述了CRISPR技术及其应用,强调技术的改进及其在大规模遗传筛选、谱系追踪、遗传诊断和基因治疗等新兴领域的应用前景。随着技术的不断进步,CRISPR有望在深入解析生物机制和临床应用等方面展现出更大的潜力。

Page qzaf034


Review Article

Characterization of Tumor Antigens from Multi-omics Data: Computational Approaches and Resources

Yunzhe Wang (王韫哲) , James Wengler , Yuzhu Fang (房钰竹), Joseph Zhou, Hang Ruan, Zhao Zhang, Leng Han

Tumor-specific antigens, also known as neoantigens, have potential utility in anti-cancer immunotherapy, including immune checkpoint blockade (ICB), neoantigen-specific T cell receptor-engineered T (TCR-T), chimeric antigen receptor T (CAR-T), and therapeutic cancer vaccines (TCVs). After recognizing presented neoantigens, the immune system becomes activated and triggers the death of tumor cells. Neoantigens may be derived from multiple origins, including somatic mutations (single nucleotide variants, insertions/deletions, and gene fusions), circular RNAs, alternative splicing, RNA editing, and polymorphic microbiomes. An increasing amount of bioinformatics tools and algorithms are being developed to predict tumor neoantigens derived from different sources, which may require inputs from different multi-omics data. In addition, calculating the peptide–major histocompatibility complex (MHC) affinity can aid in selecting putative neoantigens, as high binding affinities facilitate antigen presentation. Based on these approaches and previous experiments, many resources have been developed to reveal the landscape of tumor neoantigens across multiple cancer types. Herein, we summarize these tools, algorithms, and resources to provide an overview of computational analysis for neoantigen discovery and prioritization, as well as the future development of potential clinical utilities in this field.
肿瘤特异性抗原(又称新抗原)在抗癌免疫治疗中具有重要的应用潜力,包括免疫检查点阻断疗法(immune checkpoint blockade, ICB)、新抗原特异性T细胞受体工程化T细胞(T cell receptor engineered T cell, TCR-T)、嵌合抗原受体T细胞(chimeric antigen receptor engineered T cell,CAR-T)和治疗性的癌症疫苗(therapeutic cancer vaccines, TCVs)。当免疫系统识别到呈递的新抗原后,会被激活并引发肿瘤细胞死亡。新抗原可能存在多种来源,包括体细胞突变(单核苷酸变异、插入缺失和基因融合)、环状RNA、选择性剪接、RNA编辑以及多态性的微生物组。目前有越来越多的生物信息学工具和算法用于预测不同来源的肿瘤新抗原,这些工具可能会需要不同的多组学数据作为输入。此外,计算肽段与主要组织相容性复合体(major histocompatibility complex, MHC)或人类中的人白细胞抗原(human leukocyte antigen, HLA)的亲和力可以帮助筛选候选新抗原(高结合亲和力有助于抗原呈递)。本综述总结了这些工具、算法和资源,提供了新抗原发现和优先级排序的分析概述,以及该领域潜在临床应用的未来发展方向。

Page qzaf001


Original Research

LCORL and STC2 Variants Increase Body Size and Growth Rate in Cattle and Other Animals

Fengting Bai (白凤庭) , Yudong Cai (蔡钰东) , Min Qiu (邱敏) , Chen Liang (梁晨) , Linqian Pan (潘麟茜) , Yayi Liu (刘雅怡) , Yanshuai Feng (冯衍帅) , Xuesha Cao (曹雪莎) , Qimeng Yang (杨启蒙) , Gang Ren (任刚) , Shaohua Jiao (焦少华) , Siqi Gao (高思祺) , Meixuan Lu (卢美轩) , Xihong Wang (王喜宏) , Rasmus Heller , Johannes A Lenstra , Yu Jiang (姜雨)

Natural variants can significantly improve growth traits in livestock and serve as safe targets for gene editing, thus being applied in animal molecular design breeding. However, such safe and large-effect mutations are severely lacking. Using ancestral recombination graphs, we investigated recent selection signatures in beef cattle breeds, pinpointing sweep-driving variants in the LCORL and STC2 loci with notable effects on body size and growth rate. The ACT-to-A frameshift mutation in LCORL occurs mainly in central-European cattle, and stimulates growth. Remarkably, convergent truncating mutations were also found in commercial breeds of sheep, goats, pigs, horses, dogs, rabbits, and chickens. In the STC2 gene, we identified a missense mutation (A60P) located within the conserved region across vertebrates. We validated the two natural mutations in gene-edited mouse models, where both variants in homozygous carriers significantly increase the average weight by 11%. Our findings provide insights into a seemingly recurring gene target of body size enhancing truncating mutations across domesticated species, and offer valuable targets for gene editing-based breeding in animals.

Page qzaf025


Method

clusIBD: Robust Detection of Identity-by-descent Segments Using Unphased Genetic Data from Poor-quality Samples

Ran Li (李燃) , Yu Zang (臧钰) , Zhentang Liu (刘震棠) , Jingyi Yang (杨静怡) , Nana Wang (汪娜娜) , Jiajun Liu (刘佳俊) , Enlin Wu (吴恩霖) , Riga Wu (乌日嘎) , Hongyu Sun (孙宏钰)

The detection of identity-by-descent (IBD) segments is widely used to infer relatedness in many fields, including forensics and ancient DNA analysis. However, existing methods are often ineffective for poor-quality DNA samples. Here, we propose a method, clusIBD, which can robustly detect IBD segments using unphased genetic data with a high rate of genotyping error. We evaluated and compared the performance of clusIBD with that of IBIS, TRUFFLE, and IBDseq using simulated data, artificial poor-quality materials, and ancient DNA samples. The results show that clusIBD outperforms these existing tools and could be used for kinship inference in fields such as ancient DNA analysis and criminal investigation. clusIBD is publicly available at GitHub (https://github.com/Ryan620/clusIBD/) and BioCode (https://ngdc.cncb.ac.cn/biocode/tool/BT007882).
研究问题: 基于同源相同(Identity-by-descent, IBD)片段推断亲缘关系,已广泛应用于生物分类、物种保护、疾病机制和人类演化等研究。然而,现有方法在面对法医及古生物学等应用场景中的低质量DNA样本(如犯罪现场样本、古代人类遗骸)时,难以准确识别IBD片段。因此,亟需开发适用于微量、降解、受污染等低质量DNA分型数据的IBD识别算法,以充分挖这类样本中蕴含的遗传信息。 研究方法: 本研究提出了一种新IBD片段识别算法——clusIBD,通过识别样本间以低频率的异纯合基因型(即不一致的纯合基因型分型)为特征的DNA区域簇(cluster),实现对IBD片段的检测。研究结合模拟数据、人工制备的低质量样本和古DNA样本,对clusIBD的性能进行了系统评估,并与现有主流工具(如IBIS、TRUFFLE、IBDseq)进行了对比分析。 主要结果: 结果表明,与现有算法(软件)相比,clusIBD能够更准确地识别和检测微量、降解等低质量DNA样本的IBD片段,提示该算法在法医学和古生物学等领域的亲缘关系鉴定和系谱分析具有广阔的应用前景。 软件代码: GitHub仓库(https://github.com/Ryan620/clusIBD/)和BioCode平台(https://ngdc.cncb.ac.cn/biocode/tool/BT007882) 分析代码: https://github.com/Ryan620/clusIBD/tree/main/source_code

Page qzaf055


Database

TRAIT: A Comprehensive Database for T-cell Receptor–antigen Interactions

Mengmeng Wei (魏蒙蒙) , Jingcheng Wu (吴静成) , Shengzuo Bai (白晟佐) , Yuxuan Zhou (周宇轩) , Yichang Chen (陈弈菖) , Xue Zhang (张雪) , Wenyi Zhao (赵文艺) , Ying Chi (迟颖) , Gang Pan (潘纲) , Feng Zhu (朱峰) , Shuqing Chen (陈枢青) , Zhan Zhou (周展>)

Comprehensive and integrated resources on interactions between T-cell receptors (TCRs) and antigens are still lacking for adoptive T-cell-based immunotherapies, highlighting a significant gap that must be addressed to fully understand the mechanisms of antigen recognition by T cells. In this study, we present the T-cell receptor–antigen interaction database (TRAIT), a comprehensive database that profiles the interactions between TCRs and antigens. TRAIT stands out due to its comprehensive description of TCR–antigen interactions by integrating sequences, structures, and affinities. It provides millions of experimentally validated TCR–antigen pairs, resulting in an exhaustive landscape of antigen-specific TCRs. Notably, TRAIT emphasizes single-cell omics as a major reliable data source for TCR–antigen interactions and includes millions of reliable non-interactive TCRs. Additionally, it thoroughly demonstrates the interactions between mutations of TCRs and antigens, thereby benefiting affinity optimization of engineered TCRs as well as vaccine design. TCRs on clinical trials are innovatively provided. With the significant efforts made toward elucidating the complex interactions between TCRs and antigens, TRAIT is expected to ultimately contribute superior algorithms and substantial advancements in the field of T-cell-based immunotherapies. TRAIT is freely accessible at https://pgx.zju.edu.cn/traitdb.
研究问题 T细胞受体(TCR)通过与多肽-主要组织相容性复合物(MHC)的相互作用,介导病原体来源表位的特异性识别,并为适应性免疫系统中T细胞的激活提供初始信号。因此,解码TCR的抗原特异性对于理解适应性免疫至关重要,并有望在转化医学领域取得重大进展。对TCR与抗原识别的分子机制研究和相关算法开发成为T细胞免疫领域的研究热点,这加大了研究人员对表征TCR与抗原特异性识别数据的迫切需求。本研究提出一个全新的T细胞受体-抗原相互作用数据库TRAIT(https://pgx.zju.edu.cn/traitdb),旨在提供整合序列、结构、亲和力、突变体及临床评估的抗原特异性TCR信息,综合表征TCR与抗原之间的相互作用。 研究方法 从已发表文献、公共数据库、组学平台等收集经实验验证的抗原特异性TCR,包括序列、结构、结合亲和力、突变体及临床试验等TCR-抗原相互作用的综合信息。将收集的数据进行清洗和整理,并按照TCR的“发现(天然TCR序列)-优化(TCR及抗原突变体)-临床评估(临床试验)”分类和整合。数据库界面按照数据分类分别设置“Search”、“Omics”、“Mutations”和“Therapeutics”四个主要功能界面。 主要结果 本研究共获取了靶向1184个独特表位和112个MHC等位基因的3,393,826个抗原特异性验证的TCR序列。尤其是,首次收集并提供了3,342,225经实验确证了非结合特异性的TCR-抗原对。其次,从蛋白质数据库(PDB)中系统性地收集了223个TCR-抗原复合物的结构信息。接着,通过手动检索和整理,获得了945条TCR-抗原(包括突变体)的3D亲和力信息。最后,对正在临床试验中的TCR-T细胞进行详尽检索,并确定了65种抗原特异性TCR的34项临床试验。TRAIT数据库关注TCR的“发现-优化-临床评估”全过程,首次整合序列、结构、亲和力、突变体等多维度信息,以期全面描绘TCR与抗原的相互作用景观。

Page qzaf033


Database

CircAge: A Comprehensive Resource for Aging-associated Circular RNAs Across Species and Tissues

Xin Dong, Zhen Zhou, Yanan Wang, Ayesha Nisar, Shaoyan Pu, Longbao Lv, Yijiang Li, Xuemei Lu, Yonghan He

Circular RNAs (circRNAs) represent a novel class of RNA molecules characterized by a circular structure and enhanced stability. Emerging evidence indicates that circRNAs play pivotal regulatory roles in the aging process. However, a systematic resource that integrates aging-associated circRNA data remains lacking. Therefore, we developed a comprehensive database, CircAge, which encompasses 756 aging-related samples from 7 species and 24 tissue types. Through high-throughput sequencing, we also generated 47 new tissue samples from mice and rhesus monkeys. By integrating predictions from multiple bioinformatics tools, we identified over 529,856 unique circRNAs. Our data analysis revealed a general increase in circRNA expression levels with age, with approximately 23% of circRNAs demonstrating sequence conservation across species. The CircAge database systematically predicts potential interactions between circRNAs, microRNAs (miRNAs), and RNA-binding proteins (RBPs), and assesses the coding potential of circRNAs. This resource lays a foundation for elucidating the regulatory mechanisms of circRNAs in aging. As a comprehensive repository of aging-associated circRNAs, CircAge will significantly accelerate research in this field, facilitating the discovery of novel biomarkers and therapeutic targets for aging biology and supporting the development of diagnostic and therapeutic strategies for aging and age-related diseases. CircAge is publicly available at https://circage.kiz.ac.cn.
研究背景与问题 环状RNA(circular RNA,circRNA)作为一种具有闭环结构和高稳定性的新型非编码RNA,近年来在生命科学领域备受关注。而随着衰老研究向纵深推进,越来越多的证据表明circRNA在衰老进程中发挥着关键的调控作用。然而,目前对于circRNA在衰老过程中的具体作用机制的认知仍较为有限。 尽管已有circBase、CSCD2、TSCD和MiOncoCirc等数据库收录了丰富的circRNA序列及功能信息,但这些数据库尚未实现衰老相关样本与circRNA信息的系统性整合,这一现状极大限制了circRNA在衰老生物学中的深入研究和应用,因此建立专门聚焦衰老相关circRNA的专业数据库迫在眉睫。 研究方法与数据库构建 本研究以构建衰老相关circRNA专业数据库为目标,系统收集并分析了公开数据库中与衰老相关的circRNA测序数据。共纳入来自人类(H. sapiens)、猕猴(M. mulatta)、小鼠(M. musculus)、大鼠(R. norvegicus)、果蝇(D. melanogaster)、线虫(C. elegans)及斑马鱼(D. rerio)7个物种、涵盖肝脏、心脏、大脑等24种组织类型的公共测序数据。为进一步丰富数据资源,研究团队补充测序了小鼠与恒河猴的47个组织样本,最终累计获得803个高质量样本。 在此基础上对数据进行严格的清洗与比对,运用经典算法对circRNA进行精确识别与标准化计数,确保数据的准确性与可靠性。为深入揭示circRNA的功能潜力,研究团队结合多种circRNA识别工具、功能预测算法和跨物种比对手段,构建了CircAge数据库。该数据库全面整合了circRNA表达谱、序列保守性、miRNA及RNA结合蛋白(RNA binding protein,RBP)互作、编码潜力等多维注释,为circRNA研究提供了全面且深入的数据支撑。 主要成果展示 成果1. 全面整合样本资源,填补领域空白 CircAge系统整合了来自7个物种、24种组织类型的803个测序样本,收录共计超52万非冗余circRNA,填补了当前缺乏系统性衰老相关circRNA数据库的空白,为衰老生物学研究提供了丰富的数据资源。 成果2. 深入解析circRNA特性,构建调控网络 CircAge数据库详尽注释了circRNA在不同物种、不同年龄阶段的表达趋势,比较了跨物种circRNA的序列保守性,同时系统预测了circRNA与miRNA、RBP的互作关系及其编码潜能,成功构建了circRNA调控网络。这一成果为研究者深入探究circRNA在衰老调控中的潜在机制提供了有力工具。 成果3. 提供便捷交互界面,促进学术交流 CircAge提供了可视化界面与高效搜索功能,支持用户按物种、组织、基因等多种条件检索相关circRNA信息。同时提供调控网络图、功能预测结果及表达分布图等丰富数据资源,所有数据向公众免费开放。这一举措将极大地方便研究者获取与分析研究结果,有助于推动circRNA在衰老生物学与相关疾病研究中的应用。

Page qzaf044


Database

EPSD 2.0: An Updated Database of Protein Phosphorylation Sites Across Eukaryotic Species

Miaomiao Chen, Yujie Gou, Ming Lei, Leming Xiao, Miaoying Zhao, Xinhe Huang, Dan Liu, Zihao Feng, Di Peng, Yu Xue

As one of the most crucial post-translational modifications, protein phosphorylation regulates a broad range of biological processes in eukaryotes. Biocuration, integration, and annotation of reported phosphorylation events will deliver a valuable resource for the community. Here, we present an updated database, the eukaryotic phosphorylation site database 2.0 (EPSD 2.0), which includes 2,769,163 experimentally identified phosphorylation sites (p-sites) in 362,707 phosphoproteins from 223 eukaryotes. From the literature, 873,718 new p-sites identified through high-throughput phosphoproteomic research were first collected, and 1,078,888 original phosphopeptides together with primary references were reserved. Then, this dataset was merged into EPSD 1.0, comprising 1,616,804 p-sites within 209,326 proteins across 68 eukaryotic organisms. We also integrated 362,190 additional known p-sites from 10 public databases. After redundancy clearance, we manually re-checked each p-site and annotated 88,074 functional events for 32,762 p-sites, covering 58 types of downstream effects on phosphoproteins, and regulatory impacts on 107 biological processes. In addition, phosphoproteins and p-sites in 8 model organisms were meticulously annotated utilizing information supplied by 100 external platforms encompassing 15 areas. These areas included kinase/phosphatase, transcription regulators, three-dimensional structures, physicochemical characteristics, genomic variations, functional descriptions, protein domains, molecular interactions, drug–target associations, disease-related data, orthologs, transcript expression levels, proteomics, subcellular localization, and regulatory pathways. We expect that EPSD 2.0 will become a useful database supporting comprehensive studies on phosphorylation in eukaryotes. The EPSD 2.0 database is freely accessible online at https://epsd.biocuckoo.cn/.

Page qzaf057