Method
Deconer: An Evaluation Toolkit for Reference-based Deconvolution Methods Using Gene Expression Data
Wei Zhang , Xianglin Zhang , Qiao Liu , Lei Wei , Xu Qiao , Rui Gao , Zhiping Liu , Xiaowo Wang
View
abstract
In recent years, computational methods for quantifying cell-type proportions from transcription data have gained significant attention, particularly those reference-based methods which have demonstrated high accuracy. However, there is currently a lack of comprehensive evaluation and guidance for available reference-based deconvolution methods in cell-type deconvolution analysis. In this study, we introduce Deconvolution Evaluator (Deconer), a comprehensive toolkit for the evaluation of reference-based deconvolution methods. Deconer provides various simulated and real gene expression datasets, including both bulk and single-cell sequencing data, and offers multiple visualization interfaces. By utilizing Deconer, we conducted systematic comparisons of 16 reference-based deconvolution methods from different perspectives, including method robustness, accuracy in deconvolving rare components, signature gene selection performance, and external reference construction capability. We also performed an in-depth analysis of the application scenarios and challenges in cell-type deconvolution methods. Finally, we provided constructive suggestions for users to select and develop cell-type deconvolution algorithms. This study provides novel insights for researchers, assisting them in choosing appropriate toolkits, applying solutions in clinical contexts, and advancing the development of deconvolution tools tailored to gene expression data. The tutorials, manual, source code, and demo data of Deconer are publicly available at https://honchkrow.github.io/Deconer/ and https://ngdc.cncb.ac.cn/biocode/tool/7577.
研究问题:
从bulk转录组数据中定量解析细胞类型比例是生物信息学中的一个关键问题。随着单细胞测序技术的飞速发展,越来越多的方法能够基于参考表达谱实现对bulk数据中不同细胞类型比例的解耦分析。然而,目前对于这些解耦方法的全面评估和应用指导仍存在不足。在本研究中,我们开发了一款名为Deconer(Deconvolution Evaluator)的解耦评估工具,用于系统性评估基于参考表达谱的解耦方法。借助Deconer,我们对16种SOTA解耦方法进行了全面比较,并为研究人员在选择和开发解耦算法时提供了科学建议。Deconer的教程、手册、源代码及示例数据已公开发布,可通过以下链接访问:https://honchkrow.github.io/Deconer。
研究方法:
本研究系统收集了现有的已知细胞比例的Bulk RNA-seq数据集,用于对解耦方法进行全面评估。同时,为了提供高质量的参考数据,我们还整合了scRNA-seq数据集,生成了仿真数据,以丰富测试数据集的多样性。在此基础上,我们对14种基于概率模型或传统机器学习的解耦方法,以及2种近年来备受关注的基于深度学习的解卷积方法进行了系统的测试与比较。通过深入分析影响解卷积过程的关键因素,包括数据噪声水平、细胞类型数量以及稀有细胞类型的占比等,本研究揭示了不同解卷积方法在多种条件下的性能特征,为相关研究提供了重要的参考依据。
主要结果:
1. 开发了基于R语言的Deconer软件包,集成了多种自动化评估和可视化模块,能够高效地对细胞类型解耦方法进行全面评估和结果展示。
2. 提供了高质量且经过标准化预处理的Bulk RNA-seq和scRNA-seq数据集,为解耦算法开发人员提供了便捷的测试平台,便于其进行系统性评估和方法优化。
3. 对16种当前SOTA的解耦算法进行了综合性能比较,并根据不同解耦场景的特点,提出了针对性的算法推荐,为相关研究提供了科学指导。
Page qzaf009
Correction
Correction to: ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics
Ziyi Li, Cory A. Weller, Syed Shah, Nicholas L. Johnson, Ying Hao, Paige B. Jarreau, Jessica Roberts, Deyaan Guha, Colleen Bereda, Sydney Klaisner, Pedro Machado, Matteo Zanovello, Mercedes Prudencio, Björn Oskarsson, Nathan P. Staff, Dennis W. Dickson, Pietro Fratta, Leonard Petrucelli, Priyanka Narayan, Mark R. Cookson, Michael E. Ward, Andrew B. Singleton, Mike A. Nalls, Yue A. Qi
View
abstract
Page qzaf051
Database
The Updated Genome Warehouse: Enhancing Data Value, Security, and Usability to Address Data Expansion
Yingke Ma (马英克) , Xuetong Zhao (赵学彤) , Yaokai Jia (贾曜恺) , Zhenxian Han (韩镇先) , Caixia Yu (俞彩霞) , Zhuojing Fan (范卓静) , Zhang Zhang (章张) , Jingfa Xiao (肖景发) , Wenming Zhao (赵文明) , Yiming Bao (鲍一明) , Meili Chen (陈梅丽)
View
abstract
The Genome Warehouse (GWH), accessible at https://ngdc.cncb.ac.cn/gwh, is an extensively-utilized public repository dedicated to the deposition, management, and sharing of genome assembly sequences, annotations, and metadata. This paper highlights noteworthy enhancements to the GWH since the 2021 version, emphasizing substantial advancements in web interfaces for data submission, database functionality updates, and resource integration. Key updates include the reannotation of released prokaryotic genomes, mirroring of genome resources from National Center for Biotechnology Information (NCBI) GenBank and Reference Sequence Database (RefSeq), integration of Poxviridae sequences, implementation of an online batch submission system, enhancements to the quality control system, advanced search capabilities, and the introduction of a controlled-access mechanism for human genome data. These improvements collectively enhance the ease and security of data submission and access as well as genome data value, thereby improving convenience and utility for researchers in the genomics field.
研究问题:
随着测序技术进步和用户对数据服务期望的不断提升,国家生物信息中心的基因组数据库(Genome Warehouse, GWH, https://ngdc.cncb.ac.cn/gwh)面临着很多挑战。为了应对数据呈指数级增长、确保数据质量、提升数据利用价值及提升数据获取便利性等所带来的挑战,GWH库重点升级了数据提交、功能使用友好性和资源集成等方面的功能。
主要结果:
GWH库进行了功能迭代升级,主要包括:建立原核基因组重新注释资源、同步镜像NCBI基因组数据资源、研发在线批量提交系统、升级数据质量控制系统、提供人类基因组数据受控访问系统,并推出高级检索功能。这些改进显著提升了用户体验,使GWH成为国际核苷酸序列数据库合作组织(International Nucleotide Sequence Database Collaboration, INSDC)更有效的补充。
Page qzaf010
Database
PhaSeDis: A Manually Curated Database of Phase Separation–disease Associations and Corresponding Small Molecules
Taoyu Chen (陈韬宇) , Guoguo Tang (唐果菓) , Tianhao Li (李天昊) , Zhining Yanghong (杨宏芷宁) , Chao Hou (侯超) , Zezhou Du (杜泽州) , Kaiqiang You (游铠强) , Liwei Ma (马利伟) , Tingting Li (李婷婷)
View
abstract
Biomacromolecules form membraneless organelles through liquid–liquid phase separation in order to regulate the efficiency of particular biochemical reactions. Dysregulation of phase separation might result in pathological condensation or sequestration of biomolecules, leading to diseases. Thus, phase separation and phase separating factors may serve as drug targets for disease treatment. Nevertheless, such associations have not yet been integrated into phase separation-related databases. Therefore, based on MloDisDB, a database for membraneless organelle factor–disease associations previously developed by our lab, we constructed PhaSeDis, the phase separation–disease association database. We increased the number of phase separation entries from 52 to 185, and supplemented the evidence provided by the original articles verifying the phase separation nature of the factors. Moreover, we included the information of interacting small molecules with low-throughput or high-throughput evidence that might serve as potential drugs for phase separation entries. PhaSeDis strives to offer comprehensive descriptions of each entry, elucidating how phase separating factors induce pathological conditions via phase separation and the mechanisms by which small molecules intervene. We believe that PhaSeDis would be very important in the application of phase separation regulation in treating related diseases. PhaSeDis is available at http://mlodis.phasep.pro.
Page qzaf014
Database
HumanTestisDB: A Comprehensive Atlas of Testicular Transcriptomes and Cellular Interactions
Mengjie Wang (汪梦杰) , Laihua Li (李来花) , Qing Cheng (程青) , Hao Zhang (张浩) , Zhaode Liu (刘昭德) , Yiqiang Cui (崔益强) , Jiahao Sha (沙家豪), Yan Yuan (袁艳)
View
abstract
Advances in single-cell technology have enabled the detailed mapping of testicular cell transcriptomes, which is essential for understanding spermatogenesis. However, the fragmented nature of age-specific data from various literature sources has hindered comprehensive analysis. To overcome this, the Human Testis Database (HumanTestisDB) was developed, consolidating multiple human testicular sequencing datasets to address this limitation. Through extensive investigation, 38 unique cell types were identified, providing a detailed perspective on cellular variety. Furthermore, the database systematically categorizes samples into eight developmental stages, offering a structured framework to comprehend the temporal dynamics of testicular development. Each stage features comprehensive maps of cell–cell interactions, elucidating the complex communication network inside the testicular microenvironment at particular developmental stages. Moreover, by facilitating comparisons of interactions among various cell types at different stages, the database enables examining alterations that occur during critical transitions in spermatogenesis. HumanTestisDB, available at https://shalab.njmu.edu.cn/humantestisdb, offers vital insights into testicular transcriptomes and cellular interactions, serving as an essential resource for advancing research in reproductive biology.
单细胞技术的进步使得详细描绘睾丸细胞转录组成为可能,这对于理解精子发生过程至关重要。然而,年龄特异性的睾丸单细胞转录组测序数据分散于不同文献中,这种碎片化特性阻碍了综合性分析。为了解决这一问题,我们整合了多个人类睾丸单细胞转录组测序数据集,开发了HumanTestisDB数据库。数据库共包含38种细胞类型,为睾丸细胞多样性提供了详细的视角。此外,该数据库系统地将来自胚胎6周到49岁的样本划分为八个发育阶段,为理解睾丸发育的时间动力学提供了结构性框架,每个发育阶段都提供了细胞间相互作用的全面景观,阐明了特定发育阶段睾丸微环境内复杂的细胞通讯。通过促进不同发育阶段各种细胞类型之间相互作用的比较,该数据库允许探索精子发生过程关键转变时期细胞通讯的改变。HumanTestisDB为睾丸转录组学和细胞间相互作用提供了重要的见解,是生殖生物学研究中重要的资源,可通过以下网址访问:https://shalab.njmu.edu.cn/humantestisdb。
Page qzaf015
Database
PIGOME: An Integrated and Comprehensive Multi-omics Database for Pig Functional Genomics Studies
Guohao Han (韩郭皓) , Peng Yang (杨朋) , Yongjin Zhang (张永进) , Qiaowei Li (李巧伟) , Xinhao Fan (范新浩) , Ruipu Chen (陈锐朴) , Chao Yan (闫超) , Mu Zeng (曾木) , Yalan Yang (杨亚岚) , Zhonglin Tang (唐中林)
View
abstract
In addition to being a major source of animal protein, pigs are an important model for studying development and diseases in humans. Over the past two decades, thousands of high-throughput sequencing studies in pigs have been performed using a variety of tissues from different breeds and developmental stages. However, multi-omics databases specifically designed for pig functional genomics research are still limited. Here, we present PIGOME, a user-friendly database of pig multi-omes. PIGOME currently contains seven types of pig omics datasets, including whole-genome sequencing (WGS), RNA sequencing (RNA-seq), microRNA sequencing (miRNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), assay for transposase-accessible chromatin sequencing (ATAC-seq), bisulfite sequencing (BS-seq), and methylated RNA immunoprecipitation sequencing (MeRIP-seq), from 6901 samples and 392 projects with manually curated metadata, integrated gene annotation, and quantitative trait locus information. Furthermore, various “Explore” and “Browse” functions have been established to provide user-friendly access to omics information. PIGOME implements several tools to visualize genomic variants, gene expression, and epigenetic signals of a given gene in the pig genome, enabling efficient exploration of spatiotemporal gene expression/epigenetic patterns, functions, regulatory mechanisms, and associated economic traits. Collectively, PIGOME provides valuable resources for pig breeding and is helpful for human biomedical research. PIGOME is available at https://pigome.com.
Page qzaf016
Web Server
FarmGTEx TWAS-server: An Interactive Web Server for Customized TWAS Analysis
Zhenyang Zhang , Zitao Chen , Jinyan Teng , Shuli Liu , Qing Lin , Jun Wu , Yahui Gao , Zhonghao Bai , FarmGTEx Consortium , Bingjie Li , George Liu , Zhe Zhang , Yuchun Pan , Zhe Zhang , Lingzhao Fang , Qishan Wang
View
abstract
Transcriptome-wide association study (TWAS) is a powerful approach for investigating the molecular mechanisms linking genetic loci to complex phenotypes. However, the complexity of the TWAS analytical pipeline, including the construction of gene expression reference panels, gene expression prediction, and association analysis using data from genome-wide association studies (GWASs), poses challenges for genetic studies in many species. In this study, we provide the Farm Animal Genotype-Tissue Expression (FarmGTEx) TWAS-server, an interactive and user-friendly multispecies platform designed to streamline the translation of genetic findings across tissues and species. The server incorporates gene expression data from 49 human tissues (838 individuals), 34 pig tissues (5457 individuals), and 23 cattle tissues (4889 individuals), providing prediction models for 38,180 human genes, 21,037 pig genes, and 17,942 cattle genes. It supports genotype-based gene expression prediction, GWAS summary statistics imputation, customizable TWAS analysis, functional annotation, and result visualization. Additionally, we provide 479,203, 1208, and 657 tissue–gene–trait associations for 1129 human traits, 41 cattle traits, and 11 pig traits, respectively. Utilizing the TWAS-server, we validated the association of the ABCD4 gene with pig teat number. Furthermore, we identified that pig backfat thickness may share genetic similarities with human diastolic blood pressure, sarcoidosis (Löfgren syndrome), and body mass index. The FarmGTEx TWAS-server offers a comprehensive and accessible platform for researchers to perform TWAS analyses across tissues and species. It is freely available at https://twas.farmgtex.org, with regular updates planned as the FarmGTEx project expands to include more species.
研究问题:
全转录组关联研究(transcriptome-wide association study,TWAS)是一种强大的方法,可用于探索遗传位点如何通过分子机制影响复杂性状。然而,TWAS分析流程较为复杂,包括基因表达参考面板的构建、基因表达预测以及基于全基因组关联分析(genome-wide association study,GWAS)数据的关联分析等诸多关键环节,这给多物种的遗传研究带来了挑战。因此,亟需一个高效、用户友好的TWAS分析平台,以简化跨组织和跨物种的遗传研究。
研究方法:
本研究构建了Farm Animal Genotype-Tissue Expression(FarmGTEx)TWAS-server,一个交互式、多物种的TWAS分析平台,用于整合跨物种的基因组数据并简化TWAS流程。该平台:
• 整合表达数量性状基因座(expression quantitative trait loci, eQTL)数据:包含49种人类组织(838例)、34种猪组织(5457例)和23种牛组织(4889例)的基因表达数据。
• 建立基因预测模型:提供38,180个人类基因、21,037个猪基因和17,942个牛基因的三种TWAS软件的预测模型。
• 支持多种功能:包括基因表达预测、GWAS摘要统计量填补、自定义TWAS分析、功能注释以及结果可视化等。
• 提供组织-基因-性状关联数据:分别包括479,203个人类、1208个牛、657个猪的组织-基因-性状关联,覆盖1129种人类性状、41种牛性状和 11 种猪性状。
主要结果:
1. FarmGTEx TWAS-server提供了一个全面、易用的平台,可用于跨物种和跨组织的TWAS分析,支持在线访问(https://twas.farmgtex.org),并计划随着FarmGTEx项目的扩展持续更新,涵盖更多物种、更多模型。目前支持GWAS摘要统计量填补、表达量预测、TWAS在线分析、候选基因查询、跨物种性状相关性计算(基于TWAS摘要统计量)以及Gene Ontology (GO)和Kyoto Encyclopedia of Genes and Genomes (KEGG)注释分析等。
2. 通过公共数据验证了ABCD4基因与猪乳头数量的遗传关联。
3. 基于TWAS摘要统计量进行跨物种的对比分析,揭示了猪背膘厚度可能与人类舒张压、结节病(Lofgren综合征)和体质指数(body mass index,BMI)存在遗传相似性。
链接:https://twas.farmgtex.org
Page qzaf006
Web Server
CIEC: Cross-tissue Immune Cell Type Enrichment and Expression Map Visualization for Cancer
Jinhua He , Haitao Luo (罗海涛) , Wei Wang (王伟) , Dechao Bu (卜德超) , Zhengkai Zou (邹正楷) , Haolin Wang (王浩霖) , Hongzhen Tang , Zeping Han , Wenfeng Luo , Jian Shen , Fangmei Xie , Yi Zhao (赵屹) , Zhiming Xiang
View
abstract
Single-cell transcriptome sequencing technology has been applied to decode the cell types and functional states of immune cells, revealing their tissue-specific gene expression patterns and functions in cancer immunity. Comprehensive assessments of immune cells within and across tissues will provide us with a deeper understanding of the tumor immune system in general. Here, we present Cross-tissue Immune cell type or state Enrichment analysis of gene lists for Cancer (CIEC), the first web-based application that integrates database and enrichment analysis to estimate the cross-tissue immune cell types or states. CIEC version 1.0 consists of 480 samples covering primary tumor, adjacent normal tissue, lymph node, metastasis tissue, and peripheral blood from 323 cancer patients. By applying integrative analysis, we constructed an immune cell type/state map for each context, and adopted our previously developed Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology Based Annotation System (KOBAS) algorithm to estimate the enrichment for context-specific immune cell types/states. In addition, CIEC also provides an easy-to-use online interface for users to comprehensively analyze the immune cell characteristics mapped across multiple tissues, including expression map, correlation, similar gene detection, signature score, and expression comparison. We believe that CIEC will be a valuable resource for exploring the intrinsic characteristics of immune cells in cancer patients and for potentially guiding novel cancer–immune biomarker development and immunotherapy strategies. CIEC is freely accessible at http://ciec.gene.ac/.
研究问题:
在癌症患者中,免疫细胞在不同组织中呈现不同的组织特异性富集或者状态的变化;例如,耗竭型T细胞作为一类功能失调的T细胞在肿瘤组织中多呈现富集趋势,而效应型T细胞在患者外周血中普遍存在。目前已有一些单细胞癌症相关数据库被开发,但大规模针对癌症患者免疫细胞组织特异性特征的跨组织研究仍然缺乏,且跨组织分析面临免疫细胞类型整合及注释难题。本研究中,我们将大量癌症多组织类型的单细胞数据进行了整合分析并开发了在线分析及可视化平台,旨在帮助研究人员理解抗肿瘤免疫的工作机制,促进免疫靶点的开发。
研究方法:
从GEO数据库中下载了数百万癌症单细胞 RNA 测序数据,共收集了涵盖 480 个样本的 1,730,501 个细胞,组织类型包括原发性肿瘤、淋巴结、转移组织、邻近正常组织(包括原发性肿瘤所在正常组织的原发性正常组织和转移瘤所在正常组织的转移正常组织)以及外周血。与临床治疗相关的样本被分为治疗组和未治疗组。通过整合分析,我们将不同癌种的单细胞数据集进行了合并。
主要成果:
1. 开发了CIEC 交互式综合在线分析平台,共集成“富集分析”和“表达分析”两个模块。
2. 通过富集分析模块,用户仅需要提供基因列表或基因列表文件,经由一键式操作,即可实现对目标基因集肿瘤跨组织免疫细胞类型/状态的富集分析。
3. 通过表达分析模块,用户可分析目标基因在肿瘤组织免疫细胞中的表达概况、表达的相关性、计算目标基因集的特征得分以及比较特定免疫细胞中目标基因在肿瘤跨组织间的表达差异等。
CIEC主页:
http://ciec.gene.ac/
Page qzae067
Method
VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences
Tao Zhang (张韬) , Yiyun Liu (刘依云) , Xutong Guo (郭栩彤) , Xinran Zhang (张欣然) , Xinchang Zheng (郑欣畅) , Mochen Zhang (张陌尘) , Yiming Bao (鲍一明)
View
abstract
The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce Virus Sequence-based Taxonomy Assignment (VISTA), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to Pairwise Sequence Comparison (PASC), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA’s scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.
Page qzae082
Review Article
Challenges in AI-driven Biomedical Multimodal Data Fusion and Analysis
Junwei Liu (刘俊伟) , Xiaoping Cen (岑萧萍) , Chenxin Yi (伊晨昕) , Feng-ao Wang (王烽傲) , Junxiang Ding (丁俊翔) , Jinyu Cheng (程瑾瑜) , Qinhua Wu (吴沁桦) , Baowen Gai (盖宝文) , Yiwen Zhou (周奕雯) , Ruikun He (贺瑞坤) , Feng Gao (高峰) , Yixue Li (李亦学)
View
abstract
The rapid development of biological and medical examination methods has vastly expanded personal biomedical information, including molecular, cellular, image, and electronic health record datasets. Integrating this wealth of information enables precise disease diagnosis, biomarker identification, and treatment design in clinical settings. Artificial intelligence (AI) techniques, particularly deep learning models, have been extensively employed in biomedical applications, demonstrating increased precision, efficiency, and generalization. The success of the large language and vision models further significantly extends their biomedical applications. However, challenges remain in learning these multimodal biomedical datasets, such as data privacy, fusion, and model interpretation. In this review, we provide a comprehensive overview of various biomedical data modalities, multimodal representation learning methods, and the applications of AI in biomedical data integrative analysis. Additionally, we discuss the challenges in applying these deep learning methods and how to better integrate them into biomedical scenarios. We then propose future directions for adapting deep learning methods with model pretraining and knowledge integration to advance biomedical research and benefit their clinical applications.
随着生物医学检测方法的快速发展,个人生物医学信息的数量和类型得到了大幅扩展,包括基因组学、转录组学、蛋白质组学、代谢组学数据,以及医学影像和电子病历数据(EHRs)。这些多模态数据集在临床场景中具有巨大的应用潜力,可用于精确的疾病诊断、标志物识别和个性化疗法的开发。人工智能(AI)技术,尤其是大语言模型(LLMs)和视觉模型的成功,进一步扩展了AI在生物医学领域的应用。然而,如何有效整合这些跨尺度多模态生物医学数据集,以及在数据隐私、模态缺失和模型的可解释性等问题上依然面临诸多挑战。本综述提供了多种生物医学模态数据、多模态表示学习方法,以及AI在生物医学数据整合分析中的应用的全面概述,深入讨论实际临床应用中的挑战,并展望如何进一步推动人工智能技术在生物医学研究和临床应用中的关键发展方向。
Page qzaf011
Original Research
NSUN2-mediated HCV RNA m5C Methylation Facilitates Viral RNA Stability and Replication
Zhu-Li Li (李珠丽) , Yan Xie (谢焱) , Yafen Wang (王雅芬) , Jing Wang (王晶) , Xiang Zhou (周翔) , Xiao-Lian Zhang (章晓联)
View
abstract
RNA modifications have emerged as new efficient targets against viruses. However, little is known about 5-methylcytosine (m5C) modification in the genomes of flaviviruses. Herein, we demonstrate that hepatitis C virus (HCV), dengue virus, and Zika virus exhibit high levels of viral RNA m5C modification. We identified an m5C site at C7525 in the NS5A gene of the HCV RNA genome. HCV infection upregulates the expression of the host m5C methyltransferase NSUN2 via the transcription factor E2F1. NSUN2 deficiency decreases HCV RNA m5C methylation levels, which further reduces viral RNA stability, replication, and viral assembly and budding. A C7525-specific m5C-abrogating mutation in the HCV RNA genome similarly reduces viral replication, assembly, and budding by decreasing viral RNA stability. Notably, NSUN2 deficiency also reduces host global messenger RNA (mRNA) m5C levels during HCV infection, which upregulates the expression of antiviral innate immune response genes and further suppresses HCV RNA replication. Supported by both cellular and mouse infection models, our findings reveal that NSUN2-mediated m5C methylation of HCV RNA and host mRNAs facilitates viral RNA replication. HCV infection promotes host NSUN2 expression to facilitate HCV replication, suggesting a positive feedback loop. NSUN2 could be a potential therapeutic target for flavivirus therapeutics.
Page qzaf008
Original Research
A Developmental Gene Expression Atlas Reveals Novel Biological Basis of Complex Phenotypes in Sheep
Bingru Zhao , Hanpeng Luo , Xuefeng Fu , Guoming Zhang , Emily L Clark , Feng Wang , Brian Paul Dalrymple , V Hutton Oddy , Philip E Vercoe , Cuiling Wu , George E Liu , Cong-jun Li , Ruidong Xiang , Kechuan Tian , Yanli Zhang , Lingzhao Fang
View
abstract
Sheep (Ovis aries) represent one of the most important livestock species for global animal protein and wool production. However, little is known about the genetic and biological basis of ovine phenotypes, particularly those with high economic value and environmental impact. Here, by integrating 1413 RNA sequencing (RNA-seq) samples from 51 distinct tissues across 14 developmental time points, representing early-prenatal, late-prenatal, neonatal, lamb, juvenile, adult, and elderly stages, we constructed a high-resolution Developmental Gene Expression Atlas (dGEA) in sheep. We observed dynamic patterns of gene expression and regulatory networks across tissues and developmental stages. Leveraging this resource to interpret genetic associations for 48 monogenic and 12 complex traits in sheep, we found that genes upregulated at prenatal developmental stages played more important roles in shaping these phenotypes than those upregulated at postnatal stages. For instance, genetic associations of crimp number, mean staple length (MSL), and individual birthweight were significantly enriched in the prenatal rather than postnatal skin and immune tissues. By comprehensively integrating genome-wide association study (GWAS) fine-mapping results with the sheep dGEA, we identified several candidate genes for complex traits in sheep, such as SOX9 for MSL, GNRHR for litter size at birth, and PRKDC for live weight. These results provide novel insights into the developmental and molecular architecture of ovine phenotypes. The dGEA (https://sheepdgea.njau.edu.cn/) will serve as an invaluable resource for sheep developmental biology, genetics, genomics, and selective breeding.
研究问题
绵羊作为重要经济畜种,其复杂经济性状(如羊毛品质、生长速度、繁殖性能)的遗传机制仍未完全解析,在一定程度上制约了遗传改良和精准育种的效率。传统全基因组关联研究(Genome-Wide Association Studies,GWAS)因连锁不平衡和功能注释不足,难以揭示因果基因及分子机制。目前针对绵羊的研究多局限于单一组织或特定发育阶段,缺乏系统性的多组织发育转录组分析,限制了对复杂性状遗传调控机制的深入理解。
研究方法
整合来自51种不同组织、覆盖14个发育时间点的1413个RNA-seq样本,构建了一个高分辨率的绵羊发育基因表达图谱(developmental Gene Expression Atlas, dGEA),利用多种共表达网络分析和时间序列聚类分析,解析了不同组织和发育时期之间基因表达的动态变化及其调控网络,并结合1639只绵羊的12个复杂性状的GWAS数据与精细定位分析,为绵羊表型的发育及分子机制提供了新的见解。
主要结果
1. 绵羊dGEA提供了一个可查询多组织、多发育时期基因表达谱的平台,支持在线访问(https://sheepdgea.njau.edu.cn/),并计划随着测序数据的更新,涵盖更多组织、发育时间和品种的数据。
2. 多个复杂性状的GWAS信号显著富集于免疫组织、胃肠道和皮肤等组织在产前时期特异性表达的基因,提示早期发育调控对表型形成具有关键作用。
3. 通过整合GWAS精细定位结果与绵羊dGEA,揭示了多个复杂性状的候选基因,如SOX9与羊毛平均纤维长度、GNRHR与出生时的产羔数、PRKDC与活体重相关,并揭示其组织与发育阶段特异性表达模式。
dGEA访问
https://sheepdgea.njau.edu.cn/
Page qzaf020
Original Research
Living on the Rocks: Genomic Analysis of Limestone Langurs Provides Novel Insights into the Adaptive Evolution in Extreme Karst Environments
Zhijin Liu , Xiongfei Zhang , Peipei Wang , Minheng Hong , Xiaochan Yan , Xiaoqiu Qi , Qian Zhao , Zhenghao Chen , Huajian Nie , Hui Li , Ziwen Li , Liye Zhang , Jiwei Qi , Chaolei He , Nguyen Van Truong , Minh D Le , Tilo Nadler , Hiroo Imai , Christian Roos , Ming Li
View
abstract
Understanding how organisms adapt to their environments is a central question in evolutionary biology. Limestone langurs are unique among primates, as they are exclusively found in karst limestone habitats and have evolved mechanisms to tolerate high levels of mineral ions, which are typically associated with metal toxicity affecting organs, cells, and genetic material. We generated a high-quality reference genome (Tfra_5.0) for the limestone langur (Trachypithecus francoisi), along with genome resequencing data for 48 langurs representing 15 Trachypithecus species. Genes encoding ion channels (e.g., Na+, K+, and Ca2+) exhibited significantly accelerated evolution in limestone langurs. Limestone langur-specific mutations in Na+ and Ca2+ channels were experimentally confirmed to modify inward ion currents in vitro. Unexpectedly, scans for positive selection also identified genes involved in DNA damage response/repair pathways, a previously unknown adaptation. This finding highlights an evolutionary adaptation in limestone langurs that mitigates the increased risk of DNA damage posed by elevated metal ion concentrations. Notably, a limestone langur-specific mutation (E94D) of the melanocortin 1 receptor (MC1R) was associated with increased basal cyclic adenosine monophosphate (cAMP) production, contributing to the species’ darker coat color, which likely serves as camouflage on limestone rocks. Our findings reveal novel adaptive evolutionary mechanisms of limestone langurs and offer broader insights into organismal adaptation to extreme environments, with potential implications for understanding human health, biological evolution, and biodiversity conservation.
理解生物体如何适应其栖息地环境是演化生物学、生态学和保护生物学中的核心问题。现存灵长类动物包含500余个物种,是哺乳动物中最多样化的类群之一。研究灵长类动物适应性演化的分子机制,能够为生物演化理论提供遗传证据。同时,研究和保护灵长类动物对于保护生物多样性和维持生态系统功能也至关重要。更为重要的是,灵长类动物作为人类的近缘物种,对其基因组分析有助于理解人类的演化过程,并为人类健康和疾病治疗提供启示。石山叶猴(limestone langur)是生活在喀斯特极端环境中的非人灵长类动物。它们是如何适应喀斯特环境中高浓度金属离子的胁迫的?其演化和适应机制尚未得到充分揭示。本研究为石山叶猴建立了高质量的参考基因组(Tfra_5.0),从DNA损伤修复、离子通道以及形态学方面,揭示了石山叶猴的适应性进化机制,为生物体对极端环境的适应提供了更广泛的见解,从新的角度为理解人类健康、生物进化和生物多样性保护提供了非常有价值的信息。
Page qzaf007
Original Research
Identification of Small Open Reading Frame-encoded Proteins in the Human Genome
Hitesh Kore , Satomi Okano , Keshava K Datta , Jackson Thorp , Parthiban Periasamy , Mayur Divate , Upekha Liyanage , Gunter Hartel , Shivashankar H Nagaraj , Harsha Gowda
View
abstract
One of the main goals of the Human Genome Project is to identify all protein-coding genes. There are ∼ 20,500 protein-coding genes annotated in the human reference databases. However, in the last few years, proteogenomics studies have predicted thousands of novel protein-coding regions, including low-molecular-weight proteins encoded by small open reading frames (sORFs) in untranslated regions of messenger RNAs and non-coding RNAs. Most of these predictions are based on bioinformatics analyses and ribosome footprint data. The validity of some of these sORF-encoded proteins (SEPs) has been established through functional characterization. With the growing number of predicted novel proteins, a strategy to identify reliable candidates that warrant further studies is needed. In this study, we developed an integrated proteogenomics workflow to identify a reliable set of novel protein-coding regions in the human genome based on their recurrent observations across multiple samples. Publicly available ribosome profiling and global proteomic datasets were used to establish protein-coding evidence. We predicted protein translation from 4008 sORFs based on recurrent ribosome occupancy signals across samples. In addition, we identified 825 SEPs based on proteomic data. Some of the novel protein-coding regions identified were located in genome-wide association study (GWAS) loci associated with various traits and disease phenotypes. Peptides from SEPs are also presented by major histocompatibility complex class I (MHC-I), similar to canonical proteins. Novel protein-coding regions reported in this study expand the current catalog of protein-coding genes and warrant experimental studies to elucidate their cellular functions and potential roles in human diseases.
Page qzaf004
Original Research
A Single-cell Atlas of Developing Mouse Palates Reveals Cellular and Molecular Transitions in Periderm Cell Fate
Wenbin Huang (黄文斌) , Zhenwei Qian (钱振伟) , Jieni Zhang (张杰铌) , Yi Ding (丁毅) , Bin Wang (王斌) , Jiuxiang Lin (林久祥) , Xiannian Zhang (张先念) , Huaxiang Zhao (赵华翔) , Feng Chen (陈峰)
View
abstract
Cleft palate is one of the most common congenital craniofacial disorders that affects children’s appearance and oral functions. Investigating the transcriptomes during palatogenesis is crucial for understanding the etiology of this disorder and facilitating prenatal molecular diagnosis. However, there is limited knowledge about the single-cell differentiation dynamics during mid-palatogenesis and late-palatogenesis, specifically regarding the subpopulations and developmental trajectories of periderm, a rare but critical cell population. Here, we explored the single-cell landscape of mouse developing palates from embryonic day (E) 10.5 to E16.5. We systematically depicted the single-cell transcriptomes of mesenchymal and epithelial cells during palatogenesis, including subpopulations and differentiation dynamics. Additionally, we identified four subclusters of palatal periderm and constructed two distinct trajectories of cell fates for periderm cells. Our findings reveal that claudin-family coding genes and Arhgap29 play a role in the non-stick function of the periderm before the palatal shelves contact, and Pitx2 mediates the adhesion of periderm during the contact of opposing palatal shelves. Furthermore, we demonstrate that epithelial–mesenchymal transition (EMT), apoptosis, and migration collectively contribute to the degeneration of periderm cells in the medial epithelial seam. Taken together, our study suggests a novel model of periderm development during palatogenesis and delineates the cellular and molecular transitions in periderm cell determination.
研究问题:
唇腭裂是口腔颅颌面部最常见的先天性出生缺陷,不仅损害了患者的容貌,还会对患者的口颌系统功能及心理健康造成严重的影响。腭裂作为唇腭裂的一种亚型,其分子病因学一直是研究者关注的焦点,也是实现产前分子诊断的重要理论基础。周皮细胞是一类覆盖在上皮表面的单层细胞,尽管数量稀少,却在腭部发育的不同阶段发挥着多样且关键的生理功能。然而,周皮细胞的亚群及其命运决定的机制尚未完全阐明。
研究方法:
本研究通过单细胞转录组学测序,结合生物信息学分析,并通过随后的免疫荧光、RNAscope原位杂交(in situ hybridization, ISH)和体外腭突培养等实验,系统绘制了小鼠腭部发育中的单细胞转录组图谱,特别是鉴定了周皮细胞的四个新亚群,并构建了周皮细胞的发育轨迹:包括融合与角化这两条发育轨迹,进而解析了周皮细胞的发育动力学及关键驱动因子。
主要结果:
1. 全面绘制了小鼠腭部发育的单细胞转录组图谱。
2. 鉴定了周皮细胞的四个新亚群,并构建了周皮细胞的发育轨迹,包括融合与角化这两条发育轨迹。
3. Claudin家族编码基因与Arhgap29参与了周皮细胞在腭突接触融合前的抗粘接功能,而Pitx2介导了周皮细胞在腭突内侧缘接触时的黏附功能。
4. 上皮间充质转化、凋亡与迁移共同作用于中线上皮缝中周皮细胞的消解过程,且上皮间充质转化与凋亡在同一细胞内是互斥的。
5. 周皮细胞的接触方式是决定其命运的关键。
Page qzaf013
Original Research
Evaluative Methodology for HRD Testing: Development of Standard Tools for Consistency Assessment
Zheng Jia, Yaqing Liu, Shoufang Qu, Wenbin Li, Lin Gao, Lin Dong, Yun Xing, Yadi Cheng, Huan Fang, Yuting Yi, Yuxing Chu, Chao Zhang, Yanming Xie, Chunli Wang, Zhe Li, Zhihong Zhang, Zhipeng Xu, Yang Wang, Wenxin Zhang, Xiaoping Gu, Shuang Yang, Jinghua Li, Liangshen Wei, Yuanting Zheng, Guohui Ding, Leming Shi, Xin Yi, Jianming Ying, Jie Huang
View
abstract
Homologous recombination deficiency (HRD) has emerged as a critical prognostic and predictive biomarker in oncology. However, current testing methods, especially those reliant on targeted panels, are plagued by inconsistent results from the same samples. This highlights the urgent need for standardized benchmarks to evaluate HRD assay performance. In phases IIa and IIb of the Chinese HRD Harmonization Project, we developed ten pairs of well-characterized DNA reference materials derived from lung, breast, and melanoma cancer cell lines and their matched normal cell lines, keeping each paired with seven cancer-to-normal mass ratios. Reference datasets for allele-specific copy number variations (ASCNVs) and HRD scores were established and validated using three sequencing methods and nine analytical pipelines. The genomic instability scores (GISs) of the reference materials ranged from 11 to 96, enabling validation across various thresholds. The ASCNV reference datasets covered a genomic span of 2340 to 2749 Mb, equivalent to 81.2% to 95.4% of the autosomes in the 37d5 reference genome. These benchmarks were subsequently utilized to assess the accuracy and reproducibility of four HRD panel assays, revealing significant variability in both ASCNV detection and HRD scores. The concordance between panel-detected GISs and reference GISs ranged from 0.81 to 0.94, with only two assays exhibiting high overall agreement with Myriad MyChoice CDx for HRD classification. This study also identified specific challenges in ASCNV detection in HRD-related regions and the profound impact of high ploidy on consistency. The established HRD reference materials and datasets provide a robust toolkit for objective evaluation of HRD testing.
Page qzaf017
Review Article
Mass Spectrometry-based Solutions for Single-cell Proteomics
Siqi Li, Shuwei Li, Siqi Liu, Yan Ren
View
abstract
Mass spectrometry-based single-cell proteomics (MS-SCP) is attracting tremendous attention because it is now technically feasible to quantify thousands of proteins in minute samples. Since protein amplification is still not possible, technological improvements in MS-SCP focus on minimizing sample loss while increasing throughput, resolution, and sensitivity, as well as achieving measurement depth, accuracy, and stability comparable to bulk samples. Major advances in MS-SCP have facilitated its application in biological and even medical research. Here, we review the key advancements in MS-SCP technology and discuss the strategies of the typical proteomics workflow to improve MS-SCP analysis from single-cell isolation, sample preparation, and liquid chromatography separation to MS data acquisition and analysis. The review will provide an overall understanding of the development and applications of MS-SCP and inspire more novel ideas regarding the innovation of MS-SCP technology.
近年来,基于质谱的单细胞蛋白质组学(MS-SCP)技术迅速发展,成为生命科学领域的研究热点。得益于分析技术的持续进步,研究人员已能够在极微量样本中实现数千种蛋白质的准确定量。鉴于蛋白质无法如核酸一般扩增,当前MS-SCP研究聚焦于以下关键问题:(1)最大限度减少样本损失;(2)提高检测通量;(3)增强检测灵敏度与分辨率;(4)保障数据深度、准确性及稳定性与常规蛋白质组学相当。
本综述系统梳理了单细胞蛋白质组学的研究进展,涵盖从单细胞分离、样品前处理、液相色谱分离到质谱数据采集与分析的全流程优化策略,旨在为该领域研究者提供技术参考,并推动MS-SCP在基础研究与临床转化中的广泛应用。
Page qzaf012
Review Article
Long and Accurate: How HiFi Sequencing is Transforming Genomics
Bo Wang (王博) , Peng Jia (贾鹏) , Shenghan Gao (高胜寒) , Huanhuan Zhao (赵焕焕) , Gaoyang Zheng (郑高洋) , Linfeng Xu (许林峰) , Kai Ye (叶凯)
View
abstract
Recent developments in PacBio high-fidelity (HiFi) sequencing technologies have transformed genomic research, with circular consensus sequencing now achieving 99.9% accuracy for long (up to 25 kb) single-molecule reads. This method circumvents biases intrinsic to amplification-based approaches, enabling thorough analysis of complex genomic regions [including tandem repeats, segmental duplications, ribosomal DNA (rDNA) arrays, and centromeres] as well as direct detection of base modifications, furnishing both sequence and epigenetic data concurrently. This has streamlined a number of tasks including genome assembly, variant detection, and full-length transcript analysis. This review provides a comprehensive overview of the applications and challenges of HiFi sequencing across various fields, including genomics, transcriptomics, and epigenetics. By delineating the evolving landscape of HiFi sequencing in multi-omics research, we highlight its potential to deepen our understanding of genetic mechanisms and to advance precision medicine.
近年来,PacBio高保真(high-fidelity, HiFi)测序技术凭借其循环共识测序(Circular Consensus Sequencing, CSS)方法,成功实现了长达25kb的单分子测序,准确率高达99.9%。这一技术突破不仅克服了传统短读长测序的局限性,还为复杂基因组区域,例如可变数目串联重复序列(variable number tandem repeats, VNTRs)、着丝粒序列及rDNA阵列等的深入研究提供了新的可能性。在本文中,作者结合国内外具有代表性的前沿研究,全面综述了HiFi技术在基因组组装、变异检测、表观遗传学、全长转录本分析以及单细胞测序等领域的广泛应用和突出贡献。最后,文章深入探讨了HiFi测序技术当前面临的挑战,并展望了其未来发展方向。随着这些挑战的逐步解决,HiFi技术必将为精准医疗和多组学研究的发展提供强有力的推动力。
Page qzaf003