Original Research
Pathogen Adaptation of HLA Alleles and Its Correlation with Autoimmune Diseases in the Han Chinese
Shuai Liu (刘帅), Yanyan Li (李燕燕), Tingrui Song (宋廷瑞), Jingjing Zhang (张晶晶), Peng Zhang (张鹏), Huaxia Luo (罗华夏), Sijia Zhang (张斯佳), Yiwei Niu (牛仪伟), Tao Xu (徐涛), Shunmin He (何顺民)
View
abstract
Human leukocyte antigen (HLA) genes play a crucial role in the adaptation of human populations to the dynamic pathogenic environment. Despite their significance, investigating the pathogen-driven evolution of HLAs and its implications for autoimmune diseases presents considerable challenges. Here, we genotyped over 20 HLA genes at 3-field resolution in 8278 individuals from diverse ethnic backgrounds, including 4013 unrelated Han Chinese individuals. We focused on the adaptation of HLAs in the Han Chinese population by analyzing their binding affinity for various pathogens, and explored the potential correlations between pathogen adaptation and autoimmune diseases. Our findings reveal that specific HLA alleles like HLA-DRB1*07:01 and HLA-DQB1*06:01 confer strong pathogen adaptability at the sequence level, notably for Corynebacterium diphtheriae and Bordetella pertussis. Additionally, alleles like HLA-C*03:02 demonstrate adaptive selection against pathogens like Mycobacterium tuberculosis and coronavirus at the gene expression level. Simultaneously, the aforementioned HLA alleles are closely related to some autoimmune diseases such as multiple sclerosis. These exploratory discoveries shed light on the intricate coevolutionary relationships between pathogen adaptation and autoimmune diseases in the human population. These efforts led to an HLA database at http://bigdata.ibp.ac.cn/HLAtyping, aiding searches for HLA allele frequencies across populations.
研究问题:
人类白细胞抗原(Human leukocyte antigen, HLA)基因系统作为人类抵御病毒、细菌等病原体的重要防线,数千年来在人类与病原微生物环境的共适应中发挥着关键作用,对于人类自身免疫疾病的发生发展也有着重要影响。当前,人群中HLA基因对于不同病原微生物的适应性能力,以及与自身免疫疾病之间的关联,仍未有清晰的描绘。
研究方法:
本研究以“女娲”基因组资源为核心,利用当前先进的HLA分型工具,针对世界范围内 8278 个个体的31个HLA基因进行了高精度HLA基因分型。在此基础上,利用当前表现最优的HLA-短肽分子对接工具,计算了样本人群中HLA与常见病原体抗原之间的结合能力分数,用于评估HLA基因对病原体的适应能力。而后通过基因连锁分析,描绘病原适应HLA基因与自身免疫疾病HLA基因之间的遗传相关性。
主要结果:
1. 本研究基于世界范围内 8278 个个体的基因组测序数据,构建了31个HLA基因分辨率高达6-digit的HLA基因数据资源(http://bigdata.ibp.ac.cn/HLAtyping)。
2. 基于HLA数据资源,本研究系统探索了人群中HLA基因对于31种常见病原体的适应能力,发现HLA-DRB1基因普遍对于不同病原体有着更强的结合能力,并列出了一些对病原体有着强适应能力的HLA基因例子。
3. 结合基因多效性和遗传连锁,分析了病原适应HLA基因与自身免疫疾病易感性之间的遗传关联,为解读疾病的起源和发展提供了演化医学的视角。
Page qzaf038
Database
NeoTCR: An Immunoinformatic Database of Experimentally-supported Functional Neoantigen-specific TCR Sequences
Weijun Zhou (周炜均), Wenting Xiang (向文婷), Jinyi Yu (喻瑾怡), Zhihan Ruan (阮志涵), Yichen Pan (潘逸辰), Kankan Wang (王侃侃), Jian Liu (刘健)
View
abstract
Neoantigen-based immunotherapy has demonstrated long-lasting antitumor activity. The recognition of neoantigens by T cell receptors (TCRs) is considered a trigger for antitumor responses. Due to the overwhelming number of TCR repertoires in the human genome, pinpointing neoantigen-specific TCRs is a formidable challenge. Recent studies have identified a number of functional neoantigen-specific TCRs, but the corresponding information is scattered across published literature and is difficult to retrieve. To improve access to these data, we developed an immunoinformatic database (NeoTCR) containing a unified description of publicly available neoantigen-specific TCR sequences, as well as relevant information on targeted neoantigens, from experimentally-supported studies across 17 cancer subtypes. A user-friendly web interface allows interactive browsing and running of complex database queries. To facilitate rapid identification of neoantigen-specific TCRs from raw sequencing data, NeoTCR offers a one-stop analysis for annotation and visualization of TCR clonotypes, discovery of existing neoantigen-specific TCRs, and exclusion of bystander virus-associated TCRs. NeoTCR represents a unique tool to expedite future studies of neoantigen-specific TCRs and the development of neoantigen-based immunotherapy. NeoTCR is available at http://neotcrdb.bioxai.cn/ and https://github.com/lyotvincent/NeoTCR.
要点介绍
研究问题:
基于新抗原的免疫疗法被证实具有持久的抗肿瘤活性。T细胞受体(TCR)特异性识别新抗原是抗肿瘤免疫反应的触发因素。由于人类基因组中TCR数量庞大,从中精准识别新抗原特异性TCR具有极大挑战。新近研究发现了一些功能性新抗原特异性TCR,然而其信息散见于海量文献中,难以检索,亟需对其进行系统的采集与挖掘,以促进对新抗原特异性TCR的深入了解,并为靶向新抗原免疫疗法的临床应用提供参考。
研究方法:
基于对公共数据库中TCR序列信息及新抗原数据的广泛收集,对已发表文献的信息挖掘,对多来源信息的交叉整合,我们发布了经生物学验证的功能性新抗原特异性TCR数据库——NeoTCR。进一步,通过对TCR分析工具的整合,对公共TCR数据库信息的提取,我们在NeoTCR部署了新抗原特异性TCR分析工具,集成TCR克隆型注释和可视化、新抗原特异性TCR标记及旁观者病毒相关TCR识别的一站式分析。
主要成果1:
NeoTCR包含988条经生物学验证的功能性新抗原特异性TCR序列,来自18种肿瘤类型。其中99%序列来自人类数据,包括肿瘤患者外周血淋巴细胞、健康志愿者外周血淋巴细胞及肿瘤浸润淋巴细胞,其余来自人源化小鼠模型。NeoTCR还提供每条序列相应的新抗原信息,包括靶向基因突变位点、HLA限制性等。为新抗原靶标及其特异性TCR序列的快速筛选提供参考。
主要成果2:
NeoTCR提供TCR开源可视化分析工具,对群体TCR测序数据进行TCR序列信息注释、CDR3长度分布/序列频率、V(D)J基因片段分布和V(D)J基因片段使用频率等克隆型可视化,为TCR序列信息的深度挖掘提供参考。
主要成果3:
提供TCR序列与NeoTCR及公共数据库的比对分析,快速识别新抗原特异性TCR,排除旁观病毒相关TCR,为基于新抗原的免疫疗法的研发及临床应用提供参考。
数据库链接:
http://neotcrdb.bioaimed.com
https://github.com/lyotvincent/NeoTCR
Page qzae010
Database
PlateletBase: A Comprehensive Knowledgebase for Platelet Research and Disease Insights
Huaichao Luo, Changchun Wu, Sisi Yu, Hanxiao Ren, Xing Yin, Ruiling Zu, Lubei Rao, Peiying Zhang, Xingmei Zhang, Ruohao Wu, Ping Leng, Kaijiong Zhang, Qi Peng, Bangrong Cao, Rui Qin, Hulin Wei, Jianlin Qiao, Shanling Xu, Qun Yi, Yang Zhang, Jian Huang, Dongsheng Wang
View
abstract
Platelets are vital in many pathophysiological processes, yet there is a lack of a comprehensive resource dedicated specifically to platelet research. To fill this gap, we have developed PlateletBase, a knowledge base aimed at enhancing the understanding and study of platelets and related diseases. Our team retrieved information from various public databases, specifically extracting and analyzing RNA sequencing (RNA-seq) data from 3711 samples across 41 different conditions available on the National Center for Biotechnology Information (NCBI). PlateletBase offers six analytical and visualization tools, enabling users to perform gene similarity analysis, pair correlation, multi-correlation, expression ranking, clinical information association, and gene annotation for platelets. The current version of PlateletBase includes 10,278 genomic entries, 31,758 transcriptomic entries, 4869 proteomic entries, 2614 omics knowledge entries, 1833 drugs, 97 platelet resources, 438 diseases/traits, and six analysis modules. Each entry has been carefully curated and supported by experimental evidence. Additionally, PlateletBase features a user-friendly interface designed for efficient querying, manipulation, browsing, visualization, and analysis of detailed platelet protein and gene information. The case studies on gray platelet syndrome and angina pectoris demonstrate that PlateletBase is a suitable tool for identifying diagnostic biomarkers and exploring disease mechanisms, thereby advancing research in platelet functionality. PlateletBase is accessible at http://plateletbase.clinlabomics.org.cn/.
研究问题:血小板在多种病理生理过程中发挥关键作用,但目前缺乏专门针对血小板研究的综合性资源。现有数据库对血小板相关多组学数据的系统性整合不足,制约了血小板功能解析与疾病机制研究。
研究方法:
本研究通过系统整合多源数据构建血小板研究知识库,首先整合GWAS Catalog等权威数据库收录10,278项遗传变异,整合25项涵盖21种疾病的蛋白质组研究鉴定4,869种蛋白表达特征,处理41种病理状态下3,711例样本的RNA测序数据,同时标准化收录1,833项药物-基因互作关系。数据处理严格遵循HGNC标准统一基因命名,对接Disease Ontology术语规范疾病分类,RNA-seq数据经FastQC质控、STAR比对及ComBat批次校正,蛋白质组数据标注实验证据等级。技术架构采用Spring Boot与MySQL实现高效数据管理,基于Vue框架开发交互可视化模块,集成热图、桑基图等分析工具,并通过整合GTEx eQTL等数据构建RNA-DNA-蛋白-疾病多维调控网络。
研究结果:
本研究构建的血小板研究知识库整合了10,278个基因组条目、31,758个转录组条目及4,869个蛋白质组条目,涵盖2,614项组学知识、1,833种药物靶点和438种疾病特征,形成多维度数据体系。通过灰血小板综合征标志物(GP1BA、NBEAL2)的精准识别和心绞痛信号通路的系统解析,验证了平台在疾病机制研究中的应用效能。该平台所有数据均经实验证据验证,提供基因相似性分析、临床关联挖掘等六大核心功能模块,支持从数据查询、可视化分析到机制探索的全流程研究,其交互式界面整合热图、桑基图等工具,实现复杂组学数据的动态解析与跨组学网络构建。
数据库链接:
http://plateletbase.clinlabomics.org.cn/
Page qzaf031
Database
EryDB: A Transcriptomic Profile Database for Erythropoiesis and Erythroid-related Diseases
Guangmin Zheng (郑光敏), Song Wu (吴松), Zhaojun Zhang, Zijuan Xin (辛子娟), Lijuan Zhang (张立娟), Siqi Zhao, Jing Wu (吴静), Yanxia Liu (刘彦霞), Meng Li (李蒙), Xiuyan Ruan, Nan Qiao, Yiming Bao (鲍一明), Hongzhu Qu (渠鸿竹), Xiangdong Fang (方向东)
View
abstract
Erythropoiesis is a finely regulated and complex process that involves multiple transformations from hematopoietic stem cells to mature red blood cells at hematopoietic sites from the embryonic to adult stages. Investigations into its molecular mechanisms have generated a wealth of expression data, including bulk and single-cell RNA sequencing data. A comprehensively integrated and well-curated erythropoiesis-specific database will greatly facilitate the mining of gene expression data and enable large-scale research of erythropoiesis and erythroid-related diseases. Here, we present EryDB, an open-access and comprehensive database dedicated to the collection, integration, analysis, and visualization of transcriptomic data for erythropoiesis and erythroid-related diseases. Currently, the database includes expertly curated quality-assured data of 3803 samples and 1,187,119 single cells derived from 107 public studies of three species (Homo sapiens, Mus musculus, and Danio rerio), nine tissue types, and five diseases. EryDB provides users with the ability to not only browse the molecular features of erythropoiesis between tissues and species, but also perform computational analyses of single-cell and bulk RNA sequencing data, thus serving as a convenient platform for customized queries and analyses. EryDB v1.0 is freely accessible at https://ngdc.cncb.ac.cn/EryDB/home.
研究问题:
红细胞生成的分子机制研究产生了丰富的基因表达数据,但没有可用的特异性红细胞生成数据库。EryDB整合了来自多项研究的样本数据,提供了查询和分析红细胞生成特征以及相关基因和疾病的功能。
研究方法:
我们整合收集了来自107项红细胞生成及相关疾病研究的3803个样本和118万单细胞转录组数据,涵盖了多个物种和组织类型。EryDB数据库采用了统一的数据预处理流程,确保数据的一致性和可比性。而且,EryDB提供了丰富的数据分析功能,针对单细胞数据还提供了单细胞可视化和细胞间通讯分析等功能,为用户提供了便捷的数据挖掘平台。
主要成果1:
开发了红细胞生成相关的特异数据库EryDB,整合了来自107项研究的3803个样本和118万个单细胞数据,涵盖多个物种和组织类型。
主要成果2:
EryDB提供了丰富的数据查询和分析功能,包括基因表达谱、差异分析、功能富集等,方便用户挖掘红细胞生成相关的分子特征。
主要成果3:
EryDB集成了单细胞转录组数据分析工具,如细胞可视化、marker基因分析、细胞间通讯分析等,支持对红细胞分化过程的深入研究。
数据库链接:
https://ngdc.cncb.ac.cn/EryDB/home
Page qzae029
Database
HemAtlas: A Multi-omics Hematopoiesis Database
Zhixin Kang, Tongtong Zhu, Dong Zou, Mengyao Liu, Yifan Zhang, Lu Wang, Zhang Zhang, Feng Liu
View
abstract
Advancements in high-throughput omics technologies have facilitated a systematic exploration of crucial hematopoietic organs across diverse species. A thorough understanding of hematopoiesis in vivo and facilitation of generating functional hematopoietic stem and progenitor cells (HSPCs) in vitro necessitate a comprehensive hematopoietic cross-stage developmental landscape across species. To address this need, we developed HemAtlas, a platform designed for the systematic mapping of hematopoiesis both in vivo and in vitro. HemAtlas features detailed analyses of multi-omics datasets from humans, mice, zebrafish, and HSPC in vitro culture systems. Utilizing literature curation and data normalization, HemAtlas integrates various functional modules, allowing interactive exploration and visualization of any collected omics data based on user-specific interests. Moreover, by applying a systematic and uniform integration method, we constructed organ-wide hematopoietic references for each species with manually curated cell annotations, enabling a comprehensive decoding of cross-stage developmental hematopoiesis at the organ level. Of particular significance are three distinctive functions — single-cell cross-stage, cross-species, and cross-model analyses — that HemAtlas employs to elucidate the hematopoietic development in zebrafish, mice, and humans, and to offer guidance on the generation of HSPCs in vitro. Simultaneously, HemAtlas incorporates a comprehensive map of HSPC cross-stage development to reveal HSPC stage-specific properties. Taken together, HemAtlas serves as a crucial resource to advance our understanding of hematopoiesis and is available at https://ngdc.cncb.ac.cn/hematlas/.
研究问题:
造血干/祖细胞(Hematopoietic Stem and Progenitor Cells, HSPCs)的发育过程跨越多个造血器官,并受到物种保守的调控机制精细调节。同时,体外诱导功能性造血干细胞对于血液疾病的治疗具有重要应用价值。然而,现有数据库尚缺乏对造血发育全过程(从胚胎至成体)、多物种(如斑马鱼、小鼠和人类)以及体外模型的系统性整合。因此,构建一个多维度、可交互、聚焦于造血发育的数据资源平台,是当前该领域亟需解决的科学问题。
研究方法:
本研究系统整合了来自12个关键造血器官的94套多组学数据(涵盖单细胞转录组、表观组和空间转录组),覆盖374种主要细胞类型。通过标准化处理与整合分析,本文构建了一个涵盖多组学、多物种、多发育时期与多种模型(体内和体外)的造血数据库 —— HemAtlas(Hematopoiesis Atlas)版本1.0。
主要成果:
1. 数据库HemAtlas首先构建了一个高效的多组学可视化平台,便于用户在线深入探索各类多组学数据
2. 在此基础上,研究团队在单细胞水平上绘制了不同物种中各造血器官的单细胞转录组参考图谱,并提供了相应的在线可视化探索和数据下载功能。
3. 此外,研究人员开发了HemAtlas的三大核心在线分析模块——跨时期、跨物种和跨模型分析,内置多种分析工具,能够帮助用户系统解析体内造血的动态过程,并为体外造血研究提供指导。
4. 最后,研究团队聚焦于HSPC,系统揭示了不同物种中HSPC在跨时期发育过程中的细胞状态变化及其调控机制。
Page qzaf026
Database
HemaCisDB: An Interactive Database for Analyzing Cis-regulatory Elements Across Hematopoietic Malignancies
Xinping Cai (蔡信平), Qianru Zhang (张倩茹), Bolin Liu (刘博琳), Lu Sun (孙露), Yuxuan Liu (刘宇璇)
View
abstract
Non-coding cis-regulatory elements (CREs), such as transcriptional enhancers, are key regulators of gene expression programs. Accessible chromatin and H3K27ac are well-recognized markers for CREs associated with their biological function. Deregulation of CREs is commonly found in hematopoietic malignancies, yet the extent to which CRE dysfunction contributes to pathophysiology remains incompletely understood. Here, we developed HemaCisDB, an interactive, comprehensive, and centralized online resource for CRE characterization across hematopoietic malignancies, serving as a useful resource for investigating the pathological roles of CREs in blood disorders. Currently, we collected 922 assay of transposase accessible chromatin with sequencing (ATAC-seq), 190 DNase I hypersensitive site sequencing (DNase-seq), and 531 H3K27ac chromatin immunoprecipitation followed by sequencing (ChIP-seq) datasets from patient samples and cell lines across different myeloid and lymphoid neoplasms. HemaCisDB provides comprehensive quality control metrics to assess ATAC-seq, DNase-seq, and H3K27ac ChIP-seq data quality. The analytic modules in HemaCisDB include transcription factor (TF) footprinting inference, super-enhancer identification, and core transcriptional regulatory circuitry analysis. Moreover, HemaCisDB also enables the study of TF binding dynamics by comparing TF footprints across different disease types or conditions via web-based interactive analysis. Together, HemaCisDB provides an interactive platform for CRE characterization to facilitate mechanistic studies of transcriptional regulation in hematopoietic malignancies. HemaCisDB is available at https://hemacisdb.chinablood.com.cn/.
研究问题
顺式作用元件(cis-regulatory elements, CREs),如增强子,是一类非编码DNA序列,能够通过与特异性转录因子结合调控基因的转录活性。在血液系统恶性肿瘤(如白血病和淋巴瘤)中,CRE的异常调控普遍存在。例如,CRE区域的突变可干扰转录因子的结合位点,导致靶基因表达异常,进而促进肿瘤的发生和进展。此外,已有多项研究报道CRE活性异常在多种血液系统肿瘤中具有潜在的致病作用。 然而,现有数据库尚缺乏对血液肿瘤相关CRE的系统整合与功能注释。因此,构建一个全面、可交互的血液肿瘤CRE数据库,将有助于系统解析CRE异常在血液肿瘤发生发展中的调控作用。
研究方法
本研究构建了交互式在线资源库HemaCisDB,用于全面系统解析血液系统恶性肿瘤中的顺式作用元件(CREs)。CRE的活性与染色质开放状态及H3K27ac信号等表观遗传标志密切相关。目前,HemaCisDB整合了1634个来自多种髓系和淋系肿瘤患者样本及细胞系的表观基因组学数据(ATAC-seq、DNase-seq、H3K27ac ChIP-seq),并为所收集数据提供了系统的的质量控制标准。该数据库集成了多项功能模块,包括转录因子(TF)足迹推断/比较、超级增强子(SE)鉴定、核心调控回路(CRC)分析等,为研究CRE在血液肿瘤发生与发展中的调控机制提供了有效工具。
主要结果
1. HemaCisDB数据库涵盖多种血液系统恶性肿瘤及健康供者的表观组学数据,所涉及的疾病类型包括霍奇金淋巴瘤(Hodgkin lymphoma, HL)、非霍奇金淋巴瘤(non-Hodgkin lymphoma, NHL)、急性淋巴细胞白血病(acute lymphoblastic leukemia, ALL)、急性髓系白血病(acute myeloid leukemia, AML)、慢性淋巴细胞白血病(chronic lymphocytic leukemia, CLL)、慢性髓性白血病(chronic myeloid leukemia, CML)、原始浆细胞样树突状细胞肿瘤(blastic plasmacytoid dendritic cell neoplasm, BPDCN)以及多发性骨髓瘤(multiple myeloma, MM)。数据库整合的表观组学数据类型包括 ATAC-seq、DNase-seq 和 H3K27ac ChIP-seq。每个样本均附带详尽的遗传或药物干预处理信息。HemaCisDB 提供了便捷友好的可视化界面,支持用户基于疾病类型和样本来源对数据进行查询、浏览与下载。
2. 针对每一个样本的 ATAC-seq、DNase-seq 以及 H3K27ac ChIP-seq 数据,HemaCisDB 提供了系统的质量控制标准,并通过链接UCSC Genome Browser实现数据的可视化展示。同时,对通过上述三种数据鉴定的 CRE 区域进行了全面的基因组定位及功能注释。进一步地,HemaCisDB 对所有鉴定的 CRE 区域进行了整合分析,识别出不同疾病间共享或特异的调控区域,并提供与这些区域相关的 SNPs、eQTLs 等信息进行注释,以支持其潜在生物学功能的深入研究。
3,HemaCisDB针对不同类型的表观组学数据开发了三大核心分析模块:转录因子(TF)足迹推断与比较、超级增强子(SE)鉴定以及核心调控回路(CRC)分析。其中,转录因子足迹比较分析模块支持用户在网页端进行交互式分析,允许比较不同疾病类型或不同处理条件下的转录因子结合模式,从而揭示转录因子在病理状态或干预过程中的动态变化。
数据库链接
HemaCisDB访问网址:https://hemacisdb.chinablood.com.cn/
Page qzae088
Application Note
HemaScope: A Tool for Analyzing Single-cell and Spatial Transcriptomics Data of Hematopoietic Cells
Zhenyi Wang, Yuxin Miao, Hongjun Li, Wenyan Cheng, Minglei Shi, Gang Lv, Yating Zhu, Junyi Zhang, Tingting Tan, Jin Gu, Michael Q Zhang, Jianfeng Li, Hai Fang, Zhu Chen, Saijuan Chen
View
abstract
Single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) techniques hold great value in evaluating the heterogeneity and spatial characteristics of hematopoietic cells within tissues. These two techniques are highly complementary, with scRNA-seq offering single-cell resolution and ST retaining spatial information. However, there is an urgent demand for well-organized and user-friendly toolkits capable of handling single-cell and spatial information. Here, we present HemaScope, a specialized bioinformatics toolkit featuring modular designs to analyze scRNA-seq and ST data generated from hematopoietic cells. It enables users to perform quality control, basic analysis, cell atlas construction, cellular heterogeneity exploration, and dynamical examination on scRNA-seq data. Also, it can perform spatial analysis and microenvironment analysis on ST data. Meanwhile, HemaScope takes into consideration hematopoietic cell-specific features, including lineage affiliation evaluation, cell cycle prediction, and marker gene collection. To enhance the user experience, we have deployed the toolkit in user-friendly forms: HemaScopeR (an R package), HemaScopeCloud (a web server), HemaScopeDocker (a Docker image), and HemaScopeShiny (a graphical interface). In case studies, we employed it to construct a cell atlas of human bone marrow, analyze age-related changes, and identify acute myeloid leukemia cells in mice. Moreover, we characterized the microenvironments in angioimmunoblastic T cell lymphoma and primary central nervous system lymphoma, elucidating tumor boundaries. HemaScope is freely available at https://zhenyiwangthu.github.io/HemaScope_Tutorial/.
研究问题:
单细胞转录组测序(scRNA-seq)技术和空间转录组(ST)技术在评估组织内造血细胞的异质性和空间特征方面具有重要价值。这两种技术高度互补,scRNA-seq可以提供单细胞分辨率,ST则保留了转录组表达的空间信息。目前,我们迫切需要能够同时处理这两种数据且用户友好的工具包。
研究方法:
HemaScope是一款专为分析造血细胞scRNA-seq和ST数据而设计的模块化生物信息学工具包。针对scRNA-seq数据,HemaScope首先对基因和细胞进行质量控制,利用DoubletFinder去除双细胞,使用Seurat的FindIntegrationAnchors策略去除样本间的批次效应,并对数据进行归一化、标准化等基础分析。第二步,HemaScope构建单细胞数据图谱,利用多种降维方法可视化数据,使用HematoMap、标志基因等多种方法识别细胞类型,并进行通路富集分析、CNV预测、基因网络分析等下游分析,全面描绘数据状态。第三步,为了量化造血细胞的异质性,HemaScope充分考虑了造血细胞的特征,评估了每一个细胞的细胞周期分数,并设计了谱系得分来量化每个细胞与造血系统中各个谱系的隶属程度,利用GSVA展示不同数据的特点。最后,HemaScope集成了Monocle2、Slingshot和scVelo对数据进行轨迹分析,利用SCENIC分析转录因子调控网络,借助CellChat评估不同细胞群之间的相互作用。
针对ST数据,HemaScope首先过滤掉质量较低的数据点和基因,随后使用Seurat对数据进行归一化、标准化、降维、聚类、可视化等基础分析。第二步,HemaScope对ST样本进行基础的空间分析,利用Seurat的FindSpatiallyVariableFeatures函数识别空间高变特征,集成了Commot进行空间细胞相互作用分析,并借助cell2location对不足单细胞分辨率的ST数据进行解卷积。为了分析造血系统,特别是疾病状态下的微环境特征,HemaScope使用copyKAT分析空间不同区域的CNV情况,并基于解卷积过程得到的不同细胞类型空间分布情况利用K-means划分不同种类的微环境,分析不同微环境的成分特征、CNV特征以及微环境内部、之间的空间相互作用。
为了提升用户体验,我们开发了多种用户友好的形式,包括HemaScopeR(R包)、HemaScopeCloud(Web服务器)、HemaScopeDocker(Docker镜像)和HemaScopeShiny(图形界面)。
主要成果:
1. HemaScope集成了目前最新的scRNA-seq数据分析算法的工作流,用户能够对数据做质量控制、基本分析、细胞图谱构建、细胞异质性计算以及细胞动态特征分析。
2. HemaScope集成了目前最新的ST数据分析算法的工作流,用户能够对数据做质量控制、基本分析、空间分析以及微环境特征解析。
3. 用户友好设计,可通过多种途径使用,包括HemaScopeR(R包)、HemaScopeCloud(Web服务器)、HemaScopeDocker(Docker镜像)和HemaScopeShiny(图形界面)。
使用指南、R包和图形界面的Github链接、Web服务器网址、Docker镜像、BioCode链接:
https://zhenyiwangthu.github.io/HemaScope_Tutorial/
https://github.com/ZhenyiWangTHU/HemaScopeR/
https://hemascope.hiplot.cn/?home=hemascope
https://hub.docker.com/r/zhenyiwang123/hemascopedocker
https://ngdc.cncb.ac.cn/biocode/tool/7725
Page qzaf002
Method
DyNDG: Identifying Leukemia-related Genes Based on Time-series Dynamic Network by Integrating Differential Genes
Jin A (阿瑾), Ju Xiang (项炬), Xiangmao Meng (孟祥茂), Yue Sheng (盛岳), Hongling Peng (彭宏凌), Min Li (李敏)
View
abstract
Leukemia is a malignant disease characterized by progressive accumulation with high morbidity and mortality rates, and investigating its disease genes is crucial for understanding its etiology and pathogenesis. Network propagation methods have emerged and been widely employed in disease gene prediction, but most of them focus on static biological networks, which hinders their applicability and effectiveness in the study of progressive diseases. Moreover, there is currently a lack of special algorithms for the identification of leukemia disease genes. Here, we proposed a novel Dynamic Network-based model integrating Differentially expressed Genes (DyNDG) to identify leukemia-related genes. Initially, we constructed a time-series dynamic network to model the development trajectory of leukemia. Then, we built a background–temporal multilayer network by integrating both the dynamic network and the static background network, which was initialized with differentially expressed genes at each stage. To quantify the associations between genes and leukemia, we extended a random walk process to the background–temporal multilayer network. The results demonstrate that DyNDG achieves superior accuracy compared to several state-of-the-art methods. Moreover, after excluding housekeeping genes, DyNDG yields a set of promising candidate genes associated with leukemia progression or potential biomarkers, indicating the value of dynamic network information in identifying leukemia-related genes. The implementation of DyNDG is available at both https://ngdc.cncb.ac.cn/biocode/tool/BT7617 and https://github.com/CSUBioGroup/DyNDG.
Page qzaf037
Method
UNISOM: Unified Somatic Calling and Machine Learning-based Classification Enhance the Discovery of CHIP
Shulan Tian, Garrett Jenkinson, Alejandro Ferrer, Huihuang Yan, Joel A Morales-Rosado, Kevin L Wang, Terra L Lasho, Benjamin B Yan, Saurabh Baheti, Janet E Olson, Linda B Baughn, Wei Ding, Susan L Slager, Mrinal S Patnaik, Konstantinos N Lazaridis, Eric W Klee
View
abstract
Clonal hematopoiesis (CH) of indeterminate potential (CHIP), driven by somatic mutations in leukemia-associated genes, confers increased risk of hematologic malignancies, cardiovascular disease, and all-cause mortality. In blood of healthy individuals, small CH clones can expand over time to reach 2% variant allele frequency (VAF), the current threshold for CHIP. Nevertheless, reliable detection of low-VAF CHIP mutations is challenging, often relying on deep targeted sequencing. Here, we present UNISOM, a streamlined workflow for enhancing CHIP detection from whole-genome and whole-exome sequencing data that are underpowered, especially for low VAFs. UNISOM utilizes a meta-caller for variant detection, in couple with machine learning models which classify variants into CHIP, germline, and artifact. In whole-exome sequencing data, UNISOM recovered nearly 80% of the CHIP mutations identified via deep targeted sequencing in the same cohort. Applied to whole-genome sequencing data from Mayo Clinic Biobank, it recapitulated the patterns previously established in much larger cohorts, including the most frequently mutated CHIP genes and predominant mutation types and signatures, as well as strong associations of CHIP with age and smoking status. Notably, 30% of the identified CHIP mutations had < 5% VAFs, demonstrating its high sensitivity toward small mutant clones. This workflow is applicable to CHIP screening in population genomic studies. The UNISOM pipeline is freely available at https://github.com/shulanmayo/UNISOM and https://ngdc.cncb.ac.cn/biocode/tool/7816.
Page qzaf040
Original Research
Plasma Proteomic Profiling Reveals ITGA2B as A Key Regulator of Heart Health in High-altitude Settlers
Yihao Wang (王一豪), Pan Shen (沈磐), Zhenhui Wu (伍振辉), Bodan Tu (涂博丹), Cheng Zhang (张程), Yongqiang Zhou (周永强), Yisi Liu (刘溢思), Guibin Wang (王贵宾), Zhijie Bai (柏志杰), Xianglin Tang (汤响林), Chengcai Lai (赖成材), Haitao Lu (吕海涛), Wei Zhou (周维), Yue Gao (高月)
View
abstract
Myocardial injury is a common disease in the plateau, especially in the lowlanders who have migrated to the plateau, in which the pathogenesis is not well understood. Here, we established a cohort of lowlanders comprising individuals from both low-altitude and high-altitude areas and conducted plasma proteomic profiling. Proteomic data showed that there was a significant shift in energy metabolism and inflammatory response in individuals with myocardial abnormalities at high altitude. Notably, integrin alpha-Ⅱb (ITGA2B) emerged as a potential key player in this context. Functional studies demonstrated that ITGA2B upregulated the transcription and secretion of interleukin-6 (IL-6) through the integrin-linked kinase (ILK)/nuclear factor-κB (NF-κB) signaling axis under hypoxic conditions. Moreover, ITGA2B disrupted mitochondrial structure and function, increased glycolytic capacity, and aggravated energy reprogramming from oxidative phosphorylation to glycolysis. Leveraging the therapeutic potential of traditional Chinese medicine in cardiac diseases, we discovered that tanshinone ⅡA (TanⅡA) effectively alleviated the myocardial injury caused by the abnormally elevated expression of ITGA2B and hypobaric hypoxia exposure in mice, thus providing a novel candidate therapeutic strategy for the prevention and treatment of high-altitude myocardial injury.
研究问题:
高原心肌损伤作为移居高原人群常见的缺氧失习服病症,严重威胁高原移居者的健康。然而,目前对该疾病的分子机制缺乏系统性解析,且临床针对高原低氧环境特异性的心肌损伤防护药物十分有限。
研究方法:
本研究通过构建跨海拔域人群队列,并利用DIA血浆蛋白质组学技术,系统揭示移居高原人群心肌损伤的血浆蛋白质组变化规律。其次,通过分子生物学手段在体内外实验中详细阐明了高原心肌损伤关键差异蛋白ITGA2B和ILK的功能。最后,聚焦中医药优势资源,锚定丹参中的有效成分丹参酮ⅡA,利用靶点筛选和验证技术探究丹参酮ⅡA与ITGA2B-ILK等分子的结合能力,以及丹参酮ⅡA的药理活性。
主要结果:
1. 首次构建了移居高原人群心肌损伤的血浆蛋白质组图谱。
2. 发现整合素ITGA2B和整合素连接激酶ILK在高原心肌损伤移居者的血浆中显著高表达。
3.功能和机制研究揭示ITGA2B-ILK信号轴过度激活加重缺氧心肌的结构和功能损伤,加剧缺氧状态下的炎症反应和能量重编程。
4. 靶点验证结果表明丹参酮ⅡA能够靶向ITGA2B,同时活性评价结果显示丹参酮ⅡA可有效缓解ITGA2B异常高表达引发的高原心肌损伤。
Page qzaf030
Preface
Biomedical Big Data and Artificial Intelligence in Blood
Fuhong He (和夫红), Zhaojun Zhang (张昭军), Xiangdong Fang (方向东), Qian-Fei Wang (王前飞)
View
abstract
Page qzaf043
Original Research
Clonal Hematopoietic Mutations in Plasma Cell Disorders: Clinical Subgroups and Shared Pathogenesis
Xuezhu Wang, Liping Zuo, Yanying Yu, Xinyi Xiong, Jian Xu, Bing Qiao, Jia Chen, Hao Cai, Qi Yan, Hongxiao Han, Xin-xin Cao, Jun Deng, Chunyan Sun, Jian Li
View
abstract
Plasma cell disorders (PCDs) are marked by the clonal proliferation of abnormal plasma cells and bone marrow plasma cells (BMPCs), causing various clinical complications. These PCDs include subtypes with distinct clinical features. Multiple myeloma (MM) and monoclonal gammopathy of undetermined significance (MGUS) are more common and relatively well-studied. In contrast, primary light-chain amyloidosis (AL) and POEMS syndrome (POEMS) are rare and remain less understood. To investigate the role of clonal hematopoietic (CH) mutations and potential interconnections in these diseases, we sequenced CH mutations in lymphoid and myeloid lineages, as well as myeloma driver gene mutations, in BMPCs from affected patients. Recurrent lymphoid CH mutations (in FAT1, KMT2D, MGA, and SYNE1) and myeloma driver gene mutations (in ZFHX3 and DIS3) were found in the dominant clonal and subclonal plasma cell populations. These moderately aging-associated lymphoid CH mutations had a higher burden in MM than in AL or POEMS. Binary matrix factorization of these mutations revealed the subgroups associated with progression-free survival (PFS) (observed in MM, AL, and POEMS), age at diagnosis (in AL and POEMS), serum differential free light chain (dFLC) levels, plasma cell burden (in AL), and serum vascular endothelial growth factor (VEGF) levels (in POEMS). Moreover, the poor PFS associated with MGA or SYNE1 mutations was confirmed across MM, AL, and POEMS. CH mutations partially explained the shared pathogenesis of MM, AL, POEMS, and MGUS, and helped identify patient subgroups with specific clinical features.
研究问题:
浆细胞疾病(Plasma cell diseases, PCD)是B淋巴细胞造血谱系分化终末阶段的浆细胞克隆性增殖导致的恶性血液疾病,包括多发性骨髓瘤(Multiple myeloma, MM)、原发性轻链型淀粉样变(Light-chain amyloidosis, AL)、POEMS综合征(POEMS syndrome, POEMS)和意义未明单克隆免疫球蛋白血症(Monoclonal gammopathy of unknown significance, MGUS)等。克隆性造血(Clonal hematopoiesis)相关突变参与驱动血液细胞的克隆增殖。近期研究在小规模的MM、AL和MGUS患者的外周血和骨髓浆细胞样本中观察了克隆性造血相关突变谱,可能参与驱动PCD中浆细胞克隆的异常扩增。本研究提出的科学假说是在B淋巴细胞造血谱系中产生的克隆性造血相关突变最终积累在骨髓浆细胞中,克隆性造血相关突变可能参与PCD浆细胞的克隆性增殖,并与共同发生机制和临床表现相关。
研究方法:
本研究对来自364例新诊断的浆细胞疾病(Plasma cell diseases, PCD)患者的骨髓浆细胞进行了包含103个淋系、髓系谱系克隆性造血相关突变和骨髓瘤驱动变异的靶向基因测序,共纳入MM患者163例,AL患者121例,POEMS综合征患者67例和MGUS患者13例。
数据存储:
靶向基因测序数据在GSA (https://bigd.big.ac.cn/gsa-human/browse/HRA009164)。
Page qzaf027
Original Research
Cell-free DNA Fragmentomics Assay to Discriminate the Malignancy of Breast Nodules and Evaluate Treatment Response
Jiaqi Liu (刘嘉琦), Yalun Li (李亚伦), Wanxiangfu Tang (唐皖湘夫), Tianyi Qian (钱天一), Lijun Dai (代丽君), Ziqi Jia (贾梓淇), Heng Cao (曹恒), Chenghao Li (李成浩), Yuchen Liu (刘煜琛), Yansong Huang (黄岩松), Jiang Wu (吴疆), Dongxu Ma (马东旭), Guangdong Qiao (乔广东), Hua Bao (包华), Shuang Chang (常双), Dongqin Zhu (朱冬琴), Shanshan Yang (杨珊珊), Xuxiaochen Wu (吴徐晓辰), Xue Wu (吴雪), Hengyi Xu (徐恒毅), Hongyan Chen (陈洪岩), Yang Shao (邵阳), Xiang Wang (王翔), Zhihua Liu (刘芝华), Jianzhong Su (苏建忠)
View
abstract
The fragmentomics-based cell-free DNA (cfDNA) assays have recently illustrated prominent abilities to identify various cancers from non-conditional healthy controls, while their accuracy for identifying early-stage cancers from benign lesions with inconclusive imaging results remains uncertain. Especially for breast cancer, current imaging-based screening methods suffer from high false positive rates for women with breast nodules, leading to unnecessary biopsies, which add to discomfort and healthcare burden. Here, we enrolled 613 female participants in this multi-center study and demonstrated that cfDNA fragmentomics (cfFrag) is a robust non-invasive biomarker for breast cancer using whole-genome sequencing. Among the multimodal cfFrag profiles, the fragment size ratio (FSR), fragment size distribution (FSD), and copy number variation (CNV) show more distinguishing ability than Griffin, motif breakpoint (MBP), and neomer. The cfFrag model using the optimal three fragmentomics features discriminated early-stage breast cancer from benign nodules, even at a low sequencing depth (3×). Notably, it demonstrated a specificity of 94.1% in asymptomatic healthy women at a 90% sensitivity for breast cancer. Moreover, we comprehensively showcased the clinical utility of the cfFrag model in predicting patient responses to neoadjuvant chemotherapy (NAC) and its enhanced performance when combined with multimodal features, including radiological results [area under the curve (AUC) = 0.93–0.94] and cfDNA methylation features (AUC = 0.96).
研究问题:
目前我国乳腺癌早期诊断和疗效判断主要依赖于乳腺超声、钼靶等影像学检查手段,对医疗人员经验要求较高,且对于良恶性结节鉴别和疗效判断的准确性仍有待提高。本研究团队此前研究证实了cfDNA甲基化对于判断乳腺结节良恶性有一定准确性,但亚硫酸盐处理可能导致cfDNA损伤并增加检测成本。因此,亟需开发一种更为全面、准确、便捷且价格低廉的无创诊断方法,以提高早期乳腺癌的诊断准确性,并有效预测新辅助化疗的治疗反应。
研究方法:
该研究为前瞻性多中心研究,共纳入涵盖多个中心的613名女性参与者,其中烟台队列作为探索队列用于模型训练,包含91名乳腺癌患者和102名良性乳腺结节患者;多中心验证队列包含5个队列:北京队列(209例乳腺恶性可疑病变患者)、杭州队列(40例乳腺癌患者和13例良性乳腺结节患者)、外部筛查队列(119名无症状健康女性)、新辅助化疗队列(33例新辅助化疗后乳腺癌患者)、可重复性队列(3例乳腺癌患者和3例良性乳腺结节患者)。进一步,研究团队收集患者的血液样本,提取cfDNA进行低深度全基因组测序,分析包括拷贝数变异(CNV)、片段大小分布(FSD)、片段大小比率(FSR)等6种cfDNA片段组学特征,并通过机器学习算法构建能够反映乳腺结节恶性程度的cfFrag模型,并评价其在不同临床应用场景的效能。
Page qzaf028
Original Research
Whole-genome Sequencing Association Analysis of Quantitative Platelet Traits in A Large Cohort of β-thalassemia
Xingmin Wang, Qianqian Zhang, Xianming Chen, Yushan Huang, Wei Zhang, Liuhua Liao, Xinhua Zhang, Binbin Huang, Yueyan Huang, Yuhua Ye, Mengyang Song, Jinquan Lao, Juanjuan Chen, Xiaoqin Feng, Xingjiang Long, Zhixiang Liu, Weijian Zhu, Lian Yu, Chengwu Fan, Deguo Tang, Tianyu Zhong, Mingyan Fang, Caiyun Li, Chao Niu, Li Huang, Bin Lin, Xiaoyun Hua, Xin Jin, Zilin Li, Xiangmin Xu
View
abstract
Platelets act as a crucial indicator for monitoring hypercoagulability and thrombosis and a key target for pharmacological intervention. Genotype–phenotype association studies have confirmed that platelet traits are quantitatively regulated by multiple genes. However, there is currently a lack of genetic studies on the heterogeneity of platelet traits in β-thalassemia under a hypercoagulable state. Here, we studied the phenotypic heterogeneity of platelet count (PLT) and mean platelet volume (MPV) in a cohort of 1020 β-thalassemia patients. We further performed a functionally informed whole-genome sequencing (WGS) association analysis of common variants and rare variants for PLT and MPV in 916 patients through integrative analysis of WGS data and functional annotation data. Extreme phenotypic heterogeneity of platelet traits was observed in β-thalassemia patients. Additionally, the common variant-based gene-level analysis identified RNF144B as a novel gene associated with MPV. The rare variant analysis identified several novel associations in both coding and noncoding regions, including missense rare variants of PPP2R5C associated with PLT and missense rare variants of TSSK1B associated with MPV. In conclusion, this comprehensive and systematic whole-genome scan of platelet traits in the β-thalassemia cohort reveals the specific genetic regulation of platelet traits in the context of β-thalassemia, providing potential targets for intervention.
研究问题
β-地中海贫血(β-地贫)是以β-珠蛋白合成减少或缺失为特征的最常见遗传性血红蛋白病之一。血小板是β-地贫患者高凝状态和血栓形成重要的监测指标,亦是临床药物调控的关键。基因型-表型关联研究已证实血小板性状是受多基因调控的数量性状。然而,当前缺乏对β-地贫这一血液高凝群体血小板性状异质性的遗传因素研究。本课题全面地分析β-地贫患者血小板性状异质性,并利用全基因组测序(Whole genome sequencing, WGS)数据进行关联研究,旨在从遗传角度解释影响血小板性状异质性的遗传因素,挖掘更多与血小板性状关联的基因或基因组区域。
研究方法
本研究纳入1020例β-地贫患者,包括104例脾切除和916例未切脾患者。采集患者外周静脉血进行血液学检测,并提取外周血DNA进行平均测序深度40×的WGS。为探究影响血小板性状(包括血小板计数(Platelet count, PLT)和平均血小板体积(Mean platelet volume, MPV)的遗传因素,我们分别对916例未切脾患者的33,430,783个常见、低频和罕见变异进行PLT和MPV的关联分析和功能富集分析。
主要结果
脾切除显著影响β-地贫患者PLT和MPV水平;PLT和MPV在切脾和未切脾β-地贫患者均呈显著异质性;PLT和MPV在临床和遗传方面均呈负相关性。通过对常见和低频变异进行PLT和MPV的单变量关联分析,我们发现多个与PLT、MPV关联。为了探索单变量关联研究结果的可靠性和队列的遗传特异性,我们通过索引已知的关联位点,结果显示关联分析的可靠性和队列β-地贫疾病背景下的遗传特异性。利用单变量关联研究结果进行基于基因关联研究和基因集的富集分析,我们发现了多个关联基因,RNF144B是新发现的与MPV显著关联的基因;富集分析的结果显示血小板与炎症反应的密切关联。通过对27,164,888个低频和罕见变异的整合关联研究,我们发现了多个新发现的与PLT和MPV关联的基因,包括PPP2R5C的错义突变与PLT显著相关,TSSK1B的错义突变与MPV显著相关。
Page qzae065
Original Research
Resolving Leukemia Heterogeneity and Lineage Aberrations with HematoMap
Yuting Dai (代雨婷), Wen Ouyang (欧阳文), Wen Jin (金雯), Fan Zhang (张凡), Wenyan Cheng (程雯艳), Jianfeng Li (李剑峰), Shuo He (何硕), Junqi Zong (宗俊圻), Shijia Cao (曹诗佳), Chenxin Zhou (周晨馨), Junchen Luo (骆俊辰), Gang Lv (吕纲), Jinyan Huang (黄金艳), Hai Fang (方海), Xiaojian Sun (孙晓建), Kankan Wang (王侃侃), Saijuan Chen (陈赛娟)
View
abstract
Precise mapping of leukemic cells onto the known hematopoietic hierarchy is important for understanding the cell-of-origin and mechanisms underlying disease initiation and development. However, this task remains challenging because of the high interpatient and intrapatient heterogeneity of leukemia cell clones as well as the differences that exist between leukemic and normal hematopoietic cells. Using single-cell RNA sequencing (scRNA-seq) data with a curated clustering approach, we constructed a comprehensive reference hierarchy of normal hematopoiesis. This reference hierarchy was accomplished through multistep clustering and annotating over 100,000 bone marrow mononuclear cells derived from 25 healthy donors. We further employed the cosine distance algorithm to develop a likelihood score to determine the similarities of leukemic cells to their putative normal counterparts. Using our scoring strategies, we mapped the cells of acute myeloid leukemia (AML) and B cell precursor acute lymphoblastic leukemia (BCP-ALL) samples to their corresponding counterparts. The reference hierarchy also facilitated bulk RNA sequencing (RNA-seq) analysis, enabling the development of a least absolute shrinkage and selection operator (LASSO) score model to reveal subtle differences in lineage aberrancy within AML or BCP-ALL patients. To facilitate interpretation and application, we established an R-based package (HematoMap) that offers a fast, convenient, and user-friendly tool for identifying and visualizing lineage aberrations in leukemia from scRNA-seq and bulk RNA-seq data. Our tool provides curated resources and data analytics for understanding leukemogenesis, with the potential to enhance leukemia risk stratification and personalized treatments. The HematoMap is available at https://github.com/NRCTM-bioinfo/HematoMap.
研究问题:
基于正常造血分化层级结构解析白血病细胞起源,对探究白血病发生发展的关键环节至关重要。白血病细胞群体在不同患者间及同一患者内部存在显著异质性(heterogeneity),给系统性比较白血病细胞与正常造血细胞之间的关系带来极大挑战。
研究方法:
本研究基于单细胞转录组(scRNA-seq),采用精细化层级聚类策略,整合了25名健康供体的超过10万个骨髓单个核细胞单细胞转录组数据,通过逐步层级聚类及细胞注释,成功构建全面的人类正常造血分化参考体系,为白血病细胞起源解析提供高分辨率基准框架。基于余弦距离(cosine distance)构建相似性评分(likelihood score)算法,以量化白血病细胞群体与正常造血参考体系中不同细胞群体的相似度。此外,基于最小绝对收缩和选择算子(LASSO)机器学习模型,项目还开发了一款适用于全转录组测序(bulk RNA-seq)的打分模型,可在转录组水平解析急性髓系白血病(AML)和急性B前体细胞淋巴细胞白血病(BCP-ALL)患者的谱系异常特征。
主要结果:
通过将此算法应用于AML和BCP-ALL患者的骨髓单个核细胞单细胞转录组数据,能够精准解析白血病细胞的起源及其谱系异常。为便于推广和应用,该研究将算法与数据封装,开发了一款R语言工具包 HematoMap。该工具基于scRNA-seq和bulk RNA-seq数据,实现白血病谱系异常的解析及可视化,助力深入探索白血病的致病机制,推动精准分型及个性化治疗。
HematoMap访问地址:https://github.com/NRCTM-bioinfo/HematoMap。
Page qzaf005
Original Research
Reprogramming of RNA m6A Modification Is Required for Acute Myeloid Leukemia Development
Weidong Liu, Yuhua Wang, Shuxin Yao, Guoqiang Han, Jin Hu, Rong Yin, Fuling Zhou, Ying Cheng, Haojian Zhang
View
abstract
Hematopoietic homeostasis is maintained by hematopoietic stem cells (HSCs), and it is tightly controlled at multiple levels to sustain the self-renewal capacity and differentiation potential of HSCs. Dysregulation of self-renewal and differentiation of HSCs leads to the development of hematologic diseases, including acute myeloid leukemia (AML). Thus, understanding the underlying mechanisms of HSC maintenance and the development of hematologic malignancies is one of the fundamental scientific endeavors in stem cell biology. N6-methyladenosine (m6A) is a common modification in mammalian messenger RNAs (mRNAs) and plays important roles in various biological processes. In this study, we performed a comparative analysis of the dynamics of the RNA m6A methylome of hematopoietic stem and progenitor cells (HSPCs) and leukemia-initiating cells (LICs) in AML. We found that RNA m6A modification regulates the transition of long-term HSCs into short-term HSCs and determines the lineage commitment of HSCs. Interestingly, m6A modification leads to reprogramming that promotes cellular transformation during AML development, and LIC-specific m6A targets are recognized by different m6A readers. Moreover, the very long chain fatty acid transporter ATP-binding cassette subfamily D member 2 (ABCD2) is a key factor that promotes AML development, and deletion of ABCD2 damages clonogenic ability, inhibits proliferation, and promotes apoptosis of human leukemia cells. This study provides a comprehensive understanding of the role of m6A in regulating cell state transition in normal hematopoiesis and leukemogenesis, and identifies ABCD2 as a key factor in AML development.
研究问题:
在机体整个生命周期中造血干细胞通过自我更新与多向分化,维持着整个血液系统稳态。这一过程受到严密调控,造血调控失衡导致血液疾病发生发展。深入研究造血稳态调控和血液疾病发病机制,一直是领域的重要科学问题。RNA m6A修饰通过调控RNA命运参与各种生理病理过程。近年来研究正逐渐揭示RNA m6A修饰在血液病理生理中的重要作用,然而亟需系统挖掘RNA m6A修饰动态变化规律。
研究方法:
本研究通过整合15个不同的正常造血细胞群体和白血病干细胞的m6A甲基化测序数据以及阅读器的RNA免疫共沉淀—RIP-seq(RNA Immunoprecipititation)结果,深入分析RNA m6A修饰在正常造血过程以及白血病发生发展过程中的重要作用。
主要结果:
1. 发现了RNA m6A修饰在调控造血干细胞增殖分化中起到关键作用。
2. 揭示了RNA m6A 重编程在造血干祖细胞HSPC向白血病起始细胞LIC恶性转化过程中关键作用
3. 阐明了RNA m6A修饰调控ABCD2表达促进急性髓系白血病发生发展。
Page qzae049
Original Research
Glycolysis Induces Abnormal Transcription Through Histone Lactylation in T-cell Acute Lymphoblastic Leukemia
Wenyan Wu, Jingyi Zhang, Huiying Sun, Xiaoyu Wu, Han Wang, Bowen Cui, Shuang Zhao, Kefei Wu, Yanjun Pan, Rongrong Fan, Ying Zhong, Xiang Wang, Ying Wang, Xiaoxiao Chen, Jianan Rao, Ronghua Wang, Kai Luo, Xinrong Liu, Liang Zheng, Shuhong Shen, Meng Yin, Yangyang Xie, Yu Liu
View
abstract
The Warburg effect, characterized by excessive lactate production, and transcriptional dysregulation are two hallmarks of tumors. However, the precise influence of lactate on epigenetic modifications at a genome-wide level and its impact on gene transcription in tumor cells remain unclear. In this study, we conducted genome-wide profiling of histone H3 lysine 18 lactylation (H3K18la) in T-cell acute lymphoblastic leukemia (T-ALL). We observed elevated lactate and H3K18la levels in T-ALL cells compared to normal T cells, with H3K18la levels positively associated with cell proliferation. Accordingly, we observed a significant shift in genome-wide H3K18la modifications from T cell immunity in normal T cells to leukemogenesis in T-ALL, correlated with altered gene transcription profiles. We showed that H3K18la primarily functions in active transcriptional regulation and observed clusters of H3K18la modifications resembling super-enhancers. Disrupting H3K18la modification revealed both synergistic and divergent changes between H3K18la and histone H3 lysine 27 acetylation (H3K27ac) modifications. Finally, we found that the high transcription of H3K18la target genes, IGFBP2 and IARS, is associated with inferior prognosis of T-ALL. These findings enhance our understanding of how metabolic disruptions contribute to transcription dysregulation through epigenetic changes in T-ALL, underscoring the interplay of histone modifications in maintaining oncogenic epigenetic stability.
研究问题:
Warburg效应造成乳酸积累是肿瘤中广泛存在的重要代谢异常事件。这一过程如何影响儿童T细胞急性淋巴细胞白血病(T-ALL)表观遗传编码和基因转录调控?其引起的表观遗传改变与其他表观遗传调控机制是否存在协同作用?其调控靶基因与T-ALL患者预后存在怎样的关系?
研究方法:
本研究在全基因组范围内系统比较了H3K18la与多种经典组蛋白修饰的分布特征,探讨其在T-ALL肿瘤细胞中的转录调控作用。通过ChIP-seq比较T-ALL患者样本与正常胸腺T细胞中H3K18la的全基因组修饰情况,并结合RNA-seq分析其对基因转录的影响。同时,在体外使用乳酸生成抑制剂(sodium oxamate和2-DG)干预乳酸水平,验证“乳酸–H3K18la”轴对肿瘤细胞增殖和转录活性的影响。并通过独立患者队列分析,探讨H3K18la靶基因与T-ALL临床预后的关联。
主要结果:
主要结果1:
T-ALL细胞中乳酸及H3K18la修饰水平显著升高。H3K18la从正常T细胞中调控免疫相关基因,重编程为在T-ALL中调控TAL1、LMO1等致癌转录因子,促进其转录激活,揭示了Warburg效应通过“乳酸-H3K18la”介导表观重塑,进而促进肿瘤转录失调的机制。
主要结果2:
H3K18la显著富集于活性启动子与增强子区域,并且形成类似超级增强子的“超级乳酰化区域(SLRs)”,促进组织特异性高表达基因的转录激活。
主要结果3:
代谢干预抑制乳酸生成可显著降低H3K18la水平,抑制T-ALL细胞增殖并引发细胞周期阻滞。H3K18la的下降伴随H3K27ac的协同变化,提示二者共同维持转录稳态。
主要结果4:
H3K18la靶基因IGFBP2和IARS的高表达与T-ALL患者不良预后显著相关,揭示其作为潜在预后生物标志物和治疗靶点的应用前景。
Page qzaf029
Original Research
Characterization of Chronic Lymphocytic Leukemia Immunoglobulin Rearrangements from Partial Read Sequencing
Azahara Fuentes-Trillo, Alicia Serrano-Alcalá, Blanca Ferrer-Lores, Laura Ventura-López, Enrique Seda, Ana-Bárbara García-García, Blanca Navarro, María José Terol, Felipe Javier Chaves
View
abstract
The determination of the mutational status in the immunoglobulin variable region is an established prognostic biomarker for chronic lymphocytic leukemia (CLL). The length and inner variability of the variable, diversity, and joining (VDJ) rearranged sequences compromise B-cell clone characterization using next-generation sequencing (NGS), and a standardization is needed to adapt the procedure to the current clinical guidelines. Here, we develop a complete strategy for sequencing the variable domain of the immunoglobulin heavy chain (IGH) locus with a simple, low-cost, and efficient method that enables sequencing using shorter reads (MiSeq 150 × 2), allowing for faster results. Clonality and mutational status determination are performed within the same analysis pipeline. We tested and validated the method using 319 CLL patients previously diagnosed with IGH locus characterized using Sanger sequencing, along with 47 healthy donor samples. The analysis method follows a clone-centered consensus sequence strategy to identify B-cell clones and establish a clonal threshold specific for each patient’s clonality profile, thereby overcoming the limitations of Sanger sequencing which is the gold standard used for determining immunoglobulin heavy variable (IGHV) mutational status.
Page qzaf041
Original Research
Integrated Computational and Functional Screening Identifies G9a Inhibitors for SETD2-mutant Leukemia
Ya Zhang (张亚), Mengfang Xia (夏梦芳), Zhenyi Yi (易真伊), Pinpin Sui (岁品品), Xudong He (何旭东), Liping Wang (王丽萍), Qiyi Chen (陈祺仪), Hong-Hu Zhu (主鸿鹄), Gang Huang (黄刚), Qian-Fei Wang (王前飞)
View
abstract
SETD2, a frequently mutated epigenetic tumor suppressor gene in acute leukemia, is associated with chemotherapy resistance and poor patient outcomes. To explore potential therapeutics for SETD2-mutant leukemia, we employed an integrated approach combining computational prediction with epigenetic compound library screening. This approach identified G9a inhibitors as promising candidates, capable of reversing gene expression signatures associated with Setd2 deficiency and selectively inhibiting SETD2-deficient cells. RNA sequencing analysis revealed that the G9a inhibitor significantly downregulated Myc and Myc-regulated genes involved in translation, DNA replication, and G1/S transition in Setd2-mutant cells. Further chromatin immunoprecipitation sequencing analysis showed that G9a inhibition reduced H3K9me2 levels at the long non-coding RNA Mir100hg locus, coinciding with specific upregulation of the embedded microRNA let-7a-2 in Setd2-mutant cells. Given the established role of let-7a in MYC suppression, these findings suggest a potential mechanism by which G9a inhibitors induce MYC downregulation in SETD2-mutant leukemia. Additionally, correlation analysis between computational predictions and phenotypic outcomes highlighted the MYC signature as a key predictor of drug efficacy. Collectively, our study identifies G9a inhibitors as a promising therapeutic avenue for SETD2-mutant leukemia and provides novel insights into refining drug prediction strategies.
研究问题:
SETD2是关键的表观遗传肿瘤抑制因子,在急性白血病中常发生失活突变,导致患者预后不良并产生化疗耐药。目前,针对SETD2突变白血病的治疗策略仍十分有限,亟需开发新的治疗手段。
研究方法:
本研究通过整合计算预测与表观遗传化合物库筛选,系统鉴定了针对SETD2突变白血病的潜在治疗药物。在携带SETD2突变的小鼠和人源白血病细胞模型中对候选药物进行了功能验证,并结合转录组测序(RNA-sequencing,RNA-seq)与染色质免疫沉淀测序(ChIP-sequencing,ChIP-seq)分析深入解析其作用机制。此外,研究还建立了计算预测结果与细胞表型之间的相关性分析,用以评估药物诱导的转录组特征与实际药效之间的关联性。
主要结果:
1. 在计算药物预测与表观遗传药物库筛选的双重策略中,G9a抑制剂被识别为治疗SETD2突变白血病的潜在候选药物。
2. 功能实验证实,G9a抑制剂可通过诱导细胞周期阻滞、凋亡并抑制细胞自我更新能力,有效抑制SETD2突变白血病细胞生长。
3. 机制研究显示,G9a抑制剂通过下调H3K9me2修饰水平,特异性上调微小RNA(microRNA,miRNA)let-7a的表达,进而抑制MYC及其下游靶基因(MYC特征表达谱)。
4. 相关性分析表明,MYC特征表达谱的抑制可作为预测药物疗效的关键指标。
Page qzaf035
Review
Biological Data Resources and Machine Learning Frameworks for Hematology Research
Ying Yi (易莹), Yongfei Hu (胡永飞), Juanjuan Kang (康娟娟), Qifa Liu (刘启发), Yan Huang (黄燕), Dong Wang (王栋)
View
abstract
Hematology research has greatly benefited from the integration of diverse biological data resources and advanced machine learning (ML) frameworks. This integration has not only deepened our understanding of blood diseases such as leukemia and lymphoma, but also enhanced diagnostic accuracy and personalized treatment strategies. By applying ML algorithms to analyze large-scale biological data, researchers can more effectively identify disease patterns, predict treatment responses, and provide new perspectives for the diagnosis and treatment of hematologic disorders. Here, we provide an overview of the current landscape of biological data resources and the application of ML frameworks pertinent to hematology research.
要点介绍
血液学研究聚焦于血液系统及其相关疾病,涵盖生理功能、病理机制、新药研发和诊断标志物。随着高通量技术的发展,海量生物医学数据不断涌现。机器学习技术在解读这些复杂数据、提升诊断精度和推动个性化医疗方面发挥重要作用。本文围绕造血系统介绍了一些血液学领域的重要数据资源和机器学习框架。
主要内容
文章介绍血液学领域的相关数据资源,重点涵盖造血、血液恶性肿瘤、白血病特定数据资源、全面的临床资源以及血细胞图像数据集。
造血数据库包括了Atlas of Human Hematopoietic Stem Cell Development、Haemosphere、Blood Proteoform Atlas、StemDriver;血液恶性肿瘤相关数据库有Hemap、ABC portal、REDH Database;白血病相关数据库有BloodSpot 3.0、BloodChIP Xtra、LeukemiaDB。这些数据资源收录了大量的多组学高通量测序数据,覆盖了造血过程中主要细胞类型。此外,每个数据资源的在线网站均提供了各种功能分析模块,如查看/比较基因表达,筛选细胞特异差异表达基因等,为临床研究者或生物信息初学者提供了快捷工具。除此之外,文章还介绍了三种综合性的临床资源库:HemOnc.org、European LeukemiaNet、My Cancer Genome,涵盖了各种基础研究、临床研究和诊疗方法信息,以及一些疾病标志物信息。
在血液学及相关疾病研究中,图像数据集的使用正成为一项关键的推动技术,尤其是在自动分类、疾病诊断和细胞类型识别等领域。近年来,各种公开可用的血液图像数据集为计算机视觉和深度学习技术的研究提供了坚实的基础。在此,我们回顾了现有的公开血液细胞数据集的特征,这些数据集可用于细胞检测、分割和分类。
同时,我们介绍了一些用于血细胞图像分类的机器学习算法,如ALNett、CMLcGAN、LeuFeatx、RedTell、VHM。还阐述了机器学习与组学数据、临床数据结合的研究方向和进展。
总结与展望
尽管生物数据资源与机器学习技术的整合为血液学研究带来了革命性的变化,但仍面临数据异质性、多模态数据整合以及模型可解释性等挑战。未来的研究需要在数据标准化、模型透明度和临床应用方面取得突破,以推动精准医学在血液疾病领域的进一步发展。
Page qzaf021