Original Research
Duckweed Evolution: from Land back to Water
Yang Fang (方扬) , Xueping Tian (田雪平) , Yanling Jin (靳艳玲) , Anping Du (杜安平) , Yanqiang Ding (丁彦强) , Zhihua Liao (廖志华) , Kaize He (何开泽) , Yonggui Zhao (赵永贵) , Ling Guo (郭铃) , Yao Xiao (肖瑶) , Yaliang Xu (许亚良) , Shuang Chen (陈爽) , Yuqing Che (车育青) , Li Tan (谭力) , Songhu Wang (汪松虎) , Jiatang Li (李家堂) , Zhuolin Yi (易卓林) , Lanchai Chen (陈兰钗) , Leyi Zhao (赵乐伊) , Fangyuan Zhang (张芳源) , Guoyou Li (李国友) , Jinmeng Li (李瑾萌) , Qinli Xiong (熊勤犁) , Yongmei Zhang (张咏梅) , Qing Zhang (张庆) , Xuan Hieu Cao, Hai Zhao (赵海)
View
abstract
Terrestrialization is an important evolutionary process that plants experienced. However, little is known about how land plants acquired aquatic growth behaviors. Here, we integrate multiproxy evidence to elucidate the evolution of the aquatic plant duckweed. Three genera of duckweeds show chronologically gradual degeneration in root structure and stomatal function and a decrease in lignocellulose content, accompanied by the contraction of relevant gene families and/or a decline in their transcription levels. The number of genes in main phytohormone pathways is also gradually decreased. The coordinated action of genes involved in auxin signaling and rhizoid development causes a gradual decrease in adventitious roots. Additionally, the significant expansion of the flavonoid pathway is related to the adaptation of duckweeds to floating growth. This study reconstructs the evolutionary history of duckweeds, tracing its journey from land back to water — a reverse trajectory of early land plants.
研究问题:
植物由陆地重新适应水生环境(即“重返水生”)的进化过程在分子层面知之甚少。浮萍作为世界上最小的开花植物,可能是研究陆生植物如何演化回归水生环境的理想模型。
研究方法:
研究团队选取浮萍科中三个具有代表性的现生属(紫萍属 Spirodela、少根紫萍属Landoltia和绿萍属 Lemna),测序并比较了它们的基因组和转录组数据,并结合形态学、生理学证据以及相关化石资料,对浮萍科植物从陆地重新适应水生环境的进化机制进行了综合分析。
主要结果:
浮萍在重返水生过程中,其典型陆生适应性状(根系、气孔、植物激素调控机制和木质纤维素含量等)逐步退化或消失,在基因组和转录水平上表现为相关基因的收缩或表达下调。相应地,用于水生环境适应的代谢途径发生代偿性变化,其中尤以黄酮类物质的生物合成途径基因显著扩张。研究重建了浮萍科植物从陆地重返水生环境的历程[4],“逆向”再现了早期陆生植物演化,为植物适应性进化提供了新视角。
Page qzaf074
Database
PreDigs: A Database of Context-specific Cell Type Markers and Precise Cell Subtypes for Digestive Cell Annotation
Jiayue Meng (孟佳玥) , Mengyao Han (韩梦瑶) , Yuwei Huang (黄俞玮) , Liang Li (李梁) , Yuanhu Ju (鞠元虎) , Daqing Lv (吕大庆) , Xiaoyi Chen (陈晓熠) , Liyun Yuan (袁力赟) , Guoqing Zhang (张国庆)
View
abstract
Research on cell type markers helps investigators explore the diverse cellular composition of gastrointestinal tumors, thereby enhancing our understanding of tumor heterogeneity and its impact on disease progression and treatment response. However, the integration of large-scale datasets and the standardization of cell type identification remain challenging. Here, we developed PreDigs, a user-friendly database of predicted signatures for the digestive system, which offers 124 curated single-cell RNA sequencing datasets, covering over 3.4 million cells, all available for download. After unsupervised clustering, we unified the identification and nomenclature of cell subtype labels, constructing a cell ontology tree with 142 cell types across 8 hierarchical levels. Meanwhile, we calculated three different context-specific cell type markers, including “Cell Markers”, “Subtype Markers”, and “TPN Markers”, based on various application requirements within or across tissues. Through the integrated analysis of PreDigs data, we identified distinct cell subpopulations exclusive to tumors, one of which corresponds to tumor-specific endothelial cells. Additionally, PreDigs offers online cell annotation tools, allowing users to classify single cells with greater flexibility. PreDigs is accessible at https://www.biosino.org/predigs/.
研究问题
细胞类型标记的研究有助于研究人员探索消化道肿瘤的多样化细胞组成,从而加深我们对肿瘤异质性及其对疾病进展和治疗反应影响的理解。然而,整合大规模数据集以及缺乏标准化的细胞类型识别仍然是一项挑战。为此,我们开发了PreDigs,一个用户友好的消化系统预测特征数据库,提供了包括细胞类型检索、数据浏览与下载、在线分析的一站式平台,旨在为消化道肿瘤单细胞数据分析提供全面精准的分析支撑。
研究方法
从七大公共数据平台(包括NCBI GEO、HCL、HCA、ArrayExpress、GSA-Human、Gut Cell Survey及Blueprint)系统收集了成人消化系统的单细胞RNA测序数据,整合涵盖样本信息、测序平台、细胞类型标注与基因表达矩阵等多维信息,构建经过统一质控与标准化预处理的单细胞资源库。在此基础上,采用自动化注释与人工校验相结合的策略,实现了多组织细胞的标准分类,建立了包含142种细胞类型的层级树体系,并基于整合聚类与差异分析,全面挖掘了跨器官、跨组织状态(正常/癌旁/肿瘤)的细胞类型特异性标志物。
主要结果
1.提供124个精选的可下载单细胞RNA测序数据集,涵盖人和小鼠5个消化道器官的3种不同组织类型,超过340万个细胞。
2. 构建了一个包含142种细胞类型的细胞本体树,涵盖八个层级。
3. 根据组织内或跨组织的不同应用需求,计算了三种不同的特定细胞类型标记,包括“Cell Markers”、“Subtype Marers”和“TPN Markers”。
4. 通过整合分析 PreDigs 数据,我们鉴定出肿瘤特有的独特细胞亚群,其中一个对应于肿瘤特异性内皮细胞。
5. PreDigs 提供在线细胞注释工具,使用户能够更灵活地对单细胞进行分类。
数据库链接:
https://www.biosino.org/predigs/。
Page qzaf066
Database
LncCE: Landscape of Cellularly-elevated lncRNAs in Single Cells Across Normal and Cancer Tissues
Kang Xu (徐康) , Yujie Liu (刘玉洁) , Chongwen Lv (吕崇文) , Ya Luo (罗芽) , Jingyi Shi (石靖怡) , Haozhe Zou (邹昊哲) , Weiwei Zhou (周伟伟) , Dezhong Lv (吕德重) , Changbo Yang (杨长波) , Yongsheng Li (李永生) , Juan Xu (徐娟)
View
abstract
Long non-coding RNAs (lncRNAs) have emerged as significant players in maintaining the morphology and function of tissues and cells. The precise regulatory effectiveness of lncRNAs is closely associated with their spatial expression patterns across tissues and cells. Here, we propose the Cellularly-Elevated LncRNA (LncCE) resource to systematically explore cellularly-elevated (CE) lncRNAs across normal and cancer tissues at single-cell resolution. LncCE encompasses 87,946 entries of CE lncRNAs of 149 cell types by analyzing 181 single-cell RNA sequencing datasets, involving 20 fetal normal tissues, 59 adult normal tissues, 32 adult cancer types, and 5 pediatric cancer types. Two main search options are provided via a given lncRNA name or cell type. The results emphasize both qualitative and quantitative expression features of lncRNAs across different cell types, their co-expression with protein-coding genes, and their involvement in biological functions. In particular, LncCE provides quantitative visualizations of lncRNA expression changes in cancers compared to control samples, as well as clinical associations with patients’ overall survival. Together, LncCE offers an extensive, quantitative, and user-friendly interface to create a CE expression atlas for lncRNAs across normal and cancer tissues at the single-cell level. The LncCE database is available at http://bio-bigdata.hrbmu.edu.cn/LncCE.
Page qzaf069
Database
MedImg: An Integrated Database for Public Medical Images
Bitao Zhong (钟碧涛) , Rui Fan (樊锐) , Yue Ma (马越) , Xiangwen Ji (纪翔文) , Qinghua Cui (崔庆华) , Chunmei Cui (崔春梅)
View
abstract
The advancements in deep learning algorithms for medical image analysis have garnered significant attention in recent years. While several studies have shown promising results, with models achieving or even surpassing human performance, translating these advancements into clinical practice is still accompanied by various challenges. A primary obstacle lies in the availability of large-scale, well-characterized datasets for validating the generalization of approaches. To address this challenge, we curated a diverse collection of medical image datasets from multiple public sources, containing 105 datasets and a total of 1,995,671 images. These images span 14 modalities, including X-ray, computed tomography, magnetic resonance imaging, optical coherence tomography, ultrasound, and endoscopy, and originate from 13 organs, such as the lung, brain, eye, and heart. Subsequently, we constructed an online database, MedImg, which incorporates and systematically organizes these medical images to facilitate data accessibility. MedImg serves as an intuitive and open-access platform for facilitating research in deep learning-based medical image analysis, accessible at https://www.cuilab.cn/medimg/.
Page qzaf068
Database
The GSA Family in 2025: A Broadened Sharing Platform for Multi-omics and Multimodal Data
Sisi Zhang (张思思) , Xu Chen (陈旭) , Enhui Jin (金恩慧) , Anke Wang (王安可) , Tingting Chen (陈婷婷) , Xiaolong Zhang (张小龙) , Junwei Zhu (朱军伟) , Lili Dong (董丽莉) , Yanling Sun (孙艳玲) , Caixia Yu (俞彩霞) , Yubo Zhou (周榆博) , Zhuojing Fan (范卓静) , Huanxin Chen (陈焕新) , Shuang Zhai (翟爽) , Yubin Sun (孙玉彬) , Qiancheng Chen (陈乾成) , Jingfa Xiao (肖景发) , Shuhui Song (<宋述慧) , Zhang Zhang (章张) , Yiming Bao (鲍一明) , Yanqing Wang (王彦青) , Wenming Zhao (赵文明>)
View
abstract
The Genome Sequence Archive family (GSA family) provides a comprehensive suite of database resources for archiving, retrieving, and sharing multi-omics data for the global academic and industrial communities. It currently comprises four distinct database members: the Genome Sequence Archive (GSA, https://ngdc.cncb.ac.cn/gsa), the Genome Sequence Archive for Human (GSA-Human, https://ngdc.cncb.ac.cn/gsa-human), the Open Archive for Miscellaneous Data (OMIX, https://ngdc.cncb.ac.cn/omix), and the Open Biomedical Imaging Archive (OBIA, https://ngdc.cncb.ac.cn/obia). Compared to its 2021 version, the GSA family has expanded significantly by introducing a new repository, the OBIA, and by comprehensively upgrading the existing databases. Notable enhancements to the existing members include broadening the range of accepted data types, strengthening quality control systems, improving the data retrieval system, and refining data-sharing management mechanisms.
Page qzaf072
Method
LigExtract: Large-scale Automated Identification of Ligands from Protein Structures in the Protein Data Bank
Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes
View
abstract
The Protein Data Bank (PDB) is an ever-growing database of three-dimensional macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing their associated ligands are essential for researchers to understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools for large-scale ligand identification fail to address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing. Users simply provide a list of UniProt IDs, and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains interacting with the ligand, and a series of log files. These logs record the decisions made during the ligand extraction process and flag additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is freely available on GitHub (https://github.com/comp-medchem/LigExtract).
Page qzaf018
Method
ScReNI: Single-cell Regulatory Network Inference Through Integrating scRNA-seq and scATAC-seq Data
Xueli Xu (徐雪丽) , Yanran Liang (梁嫣然) , Miaoxiu Tang (汤杪庥) , Jiongliang Wang (王炯亮) , Xi Wang (王茜) , Yixue Li (李亦学) , Jie Wang (王杰)
View
abstract
Each cell possesses a unique gene regulatory network. However, limited methods exist for inferring cell-specific regulatory networks, particularly through the integration of single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data. Herein, we develop a novel algorithm, named single-cell regulatory network inference (ScReNI), for inferring gene regulatory networks at the single-cell level. In ScReNI, the nearest neighbors algorithm is utilized to establish the neighboring cells for each cell, where nonlinear regulatory relationships between gene expression and chromatin accessibility are inferred through a modified random forest. ScReNI is designed to analyze both paired and unpaired datasets for scRNA-seq and scATAC-seq. ScReNI demonstrates more accurate regulatory relationships and outperforms existing cell-specific network inference methods in network-based cell clustering. ScReNI also shows superior performance in inferring cell type-specific regulatory networks through integrating gene expression and chromatin accessibility. Importantly, ScReNI offers the unique function of identifying cell-enriched regulators based on each cell-specific network. Overall, ScReNI facilitates the inference of cell-specific regulatory networks and cell-enriched regulators, providing insights into single-cell regulatory mechanisms of diverse biological processes. ScReNI is available at https://github.com/Xuxl2020/ScReNI.
研究问题
尽管每个细胞都有独特的基因调控网络,但目前仍缺乏有效的方法用于推断细胞特异的调控网络,特别是通过整合单细胞转录组测序(Single-Cell RNA Sequencing,scRNA-seq)与单细胞染色质可及性测序(Single-Cell Assay for Transposase-Accessible Chromatin using Sequencing,scATAC-seq)数据。如何解析单细胞水平上基因表达和染色质可及性的复杂关系,从而准确推断细胞特异的调控网络,仍是当前研究面临的核心挑战。
研究方法
本研究开发了ScReNI(Single-cell Regulatory Network Inference)新算法,通过整合scRNA-seq和scATAC-seq数据,实现单细胞精度的基因调控网络推断。该方法首先基于单细胞转录组和染色质可及性特征进行细胞聚类分析,继而利用最近邻算法为每个细胞构建邻近细胞群。随后针对每个细胞的邻近细胞群,通过改进的随机森林模型解析基因表达与染色质可及性之间的非线性调控关系,最终建立单细胞精度的调控网络。ScReNI适用于分析配对或未配对的scRNA-seq和scATAC-seq数据。系统评估表明,ScReNI在基于调控网络的细胞聚类和细胞类型特异的调控网络推断方面具有显著优势。该算法也能有效确定各细胞特异网络中富集的关键调控因子。ScReNI为揭示不同生物学过程中单细胞水平的调控机制提供了新的研究工具和视角。
主要结果
1. 高准确度的单细胞调控关系预测:ScReNI推断出的细胞特异的调控关系与染色质免疫共沉淀测序(Chromatin Immunoprecipitation Sequencing, ChIP-seq)数据高度吻合,在多个评价指标上优于现有方法。
2. 优越的细胞聚类性能:相较于现有方法,ScReNI构建的单细胞调控网络不仅在细胞类型聚类上具有更高的调整兰德指数(ARI),在更精细的细胞亚型划分上也展现出显著优势。
3. 识别细胞富集调控因子:ScReNI应用于视网膜发育的单细胞多组学数据中,构建了细胞特异的调控网络,也成功识别了在Müller胶质细胞(MG)或视网膜前体细胞(RPCs)中特异性富集的调控因子,如Nr2e3、Nrl、Yap1等。
4. 用于GWAS分析揭示疾病相关网络:结合全基因组关联研究(Genome-Wide Association Study,GWAS)数据,ScReNI揭示了与年龄相关黄斑变性(Age-related Macular Degeneration,AMD)相关的调控网络,并识别了关键调控因子Otx2、Nrl、Neurod1和Nr2e3。
算法及数据集链接
GitHub:https://github.com/Xuxl2020/ScReNI
BioCode:https://ngdc.cncb.ac.cn/biocode/tool/7773
未配对scRNA-seq和scATAC-seq数据:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE181251
配对scRNA-seq和scATAC-seq数据:
https://www.10xgenomics.com/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-10-k-1-standard-2-0-0
Page qzaf060
Method
ACE: A Versatile Contrastive Learning Framework for Single-cell Mosaic Integration
Xuhua Yan, Jinmiao Chen, Ruiqing Zheng, Min Li
View
abstract
The integration of single-cell multi-omics datasets is critical for deciphering cellular heterogeneities. Mosaic integration, the most general integration task, poses a greater challenge regarding disparity in modality abundance across datasets. Here, we present Align and CompletE (ACE), a mosaic integration framework that assembles two types of strategies to handle this problem: modality alignment-based strategy (ACE-align) and regression-based strategy (ACE-spec). ACE-align utilizes a novel contrastive learning objective for explicit modality alignment to uncover the shared latent representations behind modalities. ACE-spec combines the modality alignment results and modality-specific representations to construct complete multi-omics representations for all datasets. Extensive experiments across various mosaic integration scenarios demonstrate the superiority of ACE’s two strategies over existing methods. Application of ACE-spec to bi-modal and tri-modal integration scenarios showcases that ACE-spec is able to enhance the representation of cellular heterogeneities for datasets with incomplete modalities. The source code of ACE can be accessed at https://github.com/CSUBioGroup/ACE-main.
Page qzaf062
Method
LEGEND: Identifying Co-expressed Genes in Multimodal Transcriptomic Sequencing Data
Tao Deng, Mengqian Huang, Kaichen Xu, Yan Lu, Yucheng Xu, Siyu Chen, Nina Xie, Qiuyue Tao, Hao Wu, Xiaobo Sun
View
abstract
Identifying co-expressed genes across tissue domains and cell types is essential for revealing co-functional genes involved in biological or pathological processes. While both single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data offer insights into gene co-expression patterns, current methods typically utilize either data type alone, potentially diluting the co-functionality signals within co-expressed gene groups. To bridge this gap, we introduce muLtimodal co-Expressed GENes finDer (LEGEND), a novel computational method that integrates scRNA-seq and SRT data for identifying groups of co-expressed genes at both cell type and tissue domain levels. LEGEND employs an innovative hierarchical clustering algorithm designed to maximize intra-cluster redundancy and inter-cluster complementarity, effectively capturing more nuanced patterns of gene co-expression and spatial coherence. Enrichment and co-function analyses further showcase the biological relevance of these gene clusters and their utilities in exploring context-specific novel gene functions. Notably, LEGEND can reveal shifts in gene–gene interactions under different conditions, providing insights into disease-associated gene crosstalk. Moreover, LEGEND can enhance the annotation accuracy of both spatial spots in SRT and single cells in scRNA-seq, and serve as a pioneering tool for identifying genes with designated spatial expression patterns. LEGEND is available at https://github.com/ToryDeng/LEGEND.
研究问题
当前空间转录组(SRT)或单细胞转录组(scRNA-seq)数据的共表达基因识别方法存在以下问题:
1. 局限于单一模态:现有方法通常仅针对单一类型的数据(SRT或scRNA-seq)进行分析,忽略了跨数据模态的潜在关联。这导致识别出的共表达模式可能局限于特定模态,缺乏跨模态一致性(例如,基因在组织区域间显示出相似表达,但在细胞类型层面则无此关联)。
2. 空间信息利用不足:针对SRT数据,现有大部分方法仅计算不同空间点(spatial spots)上基因表达水平的相似性,却忽略了空间点之间的空间位置关系以及基因表达在组织中的整体空间模式。
3. 缺乏下游应用策略:现有方法通常仅停留在识别共表达基因的层面,缺乏直接的、易于操作的方式将这些结果系统地应用于关键下游任务,例如利用共表达基因簇提升空间点或单细胞的聚类分析精度、搜索具有特定空间模式的基因等。
研究方案
1. 多模态信息融合:LEGEND方法创新性地利用信息论框架(如互信息及其衍生指标)对基因在多模态数据(SRT和scRNA-seq)中的相关性、冗余性和互补性进行量化建模和有效融合。
2. 利用伪空间域标签量化基因关系:通过分析空间点聚类结果,LEGEND识别每个空间域的中心区域。中心区域内空间点的聚类标签准确度更高,因此,该区域的聚类标签被LEGEND作为伪标签,用于更精准地量化基因间的关系。
3. 设计多种下游应用策略:为了最大化LEGEND的应用价值,本研究设计了多种策略,将LEGEND识别出的共表达基因簇应用于各类下游任务。
主要结果
1. 跨模态共表达模式的有效识别:与仅分析单一模态(SRT或scRNA-seq)的基准方法相比,LEGEND识别出的基因簇表现出更强的跨模态一致性优势。具体表现在:在单细胞层面,簇内基因显示出更紧密的跨细胞类型共表达关系;在空间层面,簇内基因呈现出更高的跨组织区域的空间表达模式一致性。
2. 与样本组织具有明确生物学相关性:通路富集分析等生物学验证表明,LEGEND能够有效地将具有共同生物学功能(如参与特定信号通路)的基因聚类到同一簇中。更重要的是,这些簇整体的表达模式在组织空间分布上与已知的解剖学结构以及疾病病理区域高度吻合,揭示了其重要的生物学关联。
3. 下游任务:LEGEND能应用于三大下游任务:
(1) 识别疾病相关的基因相互作用;
(2) 搜索具有指定空间模式的基因;
(3) 筛选单细胞聚类或空间点聚类的特征基因。
代码链接
https://github.com/ToryDeng/LEGEND
Page qzaf056
Review Article
Computational Tools and Resources for Long-read Metagenomic Sequencing Using Nanopore and PacBio
Tianyuan Zhang (张天缘) , Mian Jiang (蒋冕) , Hanzhou Li (李汉洲) , Yunyun Gao (高云云) , Salsabeel Yousuf , Kaimin Yu (余凯敏) , Xinxin Yi (易欣欣) , Jun Wang (王俊) , Lulu Yang (杨路路) , Yong-Xin Liu (刘永鑫)
View
abstract
In recent years, the field of shotgun metagenomics has witnessed remarkable advancements, primarily driven by the development and refinement of next-generation sequencing technologies, particularly long-read sequencing platforms such as Nanopore and PacBio. These platforms have significantly improved the ability to analyze microbial communities directly from environmental samples, providing valuable information on their composition, function, and dynamics without the need for pure cultivation. These technologies enhance metagenomic data assembly, annotation, and analysis by addressing longer reads, higher error rates, and complex data. In this review, we provide a comprehensive overview of the historical development of long-read metagenomics, highlighting significant landmarks and advancements. We also explore the diverse applications of long-read metagenomics, emphasizing its impact across various fields. Additionally, we summarize the essential computational tools and resources, including software, databases, and packages, developed to enhance the efficiency and accuracy of metagenomic analysis. Finally, we provide a practical guide for the installation and use of notable software available on GitHub (https://github.com/zhangtianyuan666/LongMetagenome). Overall, this review assists the metagenomics community in exploring microbial life in unprecedented depth by providing a roadmap for successful resource utilization and emphasizing possibilities for innovation.
Page qzaf075
Original Research
Deciphering Complex Interactions Between LTR Retrotransposons and Three Papaver Species Using LTR_Stream
Tun Xu (徐暾) , Stephen J Bush , Yizhuo Che (车一卓) , Huanhuan Zhao (赵焕焕) , Tingjie Wang (王庭杰) , Peng Jia (贾鹏) , Songbo Wang (王松渤) , Peisen Sun (孙培森) , Pengyu Zhang (张鹏宇) , Shenghan Gao (高胜寒) , Yu Xu (徐煜) , Chengyao Wang (王澄瑶) , Ningxin Dang (党宁馨) , Yong E Zhang (张勇) , Xiaofei Yang (杨晓飞) , Kai Ye (叶凯)
View
abstract
Long terminal repeat retrotransposons (LTR-RTs), a major type of class I transposable elements, are the most abundant repeat element in plants. The study of the interactions between LTR-RTs and the host genome relies on high-resolution characterization of LTR-RTs. However, for non-model species, this remains a challenge. To address this, we developed LTR_Stream for sublineage clustering of LTR-RTs in specific or closely related species, providing higher precision than current database-based lineage-level clustering. Using LTR_Stream, we analyzed Retand LTR-RTs in three Papaver species. Our findings show that high-resolution clustering reveals complex interactions between LTR-RTs and the host genome. For instance, we found that autonomous Retand elements could spread among the ancestors of different subgenomes, like retroviral pandemics, enriching genetic diversity. Additionally, we identified that specific truncated fragments containing transcription factor motifs such as TCP and bZIP may contribute to the generation of novel topologically associating domain-like (TAD-like) boundaries. Notably, our pre-allopolyploidization and post-allopolyploidization comparisons show that these effects diminished after allopolyploidization, suggesting that allopolyploidization may be one of the mechanisms by which Papaver species cope with LTR-RTs. We demonstrated the potential application of LTR_Stream and provided a reference case for studying the interactions between LTR-RTs and the host genome in non-model plant species.
Page qzaf061
Original Research
Comprehensive Multi-omics Analysis of Regulatory Variants for Body Weight in Cattle
Qunhao Niu (牛群皓) , Jiayuan Wu (武嘉远) , Tianyi Wu (吴天弋) , Tianliu Zhang (张天留) , Tianzhen Wang (王添祯) , Xu Zheng (郑旭) , Zhida Zhao (赵志达) , Ling Xu (徐玲) , Zezhao Wang (王泽昭) , Bo Zhu (朱波) , Lupei Zhang (张路培) , Huijiang Gao (高会江) , George E Liu, Junya Li (李俊雅) , Lingyang Xu (徐凌洋)
View
abstract
Body weight is a polygenic trait with intricate inheritance patterns. Functional genomics enriched with multi-layer annotations offers essential resources for exploring the genetic architecture of complex traits. In this study, we conducted an extensive characterization of regulatory variants associated with body weight-related traits in cattle using multi-omics analysis. First, we identified seven candidate genes by integrating selective sweep analysis and multiple genome-wide association study (GWAS) strategies using imputed whole-genome sequencing data from a population of 1577 individuals. Subsequently, we uncovered 3340 eGenes (genes whose expression levels are associated with genetic variants) across 227 muscle samples. Transcriptome-wide association studies (TWASs) further revealed a total of 532 distinct candidate genes associated with body weight-related traits. Colocalization analyses unveiled 44 genes shared between expression quantitative trait loci (eQTLs) and GWAS signals. Moreover, a comprehensive analysis by integrating GWAS, selective sweep, eQTL, TWAS, epigenomic profiling, and molecular validation highlighted a positively selected genomic region on Bos taurus autosome 6 (BTA6). This locus harbors pleiotropic genes (LAP3, MED28, and NCAPG) and a prioritized functional variant involved in the complex regulation of body weight. Additionally, convergent evolution analysis and phenome-wide association studies underscored the conservation of this locus across species. Our study provides a comprehensive understanding of the genetic regulation of body weight through multi-omics analysis in cattle. Our findings contribute to unraveling the genetic mechanisms governing weight-related traits and shed valuable light on the genetic improvement of farm animals.
研究问题
牛是农业生产中的重要家畜。GWAS虽鉴定出大量关联位点,但受限于连锁不平衡和功能注释不足,难以锁定因果基因及解析分子机制。肉牛体重相关性状(如活重、日增重、胴体性状等)是重要经济性状,其遗传机制复杂且一直是国内外研究热点。现阶段,大多研究聚焦于单一组学或少数性状,缺乏多组织、多维度的系统遗传分析,因此,无法全面揭示肉牛体重性状形成的遗传基础,直接影响肉牛遗传改良效率.
研究方法
本研究整合 1577 头肉牛的填充全基因组测序数据、227 份肌肉组织的转录组数据(RNA-seq)、表观基因组数据(WGBS、ATAC-seq、ChIP-seq)及 Hi-C 数据,采用多组学整合策略,整合多策略GWAS与选择信号筛选体重相关候选基因;结合 eQTL、TWAS 解析基因表达与体重性状的关联,并利用共定位分析建立变异-基因表达-表型的调控关系;通过分子实验(双荧光素酶报告系统)验证关键功能变异等。
主要结果
1. 多策略关联分析鉴定出两个影响体重相关性状的候选多效基因座LAP3-LCORL 和 PLAG1-CHCHD7;结合选择信号分析,进一步筛选出 DCAF16、FAM184B 等7个候选选择基因。
2. 对肌肉组织开展全转录组关联分析(TWAS)并结合共定位研究,发现6号染色体上LAP3-LCORL基因座内的基因表达与体重相关性状存在显著关联。
3. 整合多组学功能注释、选择信号及分子验证等方法,捕获 LAP3-LCORL 基因座中的变异(rs110242144)是体重相关性状的候选多效变异:其等位基因 C 具有较强的启动子活性,且可能影响与 NR2F6等转录因子的Motif。
4. 进化分析和全表型关联分析揭示LAP3-LCORL基因座与多种畜禽的生产性状均存在关联,表明LAP3-LCORL在不同种间可能存在趋同进化并发生潜在的平行进化与选择。
Page qzaf067
Original Research
Genome-wide Genetic Mutations Accumulated in Pigs Genome-edited for Xenotransplantation and Their Filial Generation
Xueyun Huo (霍学云) , Xianhui Sun (孙先辉) , Xiangyang Xing (邢向阳) , Jing Lu (路静) , Jingjing Zhang (张晶晶) , Yanyan Jiang (蒋艳艳) , Xiao Zhu (朱筱) , Changlong Li (李长龙) , Jianyi Lv (吕建祎) , Meng Guo (郭萌) , Lixue Cao (曹立雪) , Xin Liu (刘欣) , Zhenwen Chen (陈振文) , Dengke Pan (潘登科) , Shunmin He (何顺民) , Chen Zhang (张晨) , Xiaoyan Du (杜小燕)
View
abstract
Although xenotransplantation has been revolutionized by the development of genome-edited pigs, it is still unknown whether these pigs and their offspring remain genomically stable. Here, we showed that GGTA1-knockout (GTKO) pigs accumulated an average of 1205 genome-wide genetic mutations, and their filial 1 (F1) offspring contained an average of 18 de novo mutations compared with wild-type controls and their parents. The majority of mutations were in regions annotated as intergenic without altering protein functions, and none were located at predicted off-target sites. RNA sequencing analysis and phenotypic observations indicated that the accumulated mutations may have only a limited influence on GTKO pigs, and most of the mutations in the GTKO pigs could be attributed to the electrotransfection of plasmids into cells. This is the first report demonstrating that genetic mutations in genome-edited pigs are inherited stably by the next generation, providing a reference for the safe application and a standard approach to breed genome-edited pigs for xenotransplantation.
研究问题
基因编辑猪是解决全球器官移植供体短缺问题的理想异种移植供体来源,但其需经 CRISPR/Cas9 编辑及体细胞核移植(SCNT)过程,可能引入非预期基因组突变,从而潜在影响供体安全性与移植成功率。目前,对基因编辑猪及其后代基因组稳定性的系统性评估仍属空白,诠释这一科学问题有利于推动该技术的临床转化。
研究方法
本研究以GGTA1基因敲除(GTKO)五指山小型猪为模型,通过高深度(110×)全基因组测序(WGS),全面比对了5头GTKO-F0代编辑猪、3头F1代子代猪及对应野生型(WT)猪的基因组。利用多工具联合变异检测、转录组测序(RNA-seq)及健康表型分析,系统解析了突变积累的来源、分布与功能影响。进一步通过体外细胞电转染实验、突变特征分析,探究了突变产生的机制。
主要结果
首次绘制了GTKO基因编辑猪的多代全基因组突变图谱:平均每头F0代猪携带1205个新生突变,但大于98%的突变位于非编码区或为同义突变,仅少数(约8个/个体)可能影响蛋白功能。
明确突变主要来源于质粒电转染而非CRISPR切割本身:62%的突变产生于细胞编辑阶段,其突变特征(SBS18)与电穿孔诱导的活性氧(ROS)损伤高度吻合。
证实编辑基因组的跨代遗传稳定性:F1代子猪的新生突变数量(14-19个/个体)与猪的自发突变率一致,且未检测到遗传性结构变异或脱靶效应。
Page qzaf071
Original Research
Lineage-associated Human Divergently-paired Genes Exhibit Structural and Regulatory Characteristics
Guangya Duan (段光亚) , Sisi Zhang (张思思) , Bixia Tang (唐碧霞) , Jingfa Xiao (肖景发) , Zhang Zhang (章张) , Peng Cui (崔鹏) , Jun Yu (于军) , Wenming Zhao (赵文明)
View
abstract
Divergently-paired genes (DPGs) are minimal co-transcriptional units of clustered genes, representing over 10% of human genes. Our previous studies have shown that vertebrate DPGs are highly conserved compared to those from invertebrates. Three critical questions remain: (1) which DPGs are conserved across vertebrates, especially among mammals and primates? (2) to what extent and precision do these paired promoters share their sequences mechanistically and stringently? and (3) how are human DPGs distributed over selected primate lineages, and what are their possible biological functional consequences? There are 1399 human DPGs (approximately 12% of all human protein-coding genes), of which 1136, 1118, 925, and 830 human DPGs show conservation when compared to selected primates, mammals, avians, and fish, respectively. DPGs are not only functionally enriched toward direct protein–DNA interactions and cell cycle synchronization, but also exhibit lineage association, narrow in principle toward synchronization of certain core molecular mechanisms and cellular processes. Second, the inter-transcription start site (inter-TSS) distances affect both co-expression strength and disparity between the two genes of a DPG. Finally, among primates, human-associated DPGs exhibit diversification in their co-expression patterns and gene duplication events, and are obviously involved in neural development. Comparing high-quality human reference genomes from European (T2T-CHM13) and Chinese (T2T-YAO) populations, we identified 55 and 357 DPGs unique to the former and the latter, respectively. Our findings offer novel insights into the regulatory characteristics between neighboring genes and their structure–function selection among functionally conserved gene clusters.
研究问题:
异向基因对(Divergently-paired genes, DPGs)是染色体上转录方向相反、转录起始位点(TSS)间距小于1000 bp的相邻基因对,占人类基因的10%以上,构成最小的共转录单元。这类基因在进化上高度保守,但其跨脊椎动物谱系(尤其是灵长类)的保守性全貌、启动子共享序列的精确机制、谱系相关的进化保守规律及其生物学功能仍有待深入解析。
研究方法:
本研究基于高质量参考基因组(包括GRCh38,T2T-CHM13欧洲人群和T2T-YAO中国人群参考基因组)和多种组学数据(基因组、转录组、表观基因组、蛋白互作网络),系统鉴定了人类DPGs(1399对),并对其在脊椎动物(45个物种,涵盖鱼类、两栖类、爬行类、鸟类、哺乳动物、灵长类)中的进化保守性进行了全面评估。结合功能富集、共表达分析、表观遗传信号(DNase I, RNAPII, H3K4me3, H3K27ac)分析、转录因子结合位点预测以及人群基因组比较,深入探究了DPGs的结构特征、调控机制、功能关联及其在谱系分化和人群差异中的作用。
主要结果:
1. 进化保守性:共鉴定1399对人类DPGs,其中1136对在灵长类、1118对在哺乳类、925对在鸟类、830对在鱼类中保守;101对跨46种脊椎动物高度保守(vcDPGs),富集于蛋白–DNA/RNA/金属离子结合、DNA修复、细胞周期和蛋白质运输,并在41/101对中观察到显著的直接蛋白互作(vs. 随机对1/101),表明空间邻近有利于协同表达。
2. 组织共表达:DPGs在不同组织中表现出差异化的共表达,尤其在脑相关组织最强;14对DPGs(如DTX3L–PARP9、TMEM176A–TMEM176B)于所有组织普遍共表达,关联结构完整性、蛋白稳态和免疫调控功能;另有一批在睾丸和脑组织共表达,富集于核苷酸合成。
3. 表观调控与间距:高度共表达DPGs的共享区表现出更强的染色质开放性(DNase I、RNAPII结合及H3K4me3/H3K27ac修饰)。依据H3K4me3峰位将DPGs分为重叠型(中位−469 bp)、独立型(95 bp,占60%)和远距型(976 bp),表明TSS间距决定调控区空间配置,并影响该区序列的插入/缺失变异率。
4. 灵长类特异性:在人、黑猩猩、红猩猩和大猩猩中共保守390对DPGs,富集于非编码RNA加工及核糖体生物合成;另有12对人类差异DPGs(如SRGAP2–FAM72四拷贝),与大脑皮层发育密切相关。
5. 人群差异:比较欧洲(T2T-CHM13)与中国(T2T-YAO)参考基因组,分别发现357对和55对人群差异DPGs,其共表达模式差异提示它们可能参与不同人群的免疫、代谢等表型形成。
Page qzaf058
Original Research
Implication of the Vaginal Microbiome in Female Infertility and Assisted Conception Outcomes
Xiuju Chen (陈秀菊) , Yanyu Sui (隋彦禹) , Jiayi Gu (顾佳怡) , Liang Wang (王亮) , Ningxia Sun (孙宁霞)
View
abstract
The rise in infertility rates has prompted research into the impact of vaginal microbiota on female fertility and the success of assisted reproductive technology (ART). Our study aimed to compare the vaginal microbiome between fertile and infertile women and explore its influence on ART outcomes. Vaginal secretion samples were collected from 194 infertile women and 100 healthy controls at Shanghai Changzheng Hospital. The V3–V4 region of the 16S rRNA gene was amplified using polymerase chain reaction (PCR). A machine learning model was applied to predict infertility based on genus-level abundance, and the PICRUSt algorithm was employed to predict metabolic pathways related to infertility and ART outcomes. The results showed that infertile women exhibited a significantly different vaginal microbial composition compared to healthy controls, along with increased microbial diversity. Notably, the abundance of Burkholderia, Pseudomonas, and Prevotella was significantly elevated in the vaginal microbiota of the infertility group, while that of Bifidobacterium and Lactobacillus was reduced. Among infertile women, those with recurrent implantation failure (RIF) showed even higher vaginal microbial diversity, with specific genera such as Mobiluncus, Peptoniphilus, Prevotella, and Varibaculum being more abundant. Eleven metabolic pathways were identified to be associated with both RIF and infertility, with Prevotella showing stronger correlations with these pathways. This study elucidates differences in vaginal microbiome between healthy and infertile women, providing novel insights into how vaginal microbiota may impact infertility and ART outcomes. Our findings underscore the importance of specific microbial taxa in women with RIF, suggesting potential avenues for targeted interventions to improve embryo transplantation success rates.
Page qzaf042
Original Research
Deciphering Haplotype-level Chromosome Conformation Alteration in Down Syndrome by Haplotype-resolved Multi-omics Analysis
Chengchao Wu, Tianshu Zhou, Wenfu Ke, Wei Xiong, Zhihui Zhang, Siheng Zhang, Jinyue Wang, Lulu Deng, Keji Yan, Man Wang, Shenglong He, Qi Gong, Chao Ma, Xiaping Chen, Yan Li, He Long, Chong Guo, Gang Cao, Zhijun Zhang
View
abstract
For chromosome abnormalities (CAs), such as Down syndrome (DS), the influence of genomic variations on chromosome conformation and gene transcription remains elusive. Based on the complete genomic sequences from the parents of a DS trisomy patient, we systematically delineated an atlas of parental-specific, haplotype-resolved single nucleotide polymorphisms (SNPs), copy number variations (CNVs), three-dimensional (3D) genome architecture, and RNA expression profiles in the diencephalon of the DS patient. The integrated haplotype-resolved multi-omics analysis demonstrated that one-dimensional (1D) genomic variations including SNPs and CNVs in the DS patient are highly correlated with the alterations in the 3D genome organization and the subsequent changes in gene transcription. This correlation remains valid at the haplotype level. Moreover, we revealed the 3D genome alteration-associated dysregulation of DS-related genes, which facilitates understanding the pathogenesis of CAs. Together, our study contributes to deciphering the coding from 1D genomic variations to 3D genome architecture and the subsequent gene transcription outcomes in both health and disease.
研究问题:
2018年由国家卫健委发布的《全国出生缺陷综合防治方案》指出,唐氏综合征(Down syndrome)作为最普遍的出生缺陷疾病,严重危害患者的生活质量以及寿命。唐氏综合征患者携带一条额外的21号染色体拷贝,故又称21-三体综合征(trisomy 21)。常见症状包括智力障碍(intelligence disability)、颅面畸形(craniofacial malformation)、B细胞急性淋巴细胞白血病(B-cell acute lymphoblastic leukemia)、早发型阿尔茨海默病(early-onset Alzheimer disease)等症状。额外的21号染色体如何影响该疾病严重表型的发生,至今仍然没有完全解析。
研究方法:
本研究系统性的描绘了唐氏患者和健康对照的染色质构象捕获(high-throughput chromosome conformation capture, Hi-C)、拷贝数变异(copy number variation, CNV)、单核苷酸多态性(SNP)以及基因表达水平(RNA-seq)等多组学分析。同时,基于唐氏患者父母的全基因组测序(Whole genome sequencing),我们创新性的构建了患者的单倍型多组学图谱(multiple haploid omics landscape)。利用该图谱我们解析了父源/母源特异性(paternal/maternal specific)的染色质构象差异和拷贝数差异,以及这些差异如何影响了基因表达水平,并最终导致了疾病表型发生。
主要结果:
1.唐氏患者的染色质构象变化更多出现在拷贝数异常区域和SNP富集区域。这些染色质构象变化,会直接影响基因转录调控模式发生变化,从而影响基因表达水平。
2.染色质构象模块如拓扑关联结构域(topologically assocaited domain, TAD)、染色质环结构(chromatin loop)以及AB型区室(A/B compartment)在父母本的单倍型水平上差异明显。
3.单倍型的拷贝数异常,SNP丰度和染色质构象差异都会影响单倍型水平的基因表达。在唐氏患者中由于额外的染色质拷贝,导致染色体构象的变化,这些因素联合加剧了基因表达的差异水平,从而导致了疾病的发生。
数据链接:
全部测序数据已上传至GSA-human(https://ngdc.cncb.ac.cn/gsa-human/browse/HRA005229)
分析代码:
全部分析代码已上传至BioCode(https://ngdc.cncb.ac.cn/biocode/tool/BT007956)。
Page qzaf054
Original Research
Precision and Accuracy in Quantitative Measurement of Gene Expression from Single-cell/nucleus RNA Sequencing Data
Rujia Dai, Ming Zhang, Tianyao Chu, Richard Kopp, Chunling Zhang, Kefu Liu, Yue Wang, Xusheng Wang, Chao Chen, Chunyu Liu
View
abstract
Single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) have become essential tools for profiling gene expression across different cell types in biomedical research. While factors like RNA integrity, cell count, and sequencing depth are known to influence data quality, quantitative benchmarks and actionable guidelines are lacking. This gap contributes to variability in study designs and inconsistencies in downstream analyses. In this study, we systematically evaluated quantitative precision and accuracy in expression measures across 23 sc/snRNA-seq datasets comprising 3,682,576 cells from 339 samples. Precision was assessed using technical replicates based on pseudo-bulks created from subsampling. Accuracy was evaluated using sample-matched scRNA-seq and pooled-cell RNA sequencing data of mononuclear phagocytes from four species. Our results show that precision and accuracy are generally low at the single-cell level, with reproducibility being strongly influenced by cell count and RNA quality. We established data-driven thresholds for optimizing study design, recommending at least 500 cells per cell type per individual to achieve reliable quantification. Furthermore, we showed that signal-to-noise ratio is a key metric for identifying reproducible differentially expressed genes. To support future research, we developed Variability In single-Cell gene Expression (VICE), a tool that evaluates sc/snRNA-seq data quality and estimates the true positive rate of differential expression results based on sample size, observed noise levels, and expected effect size. These findings provide practical, evidence-based guidelines to enhance the reliability and reproducibility of sc/snRNA-seq studies.
Page qzaf077