Volume: 22, Issue: 6

Research Highlight

Innovative Low-cost Probe Generation Empowers Targeted Long-read RNA Sequencing

Gang Fang

no anstract

Page qzae027


Research Highlight

A Novel Targeted Long-read Sequencing Approach Boosts Transcriptomic Profiling

Xiaolong Tian and Rong Fan

no abstract

Page qzae090


Research Highlight

Single-cell Sequencing Traces Mitochondrial Transfers

Mengying Wu , Weilin Pu , Zhenglong Gu

no abstract

Page qzae092


Research Highlight

Decoding Spatial Complexity of Diverse RNA Species in Archival Tissues

Junjie Zhu , Fangqing Zhao

no abstract

Page qzae089


Research Highlight

Harnessing Type II Cytokines to Reinvigorate Exhausted T Cells for Durable Cancer Immunotherapy

Wenle Zhang , Yanwen Wang , Bin Li

no abstract

Page qzae093


Original Research

A Novel IgG–IgM Autoantibody Panel Enhances Detection of Early-stage Lung Adenocarcinoma from Benign Nodules

Rongrong Luo , Xiying Li , Ruyun Gao , Mengwei Yang , Juan Cai , Liyuan Dai , Nin Lou , Guangyu Fan , Haohua Zhu , Shasha Wang , Zhishang Zhang , Le Tang , Jiarui Yao , Di Wu , Yuankai Shi

Autoantibodies hold promise for diagnosing lung cancer. However, their effectiveness in early-stage detection needs improvement. In this study, we investigated novel IgG and IgM autoantibodies for detecting early-stage lung adenocarcinoma (Early-LUAD) by employing a multi-step approach, including Human Proteome Microarray (HuProtTM) discovery, focused microarray verification, and ELISA validation, on 1246 individuals consisting of 634 patients with Early-LUAD (stage 0–I), 280 patients with benign lung disease (BLD), and 332 normal healthy controls (NHCs). HuProtTM selected 417 IgG/IgM candidates, and focused microarray further verified 55 significantly elevated IgG/IgM autoantibodies targeting 32 tumor-associated antigens in Early-LUAD compared to BLD/NHC/BLD+NHC. A novel panel of 10 autoantibodies (ELAVL4-IgM, GDA-IgM, GIMAP4-IgM, GIMAP4-IgG, MGMT-IgM, UCHL1-IgM, DCTPP1-IgM, KCMF1-IgM, UCHL1-IgG, and WWP2-IgM) demonstrated a sensitivity of 70.5% and a specificity of 77.0% or 80.0% for distinguishing Early-LUAD from BLD or NHC in ELISA validation. Positive predictive values for distinguishing Early-LUAD from BLD with nodules ≤ 8 mm, 9–20 mm, and > 20 mm significantly increased from 47.27%, 52.00%, and 62.90% [low-dose computed tomography (LDCT) alone] to 79.17%, 71.13%, and 87.88% (10-autoantibody panel combined with LDCT), respectively. The combined risk score (CRS), based on the 10-autoantibody panel, sex, and imaging maximum diameter, effectively stratified the risk for Early-LUAD. Individuals with 10 ≤ CRS ≤ 25 and CRS > 25 indicated a higher risk of Early-LUAD compared to the reference (CRS < 10), with adjusted odds ratios of 5.28 [95% confidence interval (CI): 3.18–8.76] and 9.05 (95% CI: 5.40–15.15), respectively. This novel panel of IgG and IgM autoantibodies offers a complementary approach to LDCT in distinguishing Early-LUAD from benign nodules.

Page qzae085


Original Research

Pangenome Reveals Gene Content Variations and Structural Variants Contributing to Pig Characteristics

Heng Du , Yue Zhuo , Shiyu Lu , Wanying Li , Lei Zhou , Feizhou Sun , Gang Liu , Jian-Feng Liu

Pigs are one of the most essential sources of high-quality proteins in human diets. Structural variants (SVs) are a major source of genetic variants associated with diverse traits and evolutionary events. However, the current linear reference genome of pigs restricts the accurate presentation of position information for SVs. In this study, we generated a pangenome of pigs and a genome variation map of 599 deeply sequenced genomes across Eurasia. Additionally, we established a section-wide gene repertoire, revealing that core genes are more evolutionarily conserved than variable genes. Furthermore, we identified 546,137 SVs, their enrichment regions, and relationships with genomic features and found significant divergence across Eurasian pigs. More importantly, the pangenome-detected SVs could complement heritability estimates and genome-wide association studies based only on single nucleotide polymorphisms. Among the SVs shaped by selection, we identified an insertion in the promoter region of the TBX19 gene, which may be related to the development, growth, and timidity traits of Asian pigs and may affect the gene expression. The constructed pig pangenome and the identified SVs in this study provide rich resources for future functional genomic research on pigs.
研究问题: 在基因组学快速发展的背景下,传统的单一参考基因组已难以全面捕捉物种内的遗传多样性,尤其是结构变异(Structural Variants, SVs)的鉴定与分型等方面面临显著局限。为应对这一挑战,泛基因组(Pangenome)的理念应运而生,并已在包括人类在内的多个哺乳动物中得到了成功应用,成为提升基因组解析能力的重要方向。当前,猪作为重要的经济动物和生物医学模型,其泛基因组的构建与功能挖掘仍处于初步阶段。 研究方法: 本研究基于比较基因组学策略,对19个代表性猪种的基因组进行了系统分析,构建了猪的泛基因组,鉴定了基因组SVs。在此基础上,进一步构建了基于SV的图形泛基因组(Graph-based Pangenome),并评估其在分子性状遗传评估中的潜在优势与实际应用价值,为猪的复杂性状解析与精准育种提供了关键的基因组工具与理论支撑。 主要结果: 1. 通过PacBio三代测序和Hi-C技术,高质量组装了三个代表性中国地方猪的参考基因组,其组装完整性与当前国际主流参考基因组 Sscrofa11.1 处于同等水平。 2. 通过整合当前已公开的16个猪基因组数据,构建了猪泛基因组。在此基础上,系统识别了猪基因组中的SV热点区域,揭示了生物地理背景下的特异性变异模式。研究发现,X染色体上存在一个亚洲猪种特有的SV热点区域,而9号染色体上则鉴定出一个特异于欧洲猪种的SV热点区域。 3. 基于所鉴定的SV,构建了猪图形泛基因组,并发现该工具能够有效捕获缺失的遗传力。 4. 借助图形泛基因组,发现亚洲猪种在TBX19基因启动子区域存在一段受到正向选择的插入序列,该变异可显著调控基因表达,可能与其发育特性及行为(如胆怯)相关。

Page qzae081


Original Research

Variation and Interaction of Distinct Subgenomes Contribute to Growth Diversity in Intergeneric Hybrid Fish

Li Ren , Mengxue Luo , Jialin Cui , Xin Gao , Hong Zhang , Ping Wu , Zehong Wei , Yakui Tai , Mengdan Li , Kaikun Luo , Shaojun Liu

Intergeneric hybridization greatly reshapes regulatory interactions among allelic and non-allelic genes. However, their effects on growth diversity remain poorly understood in animals. In this study, we conducted whole-genome sequencing and RNA sequencing analyses in diverse hybrid varieties resulting from the intergeneric hybridization of goldfish (Carassius auratus red var.) and common carp (Cyprinus carpio). These hybrid individuals were characterized by distinct mitochondrial genomes and copy number variations. Through a weighted gene correlation network analysis, we identified 3693 genes as candidate growth-regulating genes. Among them, the expression of 3672 genes in subgenome R (originating from goldfish) displayed negative correlations with body weight, whereas 20 genes in subgenome C (originating from common carp) exhibited positive correlations. Notably, we observed intriguing expression patterns of solute carrier family 2 member 12 (slc2a12) in subgenome C, showing opposite correlations with body weight that changed with water temperatures, suggesting differential interactions between feeding activity and weight gain in response to seasonal changes for hybrid animals. In 40.30% of alleles, we observed dominant trans-regulatory effects in the regulatory interactions between distinct alleles from subgenomes R and C. Integrating analyses of allele-specific expression and DNA methylation data revealed that DNA methylation on both subgenomes shaped the relative contribution of allelic expression to the growth rate. These findings provide novel insights into the interactions of distinct subgenomes that underlie heterosis in growth traits and contribute to a better understanding of multiple allelic traits in animals.
上世纪80年代,刘筠院士领衔科研团队以红鲫为母本、鲤为父本,通过属间远缘杂交研制出两性可育的异源四倍体鲫鲤品系,目前该品系已培育到F32代。异源四倍体鲫鲤品系的建立打破了传统认识上的物种间生殖隔离,是国际上首次通过远缘杂交获得的脊椎动物四倍体品系。利用该四倍体鱼品系研制的优质三倍体鲫鱼(湘云鲫)和三倍体鲤鱼(湘云鲤),在生产上得到了广泛应用。刘少军不但作为主要研究人员参与了前期的研究工作,而且后续带领团队对该异源四倍体鱼品系进行了系统的维系和遗传改良。 湘云鲫和湘云鲤相比原始亲本红鲫和鲤具有显著的生长优势,已在全国广泛推广养殖,进一步的改良育种或将有力推动我国渔业发展。长期的养殖中,我们发现湘云鲤(两套鲤来源和一套红鲫来源亚基因组)相比湘云鲫(两套红鲫来源和一套鲤来源亚基因组)具有更快的生长速度和更大的体型。为了解析导致其生长表型差异的遗传机理,本研究利用红鲫、鲤和异源四倍体鲫鲤作为亲本进行正反交和倍间杂交育种,培育了6种异源二倍体和三倍体鱼群体,通过开展基因组学、转录组学和表观基因组学研究阐述了导致这些杂交鱼生长表型差异的分子遗传机理。其主要发现如下:1. 高比例的鲤来源亚基因组和鲤来源线粒体更有利于该杂交鱼快速生长;2. 不同物种来源的基因拷贝数变化和不同物种来源的线粒体可对该杂交鱼等位基因特异表达产生影响,进而影响其生长表型;3. 鉴定了包括slc2a12在内的3693个对体重表型产生潜在影响的基因。

Page qzae055


Original Research

Global Invasion History and Genomic Signatures of Adaptation of the Highly Invasive Sycamore Lace Bug

Zhenyong Du , Xuan Wang , Yuange Duan , Shanlin Liu , Li Tian , Fan Song , Wanzhi Cai , Hu Li

Invasive species cause massive economic and ecological damages. Climate change has resulted in an unprecedented increase in the number and impact of invasive species; however, the mechanisms underlying these invasions are unclear. The sycamore lace bug, Corythucha ciliata, is a highly invasive species originating from North America and has expanded across the Northern Hemisphere since the 1960s. In this study, we assembled the C. ciliata genome using high-coverage Pacific Biosciences (PacBio), Illumina, and high-throughput chromosome conformation capture (Hi-C) sequencing. A total of 15,278 protein-coding genes were identified, and expansions of gene families with oxidoreductase and metabolic activities were observed. In-depth resequencing of 402 samples from native and nine invaded countries across three continents revealed 2.74 million single nucleotide polymorphisms. Two major invasion routes of C. ciliata were identified from North America to Europe and Japan, with a contact zone forming in East Asia. Genomic signatures of selection associated with invasion and long-term balancing selection in native ranges were identified. These genomic signatures overlapped with each other as well as with expanded genes, suggesting improvements in the oxidative stress and thermal tolerance of C. ciliata. These findings offer valuable insights into the genomic architecture and adaptive evolution underlying the invasive capabilities of species during rapid environmental changes.
研究问题: 悬铃木方翅网蝽(Corythucha ciliata)是半翅目网蝽科的一种全球性入侵害虫,专一性取食重要行道树种悬铃木,对城市景观绿化造成严重危害。关于该物种的入侵历史及其快速入侵的基因组适应机制等重要科学问题仍有待深入探究。 研究方法: 本研究报道了悬铃木方翅网蝽染色体水平的参考基因组,通过比较基因组学分析,以及覆盖全球三大洲超过四百个样本的大规模群体基因组学分析,解析了这一专一食性物种如何适应氧化压力和高温环境,实现全球入侵的基因组进化机制。 主要结果: 1. 组装了悬铃木方翅网蝽染色体水平的参考基因组,这也是半翅目网蝽科两千余个物种中的首个高质量基因组。 2. 发现该网蝽基因组上与氧化还原酶活性及细胞代谢相关的基因家族显著扩张,这可能与其极强的抗氧化代谢活性与温度适应性有关。 3. 通过对覆盖三大洲406个样本的大规模基因组重测序,解析了该网蝽在全球范围内的两条独立入侵路径。 4. 鉴定了与入侵相关的基因组自然选择信号以及原生地的长期平衡选择信号。这些入侵前后的基因组特征与物种长期进化中的扩张基因相互重叠,揭示了该物种极强的氧化应激和高温胁迫耐性的机制。

Page qzae074


Original Research

Enzyme Repertoires and Genomic Insights into Lycium barbarum Pectin Polysaccharide Biosynthesis

Haiyan Yue , Yiheng Tang , Aixuan Li , Lili Zhang , Yiwei Niu , Yiming Zhang , Hao Wang , Jianjun Luo , Yi Zhao , Shunmin He , Chang Chen , Runsheng Chen

Lycium barbarum, a member of the Solanaceae family, is an important eudicot with applications in both food and medicine. L. barbarum pectin polysaccharides (LBPPs) are key bioactive compounds of L. barbarum, notable for being among the few polysaccharides with both biocompatibility and biomedical activity. Although studies have analyzed the functional properties of LBPPs, the mechanisms underlying their biosynthesis and transport by key enzymes remain poorly understood. In this study, we assembled a 2.18-Gb reference genome of L. barbarum, reconstructed the first complete biosynthesis pathway of LBPPs, and elucidated the sugar transport system. We also characterized the important genes responsible for backbone extension, sidechain synthesis, and modification of LBPPs. Furthermore, we characterized the long non-coding RNAs (lncRNAs) associated with polysaccharide metabolism. We identified a specific rhamnogalacturonan I (RG-I) rhamnosyltransferase, RRT3020, which enhances RG-I biosynthesis within LBPPs. These newly identified enzymes and pivotal genes endow L. barbarum with unique pectin biosynthesis capabilities, distinguishing it from other Solanaceae species. Our findings thus provide a foundation for evolutionary studies and molecular breeding to expand the diverse applications of L. barbarum.

Page qzae079


Original Research

Genome Assembly and Winged Fruit Gene Regulation of Chinese Wingnut: Insights from Genomic and Transcriptomic Analyses

Fangdong Geng , Xuedong Zhang , Jiayu Ma , Hengzhao Liu , Hang Ye , Fan Hao , Miaoqing Liu , Meng Dang , Huijuan Zhou , Mengdi Li , Peng Zhao

The genomic basis and biology of winged fruit are interesting issues in ecological and evolutionary biology. Chinese wingnut (Pterocarya stenoptera) is an important horticultural and economic tree species in China. The genomic resources of this hardwood tree could advance the genomic studies of Juglandaceae species and elucidate their evolutionary relationships. Here, we reported a high-quality reference genome of P. stenoptera (N50 = 35.15 Mb) and performed a comparative genomic analysis across Juglandaceae species. Paralogous relationships among the 16 chromosomes of P. stenoptera revealed eight main duplications representing the subgenomes. Molecular dating suggested that the most recent common ancestor of P. stenoptera and Cyclocarya paliurus diverged from Juglans species around 56.7 million years ago (MYA). The expanded and contracted gene families were associated with cutin, suberine, and wax biosynthesis, cytochrome P450, and anthocyanin biosynthesis. We identified large inversion blocks between P. stenoptera and its relatives, which were enriched with genes involved in lipid biosynthesis and metabolism, as well as starch and sucrose metabolism. Whole-genome resequencing of 28 individuals revealed clearly phylogenetic clustering into three groups corresponding to Pterocarya macroptera, Pterocarya hupehensis, and P. stenoptera. Morphological and transcriptomic analyses showed that CAD, COMT, LOX, and MADS-box play important roles during the five developmental stages of wingnuts. This study highlights the evolutionary history of the P. stenoptera genome and supports P. stenoptera as an appropriate Juglandaceae model for studying winged fruits. Our findings provide a theoretical basis for understanding the evolution, development, and diversity of winged fruits in woody plants.
研究问题: 开花植物的适应、繁殖和演化在很大程度上依赖于果实性状,果实在植物繁殖效率、环境适应性和物种多样性方面发挥着重要作用。翅果是被子植物早期快速扩散和分化的主要原因,也是被子植物适应风传播迁移的关键创新特征,翅果在至少93个被子植物科中出现。然而,对翅果演化发育的形态学特征以及基因调控网络的分子基础知之甚少。 研究方法: 本研究首先构建了枫杨(Pterocarya stenoptera)参考基因组,通过共线性、基因数量、基因表达以及祖先基因数量确定枫杨的全基因组复制事件以及亚基因组结构。基于比较基因组分析及枫杨属的系统发育分析,揭示基因组结构变异在枫杨(翅果)、青钱柳(Cyclocarya paliurus,翅果)和核桃楸(Juglans mandshurica,核果状坚果-非翅果类型)果实演化中的重要作用;并通过扫描电镜、石蜡切片以及转录组分析鉴定与果翅发育相关的关键候选基因。 主要结果: 1. 获得枫杨染色体水平高质量参考基因组,确定枫杨存在8个亚基因组重复事件。 2. 枫杨(翅果)、青钱柳(翅果)和核桃楸(非翅果)之间存在远古杂交事件。 3. 枫杨(翅果)、青钱柳(翅果)和核桃楸(非翅果)在12号染色体上的倒位包含与脂质代谢密切相关的LOX基因。 4. 鉴定到与果翅发育和木质素合成相关的关键候选基因。 研究背景: 胡桃科(Juglandaceae)物种具有重要的生态和经济价值,果实分为翅果和非翅果两种类型。枫杨属隶属胡桃科,具有三种典型的果翅类型,是研究翅果演化和多样性形成的良好材料。近期许多高质量木本植物基因组的组装为研究关键性状演化发育的分子基础提供了新的视角。但枫杨属的基因组资源在很大程度上仍未开发,对于翅果演化发育的形态学特征以及分子调控网络研究非常薄弱。本研究采用多组数据研究枫杨基因组演化以及翅果发育的分子调控网络,为理解被子植物翅果的演化提供新的理论依据。

Page qzae087


Original Research

Deep Amplicon Sequencing Reveals Culture-dependent Clonal Selection of Mycobacterium tuberculosis in Clinical Samples

Jiuxin Qu , Wanfei Liu , Shuyan Chen , Chi Wu , Wenjie Lai , Rui Qin , Feidi Ye , Yuanchun Li , Liang Fu , Guofang Deng , Lei Liu , Qiang Lin , Peng Cui

The commonly-used drug susceptibility testing (DST) relies on bacterial culture and faces shortcomings such as long turnaround time and clonal/subclonal selection biases. Here, we developed a targeted deep amplicon sequencing (DAS) method directly applied to clinical specimens. In this DAS panel, we examined 941 drug-resistant mutations (DRMs) associated with 20 anti-tuberculosis drugs with only 4 pg of initial DNA input, and reduced the clinical testing time from 20 days to 2 days. A prospective study was conducted using 115 clinical specimens, predominantly positive for the Xpert® Mycobacterium tuberculosis/rifampicin (Xpert MTB/RIF) assay, to evaluate DRM detection. DAS was performed on culture-free specimens, while culture-dependent isolates were used for phenotypic DST, DAS, and whole-genome sequencing (WGS). For in silico molecular DST, our result based on DAS panel revealed the similar accuracy to three published reports based on WGS. For 82 isolates, application of DAS using the resistance-determining mutation method showed better accuracy (93.03% vs. 92.16%), sensitivity (96.10% vs. 95.02%), and specificity (91.33% vs. 90.62%) than WGS using the Mykrobe software. Compared to culture-dependent WGS, culture-free DAS provides a full picture of sequence variation at the population level, exhibiting in detail the gain-and-loss variants caused by bacterial culture. Our study performs a systematic verification of the advantages of DAS in clinical applications and comprehensively illustrates the discrepancies in Mycobacterium tuberculosis before and after culture.
研究问题 常用的药敏试验依赖于细菌培养,且存在周转时间长、克隆/亚克隆选择等缺点,如何快速获得药敏试验是制约临床诊断的关键步骤。此外,受制于现有方法的制约,目前尚未对结核分枝杆菌培养前后的培养选择进行系统地评估。 研究方法 通过全面的文献综述收集整理获得了20种抗结核药物相关的941个耐药突变位点。基于这些位点,开发设计了一种基于多重PCR的靶向深度扩增子测序方法,能够直接对临床样本进行深度扩增子测序和获得分子药敏试验结果。 主要成果 1. 基于已有文献整理获得了20种抗结核药物相关的941个耐药突变位点,开发了涵盖这些耐药突变位点的深度扩增子测序方法(试剂盒),并将临床药敏试验时间从20天缩短到两天。 2. 开发了基于深度扩增子测序的分子药敏试验检测方法,并在三个已发表的全基因组测序数据中评估深度扩增子测序扩增区域的分子药敏试验,揭示了深度扩增子测序具有与全基因组测序相似的分子药敏试验检测结果准确性。 3. 通过82个临床样本培养分离株的深度扩增子测序和全基因组测序比较,显示深度扩增子测序分子药敏试验比全基因组测序具有更好的灵敏度(93.03% vs. 92.16%)、特异性(96.10% vs. 95.02%)和准确度(91.33% vs. 90.62%)。 4.通过82个临床样本培养前后的深度扩增子测序,系统比较分析了细菌培养导致的序列变异获得和丢失。 数据集及变异鉴定方法链接 NGDC数据库: https://ngdc.cncb.ac.cn/gsa/browse/CRA013987 https://ngdc.cncb.ac.cn/gsa/browse/CRA013993 https://ngdc.cncb.ac.cn/gsa/browse/CRA013994 GitHub: https://github.com/liuwf-feige/bam2vcf BioCode: https://ngdc.cncb.ac.cn/biocode/tools/BT007389

Page qzae046


Original Research

Evaluation of T Cell Receptor Construction Methods from scRNA-Seq Data

Ruonan Tian , Zhejian Yu , Ziwei Xue , Jiaxin Wu , Lize Wu , Shuo Cai , Bing Gao , Bing He , Yu Zhao , Jianhua Yao , Linrong Lu , Wanlu Liu

T cell receptors (TCRs) serve key roles in the adaptive immune system by enabling recognition and response to pathogens and irregular cells. Various methods have been developed for TCR construction from single-cell RNA sequencing (scRNA-seq) datasets, each with its unique characteristics. Yet, a comprehensive evaluation of their relative performance under different conditions remains elusive. In this study, we conducted a benchmark analysis utilizing experimental single-cell immune profiling datasets. Additionally, we introduced a novel simulator, YASIM-scTCR (Yet Another SIMulator for single-cell TCR), capable of generating scTCR-seq reads containing diverse TCR-derived sequences with different sequencing depths and read lengths. Our results consistently showed that TRUST4 and MiXCR outperformed others across multiple datasets, while DeRR demonstrated considerable accuracy. We also discovered that the sequencing depth inherently imposes a critical constraint on successful TCR construction from scRNA-seq data. In summary, we present a benchmark study to aid researchers in choosing the appropriate method for reconstructing TCRs from scRNA-seq data.
研究问题: T细胞受体(TCR)在适应性免疫系统中发挥着关键作用,能够识别和响应病原体及异常细胞。近年来,已经开发了多种从单细胞RNA测序(scRNA-seq)数据集中构建TCR的方法。然而,每种方法具有其不同的特征和功能,在不同条件下对这些方法性能的全面评估仍然缺乏。在本研究中,我们利用单细胞免疫测序数据集和模拟数据集进行了基准分析,旨在帮助研究人员选择适当的方法从scRNA-seq数据中重建TCR。 研究方法: 本研究采用了多种类型的数据集对七种TCR重构方法进行了全面的性能评估,包括scRNA-seq、scTCR-seq、伪合并RNA-seq(pseudo bulk RNA-seq)以及bulk TCR-seq数据。此外,研究团队还开发了YASIM-scTCR模拟工具,可根据用户设置的参数(如测序深度和读长)生成包含TCR和非TCR序列的scTCR-seq数据,从而实现对不同方法在准确性和灵敏度上的精确对比分析。本研究还综合比较了七种方法在灵敏度、准确性和计算效率等方面的表现,为相关领域提供了系统性的评估参考。 主要结果: 1. 在真实的scRNA-seq数据中,TRUST4和MiXCR展现出最高的灵敏度,而在准确性方面,DeRR和MiXCR表现尤为出色。 2. 我们开发了YASIM-scTCR工具,可生成包含TCR和非TCR序列的scTCR-seq模拟数据,支持用户自定义测序深度和reads长度等参数。 3. 研究发现,在模拟数据中,测序深度对TCR组装性能有显著影响。 4. 针对pseudo-bulk RNA-seq数据,TRUST4、MiXCR和CATT的表现优于其他方法。在大多数情况下,较高的TCR丰度与性能提升密切相关。 5. 综合评估包括准确性、灵敏度、适应性、易用性、运行时间和内存消耗六个方面,最终TRUST4排名最高,其次是MiXCR和DeRR。

Page qzae086


Original Research

iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning

Jiawei Luo , Kejuan Zhao , Junjie Chen , Caihua Yang , Fuchuan Qu , Yumeng Liu , Xiaopeng Jin , Ke Yan , Yang Zhang , Bin Liu

Functional peptides are short amino acid fragments that have a wide range of beneficial functions for living organisms. The majority of previous studies have focused on mono-functional peptides, but an increasing number of multi-functional peptides have been discovered. Although there have been enormous experimental efforts to assay multi-functional peptides, only a small portion of millions of known peptides has been explored. The development of effective and accurate techniques for identifying multi-functional peptides can facilitate their discovery and mechanistic understanding. In this study, we presented iMFP-LG, a method for multi-functional peptide identification based on protein language models (pLMs) and graph attention networks (GATs). Our comparative analyses demonstrated that iMFP-LG outperformed the state-of-the-art methods in identifying both multi-functional bioactive peptides and multi-functional therapeutic peptides. The interpretability of iMFP-LG was also illustrated by visualizing attention patterns in pLMs and GATs. Regarding the outstanding performance of iMFP-LG on the identification of multi-functional peptides, we employed iMFP-LG to screen novel peptides with both anti-microbial and anti-cancer functions from millions of known peptides in the UniRef90 database. As a result, eight candidate peptides were identified, among which one candidate was validated to process both anti-bacterial and anti-cancer properties through molecular structure alignment and biological experiments. We anticipate that iMFP-LG can assist in the discovery of multi-functional peptides and contribute to the advancement of peptide drug design.
研究问题 多功能肽是指同时具有多种生物学功能的小分子肽。尽管多功能肽在治疗性药物开发中具有重要潜力,但多功能肽在已知功能肽中所占比例较低,训练数据集通常存在显著的不平衡性。此外,其识别需要同时预测多个功能标签,而功能标签之间通常具有某种程度的相关性,传统方法难以有效建模这些复杂关系。因此,亟需一种高效和精准的计算方法来解决多功能肽的识别问题,同时帮助发现具有潜在应用价值的新型多功能肽。 研究方法 为了解决上述问题,论文提出了一种名为iMFP-LG 的新方法,它结合了蛋白质语言模型(pLM)和图注意力网络(GAT)来构建高效、鲁棒的多功能肽识别框架。论文利用预训练的pLM从肽序列中提取高维特征的向量表示,并首次将多功能肽的识别建模应用于基于图的分类问题。功能标签被表示为图的节点,功能之间的关系被表示为边,使用 GAT 建模节点之间的复杂关联性。引入对抗训练(Adversarial Training),通过添加对抗性扰动提高模型对输入数据的鲁棒性和泛化能力,避免过拟合。 基于训练好的iMFP-LG,论文建立了一套多功能肽发现管线,用于从大规模肽数据库(如UniRef90)中筛选出候选多功能肽。 主要结果 iMFP-LG 在多功能生物活性肽(MFBP)和多功能治疗性肽(MFTP)数据集上均优于现有最先进方法,特别是在小数据集和多功能肽类别上的表现更佳。通过可视化注意力模式,作者展示了 pLM 和 GAT 在提取关键序列特征和捕获功能关系方面的效果。作者从 UniRef90 数据库中筛选出 8 种同时具有抗菌(AMP)和抗癌(ACP)功能的候选多功能肽,并通过生物实验验证了其中 1 种肽的显著抗癌和抗菌活性。 软件链接 iMFP-LG相关代码可在 GitHub (https://github.com/chen-bioinfo/iMFP-LG) 上找到。数据已存储在国家基因组数据中心(NGDC)的 BioCode 数据库中(BioCode: BT007494),可通过以下公开链接访问:https://ngdc.cncb.ac.cn/biocode/tools/BT007494。

Page qzae084


Method

COCOA: A Framework for Fine-scale Mapping of Cell-type-specific Chromatin Compartments Using Epigenomic Information

Kai Li , Ping Zhang , Jinsheng Xu , Zi Wen , Junying Zhang , Zhike Zi , Li Li

Chromatin compartmentalization and epigenomic modifications play crucial roles in cell differentiation and disease development. However, precise mapping of chromatin compartment patterns requires Hi-C or Micro-C data at high sequencing depth. Exploring the systematic relationship between epigenomic modifications and compartment patterns remains challenging. To address these issues, we present COCOA, a deep neural network framework using convolution and attention mechanisms to infer fine-scale chromatin compartment patterns from six histone modification signals. COCOA extracts 1D track features through bidirectional feature reconstruction after resolution-specific binning of epigenomic signals. These track features are then cross-fused with contact features using an attention mechanism and transformed into chromatin compartment patterns through residual feature reduction. COCOA demonstrates accurate inference of chromatin compartmentalization at a fine-scale resolution and exhibits stable performance on test sets. Additionally, we explored the impact of histone modifications on chromatin compartmentalization prediction through in silico epigenomic perturbation experiments. Unlike obscure compartments observed in high-depth experimental data at 1-kb resolution, COCOA generates clear and detailed compartment patterns, highlighting its superior performance. Finally, we demonstrate that COCOA enables cell-type-specific prediction of unrevealed chromatin compartment patterns in various biological processes, making it an effective tool for gaining insights into chromatin compartmentalization from epigenomics in diverse biological scenarios. The COCOA Python code is publicly available at https://github.com/onlybugs/COCOA and https://ngdc.cncb.ac.cn/biocode/tools/BT007498.

Page qzae091


Method

ProtPipe: A Multifunctional Data Analysis Pipeline for Proteomics and Peptidomics

Ziyi Li , Cory A Weller , Syed Shah , Nicholas L Johnson , Ying Hao , Paige B Jarreau , Jessica Roberts , Deyaan Guha , Colleen Bereda , Sydney Klaisner , Pedro Machado , Matteo Zanovello , Mercedes Prudencio , Björn Oskarsson , Nathan P Staff , Dennis W Dickson , Pietro Fratta , Leonard Petrucelli , Priyanka Narayan , Mark R Cookson , Michael E Ward , Andrew B Singleton , Mike A Nalls , Yue A Qi

Mass spectrometry (MS) is a technique widely employed for the identification and characterization of proteins, with personalized medicine, systems biology, and biomedical applications. The application of MS-based proteomics advances our understanding of protein function, cellular signaling, and complex biological systems. MS data analysis is a critical process that includes identifying and quantifying proteins and peptides and then exploring their biological functions in downstream analyses. To address the complexities associated with MS data analysis, we developed ProtPipe to streamline and automate the processing and analysis of high-throughput proteomics and peptidomics datasets with DIA-NN preinstalled. The pipeline facilitates data quality control, sample filtering, and normalization, ensuring robust and reliable downstream analyses. ProtPipe provides downstream analyses, including protein and peptide differential abundance identification, pathway enrichment analysis, protein–protein interaction analysis, and major histocompatibility complex (MHC)–peptide binding affinity analysis. ProtPipe generates annotated tables and visualizations by performing statistical post-processing and calculating fold changes between predefined pairwise conditions in an experimental design. It is an open-source, well-documented tool available at https://github.com/NIH-CARD/ProtPipe, with a user-friendly web interface.

Page qzae083


Method

SCREEN: A Graph-based Contrastive Learning Tool to Infer Catalytic Residues and Assess Enzyme Mutations

Tong Pan , Yue Bi , Xiaoyu Wang , Ying Zhang , Geoffrey I Webb , Robin B Gasser , Lukasz Kurgan , Jiangning Song

The accurate identification of catalytic residues contributes to our understanding of enzyme functions in biological processes and pathways. The increasing number of protein sequences necessitates computational tools for the automated prediction of catalytic residues in enzymes. Here, we introduce SCREEN, a graph neural network for the high-throughput prediction of catalytic residues via the integration of enzyme functional and structural information. SCREEN constructs residue representations based on spatial arrangements and incorporates enzyme function priors into such representations through contrastive learning. We demonstrate that SCREEN (1) consistently outperforms currently-available predictors; (2) provides accurate results when applied to inferred enzyme structures; and (3) generalizes well to enzymes dissimilar from those in the training set. We also show that the putative catalytic residues predicted by SCREEN mimic key structural and biophysical characteristics of native catalytic residues. Moreover, using experimental datasets, we show that SCREEN’s predictions can be used to distinguish residues with a high mutation tolerance from those likely to cause functional loss when mutated, indicating that this tool might be used to infer disease-associated mutations. SCREEN is publicly available at https://github.com/BioColLab/SCREEN and https://ngdc.cncb.ac.cn/biocode/tool/7580.
1. 研究背景介绍 酶在维持生命的生化、分子和生理过程中发挥着关键作用,其卓越的催化能力主要由活性位点中的特定氨基酸残基(即催化残基)所决定。这些相邻的催化残基通过与底物分子形成关键相互作用,催化化学反应,并确保酶的催化效率和特异性。传统的实验方法(如定点突变实验和生化分析)虽能提供有效信息,但因通量低、耗时且劳动密集,难以满足对酶催化残基的大规模分析需求。酶的结构信息已被证明在酶功能预测和酶工程等领域具有重要应用价值。然而,大多数现有的催化残基预测方法主要依赖于酶序列或人工筛选的结构特征,难以捕捉氨基酸残基的空间排列,尤其是催化残基往往呈现空间聚集。此外,酶催化的复杂化学反应通常由少量关键催化残基驱动,但现有方法往往未能充分整合酶的功能数据,从而限制了对催化残基、酶结构和功能之间关系的深入探索。 2. 论文概要 2024年12月,Monash大学宋江宁教授团队在GPB杂志上在线发表了题为 “SCREEN: A Graph-based Contrastive Learning Tool to Infer Catalytic Residues and Assess Enzyme Mutations” 的研究论文。该研究提出一种名为SCREEN的深度学习的框架,能够整合酶结构、功能及进化特征等多源异构数据,实现对催化位点的精准识别。 3. 结果分析与阐述 研究团队认为,通过利用现代深度神经网络识别酶结构中残基的空间模式,可以显著提升催化残基预测工具的性能,并加深对酶功能的理解。为填补这一研究空白,团队提出了一种新颖高效的工具——SCREEN(Structure-based Catalytic Residue recognition in Enzymes),该工具能够快速、精准地预测不同分类群酶的催化残基。SCREEN采用图神经网络(GNN)建模酶结构中活性位点的空间排列,结合酶的结构、序列嵌入信息及进化特征进行综合分析。具体而言,SCREEN通过BLAST(Basic Local Alignment Search Tool)和HMMER(基于隐马尔可夫模型的序列分析工具)整合进化信息,并通过对比学习框架结合酶功能信息进一步提升预测性能,实现精准的催化残基预测。该工具不仅为解析酶生物催化分子机制提供了重要的基础资源,还通过结合催化位点预测与酶的进化、结构及功能信息,提供了酶进化和功能机制的重要见解。

Page qzae094


Database

SoyOD: An Integrated Soybean Multi-omics Database for Mining Genes and Biological Research

Jie Li , Qingyang Ni , Guangqi He , Jiale Huang , Haoyu Chao , Sida Li , Ming Chen , Guoyu Hu , James Whelan , Huixia Shou

Soybean is a globally important crop for food, feed, oil, and nitrogen fixation. A variety of multi-omics studies have been carried out, generating datasets ranging from genotype to phenotype. In order to efficiently utilize these data for basic and applied research, a soybean multi-omics database with extensive data coverage and comprehensive data analysis tools was established. The Soybean Omics Database (SoyOD) integrates important new datasets with existing public datasets to form the most comprehensive collection of soybean multi-omics information. Compared to existing soybean databases, SoyOD incorporates an extensive collection of novel data derived from the deep-sequencing of 984 germplasms, 162 novel transcriptomic datasets from seeds at different developmental stages, 53 phenotypic datasets, and more than 2500 phenotypic images. In addition, SoyOD integrates existing data resources, including 59 assembled genomes, genetic variation data from 3904 soybean accessions, 225 sets of phenotypic data, and 1097 transcriptomic sequences covering 507 different tissues and treatment conditions. Moreover, SoyOD can be used to mine candidate genes for important agronomic traits, as shown in a case study on plant height. Additionally, powerful analytical and easy-to-use toolkits enable users to easily access the available multi-omics datasets, and to rapidly search genotypic and phenotypic data in a particular germplasm. The novelty, comprehensiveness, and user-friendly features of SoyOD make it a valuable resource for soybean molecular breeding and biological research. SoyOD is publicly accessible at https://bis.zju.edu.cn/soyod.
研究问题: 迄今为止,大豆基因组学研究已积累了数千种大豆种质资源的遗传变异信息及其他多维组学数据,涵盖了基因组重测序、泛基因组、转录组和表型数据等,为大豆功能基因组学和分子育种研究提供了丰富的资源。为进一步有效地整理和利用这些多组学数据,团队开发了大豆多组学数据库SoyOD。该平台是一个综合性的资源库,涵盖了大豆的基因组学、遗传学及相关数据。而现有的数据库未能及时整合完整的数据集,这限制了大豆研究的数据利用效率。SoyOD为大豆功能基因挖掘提供了一个直观且用户友好的界面,并构建了交互式在线工具包,形成了一个便捷的一站式服务平台,旨在为大豆生物学深入研究提供助力。 研究方法: 本研究全面收集并整合了多个大豆数据集,包括多个基因组、转录组、重测序和表型等。为了深入挖掘这些数据中的信息,采用了多种分析方法,如基因组学、群体遗传学和转录组学等,对数据进行细致的解析和加工,以获得重要的数据和指标。 主要结果: 1、 源头数据的获取:本研究共完成了940份大豆种质资源的深度测序;完成了各类种质资源162组种子发育期的转录组测序;完成了种质资源53类表型数据的测定及2500多幅表型图片的拍摄。 2、 成功搭建了一个综合性的大豆多组学数据库SoyOD,该数据库整合了基因组、转录组和群体遗传信息,为研究人员提供了一个高效的平台,用于查询基因组信息、基因表达数据和遗传变异等。 3、 引入了多个交互式工具,包括基因ID转换、基因组坐标转换、序列提取、热图和JBrowse浏览器等,增强了数据库的功能性和实用性。 4、 SoyOD作为多组学分析的一站式平台且用户友好的界面,方便科研工作者对大豆基因功能的挖掘。

Page qzae080