Review Article
Evolution of Plant Genome Size and Composition
Bing He, Wanfei Liu, Jianyang Li, Siwei Xiong, Jing Jia, Qiang Lin, Hailin Liu, Peng Cui
View
abstract
The rapid development of sequencing technology has led to an explosion of plant genome data, opening up more opportunities for research in the field of comparative evolutionary analysis of plant genomes. In this review, we focus on changes in plant genome size and composition, examining the effects of polyploidy, whole-genome duplication, and alternations in transposable elements on plant genome architecture and evolution, respectively. In addition, to address gaps in the available information, we also collected and analyzed 234 representative plant genome data as a supplement. We aim to provide a comprehensive, up-to-date summary of information on plant genome architecture and evolution in this review.
Page qzae078
Review Article
RNA 5-Methylcytosine Modification: Regulatory Molecules, Biological Functions, and Human Diseases
Yanfang Lu, Liu Yang, Qi Feng, Yong Liu, Xiaohui Sun, Dongwei Liu, Long Qiao, Zhangsuo Liu
View
abstract
RNA methylation modifications influence gene expression, and disruptions of these processes are often associated with various human diseases. The common RNA methylation modification 5-methylcytosine (m5C), which is dynamically regulated by writers, erasers, and readers, widely occurs in transfer RNAs (tRNAs), messenger RNAs (mRNAs), ribosomal RNAs (rRNAs), enhancer RNAs (eRNAs), and other non-coding RNAs (ncRNAs). RNA m5C modification regulates metabolism, stability, nuclear export, and translation of RNA molecules. An increasing number of studies have revealed the critical roles of the m5C RNA modification and its regulators in the development, diagnosis, prognosis, and treatment of various human diseases. In this review, we summarized the recent studies on RNA m5C modification and discussed the advances in its detection methodologies, distribution, and regulators. Furthermore, we addressed the significance of RNAs modified with m5C marks in essential biological processes as well as in the development of various human disorders, from neurological diseases to cancers. This review provides a new perspective on the diagnosis, treatment, and monitoring of human diseases by elucidating the complex regulatory network of the epigenetic m5C modification.
Page qzae063
Review Article
Multiome-wide Association Studies: Novel Approaches for Understanding Diseases
Mengting Shao, Kaiyang Chen, Shuting Zhang, Min Tian, Yan Shen, Chen Cao, Ning Gu
View
abstract
The rapid development of multiome (transcriptome, proteome, cistrome, imaging, and regulome)-wide association study methods have opened new avenues for biologists to understand the susceptibility genes underlying complex diseases. Thorough comparisons of these methods are essential for selecting the most appropriate tool for a given research objective. This review provides a detailed categorization and summary of the statistical models, use cases, and advantages of recent multiome-wide association studies. In addition, to illustrate gene–disease association studies based on transcriptome-wide association study (TWAS), we collected 478 disease entries across 22 categories from 235 manually reviewed publications. Our analysis reveals that mental disorders are the most frequently studied diseases by TWAS, indicating its potential to deepen our understanding of the genetic architecture of complex diseases. In summary, this review underscores the importance of multiome-wide association studies in elucidating complex diseases and highlights the significance of selecting the appropriate method for each study.
Page qzae077
Review Article
Bioinformatic Resources for Exploring Human–virus Protein–protein Interactions Based on Binding Modes
Huimin Chen, Jiaxin Liu, Gege Tang, Gefei Hao, Guangfu Yang
View
abstract
Historically, there have been many outbreaks of viral diseases that have continued to claim millions of lives. Research on human–virus protein–protein interactions (PPIs) is vital to understanding the principles of human–virus relationships, providing an essential foundation for developing virus control strategies to combat diseases. The rapidly accumulating data on human–virus PPIs offer unprecedented opportunities for bioinformatics research around human–virus PPIs. However, available detailed analyses and summaries to help use these resources systematically and efficiently are lacking. Here, we comprehensively review the bioinformatic resources used in human–virus PPI research, and discuss and compare their functions, performance, and limitations. This review aims to provide researchers with a bioinformatic toolbox that will hopefully better facilitate the exploration of human–virus PPIs based on binding modes.
Page qzae075
Original Research
Centromere Landscapes Resolved from Hundreds of Human Genomes
Shenghan Gao, Yimeng Zhang, Stephen J Bush, Bo Wang, Xiaofei Yang, Kai Ye
View
abstract
High-fidelity (HiFi) sequencing has facilitated the assembly and analysis of the most repetitive region of the genome, the centromere. Nevertheless, our current understanding of human centromeres is based on a relatively small number of telomere-to-telomere assemblies, which have not yet captured its full diversity. In this study, we investigated the genomic diversity of human centromere higher order repeats (HORs) via both HiFi reads and haplotype-resolved assemblies from hundreds of samples drawn from ongoing pangenome-sequencing projects and reprocessed them via a novel HOR annotation pipeline, HiCAT-human. We used this wealth of data to provide a global survey of the centromeric HOR landscape; in particular, we found that 23 HORs presented significant copy number variability between populations. We detected three centromere genotypes with unbalanced population frequencies on chromosomes 5, 8, and 17. An inter-assembly comparison of HOR loci further revealed that while HOR array structures are diverse, they nevertheless tend to form a number of specific landscapes, each exhibiting different levels of HOR subunit expansion and possibly reflecting a cyclical evolutionary transition from homogeneous to nested structures and back.
研究问题:着丝粒作为染色体上的重要区域,对遗传信息的可靠传递以及基因组的稳定起着决定性作用。由于着丝粒序列高度重复、重复单元高度相似,着丝粒结构的精细解析面临巨大挑战。目前对人类着丝粒的研究仅局限在少量样本,难以探索人群中着丝粒高阶重复结构(higher order repeat, HOR)的多样性和演化模式。
研究方法:本研究基于102个样本的HiFi测序数据和109个公开发表的单倍型基因组组装,开发了人群着丝粒高阶重复挖掘工具HiCAT-human,进行了人群着丝粒HOR精细解析,揭示人群着丝粒HOR存在显著多样性。通过对人群着丝粒HOR模式的聚类分析,发现染色体的HOR模式呈现一定的聚类特征,揭示了染色体着丝粒存在许多特定的基因型。进一步,通过对所有样本的着丝粒HOR模式进行系统分析,提出了人类着丝粒HOR演化模型。
主要结果:
1. 在14条染色体上发现共23种着丝粒HOR模式存在人群间显著的不平衡。
2. 5,8,17号染色体上,存在三种HOR数量分布明显不同的着丝粒基因型。
3. 解析了每条染色体上的着丝粒HOR阵列结构,将染色体分为4种不同类型。
4. 提出了局部嵌套扩张驱动的着丝粒HOR演化模型。
Page qzae071
Original Research
RAG-seq: NSR-primed and Transposase Tagmentation-mediated Strand-specific Total RNA Sequencing in Single Cells
Ping Xu, Zhiheng Yuan, Xiaohua Lu, Peng Zhou, Ding Qiu, Zhenghao Qiao, Zhongcheng Zhou, Li Guan, Yongkang Jia, Xuan He, Ling Sun, Youzhong Wan, Ming Wang, Yang Yu
View
abstract
Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular diversity with unprecedented resolution. However, many current methods are limited in capturing full-length transcripts and discerning strand orientation. Here, we present RAG-seq, an innovative strand-specific total RNA sequencing technique that combines not-so-random (NSR) primers with Tn5 transposase-mediated tagmentation. RAG-seq overcomes previous limitations by delivering comprehensive transcript coverage and maintaining strand orientation, which are essential for accurate quantification of overlapping genes and detection of antisense transcripts. Through optimized reverse transcription with oligo-dT primers, rRNA depletion via Depletion of Abundant Sequences by Hybridization (DASH), and linear amplification, RAG-seq enhances sensitivity and reproducibility, especially for low-input samples and single cells. Application to mouse oocytes and early embryos highlights RAG-seq’s superior performance in identifying stage-specific antisense transcripts, shedding light on their regulatory roles during early development. This advancement represents a significant leap in transcriptome analysis within complex biological contexts.
Page qzae072
Original Research
Identify Non-mutational p53 Functional Deficiency in Human Cancers
Qianpeng Li, Yang Zhang, Sicheng Luo, Zhang Zhang, Ann L Oberg, David E Kozono, Hua Lu, Jann N Sarkaria, Lina Ma, Liguo Wang
View
abstract
An accurate assessment of p53’s functional statuses is critical for cancer genomic medicine. However, there is a significant challenge in identifying tumors with non-mutational p53 inactivation which is not detectable through DNA sequencing. These undetected cases are often misclassified as p53-normal, leading to inaccurate prognosis and downstream association analyses. To address this issue, we built the support vector machine (SVM) models to systematically reassess p53’s functional statuses in TP53 wild-type (TP53WT) tumors from multiple The Cancer Genome Atlas (TCGA) cohorts. Cross-validation demonstrated the good performance of the SVM models with a mean area under the receiver operating characteristic curve (AUROC) of 0.9822, precision of 0.9747, and recall of 0.9784. Our study revealed that a significant proportion (87%–99%) of TP53WT tumors actually had compromised p53 function. Additional analyses uncovered that these genetically intact but functionally impaired (termed as predictively reduced function of p53 or TP53WT-pRF) tumors exhibited genomic and pathophysiologic features akin to TP53-mutant tumors: heightened genomic instability and elevated levels of hypoxia. Clinically, patients with TP53WT-pRF tumors experienced significantly shortened overall survival or progression-free survival compared to those with predictively normal function of p53 (TP53WT-pN) tumors, and these patients also displayed increased sensitivity to platinum-based chemotherapy and radiation therapy.
研究问题:
p53蛋白是一个重要的抑癌因子。其编码基因TP53是癌症中最常发生突变的基因。常用的基于DNA测序的分子诊断手段无法检测到非突变型的p53蛋白失活或缺陷,这些未检出的病例常被误判为p53蛋白功能正常,导致对癌症检测结果和治疗方案的错误研判。然而,目前尚无有效的p53蛋白活性评估手段,这成为癌症分子诊断的主要瓶颈。
研究方法:
收集已报道的p53靶基因集合,通过分析ChIP-seq和RNA-seq数据进一步鉴定癌种特异的p53调控基因,根据表达量变化将p53调控基因分为上调组和下调组。
针对每个样本,分别计算上调组和下调组的GSVA、ssGSEA、Z-score,所有调控基因PCA的PC1值,获得每个样本的综合表达分数(Composite Expression Score,CES),共7个值。
将癌旁组织作为p53蛋白功能正常样本,TP53截断突变(Truncating Mutation,TM)的样本作为p53功能失活样本,利用这两种样本的CES训练癌种特异的支持向量机(Support Vector Machine,SVM)模型。
在七个癌种中,利用构建的SVM模型预测TP53野生型(Wild Type, WT)、错义突变(Missense Mutation,MM)样本中的p53蛋白活性。
利用重组能力评分(RPS)评估化疗敏感性,利用放疗敏感性评分(RSS)评估放疗敏感性,计算GSVA值以表征化疗、放疗的敏感性。
使用患者来源的异种移植(PDX)模型,验证胶质母细胞瘤中非突变型p53功能缺陷型肿瘤样本响应放疗的敏感性。
基于TCGA的RNA-seq测序reads分析TP53转录本水平的突变,比较与DNA水平的差异,研究导致非突变型p53蛋白功能缺陷的因素。
主要结果:
1. 研究的七个癌种中,大部分TP53基因野生型WT肿瘤样本(87%–99%)被预测为p53蛋白功能失活或缺陷。
2. WT肿瘤中预测的p53蛋白功能失活或缺陷样本(pRF)与TP53基因突变样本具有明显相似性,包括较高的基因组不稳定性和缺氧水平。
3. WT肿瘤中pRF样本生存率较低,但对铂类化疗以及放疗具有较高敏感性。
4. 全外显子组测序实验的假阴性和MDM2与MDM4基因扩增可解释22%–25% 的pRF样本的产生。
代码获取链接:
p53蛋白活性预测模型可在https://github.com/liguowang/epage或国家生信信息中心BioCode网站 https://ngdc.cncb.ac.cn/biocode/tool/BT7490获取。
Page qzae064
Original Research
Enhancing Variant Calling in Whole-exome Sequencing Data Using Population-matched Reference Genomes
Shuming Guo, Zhuo Huang, Yanming Zhang, Yukun He, Xiangju Chen, Wenjuan Wang, Lansheng Li, Yu Kang, Zhancheng Gao, Jun Yu, Zhenglin Du, Yanan Chu
View
abstract
Whole-exome sequencing (WES) data are frequently used for cancer diagnosis and genome-wide association studies (GWAS), based on high-coverage read mapping, informative variant calling, and high-quality reference genomes. The center position of the currently used genome assembly, GRCh38, is now challenged by two newly published telomere-to-telomere (T2T) genomes, T2T-CHM13 and T2T-YAO, and it becomes urgent to have a comparative study to test population specificity using the three reference genomes based on real case WES data. Here, we report our analysis along this line for 19 tumor samples collected from Chinese patients. The primary comparison of the exon regions among the three references reveals that the sequences in up to ∼ 1% of target regions in T2T-YAO are widely diversified from GRCh38 and may lead to off-target in sequence capture. However, T2T-YAO still outperforms GRCh38 by obtaining 7.41% of more mapped reads. Due to more reliable read-mapping and closer phylogenetic relationship with the samples than GRCh38, T2T-YAO reduces half of variant calls of clinical significance which are mostly benign, while maintaining sensitivity in identifying pathogenic variants. T2T-YAO also outperforms T2T-CHM13 in reducing calls of Chinese-specific variants. Our findings highlight the critical need for employing population-specific reference genomes in genomic analysis to ensure accurate variant analysis and the significant benefits of tailoring these approaches to the unique genetic background of each ethnic group.
Page qzae070
Original Research
Virus Infection Induces Immune Gene Activation with CTCF-anchored Enhancers and Chromatin Interactions in Pig Genome
Jianhua Cao, Ruimin Ren, Xiaolong Li, Xiaoqian Zhang, Yan Sun, Xiaohuan Tian, Ru Liu, Xiangdong Liu, Yijun Ruan, Guoliang Li, Shuhong Zhao
View
abstract
Chromatin organization is important for gene transcription in pig genome. However, its three-dimensional (3D) structure and dynamics are much less investigated than those in human. Here, we applied the long-read chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) method to map the whole-genome chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) in porcine macrophage cells before and after polyinosinic-polycytidylic acid [Poly(I:C)] induction. Our results reveal that Poly(I:C) induction impacts the 3D genome organization in the 3D4/21 cells at the fine-scale chromatin loop level rather than at the large-scale domain level. Furthermore, our findings underscore the pivotal role of CTCF-anchored chromatin interactions in reshaping chromatin architecture during immune responses. Knockout of the CTCF-binding locus further confirms that the CTCF-anchored enhancers are associated with the activation of immune genes via long-range interactions. Notably, the ChIA-PET data also support the spatial relationship between single nucleotide polymorphisms (SNPs) and related gene transcription in 3D genome aspect. Our findings in this study provide new clues and potential targets to explore key elements related to diseases in pigs and are also likely to shed light on elucidating chromatin organization and dynamics underlying the process of mammalian infectious diseases.
研究问题
猪基因组的三维结构对于基因表达调控至关重要,尤其是在病毒感染和免疫应答过程中。尽管猪在生物医学研究中具有重要价值,且与人类在遗传和生理上的相似性使得它成为研究疾病机制和开发新疗法的理想模型,但目前对猪基因组三维结构的认识仍然十分有限。
研究方法
本研究使用聚肌苷酸-聚胞苷酸[Poly(I:C)]刺激猪肺泡巨噬细胞,模拟体外病毒感染过程。在Poly(I:C)诱导前后的3D4/21细胞上开展了CTCF和RNAPⅡ介导的配对末端标签测序染色质相互作用分析(chromatin interaction analysis by paired-end tag sequencing, ChIA-PET)实验,同时结合多组学数据(RNA-seq、ChIP-seq和Hi-C)深入探究了免疫反应期间其染色质三维结构的动态变化。
主要结果
1. Poly(I:C)诱导后能够显著上调3D4/21细胞中免疫相关基因的表达,同时伴随表观遗传修饰的变化。
2. 在免疫反应期间,CTCF锚定的增强子重塑了猪基因组中的染色质结构,导致免疫基因的转录激活。
3. 在GBP基因家族的拓扑结构域(TAD)边界的CTCF结合位点上的单碱基变异SNP,可能在猪群体中由PRRS病毒触发的免疫反应中起着关键作用。
数据链接
https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA012056
Page qzae062
Original Research
The Genome Architecture of the Copepod Eurytemora carolleeae — the Highly Invasive Atlantic Clade of the Eurytemora affinis Species Complex
Zhenyong Du, Gregory Gelembiuk, Wynne Moss, Andrew Tritt, Carol Eunmi Lee
View
abstract
Copepods are among the most abundant organisms on the planet and play critical functions in aquatic ecosystems. Among copepods, populations of the Eurytemora affinis species complex are numerically dominant in many coastal habitats and serve as food sources for major fisheries. Intriguingly, certain populations possess the unusual capacity to invade novel salinities on rapid time scales. Despite their ecological importance, high-quality genomic resources have been absent for calanoid copepods, limiting our ability to comprehensively dissect the genome architecture underlying the highly invasive and adaptive capacity of certain populations. Here, we present the first chromosome-level genome of a calanoid copepod, from the Atlantic clade (Eurytemora carolleeae) of the E. affinis species complex. This genome was assembled using high-coverage PacBio long-read and Hi-C sequences of an inbred line, generated through 30 generations of full-sib mating. This genome, consisting of 529.3 Mb (contig N50 = 4.2 Mb, scaffold N50 = 140.6 Mb), was anchored onto four chromosomes. Genome annotation predicted 20,262 protein-coding genes, of which ion transport-related gene families were substantially expanded based on comparative analyses of 12 additional arthropod genomes. Also, we found genome-wide signatures of historical gene body methylation of the ion transport-related genes and the significant clustering of these genes on each chromosome. This genome represents one of the most contiguous copepod genomes to date and is among the highest quality marine invertebrate genomes. As such, this genome provides an invaluable resource to help yield fundamental insights into the ability of this copepod to adapt to rapidly changing environments.
、研究方法替换为下列内容
在本项研究中,以大西洋真宽水蚤为研究焦点,成功报道了该物种的首个染色体级别的参考基因组。这一成果不仅代表了大西洋真宽水蚤基因组研究的里程碑,也是桡足纲哲水蚤目中首个达到染色体级别的参考基因组。通过细致的比较基因组学分析,深入揭示了大西洋真宽水蚤在高度盐度适应性背后的潜在基因组进化机制,为理解其如何适应极端盐度环境提供了关键的分子层面见解。
研究方法:
通过比较基因组学分析,揭示了大西洋真宽水蚤在高度盐度适应性背后的潜在基因组进化机制。
研究意义:
报道了大西洋真宽水蚤同时也是桡足纲哲水蚤目中首个染色体级别的参考基因组,是大西洋真宽水蚤基因组研究的里程碑,为理解其如何适应极端盐度环境提供了关键的分子层面见解。
Page qzae066
Method
MitoSort: Robust Demultiplexing of Pooled Single-cell Genomic Data Using Endogenous Mitochondrial Variants
Zhongjie Tang, Weixing Zhang, Peiyu Shi, Sijun Li, Xinhui Li, Yueming Li, Yicong Xu, Yaqing Shu, Zheng Hu, Jin Xu
View
abstract
Multiplexing across donors has emerged as a popular strategy to increase throughput, reduce costs, overcome technical batch effects, and improve doublet detection in single-cell genomic studies. To eliminate additional experimental steps, endogenous nuclear genome variants are used for demultiplexing pooled single-cell RNA sequencing (scRNA-seq) data by several computational tools. However, these tools have limitations when applied to single-cell sequencing methods that do not cover nuclear genomic regions well, such as single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq). Here, we demonstrate that mitochondrial germline variants are an alternative, robust, and computationally efficient endogenous barcode for sample demultiplexing. We propose MitoSort, a tool that uses mitochondrial germline variants to assign cells to their donor origins and identify cross-genotype doublets in single-cell genomic datasets. We evaluate its performance by using in silico pooled mitochondrial scATAC-seq (mtscATAC-seq) libraries and experimentally multiplexed data with cell hashtags. MitoSort achieves high accuracy and efficiency in genotype clustering and doublet detection for mtscATAC-seq data, addressing the limitations of current computational techniques tailored for scRNA-seq data. Moreover, MitoSort exhibits versatility, and can be applied to various single-cell sequencing approaches beyond mtscATAC-seq provided that the mitochondrial variants are reliably detected. Furthermore, we demonstrate the application of MitoSort in a case study where B cells from eight donors were pooled and assayed by single-cell multi-omics sequencing. Altogether, our results demonstrate the accuracy and efficiency of MitoSort, which enables reliable sample demultiplexing in various single-cell genomic applications. MitoSort is available at https://github.com/tangzhj/MitoSort.
研究问题:
• 混样测序策略的应用使得研究人员能够低成本、高效率地利用单细胞测序技术进行科研探索和临床应用。在核基因组测序覆盖度不足时,基于核基因组变异的样本解混方法具有局限性。线粒体基因组在单细胞测序中易被捕获并且携带个体特异的多态性位点,为单细胞基因组测序数据进行准确解混和双联体细胞的准确检测提供了新的思路。
研究方法:
• MitoSort利用线粒体种系突变进行样本解混,解决了现有方法的局限性。
• 通过贝叶斯推断,根据细胞中线粒体种系突变的分布矩阵,计算每个细胞属于不同供体的概率,从而高效准确地识别细胞供体来源并检测跨基因型的双联体细胞。
主要结果1:MitoSort 在mtscATAC-seq 数据上表现出高度准确性,能够可靠地识别细胞供体来源。
主要结果2:与现有的解混工具相比,MitoSort在解混性能和计算效率上有显著优势。
主要结果3:通过对多个体混样外周单核细胞进行单细胞多组学测序,构建了基准测试数据集,评估并比较MitoSort的优越性能。
方法链接:
https://github.com/tangzhj/MitoSort
Page qzae073
Database
DRED: A Comprehensive Database of Genes Related to Repeat Expansion Diseases
Qingqing Shi, Min Dai, Yingke Ma, Jun Liu, Xiuying Liu, Xiu-Jie Wang
View
abstract
Expansion of tandem repeats in genes often causes severe diseases, such as fragile X syndrome, Huntington’s disease, and spinocerebellar ataxia. However, information on genes associated with repeat expansion diseases is scattered throughout the literature, systematic prediction of potential genes that may cause diseases via repeat expansion is also lacking. Here, we develop DRED, a Database of genes related to Repeat Expansion Diseases, as a manually-curated database that covers all known 61 genes related to repeat expansion diseases reported in PubMed and OMIM, along with the detailed repeat information for each gene. DRED also includes 516 genes with the potential to cause diseases via repeat expansion, which were predicted based on their repeat composition, genetic variations, genomic features, and disease associations. Various types of information on repeat expansion diseases and their corresponding genes/repeats are presented in DRED, together with links to external resources, such as NCBI and ClinVar. DRED provides user-friendly interfaces with comprehensive functions, and can serve as a central data resource for basic research and repeat expansion disease-related medical diagnosis. DRED is freely accessible at http://omicslab.genetics.ac.cn/dred, and will be frequently updated to include newly reported genes related to repeat expansion diseases.
Page qzae068
Database
DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins
Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao
View
abstract
DNA replication is a complex and crucial biological process in eukaryotes. To facilitate the study of eukaryotic replication events, we present a database of eukaryotic DNA replication origins (DeOri), which collects genome-wide data on eukaryotic DNA replication origins currently available. With the rapid development of high-throughput experimental technology in recent years, the number of datasets in the new release of DeOri 10.0 increased from 10 to 151 and the number of sequences increased from 16,145 to 9,742,396. Besides nucleotide sequences and browser extensible data (BED) files, corresponding annotation files, such as coding sequences (CDSs), mRNAs, and other biological elements within replication origins, are also provided. The experimental techniques used for each dataset, as well as related statistical data, are also presented on web page. Differences in experimental methods, cell lines, and sequencing technologies have resulted in distinct replication origins, making it challenging to differentiate between cell-specific and non-specific replication origins. Based on multiple replication origin datasets at the species level, we scored and screened replication origins in Homo sapiens, Gallus gallus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. The screened regions with high scores were considered as species-conservative origins, which are integrated and presented as reference replication origins (rORIs). Additionally, we analyzed the distribution of relevant genomic elements associated with replication origins at the genome level, such as CpG island (CGI), transcription start site (TSS), and G-quadruplex (G4). These analysis results can be browsed and downloaded as needed at http://tubic.tju.edu.cn/deori/.
研究问题:
DNA复制作为生物遗传发育的一个重要环节,复制过程中出现的错误及细胞周期调控的紊乱容易导致细胞损伤和癌变。随着实验技术和手段的发展,真核生物复制起始点的相关数据增长迅速,迫切需要建立相关数据平台对其进行有效整合,进而促进对真核生物复制机理的研究。
研究方法:
基于对公开研究和公共数据库的广泛搜集和挖掘,本文对基因组水平的真核生物复制起始点数据进行分类汇总,并进一步收录了与复制起始点相关的蛋白结合序列,开发了通过实验鉴定的真核生物复制起始点数据库DeOri (Database of eukaryotic replication Origins) 版本10.0。
主要结果:
1. 数据库DeOri 10.0包含复制起始点Origin、复制起始区Zone、复制相关蛋白结合序列Site的序列数据和坐标信息,以及基因组元件在该区域的注释信息。
2. 相对于以往版本,数据库DeOri 10.0具有重新设计的框架、用户友好的浏览页面和交互式的可视化模块,统计了与复制起始点相关的基因组元件分布情况和数据集的统计信息,为复制起始和调控机制的研究提供参考。
3. 数据库DeOri 10.0在物种水平上通过打分筛选,获得保守的复制起始点,提供了5个物种的参考复制起始点数据集。
数据库链接:
http://tubic.tju.edu.cn/deori/
Page qzae076
Application Note
GP-Plotter: Flexible Spectral Visualization for Proteomics Data with Emphasis on Glycoproteomics Analysis
Zheng Fang, Mingming Dong, Hongqiang Qin, Mingliang Ye
View
abstract
Identification evaluation and result dissemination are essential components in mass spectrometry-based proteomics analysis. The visualization of fragment ions in mass spectrum provides strong evidence for peptide identification and modification localization. Here, we present an easy-to-use tool, named GP-Plotter, for ion annotation of tandem mass spectra and corresponding image output. Identification result files of common searching tools in the community and user-customized files are supported as input of GP-Plotter. Multiple display modes and parameter customization can be achieved in GP-Plotter to present annotated spectra of interest. Different image formats, especially vector graphic formats, are available for image generation which is favorable for data publication. Notably, GP-Plotter is also well-suited for the visualization and evaluation of glycopeptide spectrum assignments with comprehensive annotation of glycan fragment ions. With a user-friendly graphical interface, GP-Plotter is expected to be a universal visualization tool for the community. GP-Plotter has been implemented in the latest version of Glyco-Decipher (v1.0.4) and the standalone GP-Plotter software is also freely available at https://github.com/DICP-1809.
研究问题
随着富集方法和质谱仪器的快速发展,基于质谱的蛋白质组学已成为高通量鉴定与定位包括糖基化在内的翻译后修饰(Post-Translational Modifications,PTMs)的主流方法。在过去数年间,多种软件工具已被开发用于鉴定与展示修饰肽段的质谱碎裂谱图,然而现有的可视化工具无法系统标注糖链碎片离子,因此无法应用于糖肽谱图的可视化。更重要的是目前的鉴定软件仅支持自身鉴定结果的可视化,阻碍了不同鉴定流程之间的比较和评估。蛋白质组学,特别是糖蛋白质组学领域,还缺少适用于不同鉴定软件的通用可视化工具。
研究方法
我们发展了GP-Plotter这一可视化工具用于实现蛋白质组学串联质谱谱图的离子标注与图片输出。GP-Plotter 可以读取常见鉴定工具的肽段结果并进行碎片离子注释,也能够对完整糖肽谱图中的糖链离子进行全面注释。作为一款灵活的工具,GP-Plotter支持镜像谱图等多种显示模式,还可以基于用户自定义肽段列表对质谱数据进行注释。用户可以通过图形界面轻松自定义图像生成中的多个关键参数。同时,GP-Plotter 可将标注的谱图以矢量图等常见的图像格式输出,促进组学数据的分析与共享。
主要结果
1. GP-Plotter可读取基于mzML标准格式的谱图信息,可支持展示包括MaxQuant、pFind和MSFragger等常见蛋白质组学软件的常规肽段鉴定结果,同时也支持Glyco-Decipher、Byonic和pGlyco等软件的糖蛋白质组学的糖肽鉴定结果。
2. GP-Plotter通过读取WURCS 2.0编码的糖链结构信息,通过内置的步进枚举算法,可以实现糖肽谱图中肽段骨架与糖链修饰的碎片离子的全面系统标注。
3. 通过灵活的鉴定结果输入与谱图展示模式,GP-Plotter可促进糖肽等翻译后修饰肽段的鉴定结果的可信度评估与修饰的位点定位。
软件链接
GP-Plotter 已集成到 Glyco-Decipher 的最新版本(v1.0.4)中,且独立的 GP-Plotter程序可在 https://github.com/DICP-1809 免费获取。
Page qzae069