• 5
  • 4
  • 3
  • 2
  • 1

CiteScore: 12.4 Impact Factor: 7.691

Genomics, Proteomics & Bioinformatics (GPB) is the official journal of Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. The goals of GPB are to disseminate new frontiers in the field of omics and bioinformatics, to publish high-quality discoveries in a fast-pace, and to promote open access and online publication via Article-in-Press for efficient publishing.

Read more...

Recent Articles (Volume:19, Issue:2)

1. Single-cell RNA Sequencing Deciphers Immune Landscape of Human Recurrent Miscarriage

Chunyu Huang, Yong Zeng, Wenwei Tu

Page 169-171


2. Profiling Chromatin Accessibility at Single-cell Resolution

Sarthak Sinha, Ansuman T. Satpathy, Weiqiang Zhou, Hongkai Ji, Jo A. Stratton, Arzina Jaffer, Nizar Bahlis, Sorana Morrissy, Jeff A. Biernaskie

How distinct transcriptional programs are enacted to generate cellular heterogeneity and plasticity, and enable complex fate decisions are important open questions. One key regulator is the cell’s epigenome state that drives distinct transcriptional programs by regulating chromatin accessibility. Genome-wide chromatin accessibility measurements can impart insights into regulatory sequences (in)accessible to DNA-binding proteins at a single-cell resolution. This review outlines molecular methods and bioinformatic tools for capturing cell-to-cell chromatin variation using single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) in a scalable fashion. It also covers joint profiling of chromatin with transcriptome/proteome measurements, computational strategies to integrate multi-omic measurements, and predictive bioinformatic tools to infer chromatin accessibility from single-cell transcriptomic datasets. Methodological refinements that increase power for cell discovery through robust chromatin coverage and integrate measurements from multiple modalities will further expand our understanding of gene regulation during homeostasis and disease.
细胞与细胞间不同的转录活动是影响细胞间的异质性、可塑性,并最终决定细胞命运。染色质结构(如:染色质的开放程度)在基因表达调控中起关键作用。全基因组范围的染色质开放程度检测可以使我们获得单细胞水平的调控蛋白结合区域。本文概述了用来揭示不同细胞之间染色质开放程度变化的实验技术——单细胞ATAC测序技术(scATAC-seq),以及相关的生物信息分析方法。同时,本文还囊括了对scATAC数据与转录组、蛋白质组等多组学数据整合分析的分析策略,以及如何使用单细胞转录组数据用于预测染色质开放程度的生物信息工具的介绍。通过更精细尺度的染色质覆盖以及整合多组学信息,我们可以更深入的理解稳态和疾病中的基因转录调控活动。

Page 172-190


3. Single-cell Analysis Technologies for Immuno-oncology Research: from Mechanistic Delineation to Biomarker Discovery

Zhiliang Bai, Graham Su, Rong Fan

The successes with immune checkpoint blockade (ICB) and chimeric antigen receptor (CAR)-T-cell therapy in treating multiple cancer types have established immunotherapy as a powerful curative option for patients with advanced cancers. Unfortunately, many patients do not derive benefit or long-term responses, highlighting a pressing need to perform complete investigation of the underlying mechanisms and the immunotherapy-induced tumor regression or rejection. In recent years, a large number of single-cell technologies have leveraged advances in characterizing immune system, profiling tumor microenvironment, and identifying cellular heterogeneity, which establish the foundations for lifting the veil on the comprehensive crosstalk between cancer and immune system during immunotherapies. In this review, we introduce the applications of the most widely used single-cell technologies in furthering our understanding of immunotherapies in terms of underlying mechanisms and their association with therapeutic outcomes. We also discuss how single-cell analyses help to deliver new insights into biomarker discovery to predict patient response rate, monitor acquired resistance, and support prophylactic strategy development for toxicity management. Finally, we provide an overview of applying cutting-edge single-cell spatial-omics to point out the heterogeneity of tumor–immune interactions at higher level that can ultimately guide to the rational design of next-generation immunotherapies.
免疫检查点抑制剂(immune checkpoint blockade)和嵌合抗原受体T细胞(Chimeric antigen receptor-T cell,CAR-T)在多种癌症治疗中取得的成功使得免疫疗法成为恶性肿瘤的潜在有力克星。但不幸的是,仍然有许多病人无法从这些新疗法中获益或者疗效持续时间短暂。因此,全面深入的剖析疗效背后的分子机制、明确与治疗相关的肿瘤消退或抗药原因尤为迫切。近几年,众多单细胞分析技术在表征免疫系统、研究肿瘤微环境、解读细胞间异质性等应用中发挥着日益重要的作用、取得了一系列突破性进展,为进一步了解免疫治疗过程中癌细胞与免疫细胞的交互作用奠定了基础。本文综述介绍了当前应用最广泛的单细胞技术在解读免疫疗法的作用机制及其与临床治疗效果的关联中取得的重要成果,讨论了这些技术如何通过发现生物标志物来预测疗效、监测复发及针对临床相关副反应开发预防性措施。最后,本文概述并展望了当前最先进的空间组学技术在组织层面以更高准确度明晰肿瘤-免疫相互作用、助力下一代免疫疗法开发的前景。

Page 191-207


4. Single-cell Immune Landscape of Human Recurrent Miscarriage

Feiyang Wang, Wentong Jia, Mengjie Fan, Xuan Shao, Zhilang Li, Yongjie Liu, Yeling Ma, Yu-Xia Li, Rong Li, Qiang Tu, Yan-Ling Wang

Successful pregnancy in placental mammals substantially depends on the establishment of maternal immune tolerance to the semi-allogenic fetus. Disorders in this process are tightly associated with adverse pregnancy outcomes including recurrent miscarriage (RM). However, an in-depth understanding of the systematic and decidual immune environment in RM remains largely lacking. In this study, we utilized single-cell RNA-sequencing (scRNA-seq) to comparably analyze the cellular and molecular signatures of decidual and peripheral leukocytes in normal and unexplained RM pregnancies at the early stage of gestation. Integrative analysis identifies 22 distinct cell clusters in total, and a dramatic difference in leukocyte subsets and molecular properties in RM cases is revealed. Specifically, the cytotoxic properties of CD8+ effector T cells, nature killer (NK), and mucosal-associated invariant T (MAIT) cells in peripheral blood indicates apparently enhanced pro-inflammatory status, and the population proportions and ligand–receptor interactions of the decidual leukocyte subsets demonstrate preferential immune activation in RM patients. The molecular features, spatial distribution, and the developmental trajectories of five decidual NK (dNK) subsets have been elaborately illustrated. In RM patients, a dNK subset that supports embryonic growth is diminished in proportion, while the ratio of another dNK subset with cytotoxic and immune-active signature is significantly increased. Notably, a unique pro-inflammatory CD56+CD16+ dNK subset substantially accumulates in RM decidua. These findings reveal a comprehensive cellular and molecular atlas of decidual and peripheral leukocytes in human early pregnancy and provide an in-depth insight into the immune pathogenesis for early pregnancy loss.
研究问题: 不明原因复发性流产(RM)的免疫适应性调节是否存在异常,免疫微环境的破坏主要由哪些免疫细胞造成? • 研究方法: 通过10X Genomics单细胞测序分析、流式分选、免疫荧光等技术,构建了不明原因复发性流产患者的蜕膜及外周血中免疫细胞的单细胞转录组图谱,系统比较了复发流产和正常妊娠免疫细胞亚群分布及分子特性的差异;通过拟时分析推演了蜕膜dNK细胞亚群的发育轨迹。进而利用扩大的临床样本验证了不明原因复发流产患者蜕膜中dNK细胞亚群比例的变化。 • 主要结果1: RM患者外周血中多个T细胞亚群和NK细胞亚群比例发生改变,综合表现为免疫炎性激活状态。 • 主要结果2: RM患者蜕膜局部CD4+和CD8+ T细胞及dNK细胞亚群比例及细胞间互作模式呈现免疫耐受-免疫激活失衡的状态。 • 主要结果3: 推演了dNK细胞的发育轨迹,证明dNK1亚群处于发育早期阶段,具有发育为其它dNK亚群的潜能。 • 主要结果4: 部分RM患者蜕膜中dNK4细胞亚群比例的显著增高,提示其NK细胞的募集和训导分化异常,并预示了不良妊娠结局的记忆机制。

Page 208-222


5. Transcriptomic Profiling of Human Pluripotent Stem Cell-derived Retinal Pigment Epithelium over Time

Grace E. Lidgerwood, Anne Senabouth, Casey J.A. Smith-Anttila, Vikkitharan Gnanasambandapillai, Dominik C. Kaczorowski, Daniela Amann-Zalcenstein, Erica L. Fletcher, Shalin H. Naik, Alex W. Hewitt, Joseph E. Powell, Alice Pébay

Human pluripotent stem cell (hPSC)-derived progenies are immature versions of cells, presenting a potential limitation to the accurate modelling of diseases associated with maturity or age. Hence, it is important to characterise how closely cells used in culture resemble their native counterparts. In order to select appropriate time points of retinal pigment epithelium (RPE) cultures that reflect native counterparts, we characterised the transcriptomic profiles of the hPSC-derived RPE cells from 1- and 12-month cultures. We differentiated the human embryonic stem cell line H9 into RPE cells, performed single-cell RNA-sequencing of a total of 16,576 cells to assess the molecular changes of the RPE cells across these two culture time points. Our results indicate the stability of the RPE transcriptomic signature, with no evidence of an epithelial–mesenchymal transition, and with the maturing populations of the RPE observed with time in culture. Assessment of Gene Ontology pathways revealed that as the cultures age, RPE cells upregulate expression of genes involved in metal binding and antioxidant functions. This might reflect an increased ability to handle oxidative stress as cells mature. Comparison with native human RPE data confirms a maturing transcriptional profile of RPE cells in culture. These results suggest that long-term in vitro culture of RPE cells allows the modelling of specific phenotypes observed in native mature tissues. Our work highlights the transcriptional landscape of hPSC-derived RPE cells as they age in culture, which provides a reference for native and patient samples to be benchmarked against.
视网膜色素上皮细胞(RPE)是一种单层有色极性细胞,它在维持光感受器和潜在血管系统的健康和功能中起关键作用。氧化应激会造成RPE数量的下降及老化,从而有损视网膜的健康。目前,RPE老化的分子机制尚不清楚。多能干细胞是一种具有分化潜力的细胞,理论上讲,它可以在体外条件下被诱导成任何一种细胞类型。为了获得与天然RPE更为接近的诱导RPE细胞,我们刻画了多能干细胞诱导的RPE在培养1个月及12个月后的转录组情况。我们对以上两个时间点的RPE进行单细胞RNA测序,共获得16576个细胞的数据。我们的结果表明,在培养过程中,未出现RPE向干细胞转化的不稳定现象,同时RPE的占比较为稳固。基因功能富集分析表明,随着培养时间推移,RPE上调基因富集了金属结合与抗氧化功能。这可能反应了随着细胞发育越来越成熟,细胞的抗氧化能力得到提升。与天然RPE细胞数据相类似,培养12月的诱导RPE细胞展现出成熟的RPE转录特征。我们的工作描述了多能干细胞诱导培养RPE随时间发展的转录组图谱,为天然样本或病理样本提供参考。

Page 223-242


6. Immunoprofiling of Drosophila Hemocytes by Single-cell Mass Cytometry

József Á. Balog, Viktor Honti, Éva Kurucz, Beáta Kari, László G. Puskás, István Andó, Gábor J. Szebeni

Single-cell mass cytometry (SCMC) combines features of traditional flow cytometry (i.e., fluorescence-activated cell sorting) with mass spectrometry, making it possible to measure several parameters at the single-cell level for a complex analysis of biological regulatory mechanisms. In this study, we optimized SCMC to analyze hemocytes of the Drosophila innate immune system. We used metal-conjugated antibodies (against cell surface antigens H2, H3, H18, L1, L4, and P1, and intracellular antigens 3A5 and L2) and anti-IgM (against cell surface antigen L6) to detect the levels of antigens, while anti-GFP was used to detect crystal cells in the immune-induced samples. We investigated the antigen expression profile of single cells and hemocyte populations in naive states, in immune-induced states, in tumorous mutants bearing a driver mutation in the Drosophila homologue of Janus kinase (hopTum) and carrying a deficiency of the tumor suppressor gene lethal(3)malignant blood neoplasm-1 [l(3)mbn1], as well as in stem cell maintenance-defective hdcΔ84 mutant larvae. Multidimensional analysis enabled the discrimination of the functionally different major hemocyte subsets for lamellocytes, plasmatocytes, and crystal cells, and delineated the unique immunophenotype of Drosophila mutants. We have identified subpopulations of L2+/P1+ and L2+/L4+/P1+ transitional phenotype cells in the tumorous strains l(3)mbn1 and hopTum, respectively, and a subpopulation of L4+/P1+ cells upon immune induction. Our results demonstrated for the first time that SCMC, combined with multidimensional bioinformatic analysis, represents a versatile and powerful tool to deeply analyze the regulation of cell-mediated immunity of Drosophila.
单细胞质谱(SCMC)结合了传统流式细胞术(FACS)和质谱的功能,可以在单细胞水平上测量多个参数,以进行复杂的生物学调节机制分析。在这项研究中,我们优化了SCMC以分析果蝇先天免疫系统的血细胞。我们使用金属结合抗体(在细胞表面的H2,H3,H18,L1,L4和P1,在细胞内的3A5和L2)和免疫球蛋白M 抗体(在细胞表面的L6)来检测抗原水平,而绿色荧光蛋白抗体(anti-GFP)用于检测免疫诱导样品中的含晶细胞。我们研究了未诱导状态下,和免疫诱导状态下单细胞和血细胞群体的抗原表达谱。我们对携带果蝇Janus激酶(hopTum)的果蝇驱动突变并携带缺乏肿瘤抑制因子l(3)mbn1基因的肿瘤突变体中进行了调查,并对在干细胞维持缺陷型hdcΔ84突变体幼虫中也进行了调查。多维分析能够区分功能不同的主要血细胞亚群:薄层细胞,浆细胞和含晶细胞,并描绘了果蝇突变体的独特免疫表型。我们已经发现了肿瘤菌株中L2 +/P1 +(l(3)mbn1), L2+/L4+/P1+ (hopTum) 过渡表型细胞的亚群以及免疫诱导后的L4+/P1+细胞的亚群。我们的结果首次证明,单细胞质谱与多维生物信息学分析相结合,代表了一种多功能且功能强大的工具,可深入分析果蝇细胞介导的免疫调节。

Page 243-252


7. Direct Comparative Analyses of 10X Genomics Chromium and Smart-seq2

Xiliang Wang, Yao He, Qiming Zhang, Xianwen Ren, Zemin Zhang

Single-cell RNA sequencing (scRNA-seq) is generally used for profiling transcriptome of individual cells. The droplet-based 10X Genomics Chromium (10X) approach and the plate-based Smart-seq2 full-length method are two frequently used scRNA-seq platforms, yet there are only a few thorough and systematic comparisons of their advantages and limitations. Here, by directly comparing the scRNA-seq data generated by these two platforms from the same samples of CD45− cells, we systematically evaluated their features using a wide spectrum of analyses. Smart-seq2 detected more genes in a cell, especially low abundance transcripts as well as alternatively spliced transcripts, but captured higher proportion of mitochondrial genes. The composite of Smart-seq2 data also resembled bulk RNA-seq data more. For 10X-based data, we observed higher noise for mRNAs with low expression levels. Approximately 10%−30% of all detected transcripts by both platforms were from non-coding genes, with long non-coding RNAs (lncRNAs) accounting for a higher proportion in 10X. 10X-based data displayed more severe dropout problem, especially for genes with lower expression levels. However, 10X-data can detect rare cell types given its ability to cover a large number of cells. In addition, each platform detected distinct groups of differentially expressed genes between cell clusters, indicating the different characteristics of these technologies. Our study promotes better understanding of these two platforms and offers the basis for an informed choice of these widely used technologies.
单细胞RNA测序(scRNA-seq)通常用于描述单个细胞的转录组。基于液滴的10X和基于细胞板的Smart-seq2是两种常用的scRNA-seq建库体系,但目前只有极少数的研究系统的比较了这两种体系的优势和局限性。本研究用这两种体系对同样的取自肿瘤的CD45-细胞进行了单细胞测序,并对测序数据进行了系统的评估。我们发现在单个细胞内Smart-seq2可以发现更多的基因,特别是低表达丰度的基因,Smart-seq2的数据也特别适合分析转录本的选择性剪接,但捕获的线粒体基因的比例更高。Smart-seq2数据与传统的高通量RNA-seq数据更为相似。而10x数据则在低表达的mRNA中观察到了更高的噪声。在两种体系的数据中都观察到了约10%−30%的转录本来自非编码基因,尤其是在10X数据中所占比例更高高。10x数据展示出更为严重的漏检率,特别是对低表达的基因。但是由于10x数据能够同时获得大量细胞,因此可以检测到罕见的细胞类型。此外,这两种体系检测到了相同细胞亚群之间不同的差异表达基因,说明了不同体系的各自特性。本文有助于研究者更好地理解这两种不同技术,并为研究者选择单细胞建库体系提供了依据。

Page 253-266


8. Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data

Qianhui Huang, Yu Liu, Yuheng Du, Lana X. Garmire

Annotating cell types is a critical step in single-cell RNA sequencing (scRNA-seq) data analysis. Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-seq analysis. In this study, we evaluated ten cell type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single-cell research, including Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, and SCINA. The other two methods were repurposed from deconvoluting DNA methylation data, i.e., linear constrained projection (CP) and robust partial correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions; the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased cell type classes; as well as the detection of rare and unknown cell types. Overall, methods such as Seurat, SingleR, CP, RPC, and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Additionally, Seurat, SingleR, CP, and RPC were more robust against downsampling. However, Seurat did have a major drawback at predicting rare cell populations, and it was suboptimal at differentiating cell types highly similar to each other, compared to SingleR and RPC. All the code and data are available from https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.
注释细胞类型是单细胞RNA测序(scRNA-seq)数据分析中的关键步骤。近年来,以实现自动细胞类型识别为目的出现了一些监督或半监督分类方法。但是,对这些方法使用的全面评估还存在欠缺。此外,某些最初设计用于分析其他组织或细胞群组学数据的分类方法在scRNA-seq分析的适用性尚不清楚。在这项研究中,我们评估了十种以R包形式公开提供的细胞类型注释方法。其中有八种流行方法是专门为单细胞研究开发的,包括Seurat,scmap,SingleR,CHETAH,SingleCellNet,scID,Garnett和SCINA。另外,我们从反卷积DNA甲基化数据的常用技术中重新利用了其他两种方法,即线性约束投影(CP)和鲁棒偏相关(RPC)。我们利用各种公共scRNA-seq数据集和模拟数据进行了系统比较。对于每个方法,我们评估了数据集内和数据集间的预测的准确性;应对诸如基因过滤,细胞类型之间的高度相似性以及增加的细胞类型类别等实际挑战的鲁棒性;以及对稀有和未知细胞类型的检测。总体而言,Seurat,SingleR,CP,RPC和SingleCellNet之类的方法效果良好,其中Seurat是注释主要细胞类型的最佳方法。此外,Seurat,SingleR,CP和RPC在应对基因过滤的鲁棒性方面更佳。然而,与SingleR和RPC相比,Seurat在预测稀有细胞种群方面有主要缺陷,并且在区分高度相似的细胞类型时表现欠佳。所有代码和数据都可以从https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark获得。

Page 267-281


9. SSRE: Cell Type Detection Based on Sparse Subspace Representation and Similarity Enhancement

Zhenlan Liang, Min Li, Ruiqing Zheng, Yu Tian, Xuhua Yan, Jin Chen, Fang-Xiang Wu, Jianxin Wang

Accurate identification of cell types from single-cell RNA sequencing (scRNA-seq) data plays a critical role in a variety of scRNA-seq analysis studies. This task corresponds to solving an unsupervised clustering problem, in which the similarity measurement between cells affects the result significantly. Although many approaches for cell type identification have been proposed, the accuracy still needs to be improved. In this study, we proposed a novel single-cell clustering framework based on similarity learning, called SSRE. SSRE models the relationships between cells based on subspace assumption, and generates a sparse representation of the cell-to-cell similarity. The sparse representation retains the most similar neighbors for each cell. Besides, three classical pairwise similarities are incorporated with a gene selection and enhancement strategy to further improve the effectiveness of SSRE. Tested on ten real scRNA-seq datasets and five simulated datasets, SSRE achieved the superior performance in most cases compared to several state-of-the-art single-cell clustering methods. In addition, SSRE can be extended to visualization of scRNA-seq data and identification of differentially expressed genes. The matlab and python implementations of SSRE are available at https://github.com/CSUBioGroup/SSRE.
细胞类型识别是单细胞分析中的一个重要步骤,是差异表达分析、轨迹推断等下游分析的基础。在本文中,我们将其看作一个无监督聚类问题,提出了一种基于稀疏子空间假设的单细胞聚类算法SSRE。该方法利用相同类型细胞的基因表达能够相互表示的特性构建全局的细胞相似性。考虑到理想情况下,细胞相似性矩阵应该是按细胞类型呈现对角分块且稀疏的,因此SSRE通过L1范数来约束细胞相互表示矩阵的稀疏度,并采用交替方向乘子法(ADMM)求解优化模型。另外,考虑到求解细胞互相表示系数过程,LASSO的特征缩减特性,SSRE利用细胞间的pearson相关性、spearman相关性和余弦相似度三种经典的相似性作为补充信息,对细胞相互表示矩阵进行部分填充增强,由此得到更可靠的细胞相似性。通过对细胞表示矩阵对称化获得细胞相似性矩阵,最后结合谱聚类获得最终的细胞分组。考虑到单细胞转录组数据的高维度、高噪声和高稀疏性的特点,SSRE利用文中的四种相似性设计了一种基于拉普拉斯得分的基因选择策略。在10套真实单细胞转录组数据集和5套模拟数据集上测试SSRE的性能,并选取8个经典的单细胞聚类方法作为比较。实验结果表明,大多数情况下,SSRE有更高的聚类准确度。另外,SSRE可以很容易地扩展到单细胞转录组数据的可视化和差异表达基因的识别上。SSRE的代码可在https://github.com/CSUBioGroup/SSRE获得。

Page 282-291


10. redPATH: Reconstructing the Pseudo Development Time of Cell Lineages in Single-cell RNA-seq Data and Applications in Cancer

Kaikun Xie, Zehua Liu, Ning Chen, Ting Chen

The recent advancement of single-cell RNA sequencing (scRNA-seq) technologies facilitates the study of cell lineages in developmental processes and cancer. In this study, we developed a computational method, called redPATH, to reconstruct the pseudo developmental time of cell lineages using a consensus asymmetric Hamiltonian path algorithm. Besides, we developed a novel approach to visualize the trajectory development and implemented visualization methods to provide biological insights. We validated the performance of redPATH by segmenting different stages of cell development on multiple neural stem cell and cancer datasets, as well as other single-cell transcriptome data. In particular, we identified a stem cell-like subpopulation in malignant glioma cells. These cells express known proliferative markers, such as GFAP, ATP1A2, IGFBPL1, and ALDOC, and remain silenced for quiescent markers such as ID3. Furthermore, we identified MCL1 as a significant gene that regulates cell apoptosis and CSF1R for reprogramming macrophages to control tumor growth. In conclusion, redPATH is a comprehensive tool for analyzing scRNA-seq datasets along the pseudo developmental time. redPATH is available at https://github.com/tinglabs/redPATH.
单细胞转录组测序技术(scRNA-seq)的最新进展促进了对发育过程和癌症中细胞谱系的研究。本文开发了一套名为 redPATH 的计算分析方法,首先设计了一个合并非对称哈密顿路径算法(consensus asymmetric Hamiltonian path algorithm)来重构细胞的伪发育时序。此后我们开发了一种新颖的方法来可视化细胞发育的轨迹和其它可视化方法,以提供生物学分析及解释。我们通过在多个神经干细胞和癌性数据集以及其他单细胞转录组数据上分割细胞发育的不同阶段来验证 redPATH 的性能。通过redPATH的分析,我们在恶性胶质瘤细胞中发现了一个干细胞样亚群 (stem-cell-like subpopulation)。这些细胞表达了已知的增殖标记基因,例如 GFAP、ATP1A2、IGFBPL1 和 ALDOC;并在静止标记基因中保持不表达的状态,例如ID3。此外,我们将 MCL1识别为调节细胞死亡的重要基因,也识别了 CSF1R为用于调控巨噬细胞来控制肿瘤生长的重要基因。总的来说,redPATH 是一个用于在单细胞转录组数据上进行伪发育时序分析的综合性工具。该软件可以在https://github.com/tinglabs/redPATH 获得。

Page 292-305


11. DTFLOW: Inference and Visualization of Single-cell Pseudotime Trajectory Using Diffusion Propagation

Jiangyong Wei, Tianshou Zhou, Xinan Zhang, Tianhai Tian

One of the major challenges in single-cell data analysis is the determination of cellular developmental trajectories using single-cell data. Although substantial studies have been conducted in recent years, more effective methods are still strongly needed to infer the developmental processes accurately. This work devises a new method, named DTFLOW, for determining the pseudo-temporal trajectories with multiple branches. DTFLOW consists of two major steps: a new method called Bhattacharyya kernel feature decomposition (BKFD) to reduce the data dimensions, and a novel approach named Reverse Searching on k-nearest neighbor graph (RSKG) to identify the multi-branching processes of cellular differentiation. In BKFD, we first establish a stationary distribution for each cell to represent the transition of cellular developmental states based on the random walk with restart algorithm, and then propose a new distance metric for calculating pseudotime of single cells by introducing the Bhattacharyya kernel matrix. The effectiveness of DTFLOW is rigorously examined by using four single-cell datasets. We compare the efficiency of DTFLOW with the published state-of-the-art methods. Simulation results suggest that DTFLOW has superior accuracy and strong robustness properties for constructing pseudotime trajectories. The Python source code of DTFLOW can be freely accessed at https://github.com/statway/DTFLOW.
单细胞数据分析中的一个主要挑战是如何确定细胞的发育轨迹。尽管近年来已经开展了大量研究,但仍需要更为有效的方法来准确地推断细胞的发育过程。为解决这个问题,我们设计了一种新的算法DTFLOW来确定具有多分支的单细胞数据伪时间轨迹。该算法包含两个主要步骤。首先构建了一个全新的基于巴氏核特征分解降维算法BKFD,在该降维算法中我们基于重启随机游走算法将每个细胞表示为一个平稳分布,并利用该分布刻画细胞发育状态的转变。该算法的核心是利用巴氏核矩阵来定义一种新的距离度量并用于计算单细胞的伪时间。第二个主要步骤是利用在第一步得到的伪时间在K近邻图上逆向搜索,得到了一种新的多分支识别算法RSKG。我们利用四个单细胞数据集对DTFLOW算法进行严格的验证并与二个最新发表的算法进行了比较。研究结果表明,与常规伪时间轨迹推断算法相比,DTFLOW具有较高的精度和较强的稳健性。

Page 306-318


12. c-CSN: Single-cell RNA Sequencing Data Analysis by Conditional Cell-specific Network

Lin Li, Hao Dai, Zhaoyuan Fang, Luonan Chen

The rapid advancement of single-cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. However, compared to bulk RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq) suffers from higher noise and lower coverage, which brings new computational difficulties. Based on statistical independence, cell-specific network (CSN) is able to quantify the overall associations between genes for each cell, yet suffering from a problem of overestimation related to indirect effects. To overcome this problem, we propose the c-CSN method, which can construct the conditional cell-specific network (CCSN) for each cell. c-CSN method can measure the direct associations between genes by eliminating the indirect associations. c-CSN can be used for cell clustering and dimension reduction on a network basis of single cells. Intuitively, each CCSN can be viewed as the transformation from less “reliable” gene expression to more “reliable” gene–gene associations in a cell. Based on CCSN, we further design network flow entropy (NFE) to estimate the differentiation potency of a single cell. A number of scRNA-seq datasets were used to demonstrate the advantages of our approach. 1) One direct association network is generated for one cell. 2) Most existing scRNA-seq methods designed for gene expression matrices are also applicable to c-CSN-transformed degree matrices. 3) CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell. c-CSN is publicly available at https://github.com/LinLi-0909/c-CSN.
单细胞技术的快速发展使人们对细胞异质性的复杂机制有了新的认识。然而,与传统的bulk RNA测序(RNA-seq)相比,单细胞RNA-seq (scRNA-seq)的噪声更高,覆盖面更低。这些增大了生物信息学分析的困难。在之前的工作中,基于统计独立性构建的细胞特异性网络(cell-specific network, CSN)能够量化每个细胞的基因之间的整体关联,但CSN方法构建的网络存在间接相关,从而存在过估计的问题。为了克服这一问题,我们提出了c-CSN方法,该方法可以为每个细胞构建条件细胞特异性网络(CCSN)。c-CSN方法通过消除基因间的间接关联来测量基因间的直接关联。c-CSN可以在单个细胞的网络基础上进行细胞聚类和降维。直观地看,每个CCSN都可以看作是细胞内从不太“可靠”的基因表达到更“可靠”的基因-基因关联的转化。同时在CCSN的基础上,我们进一步设计了网络流熵(NFE)来评估单个细胞的分化能力。

Page 319-329


13. scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets

Qianqian Song, Jing Su, Lance D. Miller, Wei Zhang

In gene expression profiling studies, including single-cell RNA sequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.
在基因表达谱研究中,包括单细胞RNA-seq(scRNA-seq)分析,共表达基因的鉴定和表征提供了有关细胞身份和功能的关键信息。目前而言,寻找单细胞RNA-seq数据中的基因共表达聚类存在一些挑战。我们表明,常用的单细胞数据方法不能准确识别共表达的基因,并且产生的结果大大限制了共表达基因的生物学期望。在本文中,我们提出了单细胞潜变量模型(scLM),一种针对单细胞数据量身定制的基因共聚类算法,该算法在检测具有重要生物学背景的共表达基因簇时表现优异。重要的是,scLM可以同时对多个单细胞数据集进行聚类,使用户能够利用来自多个来源的单细胞数据来进行比较分析。scLM将原始基因表达数据作为输入,并保留生物学差异,而不受多个数据集的批次影响。仿真数据和实验数据的结果都表明,scLM的性能优于现有方法,并且准确性大大提高。为了说明scLM可揭示潜在的生物学机理,我们将其应用于多个现有的scRNA-seq数据集。我们发现scLM可以识别具有重要功能的基因模块并改善细胞分类,从而有助于发现生物机制和理解复杂生物系统(例如癌症)。此外,我们在https://github.com/QSong-github/scLM提供了实现scLM方法的用户友好型R包。

Page 330-341


View More

Most Cited Articles

1. Exosome and exosomal microRNA: Trafficking, sorting, and function

Jian Zhang | Sha Li | Lu Li | Meng Li | Chongye Guo | Jun Yao | Shuangli Mi

2. PacBio Sequencing and Its Applications

Anthony Rhoads | Kin Fai Au

3. N6-methyl-adenosine (m6A) in RNA: An Old Modification with A Novel Epigenetic Function

Yamei Niu | Xu Zhao | Yong Sheng Wu | Ming Ming Li | Xiu Jie Wang | Yun Gui Yang

View More

Related Links