Advanced Single-cell Omics Technologies and Informatics Tools for Genomics, Proteomics, and Bioinformatics Analysis
Luonan Chen, Rong Fan, Fuchou Tang
scDPN for High-throughput Single-cell CNV Detection to Uncover Clonal Evolution During HCC Recurrence
Liang Wu, Miaomiao Jiang, Yuzhou Wang, Biaofeng Zhou, Yunfan Sun, Kaiqian Zhou, Jiarui Xie, Yu Zhong, Zhikun Zhao, Michael Dean, Yong Hou, Shiping Liu
Single-cell genomics provides substantial resources for dissecting cellular heterogeneity and cancer evolution. Unfortunately, classical DNA amplification-based methods have low throughput and introduce coverage bias during sample preamplification. We developed a single-cell DNA library preparation method without preamplification in nanolitre scale (scDPN) to address these issues. The method achieved a throughput of up to 1800 cells per run for copy number variation (CNV) detection. Also, our approach demonstrated a lower level of amplification bias and noise than the multiple displacement amplification (MDA) method and showed high sensitivity and accuracy for cell line and tumor tissue evaluation. We used this approach to profile the tumor clones in paired primary and relapsed tumor samples of hepatocellular carcinoma (HCC). We identified three clonal subpopulations with a multitude of aneuploid alterations across the genome. Furthermore, we observed that a minor clone of the primary tumor containing additional alterations in chromosomes 1q, 10q, and 14q developed into the dominant clone in the recurrent tumor, indicating clonal selection during recurrence in HCC. Overall, this approach provides a comprehensive and scalable solution to understand genome heterogeneity and evolution.
单细胞全基因组测序技术是研究肿瘤的异质性和演化问题的重要手段和工具。传统的单细胞全基因组测序非常依赖于DNA的扩增技术，比如多重置换扩增（MDA)，具有通量低和偏好性强等不足。基于此，我们利用转座酶可直接建库的特征，开发了一套基于微孔芯片的高通量、低成本、纳升级反应的单细胞DNA文库制备方法——scDPN。其通量可达 1800 个细胞／每次，结合低深度全基因组测序，非常适用于单细胞水平的基因组拷贝数变异 (CNV) 的检测。通过对细胞系和肿瘤组织的研究发现，scDPN在单细胞水平的CNV检测具有较好的均一性和准确性，且相对于传统的MDA方法，其扩增偏向性和背景噪音更小，能更准确的检测出单细胞水平的CNV事件。进一步地，我们使用scDPN对来自同一个肝细胞癌病人的原发和复发肿瘤样本进行了单细胞CNV的研究。我们发现原发肿瘤具有较高的异质性，其包含了两种CNV差异较大的克隆亚型，分别为主要克隆Type 1型和次要克隆Type 2型，其细胞数占比分别为85%和15%。有意思的是，复发肿瘤细胞均来自于原发肿瘤的次要克隆Type 2型，说明复发过程发生了肿瘤克隆亚型的选择。对两种克隆亚型的深入研究发现，相比于Type 1型，Type 2在染色体 1q、10q 和 14q 出现了新的CNV事件。值得注意的是，10q位置的CNV，即杂合缺失而导致的包括PTEN 、FAS 在内的多个抑癌基因的拷贝数减少，可能是Type 2型在复发过程中被优势选择的原因之一，利用TCGA（The Cancer Genomic Atlas）公共数据的预后验证结果也间接验证了此推论。总而言之，我们新开发的单细胞文库制备方法scDPN，为单细胞基因组学的研究提供了一个全面且可扩展的解决方案。
Mapping Human Pluripotent Stem Cell-derived Erythroid Differentiation by Single-cell Transcriptome Analysis
Zijuan Xin, Wei Zhang, Shangjin Gong, Junwei Zhu, Yanming Li, Zhaojun Zhang, Xiangdong Fang
There is an imbalance between the supply and demand of functional red blood cells (RBCs) in clinical applications. This imbalance can be addressed by regenerating RBCs using several in vitro methods. Induced pluripotent stem cells (iPSCs) can handle the low supply of cord blood and the ethical issues in embryonic stem cell research, and provide a promising strategy to eliminate immune rejection. However, no complete single-cell level differentiation pathway exists for the iPSC-derived erythroid differentiation system. In this study, we used iPSC line BC1 to establish a RBC regeneration system. The 10X Genomics single-cell transcriptome platform was used to map the cell lineage and differentiation trajectory on day 14 of the regeneration system. We observed that iPSC differentiation was not synchronized during embryoid body (EB) culture. The cells (on day 14) mainly consisted of mesodermal and various blood cells, similar to the yolk sac hematopoiesis. We identified six cell classifications and characterized the regulatory transcription factor (TF) networks and cell–cell contacts underlying the system. iPSCs undergo two transformations during the differentiation trajectory, accompanied by the dynamic expression of cell adhesion molecules and estrogen-responsive genes. We identified erythroid cells at different stages, such as burst-forming unit erythroid (BFU-E) and orthochromatic erythroblast (ortho-E) cells, and found that the regulation of TFs (e.g., TFDP1 and FOXO3) is erythroid-stage specific. Immune erythroid cells were identified in our system. This study provides systematic theoretical guidance for optimizing the iPSC-derived erythroid differentiation system, and this system is a useful model for simulating in vivo hematopoietic development and differentiation.
建立了iPSC红细胞再生体系，通过10x scRNA-seq技术首次绘制了该体系第14 天的细胞谱系和分化轨迹。结果显示 iPSCs 分化在胚状体培养过程中不同步，第14天的细胞主要由中胚层和各种血细胞组成，类似于卵黄囊造血。发现了iPSC 在分化轨迹中经历了两次转化，其伴随着细胞粘附分子和雌激素反应基因的动态表达。并首次在红系体外分化系统中鉴定了免疫类红细胞。该研究为优化红细胞再生系统提供了一定的理论指导。
Single-cell Long Non-coding RNA Landscape of T Cells in Human Cancer Immunity
Haitao Luo, Dechao Bu, Lijuan Shao, Yang Li, Liang Sun, Ce Wang, Jing Wang, Wei Yang, Xiaofei Yang, Jun Dong, Yi Zhao, Furong Li
The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of T cells. To date, the complete landscape and systematic characterization of long noncoding RNAs (lncRNAs) in T cells in cancer immunity are lacking. Here, by systematically analyzing full-length single-cell RNA sequencing (scRNA-seq) data of more than 20,000 libraries of T cells across three cancer types, we provided the first comprehensive catalog and the functional repertoires of lncRNAs in human T cells. Specifically, we developed a custom pipeline for de novo transcriptome assembly and obtained a novel lncRNA catalog containing 9433 genes. This increased the number of current human lncRNA catalog by 16% and nearly doubled the number of lncRNAs expressed in T cells. We found that a portion of expressed genes in single T cells were lncRNAs which had been overlooked by the majority of previous studies. Based on metacell maps constructed by the MetaCell algorithm that partitions scRNA-seq datasets into disjointed and homogenous groups of cells (metacells), 154 signature lncRNA genes were identified. They were associated with effector, exhausted, and regulatory T cell states. Moreover, 84 of them were functionally annotated based on the co-expression networks, indicating that lncRNAs might broadly participate in the regulation of T cell functions. Our findings provide a new point of view and resource for investigating the mechanisms of T cell regulation in cancer immunity as well as for novel cancer-immune biomarker development and cancer immunotherapies.
Single-cell Transcriptomes Reveal Characteristics of MicroRNAs in Gene Expression Noise Reduction
Tao Hu, Lei Wei, Shuailin Li, Tianrun Cheng, Xuegong Zhang, Xiaowo Wang
Isogenic cells growing in identical environments show cell-to-cell variations because of the stochasticity in gene expression. High levels of variation or noise can disrupt robust gene expression and result in tremendous consequences for cell behaviors. In this work, we showed evidence from single-cell RNA sequencing data analysis that microRNAs (miRNAs) can reduce gene expression noise at the mRNA level in mouse cells. We identified that the miRNA expression level, number of targets, target pool abundance, and miRNA–target interaction strength are the key features contributing to noise repression. miRNAs tend to work together in cooperative subnetworks to repress target noise synergistically in a cell type-specific manner. By building a physical model of post-transcriptional regulation and observing in synthetic gene circuits, we demonstrated that accelerated degradation with elevated transcriptional activation of the miRNA target provides resistance to extrinsic fluctuations. Together, through the integrated analysis of single-cell RNA and miRNA expression profiles, we demonstrated that miRNAs are important post-transcriptional regulators for reducing gene expression noise and conferring robustness to biological processes.
Single-cell RNA Sequencing Reveals Sexually Dimorphic Transcriptome and Type 2 Diabetes Genes in Mouse Islet β Cells
Gang Liu, Yana Li, Tengjiao Zhang, Mushan Li, Sheng Li, Qing He, Shuxin Liu, Minglu Xu, Tinghui Xiao, Zhen Shao, Weiyang Shi, Weida Li
Type 2 diabetes (T2D) is characterized by the malfunction of pancreatic β cells. Susceptibility and pathogenesis of T2D can be affected by multiple factors, including sex differences. However, the mechanisms underlying sex differences in T2D susceptibility and pathogenesis remain unclear. Using single-cell RNA sequencing (scRNA-seq), we demonstrate the presence of sexually dimorphic transcriptomes in mouse β cells. Using a high-fat diet-induced T2D mouse model, we identified sex-dependent T2D altered genes, suggesting sex-based differences in the pathological mechanisms of T2D. Furthermore, based on islet transplantation experiments, we found that compared to mice with sex-matched islet transplants, sex-mismatched islet transplants in healthy mice showed down-regulation of genes involved in the longevity regulating pathway of β cells. Moreover, the diabetic mice with sex-mismatched islet transplants showed impaired glucose tolerance. These data suggest sexual dimorphism in T2D pathogenicity, indicating that sex should be considered when treating T2D. We hope that our findings could provide new insights for the development of precision medicine in T2D.
Single-cell RNA Sequencing Reveals Thoracolumbar Vertebra Heterogeneity and Rib-genesis in Pigs
Jianbo Li, Ligang Wang, Dawei Yu, Junfeng Hao, Longchao Zhang, Adeniyi C. Adeola, Bingyu Mao, Yun Gao, Shifang Wu, Chunling Zhu, Yongqing Zhang, JilongRen3ChanggaiMu16David M. Irwin, Lixian Wang, Tang Hai, Haibing Xie, Yaping Zhang
Development of thoracolumbar vertebra (TLV) and rib primordium (RP) is a common evolutionary feature across vertebrates, although whole-organism analysis of the expression dynamics of TLV- and RP-related genes has been lacking. Here, we investigated the single-cell transcriptome landscape of thoracic vertebra (TV), lumbar vertebra (LV), and RP cells from a pig embryo at 27 days post-fertilization (dpf) and identified six cell types with distinct gene expression signatures. In-depth dissection of the gene expression dynamics and RNA velocity revealed a coupled process of osteogenesis and angiogenesis during TLV and RP development. Further analysis of cell type-specific and strand-specific expression uncovered the extremely high level of HOXA10 3′-UTR sequence specific to osteoblasts of LV cells, which may function as anti-HOXA10-antisense by counteracting the HOXA10-antisense effect to determine TLV transition. Thus, this work provides a valuable resource for understanding embryonic osteogenesis and angiogenesis underlying vertebrate TLV and RP development at the cell type-specific resolution, which serves as a comprehensive view on the transcriptional profile of animal embryo development.
A Single-cell Transcriptome Atlas of Cashmere Goat Hair Follicle Morphogenesis
Wei Ge, Weidong Zhang, Yuelang Zhang, Yujie Zheng, Fang Li, Shanhe Wang, Jinwang Liu, Shaojing Tan, Zihui Yan, Lu Wang, Wei Shen, Lei Qu, Xin Wang
Cashmere, also known as soft gold, is produced from the secondary hair follicles (SHFs) of cashmere goats. The number of SHFs determines the yield and quality of cashmere; therefore, it is of interest to investigate the transcriptional profiles present during cashmere goat hair follicle development. However, mechanisms underlying this development process remain largely unexplored, and studies regarding hair follicle development mostly use a murine research model. In this study, to provide a comprehensive understanding of cellular heterogeneity and cell fate decisions, single-cell RNA sequencing was performed on 19,705 single cells of the dorsal skin from cashmere goat fetuses at induction (embryonic day 60; E60), organogenesis (E90), and cytodifferentiation (E120) stages. For the first time, unsupervised clustering analysis identified 16 cell clusters, and their corresponding cell types were also characterized. Based on lineage inference, a detailed molecular landscape was revealed along the dermal and epidermal cell lineage developmental pathways. Notably, our current data also confirmed the heterogeneity of dermal papillae from different hair follicle types, which was further validated by immunofluorescence analysis. The current study identifies different biomarkers during cashmere goat hair follicle development and has implications for cashmere goat breeding in the future.
在本研究中，在毛囊形态发生的诱导阶段（胚胎第60天）、器官形成阶段（胚胎第90天）以及细胞分化阶段（胚胎第120天），利用单细胞转录组测序（single cell RNA sequencing, scRNA seq）对绒山羊胚胎背部皮肤组织中的19705个单细胞进行了单细胞转录组测序。基于t分布随机邻域嵌入（tSNE）聚类分析，成功鉴定了绒山羊毛囊发育过程中的真皮细胞、表皮细胞、毛乳头细胞、内皮细胞、角化细胞以及周皮细胞等细胞类型，并首次详细描绘了其基因表达谱。基于不同细胞类型之间的差异分析，发现了一系列具有细胞类型特异性的标记分子。基于拟时间分化轨迹分析，成功构建了绒山羊表皮细胞谱系及真皮细胞谱系在整个毛囊发育过程中的分化轨迹，阐述了真皮细胞谱系真皮聚集、毛乳头以及表皮细胞谱系基质细胞、毛囊干细胞前体细胞、内根鞘以及毛干细胞特化过程中拟时间基因表达变化情况。
GranatumX: A Community-engaging, Modularized, and Flexible Webtool for Single-cell Data Analysis
David G. Garmire, Xun Zhu, Aravind Mantravadi, Qianhui Huang, Breck Yunits, Yu Liu, Thomas Wolfgruber, Olivier Poirion, Tianying Zhao, Cédric Arisdakessian, Stefan Stanojevic, Lana X. Garmire
We present GranatumX, a next-generation software environment for single-cell RNA sequencing (scRNA-seq) data analysis. GranatumX is inspired by the interactive webtool Granatum. GranatumX enables biologists to access the latest scRNA-seq bioinformatics methods in a web-based graphical environment. It also offers software developers the opportunity to rapidly promote their own tools with others in customizable pipelines. The architecture of GranatumX allows for easy inclusion of plugin modules, named Gboxes, which wrap around bioinformatics tools written in various programming languages and on various platforms. GranatumX can be run on the cloud or private servers and generate reproducible results. It is a community-engaging, flexible, and evolving software ecosystem for scRNA-seq analysis, connecting developers with bench scientists. GranatumX is freely accessible at http://garmiregroup.org/granatumx/app.
scGET: Predicting Cell Fate Transition During Early Embryonic Development by Single-cell Graph Entropy
Jiayuan Zhong, Chongyin Han, Xuhang Zhang, Pei Chen, Rui Liu
During early embryonic development, cell fate commitment represents a critical transition or “tipping point” of embryonic differentiation, at which there is a drastic and qualitative shift of the cell populations. In this study, we presented a computational approach, scGET, to explore the gene–gene associations based on single-cell RNA sequencing (scRNA-seq) data for critical transition prediction. Specifically, by transforming the gene expression data to the local network entropy, the single-cell graph entropy (SGE) value quantitatively characterizes the stability and criticality of gene regulatory networks among cell populations and thus can be employed to detect the critical signal of cell fate or lineage commitment at the single-cell level. Being applied to five scRNA-seq datasets of embryonic differentiation, scGET accurately predicts all the impending cell fate transitions. After identifying the “dark genes” that are non-differentially expressed genes but sensitive to the SGE value, the underlying signaling mechanisms were revealed, suggesting that the synergy of dark genes and their downstream targets may play a key role in various cell development processes. The application in all five datasets demonstrates the effectiveness of scGET in analyzing scRNA-seq data from a network perspective and its potential to track the dynamics of cell differentiation. The source code of scGET is accessible at https://github.com/zhongjiayuna/scGET_Project.
scLink: Inferring Sparse Gene Co-expression Networks from Single-cell Expression Data
Wei Vivian Li, Yanzeng Li
A system-level understanding of the regulation and coordination mechanisms of gene expression is essential for studying the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell type-specific manner. Here we propose the scLink method, which uses statistical network modeling to understand the co-expression relationships among genes and construct sparse gene co-expression networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis. The scLink R package is available at https://github.com/Vivianstats/scLink.
为了研究健康或疾病条件下的生物过程的复杂性，我们必须对基因表达的调控和协同作用的机制进行系统性的研究。单细胞测序技术的快速发展为研究细胞特异性的基因相互作用创造了有利条件。我们在本文中提出一个叫scLink的新方法; 它可以通过统计网络建模利用单细胞基因表达数据研究基因间的共表达关系，并且建立稀疏的基因共表达网络。本文通过多组仿真和实际单细胞数据展示了scLink在单细胞基因共表达网络研究中的优点。实现scLink方法的R包可以在其Github页面下载: https://github.com/Vivianstats/scLink.
Polar Gini Curve: A Technique to Discover Gene Expression Spatial Patterns from Single-cell RNA-seq Data
Thanh Minh Nguyen, Jacob John Jeevan, Nuo Xu, Jake Y. Chen
In this work, we describe the development of Polar Gini Curve, a method for characterizing cluster markers by analyzing single-cell RNA sequencing (scRNA-seq) data. Polar Gini Curve combines the gene expression and the 2D coordinates (“spatial”) information to detect patterns of uniformity in any clustered cells from scRNA-seq data. We demonstrate that Polar Gini Curve can help users characterize the shape and density distribution of cells in a particular cluster, which can be generated during routine scRNA-seq data analysis. To quantify the extent to which a gene is uniformly distributed in a cell cluster space, we combine two polar Gini curves (PGCs)—one drawn upon the cell-points expressing the gene (the “foreground curve”) and the other drawn upon all cell-points in the cluster (the “background curve”). We show that genes with highly dissimilar foreground and background curves tend not to uniformly distributed in the cell cluster—thus having spatially divergent gene expression patterns within the cluster. Genes with similar foreground and background curves tend to uniformly distributed in the cell cluster—thus having uniform gene expression patterns within the cluster. Such quantitative attributes of PGCs can be applied to sensitively discover biomarkers across clusters from scRNA-seq data. We demonstrate the performance of the Polar Gini Curve framework in several simulation case studies. Using this framework to analyze a real-world neonatal mouse heart cell dataset, the detected biomarkers may characterize novel subtypes of cardiac muscle cells. The source code and data for Polar Gini Curve could be found at http://discovery.informatics.uab.edu/PGC/ or https://figshare.com/projects/Polar_Gini_Curve/76749.
Integration of Droplet Microfluidic Tools for Single-cell Functional Metagenomics: An Engineering Head Start
David Conchouso, AmaniAl-Ma'abadi, Hayedeh Behzad, Mohammed Alarawi, Masahito Hosokawa, Yohei Nishikawa, Haruko Takeyama, Katsuhiko Mineta, Takashi Gojobori
Droplet microfluidic techniques have shown promising outcome to study single cells at high throughput. However, their adoption in laboratories studying “-omics” sciences is still irrelevant due to the complex and multidisciplinary nature of the field. To facilitate their use, here we provide engineering details and organized protocols for integrating three droplet-based microfluidic technologies into the metagenomic pipeline to enable functional screening of bioproducts at high throughput. First, a device encapsulating single cells in droplets at a rate of ∼250 Hz is described considering droplet size and cell growth. Then, we expand on previously reported fluorescence-activated droplet sorting systems to integrate the use of 4 independent fluorescence-exciting lasers (i.e., 405, 488, 561, and 637 nm) in a single platform to make it compatible with different fluorescence-emitting biosensors. For this sorter, both hardware and software are provided and optimized for effortlessly sorting droplets at 60 Hz. Then, a passive droplet merger is also integrated into our pipeline to enable adding new reagents to already-made droplets at a rate of 200 Hz. Finally, we provide an optimized recipe for manufacturing these chips using silicon dry-etching tools. Because of the overall integration and the technical details presented here, our approach allows biologists to quickly use microfluidic technologies and achieve both single-cell resolution and high-throughput capability (>50,000 cells/day) for mining and bioprospecting metagenomic data