Ambient Temperature is A Strong Selective Factor Influencing Human Development and Immunity
Lindan Ji, Dongdong Wu, Haibing Xie, Binbin Yao, Yanming Chen, David M. Irwin, Dan Huang, Jin Xu, Nelson L.S. Tang, Yaping Zhang
Solar energy, which is essential for the origin and evolution of all life forms on Earth, can be objectively recorded through attributes such as climatic ambient temperature (CAT), ultraviolet radiation (UVR), and sunlight duration (SD). These attributes have specific geographical variations and may cause different adaptation traits. However, the adaptation profile of each attribute and the selective role of solar energy as a whole during human evolution remain elusive. Here, we performed a genome-wide adaptation study with respect to CAT, UVR, and SD using the Human Genome Diversity Project-Centre Etude Polymorphism Humain (HGDP-CEPH) panel data. We singled out CAT as the most important driving force with the highest number of adaptive loci (6 SNPs at the genome-wide 1 × 10−7 level; 401 at the suggestive 1 × 10−5 level). Five of the six genome-wide significant adaptation SNPs were successfully replicated in an independent Chinese population (N = 1395). The corresponding 316 CAT adaptation genes were mostly involved in development and immunity. In addition, 265 (84%) genes were related to at least one genome-wide association study (GWAS)-mapped human trait, being significantly enriched in anthropometric loci such as those associated with body mass index (χ2; P < 0.005), immunity, metabolic syndrome, and cancer (χ2; P < 0.05). For these adaptive SNPs, balancing selection was evident in Euro-Asians, whereas obvious positive and/or purifying selection was observed in Africans. Taken together, our study indicates that CAT is the most important attribute of solar energy that has driven genetic adaptation in development and immunity among global human populations. It also supports the non-neutral hypothesis for the origin of disease-predisposition alleles in common diseases.
对地球生命的起源与进化至关重要的太阳能，可通过气温（CAT）、紫外辐射（UVR）和日照时长（SD）这三个主要属性进行衡量。在不同地域，这三者呈现出不同的组合方式并可能导致不同的生物适应特征。目前，人类群体对以这三种属性为代表的太阳能的适应模式，尤其是对每一属性的适应特征，尚未被完全阐释清楚。在本研究中，我们通过全基因组适应性分析，筛选了法国人类多样性研究中心的人类基因组多样性计划（HGDP-CEPH）数据库中可能受到气温、紫外辐射和日照时长选择的单核苷酸多态性（SNP）。我们发现与气温显著相关的位点最多（基因组1×10-7水平有6个SNP，达到1×10-5水平有401个SNP），提示气温可能是三种属性中起主要作用的因素。在这6个全基因组水平显著的气温信号中，5个在独立的中国人群（N = 1395）中得到验证。此外，401个SNP归属的316个气温相关基因主要与发育和免疫相关。其中，265个基因（84%）与至少一个全基因组关联研究（GWAS）相关的人类特征相关，并在人体测量学指标如体质指数BMI、免疫、代谢综合征和肿瘤等类别显著富集。总体上，这些气温适应信号在欧亚人群显示出平衡选择，在非洲则显示为净化选择。综上所述，本研究提示气温可能在太阳能对全球人群的发育和免疫功能的选择作用中起了最主要的作用。同时，研究结果也支持常见复杂疾病的易感基因的非中性起源假说。
Landscape and Dynamics of the Transcriptional Regulatory Network During Natural Killer Cell Differentiation
Kun Li, Yang Wu, Young Li, Qiaoni Yu, Zhigang Tian, Haiming Wei, Kun Qu
Natural killer (NK) cells are essential in controlling cancer and infection. However, little is known about the dynamics of the transcriptional regulatory machinery during NK cell differentiation. In this study, we applied the assay of transposase accessible chromatin with sequencing (ATAC-seq) technique in a home-developed in vitro NK cell differentiation system. Analysis of ATAC-seq data illustrated two distinct transcription factor (TF) clusters that dynamically regulate NK cell differentiation. Moreover, two TFs from the second cluster, FOS-like 2 (FOSL2) and early growth response 2 (EGR2), were identified as novel essential TFs that control NK cell maturation and function. Knocking down either of these two TFs significantly impacted NK cell differentiation. Finally, we constructed a genome-wide transcriptional regulatory network that provides a better understanding of the regulatory dynamics during NK cell differentiation.
天然杀伤（NK）细胞是先天性淋巴细胞，可保护宿主免受感染或癌细胞侵袭。此外，基于NK细胞的免疫疗法已成为癌症治疗中的新兴力量，并将在未来疾病治疗中发挥重要作用，而NK细胞用于免疫治疗依赖于大量具有最佳细胞活性的NK细胞。因此，全面了解NK细胞的分化过程对于提高临床治疗的有效性尤其重要。在这项研究中，我们利用ATAC-seq技术在体外诱导NK细胞分化系统中检测NK细胞分化过程中染色质可及性的变化。对ATAC-seq数据的分析发现两个不同的转录因子（TF）簇动态调控NK细胞的分化。 此外，来自第二个簇的两个TFs ，FOSL2和EGR2，被确定为调控NK细胞成熟和功能的新的必需转录因子。 敲低这两个TF中的任何一个，都会明显影响NK细胞的分化。 最后，我们构建了一个全基因组范围的转录调控网络，可以全面了解NK细胞的分化过程。
The Biological Significance of Multi-copy Regions and Their Impact on Variant Discovery
Jing Sun, Yanfang Zhang, Minhui Wang, Qian Guan, Xiujia Yang, Jin Xia Ou, Mingchen Yan, Chengrui Wang, Yan Zhang, Zhi-Hao Li, Chunhong Lan, Chen Mao, Hong-Wei Zhou, Bingtao Hao, Zhenhai Zhang
Identification of genetic variants via high-throughput sequencing (HTS) technologies has been essential for both fundamental and clinical studies. However, to what extent the genome sequence composition affects variant calling remains unclear. In this study, we identified 63,897 multi-copy sequences (MCSs) with a minimum length of 300 bp, each of which occurs at least twice in the human genome. The 151,749 genomic loci (multi-copy regions, or MCRs) harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes. MCRs containing the same MCS tend to be located on the same chromosome. Gene Ontology (GO) analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgi-related cellular component terms and various enzymatic activities in the GO biological function category. MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks. Moreover, genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs. Using simulated HTS datasets, we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions. These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.
Kinase–substrate Edge Biomarkers Provide A More Accurate Prognostic Prediction in ER-negative Breast Cancer
Yidi Sun, Chen Li, Shichao Pang, Qianlan Yao, Luonan Chen, Yixue Li, Rong Zeng
The estrogen receptor (ER)-negative breast cancer subtype is aggressive with few treatment options available. To identify specific prognostic factors for ER-negative breast cancer, this study included 705,729 and 1034 breast invasive cancer patients from the Surveillance, Epidemiology, and End Results (SEER) and The Cancer Genome Atlas (TCGA) databases, respectively. To identify key differential kinase–substrate node and edge biomarkers between ER-negative and ER-positive breast cancer patients, we adopted a network-based method using correlation coefficients between molecular pairs in the kinase regulatory network. Integrated analysis of the clinical and molecular data revealed the significant prognostic power of kinase–substrate node and edge features for both subtypes of breast cancer. Two promising kinase–substrate edge features, CSNK1A1–NFATC3 and SRC–OCLN, were identified for more accurate prognostic prediction in ER-negative breast cancer patients.
Pooled Plasmid Sequencing Reveals the Relationship Between Mobile Genetic Elements and Antimicrobial Resistance Genes in Clinically Isolated Klebsiella pneumoniae
Yan Jiang, Yanfei Wang, Xiaoting Hua, Yue Qu, Anton Y. Peleg, Yunsong Yu
Plasmids remain important microbial components mediating the horizontal gene transfer (HGT) and dissemination of antimicrobial resistance. To systematically explore the relationship between mobile genetic elements (MGEs) and antimicrobial resistance genes (ARGs), a novel strategy using single-molecule real-time (SMRT) sequencing was developed. This approach was applied to pooled conjugative plasmids from clinically isolated multidrug-resistant (MDR) Klebsiella pneumoniae from a tertiary referral hospital over a 9-month period. The conjugative plasmid pool was obtained from transconjugants that acquired antimicrobial resistance after plasmid conjugation with 53 clinical isolates. The plasmid pool was then subjected to SMRT sequencing, and 82 assembled plasmid fragments were obtained. In total, 124 ARGs (responsible for resistance to β-lactam, fluoroquinolone, and aminoglycoside, among others) and 317 MGEs [including transposons (Tns), insertion sequences (ISs), and integrons] were derived from these fragments. Most of these ARGs were linked to MGEs, allowing for the establishment of a relationship network between MGEs and/or ARGs that can be used to describe the dissemination of resistance by mobile elements. Key elements involved in resistance transposition were identified, including IS26, Tn3, IS903B, ISEcp1, and ISKpn19. As the most predominant IS in the network, a typical IS26-mediated multicopy composite transposition event was illustrated by tracing its flanking 8-bp target site duplications (TSDs). The landscape of the pooled plasmid sequences highlights the diversity and complexity of the relationship between MGEs and ARGs, underpinning the clinical value of dominant HGT profiles.
Exploring Potential Signals of Selection for Disordered Residues in Prokaryotic and Eukaryotic Proteins
Arup Panda, Tamir Tuller
Intrinsically disordered proteins (IDPs) are an important class of proteins in all domains of life for their functional importance. However, how nature has shaped the disorder potential of prokaryotic and eukaryotic proteins is still not clearly known. Randomly generated sequences are free of any selective constraints, thus these sequences are commonly used as null models. Considering different types of random protein models, here we seek to understand how the disorder potential of natural eukaryotic and prokaryotic proteins differs from random sequences. Comparing proteome-wide disorder content between real and random sequences of 12 model organisms, we noticed that eukaryotic proteins are enriched in disordered regions compared to random sequences, but in prokaryotes such regions are depleted. By analyzing the position-wise disorder profile, we show that there is a generally higher disorder near the N- and C-terminal regions of eukaryotic proteins as compared to the random models; however, either no or a weak such trend was found in prokaryotic proteins. Moreover, here we show that this preference is not caused by the amino acid or nucleotide composition at the respective sites. Instead, these regions were found to be endowed with a higher fraction of protein–protein binding sites, suggesting their functional importance. We discuss several possible explanations for this pattern, such as improving the efficiency of protein–protein interaction, ribosome movement during translation, and post-translational modification. However, further studies are needed to clearly understand the biophysical mechanisms causing the trend.
PIMD: An Integrative Approach for Drug Repositioning Using Multiple Characterization Fusion
Song He, Yuqi Wen, Xiaoxi Yang; Zhen Liu; Xinyu Song; Xin Huang; Xiaochen Bo
The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development. However, the integration of multi-dimensional drug data for precision repositioning remains a pressing challenge. Here, we propose a systematic framework named PIMD to predict drug therapeutic properties by integrating multi-dimensional data for drug repositioning. In PIMD, drug similarity networks (DSNs) based on chemical, pharmacological, and clinical data are fused into an integrated DSN (iDSN) composed of many clusters. Rather than simple fusion, PIMD offers a systematic way to annotate clusters. Unexpected drugs within clusters and drug pairs with a high iDSN similarity score are therefore identified to predict novel therapeutic uses. PIMD provides new insights into the universality, individuality, and complementarity of different drug properties by evaluating the contribution of each property data. To test the performance of PIMD, we use chemical, pharmacological, and clinical properties to generate an iDSN. Analyses of the contributions of each drug property indicate that this iDSN was driven by all data types and performs better than other DSNs. Within the top 20 recommended drug pairs, 7 drugs have been reported to be repurposed. The source code for PIMD is available at https://github.com/Sepstar/PIMD/.