COVID-ONE-hi: The One-stop Database for COVID-19-specific Humoral Immunity and Clinical Parameters
Zhaowei Xu, Yang Li1, Qing Lei, Likun Huang, Dan-yun Lai, Shu-juan Guo, He-wei Jiang, Hongyan Hou, Yun-xiao Zheng, Xue-ning Wang, Jiaoxiang Wu, Ming-liang Ma, Bo Zhang, Hong Chen, Caizheng Yu, Jun-biao Xue, Hai-nan Zhang, Huan Qi, Siqi Yu, Mingxi Lin, Yandi Zhang, Xiaosong Lin, Zongjie Yao, Huiming Sheng, Ziyong Sun, Feng Wang, Xionglin Fan, Sheng-cTao
Coronavirus disease 2019 (COVID-19), which is caused by SARS-CoV-2, varies with regard to symptoms and mortality rates among populations. Humoral immunity plays critical roles in SARS-CoV-2 infection and recovery from COVID-19. However, differences in immune responses and clinical features among COVID-19 patients remain largely unknown. Here, we report a database for COVID-19-specific IgG/IgM immune responses and clinical parameters (named COVID-ONE-hi). COVID-ONE-hi is based on the data that contain the IgG/IgM responses to 24 full-length/truncated proteins corresponding to 20 of 28 known SARS-CoV-2 proteins and 199 spike protein peptides against 2360 serum samples collected from 783 COVID-19 patients. In addition, 96 clinical parameters for the 2360 serum samples and basic information for the 783 patients are integrated into the database. Furthermore, COVID-ONE-hi provides a dashboard for defining samples and a one-click analysis pipeline for a single group or paired groups. A set of samples of interest is easily defined by adjusting the scale bars of a variety of parameters. After the “START” button is clicked, one can readily obtain a comprehensive analysis report for further interpretation. COVID-ONE-hi is freely available at www.COVID-ONE.cn.
Gut Microbiome Alterations in COVID-19
Tao Zuo, Xiaojian Wu, Weiping Wen, Ping Lan
Since the outset of the coronavirus disease 2019 (COVID-19) pandemic, the gut microbiome in COVID-19 has garnered substantial interest, given its significant roles in human health and pathophysiology. Accumulating evidence is unveiling that the gut microbiome is broadly altered in COVID-19, including the bacterial microbiome, mycobiome, and virome. Overall, the gut microbial ecological network is significantly weakened and becomes sparse in patients with COVID-19, together with a decrease in gut microbiome diversity. Beyond the existence of severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), the gut microbiome of patients with COVID-19 is also characterized by enrichment of opportunistic bacteria, fungi, and eukaryotic viruses, which are also associated with disease severity and presentation. Meanwhile, a multitude of symbiotic bacteria and bacteriophages are decreased in abundance in patients with COVID-19. Such gut microbiome features persist in a significant subset of patients with COVID-19 even after disease resolution, coinciding with ‘long COVID’ (also known as post-acute sequelae of COVID-19). The broadly-altered gut microbiome is largely a consequence of SARS-CoV-2 infection and its downstream detrimental effects on the systemic host immunity and the gut milieu. The impaired host immunity and distorted gut microbial ecology, particularly loss of low-abundance beneficial bacteria and blooms of opportunistic fungi including Candida, may hinder the reassembly of the gut microbiome post COVID-19. Future investigation is necessary to fully understand the role of the gut microbiome in host immunity against SARS-CoV-2 infection, as well as the long-term effect of COVID-19 on the gut microbiome in relation to the host health after the pandemic.
肠道微生物群对于人体的健康和病理生理过程发挥着极其重要的作用。自新冠肺炎疫情爆发以来，新冠肺炎患者的肠道微生态变化及其对感染免疫的抵抗功能得到越来越多的关注。越来越多证据显示，新冠肺炎患者的肠道微生物群发生了明显的改变，包括细菌微生物群、真菌微生物群和病毒微生物群。总的来说，新冠肺炎患者的肠道微生物多样性降低，生态网络变得结构性稀疏。新冠肺炎患者的肠道中除了存在 SARS-CoV-2以外，还存在大量的机会致病菌、真菌和真核生物病毒，它们都与新冠肺炎的严重程度和临床表现有关。此外，绝大部分新冠肺炎患者肠道中的共生细菌、益生菌和噬菌体数量减少。大部分新冠肺炎患者的肠道微生态紊乱在疾病痊愈后也持续存在，此与“Long COVID”（新冠肺炎患者感染清除后数周或数月仍出现新冠肺炎症状）相一致，提示肠道微生态的持续紊乱在“新冠肺炎后时代”的长期健康危险。新冠肺炎感染对宿主免疫和肠道环境的不利影响在很大程度上导致了肠道微生物群改变。宿主免疫力受损和肠道微生物生态失调，特别是低丰度有益细菌的缺失和机会致病真菌(包括念珠菌)的大量繁殖，可能会阻碍新冠肺炎后肠道微生物群的恢复。因此，研究肠道微生物群在宿主免疫对抗SARS-CoV-2感染中的作用，和研究疫情大流行后新冠肺炎对肠道微生态的长期影响及与宿主健康的关系是十分有必要的。
Quantitative Proteomics Using Isobaric Labeling: A Practical Guide
Xiulan Chen, Yaping Sun, Tingting Zhang, Lian Shu, Peter Roepstorff, Fuquan Yang
In the past decade, relative proteomic quantification using isobaric labeling technology has developed into a key tool for comparing the expression of proteins in biological samples. Although its multiplexing capacity and flexibility make this a valuable technology for addressing various biological questions, its quantitative accuracy and precision still pose significant challenges to the reliability of its quantification results. Here, we give a detailed overview of the different kinds of isobaric mass tags and the advantages and disadvantages of the isobaric labeling method. We also discuss which precautions should be taken at each step of the isobaric labeling workflow, to obtain reliable quantification results in large-scale quantitative proteomics experiments. In the last section, we discuss the broad applications of the isobaric labeling technology in biological and clinical studies, with an emphasis on thermal proteome profiling and proteogenomics.
Integrative Multi-omics Landscape of Non-structural Protein 3 of Severe Acute Respiratory Syndrome Coronaviruses
Ruona Shi, Zhenhuan Feng, Xiaofei Zhang
The coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is currently a global pandemic. Extensive investigations have been performed to study the clinical and cellular effects of SARS-CoV-2 infection. Mass spectrometry-based proteomics studies have revealed the cellular changes due to the infection and identified a plethora of interactors for all SARS-CoV-2 components, except for the longest non-structural protein 3 (NSP3). Here, we expressed the full-length NSP3 proteins of SARS-CoV and SARS-CoV-2 to investigate their unique and shared functions using multi-omics methods. We conducted interactome, phosphoproteome, ubiquitylome, transcriptome, and proteome analyses of NSP3-expressing cells. We found that NSP3 plays essential roles in cellular functions such as RNA metabolism and immune response (e.g., NF-κB signal transduction). Interestingly, we showed that SARS-CoV-2 NSP3 has both endoplasmic reticulum and mitochondrial localizations. In addition, SARS-CoV-2 NSP3 is more closely related to mitochondrial ribosomal proteins, whereas SARS-CoV NSP3 is related to the cytosolic ribosomal proteins. In summary, our integrative multi-omics study of NSP3 improves the understanding of the functions of NSP3 and offers potential targets for the development of anti-SARS strategies.
NSP3 是 SARS-CoVs 中具有蛋白酶功能的两个蛋白之一，参与切割pp1a 和 pp1ab 两个长肽。前期的研究表明 NSP3 与其他 NSPs 一起调控病毒基因组 RNA 的合成。NSP3 是否还具有其他功能？NSP3 对于宿主细胞有什么影响？SARS-CoV 和 SARS-CoV-2 两种冠状病毒的 NSP3 功能是否相同？
在本研究中，我们通过免疫荧光发现了两种 NSP3 蛋白在宿主细胞中定位的不同。利用相互作用组、磷酸化组、泛素化组、转录组和蛋白组的多组学方法整合分析，我们发现两种 SARS-CoV 的 NSP3 蛋白对宿主细胞的影响的异同。
SARS-CoV 和 SARS-CoV-2 的 NSP3 蛋白具有细胞定位的倾向性。
NSP3 的相互作用组揭示了 SARS-CoV-2 NSP3 与内质网和线粒体核糖体关系密切，而 SARS-CoV NSP3 则与细胞质核糖体相关。
多组学揭示由 NSP3 调控的细胞过程。
Genomic Epidemiology of SARS-CoV-2 in Pakistan
Shuhui Song, Cuiping Li, Lu Kang, Dongmei Tian, Nazish Badar, Wentai Ma, Shilei Zhao, Xuan Jiang, Chun Wang, Yongqiao Sun, Wenjie Li, Meng Lei, Shuangli Li, Qiuhui Qi, Aamer Ikram, Muhammad Salman, Massab Umair, Huma Shireen, Fatima Batool, Bing Zhang, Hua Chen, Yun-Gui Yang, Amir Ali Abbasi, Mingkun Li, Yongbiao Xue, Yiming Bao
COVID-19 has swept globally and Pakistan is no exception. To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1, 2020. We identified a total of 347 mutated positions, 31 of which were over-represented in Pakistan. Meanwhile, we found over 1000 intra-host single-nucleotide variants (iSNVs). Several of them occurred concurrently, indicating possible interactions among them or coevolution. Some of the high-frequency iSNVs in Pakistan were not observed in the global population, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation (G8371T in ORF1ab) of this cluster. Furthermore, 28 putative international introductions were identified, several of which are consistent with the epidemiological investigations. In all, this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan, which could aid ongoing and future viral surveillance and COVID-19 control.
Integrative Analysis of Genome, 3D Genome, and Transcriptome Alterations of Clinical Lung Cancer Samples
Tingting Li, Ruifeng Li, Xuan Dong, LinShi, Miao Lin, Ting Peng, Pengze Wu, Yuting Liu, Xiaoting Li, Xuheng He, Xu Han, Bin Kang, Yinan Wang, Zhiheng Liu, Qing Chen, Yue Shen, Mingxiang Feng, Xiangdong Wang, Duojiao Wu, Jian Wang, Cheng Li
Genomic studies of cancer cell alterations, such as mutations, copy number variations (CNVs), and translocations, greatly promote our understanding of the genesis and development of cancers. However, the 3D genome architecture of cancers remains less studied due to the complexity of cancer genomes and technical difficulties. To explore the 3D genome structure in clinical lung cancer, we performed Hi-C experiments using paired normal and tumor cells harvested from patients with lung cancer, combining with RNA sequenceing analysis. We demonstrated the feasibility of studying 3D genome of clinical lung cancer samples with a small number of cells (1 × 104), compared the genome architecture between clinical samples and cell lines of lung cancer, and identified conserved and changed spatial chromatin structures between normal and cancer samples. We also showed that Hi-C data can be used to infer CNVs and point mutations in cancer. By integrating those different types of cancer alterations, we showed significant associations between CNVs, 3D genome, and gene expression. We propose that 3D genome mediates the effects of cancer genomic alterations on gene expression through altering regulatory chromatin structures. Our study highlights the importance of analyzing 3D genomes of clinical cancer samples in addition to cancer cell lines and provides an integrative genomic analysis pipeline for future larger-scale studies in lung cancer and other cancers.
Oleic Acid and Eicosapentaenoic Acid Reverse Palmitic Acid-induced Insulin Resistance in Human HepG2 Cells via the Reactive Oxygen Species/JUN Pathway
Yaping Sun, Jifeng Wang, Xiaojing Guo, Nali Zhu, Lili Niu, Xiang Ding, Zhensheng Xie, Xiulan Chen, Fuquan Yang
Oleic acid (OA), a monounsaturated fatty acid (MUFA), has previously been shown to reverse saturated fatty acid palmitic acid (PA)-induced hepatic insulin resistance (IR). However, its underlying molecular mechanism is unclear. In addition, previous studies have shown that eicosapentaenoic acid (EPA), a ω-3 polyunsaturated fatty acid (PUFA), reverses PA-induced muscle IR, but whether EPA plays the same role in hepatic IR and its possible mechanism involved need to be further clarified. Here, we confirmed that EPA reversed PA-induced IR in HepG2 cells and compared the proteomic changes in HepG2 cells after treatment with different free fatty acids (FFAs). A total of 234 proteins were determined to be differentially expressed after PA+OA treatment. Their functions were mainly related to responses to stress and endogenous stimuli, lipid metabolic process, and protein binding. For PA+EPA treatment, the PA-induced expression changes of 1326 proteins could be reversed by EPA, 415 of which were mitochondrial proteins, with most of the functional proteins involved in oxidative phosphorylation (OXPHOS) and tricarboxylic acid (TCA) cycle. Mechanistic studies revealed that the protein encoded by JUN and reactive oxygen species (ROS) play a role in OA- and EPA-reversed PA-induced IR, respectively. EPA and OA alleviated PA-induced abnormal adenosine triphosphate (ATP) production, ROS generation, and calcium (Ca2+) content. Importantly, H2O2-activated production of ROS increased the protein expression of JUN, further resulting in IR in HepG2 cells. Taken together, we demonstrate that ROS/JUN is a common response pathway employed by HepG2 cells toward FFA-regulated IR.
Mining Unknown Porcine Protein Isoforms by Tissue-based Map of Proteome Enhances Pig Genome Annotation
Pengju Zhao, Xianrui Zheng, Ying Yu, Zhuocheng Hou, Chenguang Diao, Haifei Wang, Huimin Kang, Chao Ning, Junhui Li, Wen Feng, Wen Wang, George E. Liu, Bugao Li, Jacqueline Smith, Yangzom Chamba, Jian-Feng Liu
A lack of the complete pig proteome has left a gap in our knowledge of the pig genome and has restricted the feasibility of using pigs as a biomedical model. In this study, we developed a tissue-based proteome map using 34 major normal pig tissues. A total of 5841 unknown protein isoforms were identified and systematically characterized, including 2225 novel protein isoforms, 669 protein isoforms from 460 genes symbolized beginning with LOC, and 2947 protein isoforms without clear NCBI annotation in the current pig reference genome. These newly identified protein isoforms were functionally annotated through profiling the pig transcriptome with high-throughput RNA sequencing of the same pig tissues, further improving the genome annotation of the corresponding protein-coding genes. Combining the well-annotated genes that have parallel expression pattern and subcellular witness, we predicted the tissue-related subcellular locations and potential functions for these unknown proteins. Finally, we mined 3081 orthologous genes for 52.7% of unknown protein isoforms across multiple species, referring to 68 KEGG pathways as well as 23 disease signaling pathways. These findings provide valuable insights and a rich resource for enhancing studies of pig genomics and biology, as well as biomedical model application to human medicine.
Integrating Genomic and Transcriptomic Data to Reveal Genetic Mechanisms Underlying Piao Chicken Rumpless Trait
Yun-Mei Wang, Saber Khederzadeh, Shi-Rong Li, Newton Otieno Otecko, David M. Irwin, Mukesh Thakur, Xiao-Die Ren, Ming-Shan Wang, Dong-Dong Wu, Ya-Ping Zhang
Piao chicken, a rare Chinese native poultry breed, lacks primary tail structures, such as pygostyle, caudal vertebra, uropygial gland, and tail feathers. So far, the molecular mechanisms underlying tail absence in this breed remain unclear. In this study, we comprehensively employed comparative transcriptomic and genomic analyses to unravel potential genetic underpinnings of rumplessness in Piao chicken. Our results reveal many biological factors involved in tail development and several genomic regions under strong positive selection in this breed. These regions contain candidate genes associated with rumplessness, including Irx4, Il18, Hspb2, and Cryab. Retrieval of quantitative trait loci (QTL) and gene functions implies that rumplessness might be consciously or unconsciously selected along with the high-yield traits in Piao chicken. We hypothesize that strong selection pressures on regulatory elements might lead to changes in gene activity in mesenchymal stem cells of the tail bud. The ectopic activity could eventually result in tail truncation by impeding differentiation and proliferation of the stem cells. Our study provides fundamental insights into early initiation and genetic basis of the rumpless phenotype in Piao chicken.
Elucidation of the MicroRNA Transcriptome in Western Corn Rootworm Reveals Its Dynamic and Evolutionary Complexity
Xiaozeng Yang, Elane Fishilevich, Marcelo A. German, Premchand Gandra, Robert E. McEwan, André Billion, Eileen Knorr, Andreas Vilcinskas, Kenneth E. Narva
Diabrotica virgifera virgifera (western corn rootworm, WCR) is one of the most destructive agricultural insect pests in North America. It is highly adaptive to environmental stimuli and crop protection technologies. However, little is known about the underlying genetic basis of WCR behavior and adaptation. More specifically, the involvement of small RNAs (sRNAs), especially microRNAs (miRNAs), a class of endogenous small non-coding RNAs that regulate various biological processes, has not been examined, and the datasets of putative sRNA sequences have not previously been generated for WCR. To achieve a comprehensive collection of sRNA transcriptomes in WCR, we constructed, sequenced, and analyzed sRNA libraries from different life stages of WCR and northern corn rootworm (NCR), and identified 101 conserved precursor miRNAs (pre-miRNAs) in WCR and other Arthropoda. We also identified 277 corn rootworm specific pre-miRNAs. Systematic analyses of sRNA populations in WCR revealed that its sRNA transcriptome, which includes PIWI-interacting RNAs (piRNAs) and miRNAs, undergoes a dynamic change throughout insect development. Phylogenetic analysis of miRNA datasets from model species reveals that a large pool of species-specific miRNAs exists in corn rootworm; these are potentially evolutionarily transient. Comparisons of WCR miRNA clusters to other insect species highlight conserved miRNA-regulated processes that are common to insects. Parallel Analysis of RNA Ends (PARE) also uncovered potential miRNA-guided cleavage sites in WCR. Overall, this study provides a new resource for studying the sRNA transcriptome and miRNA-mediated gene regulation in WCR and other Coleopteran insects.
Characterizing RNA Pseudouridylation by Convolutional Neural Networks
Xuan He, Sai Zhang, Yanqing Zhang, Zhixin Lei, Tao Jiang, Jianyang Zeng
Pseudouridine (Ψ) is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and mRNAs. However, the functions, mechanisms, and precise distribution of Ψs (especially in mRNAs) still remain largely unclear. The landscape of Ψs across the transcriptome has not yet been fully delineated. Here, we present a highly effective model based on a convolutional neural network (CNN), called PseudoUridyLation Site Estimator (PULSE), to analyze large-scale profiling data of Ψ sites and characterize the contextual sequence features of pseudouridylation. PULSE, consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network, can automatically learn the hidden patterns of pseudouridylation from the local sequence information. Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy, thus enabling us to further characterize the transcriptome-wide landscape of Ψ sites. We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation, such as the regulations of RNA secondary structure, codon usage, translation, and RNA stability, and the connection to single nucleotide variants. The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE.
kLDM: Inferring Multiple Metagenomic Association Networks Based on the Variation of Environmental Factors
Yuqing Yang, Xin Wang, Kaikun Xie, Congmin Zhu, Ning Chen, Ting Chen
Identification of significant biological relationships or patterns is central to many metagenomic studies. Methods that estimate association networks have been proposed for this purpose; however, they assume that associations are static, neglecting the fact that relationships in a microbial ecosystem may vary with changes in environmental factors (EFs), which can result in inaccurate estimations. Therefore, in this study, we propose a computational model, called the k-Lognormal-Dirichlet-Multinomial (kLDM) model, which estimates multiple association networks that correspond to specific environmental conditions, and simultaneously infers microbe–microbe and EF–microbe associations for each network. The effectiveness of the kLDM model was demonstrated on synthetic data, a colorectal cancer (CRC) dataset, the Tara Oceans dataset, and the American Gut Project dataset. The results revealed that the widely-used Spearman’s rank correlation coefficient method performed much worse than the other methods, indicating the importance of separating samples by environmental conditions. Cancer fecal samples were then compared with cancer-free samples, and the estimation achieved by kLDM exhibited fewer associations among microbes but stronger associations between specific bacteria, especially five CRC-associated operational taxonomic units, indicating gut microbe translocation in cancer patients. Some EF-dependent associations were then found within a marine eukaryotic community. Finally, the gut microbial heterogeneity of inflammatory bowel disease patients was detected. These results demonstrate that kLDM can elucidate the complex associations within microbial ecosystems. The kLDM program, R, and Python scripts, together with all experimental datasets, are accessible at https://github.com/tinglab/kLDM.git.
FunHoP: Enhanced Visualization and Analysis of Functionally Homologous Proteins in Complex Metabolic Networks
Kjersti Rise, May-Britt Tessem, Finn Drabløs, Morten B. Rye
Cytoscape is often used for visualization and analysis of metabolic pathways. For example, based on KEGG data, a reader for KEGG Markup Language (KGML) is used to load files into Cytoscape. However, although multiple genes can be responsible for the same reaction, the KGML-reader KEGGScape only presents the first listed gene in a network node for a given reaction. This can lead to incorrect interpretations of the pathways. Our new method, FunHoP, shows all possible genes in each node, making the pathways more complete. FunHoP collapses all genes in a node into one measurement using read counts from RNA-seq. Assuming that activity for an enzymatic reaction mainly depends upon the gene with the highest number of reads, and weighting the reads on gene length and ratio, a new expression value is calculated for the node as a whole. Differential expression at node level is then applied to the networks. Using prostate cancer as model, we integrate RNA-seq data from two patient cohorts with metabolism data from literature. Here we show that FunHoP gives more consistent pathways that are easier to interpret biologically. Code and documentation for running FunHoP can be found at https://github.com/kjerstirise/FunHoP.
CharPlant: A De Novo Open Chromatin Region Prediction Tool for Plant Genomes
Yin Shen, Ling-Ling Chen, Junxiang Gao
Chromatin accessibility is a highly informative structural feature for understanding gene transcription regulation, because it indicates the degree to which nuclear macromolecules such as proteins and RNAs can access chromosomal DNA. Studies have shown that chromatin accessibility is highly dynamic during stress response, stimulus response, and developmental transition. Moreover, physical access to chromosomal DNA in eukaryotes is highly cell-specific. Therefore, current technologies such as DNase-seq, ATAC-seq, and FAIRE-seq reveal only a portion of the open chromatin regions (OCRs) present in a given species. Thus, the genome-wide distribution of OCRs remains unknown. In this study, we developed a bioinformatics tool called CharPlant for the de novo prediction of OCRs in plant genomes. To develop this tool, we constructed a three-layer convolutional neural network (CNN) and subsequently trained the CNN using DNase-seq and ATAC-seq datasets of four plant species. The model simultaneously learns the sequence motifs and regulatory logics, which are jointly used to determine DNA accessibility. All of these steps are integrated into CharPlant, which can be run using a simple command line. The results of data analysis using CharPlant in this study demonstrate its prediction power and computational efficiency. To our knowledge, CharPlant is the first de novo prediction tool that can identify potential OCRs in the whole genome. The source code of CharPlant and supporting files are freely available from https://github.com/Yin-Shen/CharPlant.
染色质可及性表征了蛋白质和RNA等核内大分子能够接近染色体DNA的程度，是含有丰富信息的理解基因转录调控的结构特征。研究表明，在胁迫响应、刺激反应和发育过程中染色质可及性高度动态化；而且真核生物的染色质可及性也具有高度细胞特异性。目前的检测技术如DNase-seq、ATAC-seq和FAIRE-seq仅能揭示特定物种中存在的一部分染色质开放区域（open chromatin region，OCR），因此OCRs的全基因组分布仍然是未知的。在这项研究中，我们开发了一个生物信息学工具CharPlant，用于从头预测植物基因组中染色质开放区域。在CharPlant中我们构建了一个三层卷积神经网络，然后使用四种植物（水稻、拟南芥、苜蓿和番茄）的ATAC-seq和DNase-seq数据集训练该模型。该模型同时学习序列基序和调控逻辑以确定DNA的可及性，所有步骤都集成在CharPlant中可使用简单的命令行运行。利用CharPlant进行数据分析的结果证实了该方法的预测能力和计算效率。CharPlant是第一个能够在整个基因组中识别潜在OCRs的从头预测工具，其源代码和支持文件可以从下面的链接免费获取：https://github.com/Yin-Shen/CharPlant。