Article Online - Genomics, Proteomics & Bioinformatics

Volume: 14, Issue: 6

Research Highlight

Reading and Interpreting the Histone Acylation Code

Jelly H.M. Soffers, Xuanying Li, Susan M. Abmayr, Jerry L. Workman

组蛋白修饰，例如酰基化修饰，在表观遗传调控基因转录和染色质重塑的过程中起到重要作用。这其中，酰基化被阅读器的识别是重要一步，然而我们对不同酰基化的阅读机制还不甚了解。近来，众多新型组蛋白酰基化修饰被不断发现，其中包括巴豆酰化。这些新型酰基化的发现极大地扩展了我们对组蛋白赖氨酸修饰的认知，而不同阅读器对于巴豆酰化的识别机制这一问题也引起了很大关注。溴结构域，YEATS结构域和双PHD指状结构域是三类主要的赖氨酸乙酰化修饰阅读器，但他们并不仅限于识别乙酰化。现在，李海涛和David Allis研究团队将巴豆酰化阅读器从溴结构域及YEATS结构域扩展到双PHD指状结构域。这三种结构域利用不同的阅读机制识别酰基化修饰，并具有不同的阅读偏好性。溴结构域的侧面开放式酰基化蛋白阅读口袋，具有较高的乙酰化识别能力，但不具备巴豆酰化识别偏好性。YEATS结构域利用末端开放的芳香三明治口袋识别酰基化修饰。总体而言，YEATS结构域拥有广泛的酰基化结合能力，对巴豆酰化的识别比乙酰化强。双PHD指状结构域形成一个末端关闭的高度尺寸选择性疏水口袋。即使没有较深的口袋，双PHD指状结构域可以通过利用疏水环境和协同氢键识别较大的酰基化修饰，例如巴豆酰化。新型酰基化的发现增加了组蛋白修饰的多样性，然而我们才刚刚开始解开这些不同修饰阅读机制的面纱。更好第地了解这些组蛋白修饰阅读机制有助于我们进一步解析组蛋白修饰在染色质重塑和调控基因转录过程中的作用。

Page 329-332

Download 1507

Resource Review

Biological Databases for Hematology Research

Qian Zhang, Nan Ding, Lu Zhang, Xuetong Zhao, Yadong Yang, Hongzhu Qu, Xiangdong Fang

View abstract

With the advances of genome-wide sequencing technologies and bioinformatics approaches, a large number of datasets of normal and malignant erythropoiesis have been generated and made public to researchers around the world. Collection and integration of these datasets greatly facilitate basic research and clinical diagnosis and treatment of blood disorders. Here we provide a brief introduction of the most popular omics data resources of normal and malignant hematopoiesis, including some integrated web tools, to help users get better equipped to perform common analyses. We hope this review will promote the awareness and facilitate the usage of public database resources in the hematology research.

随着测序技术的不断进步，血液病领域的测序数据产量呈指数增加，由此诞生了诸多血液病和红细胞相关的数据库，为血液学科学研究提供了诸多研究靶点，并且为临床诊断、治疗提供了充足的数据储备和科学依据。本文针对红细胞和红细胞相关血液病的数据库进行系统性汇总，概述了它们的主要内容、基本特征、和初级使用方法，希望能够帮助血液学相关科研人员、临床医生以及想要了解该领域的其他人员对白血病、贫血症、纯红细胞再生障碍性贫血和正常红细胞发育等有一定了解，同时希望能增加人们使用公共数据库的意识。

Page 333-337

Download 1622

Original Research

T Cell Repertoire Diversity Is Decreased in Type 1 Diabetes Patients

Yin Tong, Zhoufang Li, Hua Zhang, Ligang Xia, Meng Zhang, Ying Xu, Zhanhui Wang, Michael W. Deem, Xiaojuan Sun, Jiankui He

View abstract

Type 1 diabetes mellitus (T1D) is an immune-mediated disease. The autoreactive T cells in T1D patients attack and destroy their own pancreatic cells. In order to systematically investigate the potential autoreactive T cell receptors (TCRs), we used a high-throughput immune repertoire sequencing technique to profile the spectrum of TCRs in individual T1D patients and controls. We sequenced the T cell repertoire of nine T1D patients, four type 2 diabetes (T2D) patients, and six nondiabetic controls. The diversity of the T cell repertoire in T1D patients was significantly decreased in comparison with T2D patients (P = 7.0E−08 for CD4+ T cells, P = 1.4E−04 for CD8+ T cells) and nondiabetic controls (P = 2.7E−09 for CD4+ T cells, P = 7.6E−06 for CD8+ T cells). Moreover, T1D patients had significantly more highly-expanded T cell clones than T2D patients (P = 5.2E−06 for CD4+ T cells, P = 1.9E−07 for CD8+ T cells) and nondiabetic controls (P = 1.7E−07 for CD4+ T cells, P = 3.3E−03 for CD8+ T cells). Furthermore, we identified a group of highly-expanded T cell receptor clones that are shared by more than two T1D patients. Although further validation in larger cohorts is needed, our data suggest that T cell receptor diversity measurements may become a valuable tool in investigating diabetes, such as using the diversity as an index to distinguish different types of diabetes.

一型糖尿病（T1D）是一种自身免疫系统缺陷导致的疾病。T1D患者自身的T淋巴细胞攻击和摧毁胰腺beta细胞，使之不能正常分泌胰岛素。为了系统地探讨T1D患者体内T淋巴细胞的组成，我们用高通量免疫组库测序技术分析了T1D患者和非T1D对照 [包括二型糖尿病（T2D）和非糖尿病] 样本中的所有T淋巴细胞受体（TCR）。通过分析和对比9个T1D患者，4个T2D患者和6个非糖尿病健康人样品的TCR组成，我们发现T1D患者 TCR的多样性相比于T2D患者和非糖尿病对照组，明显降低。此外，T1D患者比T2D患者和非糖尿病组存在更多由于单一T淋巴细胞扩增形成的大克隆，我们称为HEC，这些大克隆很有可能是源于T1D中自身反应性T淋巴细胞。同时，我们在不同的T1D患者中发现了一些共享的HEC。由于目前样品数量有限，要确定这些共享的HEC真正是T1D特异性的，我们需要大样本的数据和后续实验的验证。尽管如此，我们仍然可以预见通过免疫组库测序检测TCR多样性的方法有可能成为一个重要的工具，可用于评估和检测T1D以及其他自身免疫系统疾病。比如，我们可以使用多样性作为指标，来区分不同类型的糖尿病等。

Page 338-348

Download 1088

Original Research

Identification of Risk Pathways and Functional Modules for Coronary Artery Disease Based on Genome-wide SNP Data

Xiang Zhao, Yi-Zhao Luan, Xiaoyu Zuo, Ye-Da Chen, Jiheng Qin, Lv Jin, Yiqing Tan, Meihua Lin, Naizun Zhang, Yan Liang, Shao-Qi Rao

View abstract

Coronary artery disease (CAD) is a complex human disease, involving multiple genes and their nonlinear interactions, which often act in a modular fashion. Genome-wide single nucleotide polymorphism (SNP) profiling provides an effective technique to unravel these underlying genetic interplays or their functional involvements for CAD. This study aimed to identify the susceptible pathways and modules for CAD based on SNP omics. First, the Wellcome Trust Case Control Consortium (WTCCC) SNP datasets of CAD and control samples were used to assess the joint effect of multiple genetic variants at the pathway level, using logistic kernel machine regression model. Then, an expanded genetic network was constructed by integrating statistical gene–gene interactions involved in these susceptible pathways with their protein–protein interaction (PPI) knowledge. Finally, risk functional modules were identified by decomposition of the network. Of 276 KEGG pathways analyzed, 6 pathways were found to have a significant effect on CAD. Other than glycerolipid metabolism, glycosaminoglycan biosynthesis, and cardiac muscle contraction pathways, three pathways related to other diseases were also revealed, including Alzheimer’s disease, non-alcoholic fatty liver disease, and Huntington’s disease. A genetic epistatic network of 95 genes was further constructed using the abovementioned integrative approach. Of 10 functional modules derived from the network, 6 have been annotated to phospholipase C activity and cell adhesion molecule binding, which also have known functional involvement in Alzheimer’s disease. These findings indicate an overlap of the underlying molecular mechanisms between CAD and Alzheimer’s disease, thus providing new insights into the molecular basis for CAD and its molecular relationships with other diseases.

冠心病是一类复杂疾病，其遗传机制涉及基因间的非线性相互作用。全基因组单核甘酸多态性数据（GWAS）为全面解析复杂疾病的多基因互作及致病分子通路提供了机遇。本研究旨在发展和完善复杂疾病易感通路和风险功能模块识别的分析框架，对Wellcome Trust Case Control Consortium（WTCCC）提供的冠心病数据进行了深度挖掘，识别与冠心病有关的易感通路和功能模块。首先，采用logistic核机器回归模型识别冠心病易感通路；进一步进行易感通路间基因互作分析，在蛋白质-蛋白质互作先验知识的引导下构建基因互作网络；最后通过网络拓扑学分析，识别冠心病的风险功能模块。。结果发现，甘油脂代谢通路（hsa00561）、粘多糖生物合成通路（hsa00532）等六条生物学通路与冠心病显著关联，进一步富集分析得到6个功能模块富集到磷脂酶C活性（GO:0004629）和细胞粘附分子(CAMs)的结合（GO:0050839）等GO条目。以上结果表明，本研究提出的基于先验知识挖掘冠心病功能模块的新思路是可行的，提示冠心病致病分子机制涉及多个生物学通路和以功能模块形式发挥功能的遗传变异。

Page 349-356

Download 1059

Original Research

Plant Proteins Are Smaller Because They Are Encoded by Fewer Exons than Animal Proteins

Obed Ramírez-Sánchez, Paulino Pérez-Rodríguez, Luis Delaye, Axel Tiessen

View abstract

Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylogenetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa), average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ∼81 aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ∼10 exons of small size [∼176 nucleotides (nt)]. Streptophyta have on average only ∼5.7 exons of medium size (∼230 nt). Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400 nt). Among subcellular compartments, membrane proteins are the largest (∼520 aa), whereas the smallest proteins correspond to the gene ontology group of ribosome (∼240 aa). Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ∼34% more but ∼20% smaller proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes.

蛋白大小是一项重要的生化指标，大蛋白含有更多的结构域，因此比小蛋白具有更多的生物学功能。我们研究发现，不同的世系在蛋白长度、外显子结构和结构域数量上有显著的不同。真核生物蛋白平均有472个氨基酸残基（Amino acid residues, aa），植物蛋白比动物蛋白和真菌蛋白小。植物特异性蛋白比植物在其他真核种系中的保守同源蛋白要小，约81 aa，这种小蛋白不能用内共生、亚细胞分隔或外显子大小来解释，而是由外显子数量造成的。后生动物蛋白由~10个小外显子（~176 个核苷酸nt）编码，链形植物平均只有~5.7个中等大小外显子（~230nt）。多细胞物种通过编码更多的外显子数量得到更大的蛋白，而大多数单细胞生物采用更大的外显子（>400 nt）。在所有的亚细胞结构成分中，膜蛋白最大（~520 aa），核糖体相关蛋白最小（~240 aa）。植物基因仅含有动物基因一半的外显子，而且平均包含的结构域也比动物少。有趣的是，迁移至植物核内的内共生蛋白会变得比蓝藻直系同源还要大。因此，我们认为植物有比细菌更大、但是却比动物或真菌小的蛋白质。与真核生物平均值相比，植物拥有更多（~34%）但是却更小（~20%）的蛋白质。这暗示着进行光合作用的生物具有特异性，我们应该特别关注一下进化驱动力对它们的基因组和蛋白质组的作用，这为进一步阐明植物蛋白质组进化上的变化规律奠定了基础。

El tamaño de las proteínas es una característica bioquímica importante ya que proteínas más largas pueden contener más dominios y por tanto mostrar más funcionalidades biológicas comparadas con proteínas chicas. Encontramos notables diferencias en la longitud de proteínas, estructura de exones, y número de dominios entre diferentes linajes filogenéticos. Mientras que las proteínas de eucariontes tienen un tamaño promedio de 472 aminoácidos (aa), los tamaños promedio en genomas de plantas son más chicos que los de animales y hongos. Las proteínas específicas de plantas son ~81 aa más chicas que las proteínas en plantas conservadas entre otros linajes de eucariontes. El tamaño más chico de las proteínas de plantas no pudo ser explicado por endosimbiosis, tampoco por compartimentos celulares, ni por tamaño de exones, pero sí por el número de exones. Las proteínas en metazoarios son codificadas en promedio por ~10 exones de tamaño chico [~176 nucleótidos (nt)]. Las streptophytas tienen en promedio solo ~5.7 exones de tamaño intermedio (~230 nt). Las especies multicelulares codifican para proteínas largas al incrementar el número de exones, mientras que la mayoría de los organismos unicelulares utilizan exones más largos (>400 nt). Entre los compartimentos subcelulares, las proteínas de membrana son las más largas (~520 aa), mientras que las proteínas más chicas corresponden al grupo de ribosomas (~240 aa). Los genes de plantas están codificados en promedio solamente por la mitad del número de exones y también contienen menos dominios que las proteínas de animales. Interesantemente, las proteínas endosimbióticas que migraron al núcleo de las plantas se volvieron más largas que sus ortólogos en cianobacterias. Concluimos que las plantas tienen proteínas más grandes que bacterias pero más chicas que animales y hongos. En comparación con el promedio de las especies de eucariontes, las plantas tienen ~34% más proteínas pero que son ~20% más chicas. Esto sugiere que los organismos fotosintéticos son únicos y merecen atención especial con respecto a las fuerzas evolutivas que actúan en sus genomas y protemas.

Page 357-370

Download 1352

Method

An Improved Methodology to Overcome Key Issues in Human Fecal Metagenomic DNA Extraction

Jitendra Kumar, Manoj Kumar, Shashank Gupta, Vasim Ahmed, Manu Bhambi, Rajesh Pandey, Nar Singh Chauhan

View abstract

Microbes are ubiquitously distributed in nature, and recent culture-independent studies have highlighted the significance of gut microbiota in human health and disease. Fecal DNA is the primary source for the majority of human gut microbiome studies. However, further improvement is needed to obtain fecal metagenomic DNA with sufficient amount and good quality but low host genomic DNA contamination. In the current study, we demonstrate a quick, robust, unbiased, and cost-effective method for the isolation of high molecular weight (>23 kb) metagenomic DNA (260/280 ratio >1.8) with a good yield (55.8 ± 3.8 ng/mg of feces). We also confirm that there is very low human genomic DNA contamination (eubacterial: human genomic DNA marker genes = 227.9:1) in the human feces. The newly-developed method robustly performs for fresh as well as stored fecal samples as demonstrated by 16S rRNA gene sequencing using 454 FLX+. Moreover, 16S rRNA gene analysis indicated that compared to other DNA extraction methods tested, the fecal metagenomic DNA isolated with current methodology retains species richness and does not show microbial diversity biases, which is further confirmed by qPCR with a known quantity of spike-in genomes. Overall, our data highlight a protocol with a balance between quality, amount, user-friendliness, and cost effectiveness for its suitability toward usage for culture-independent analysis of the human gut microbiome, which provides a robust solution to overcome key issues associated with fecal metagenomic DNA isolation in human gut microbiome studies.

微生物在自然界中无所不在，最近的免培养法研究指出肠道菌群对于人类的健康和疾病发生起到重要作用。排泄物中的DNA是大多数人类肠道微生物研究的第一手来源，然而，我们需要得到更高数量及质量，且为低宿主基因组DNA污染的排泄物元基因组DNA来进行研究。在这项研究中，我们开发了一种快速、有效、无偏好性且经济划算的分离高分子量（>23 kb）元基因组DNA的方法（每mg粪便55.8 ± 3.8 ng）。我们也证实了在人类排泄物中含有较低的人类基因组DNA污染（标记的真菌DNA：标记的人类基因组DNA=227.9:1）。这种新的方法能够很好地适用于新鲜或者冻存的排泄物，且通过454 FLX+的16S rRNA 基因测序方法也已经得以证实。与其他测试的DNA提取方法相比，用这种方法提取排泄物元基因组DNA保留了物种丰度，并且没有微生物偏好性，同时，我们也用qPCR进行了验证。总得来说，我们开发出了一种平衡了质量、数量、用户友好性和经济划算多方面因素的分离高分子量元基因组DNA的方法，可适用于免培养法分析人类肠道菌群，为人类肠道菌群研究提供了一个强有力的能克服提取排泄物元基因组DNA诸多障碍的新途径。不过，该方法用于其他物种时还需要根据物种特点进行相应调整。

Page 371-378

Download 1599