Proteomics Technologies and Challenges
William C.S. Cho
Proteomics is the study of proteins and their interactions in a cell. With the completion of the Human Genome Project, the emphasis is shifting to the protein compliment of the human organism. Because proteome reflects more accurately on the dynamic state of a cell, tissue, or organism, much is expected from proteomics to yield better disease markers for diagnosis and therapy monitoring. The advent of proteomics technologies for global detection and quantitation of proteins creates new opportunities and challenges for those seeking to gain greater understanding of diseases. High-throughput proteomics technologies combining with advanced bioinformatics are extensively used to identify molecular signatures of diseases based on protein pathways and signaling cascades. Mass spectrometry plays a vital role in proteomics and has become an indispensable tool for molecular and cellular biology. While the potential is great, many challenges and issues remain to be solved, such as mining low abundant proteins and integration of proteomics with genomics and metabolomics data. Nevertheless, proteomics is the foundation for constructing and extracting useful knowledge to biomedical research. In this review, a snapshot of contemporary issues in proteomics technologies is discussed.
Integration of Known Transcription Factor Binding Site Information and Gene Expression Data to Advance from Co-Expression to Co-Regulation
Maarten Clements, Eugene P. van Someren, Theo A. Knijnenburg, Marcel J.T. Reinders
The common approach to find co-regulated genes is to cluster genes based on gene expression. However, due to the limited information present in any dataset, genes in the same cluster might be co-expressed but not necessarily co-regulated. In this paper, we propose to integrate known transcription factor binding site information and gene expression data into a single clustering scheme. This scheme will find clusters of co-regulated genes that are not only expressed similarly under the measured conditions, but also share a regulatory structure that may explain their common regulation. We demonstrate the utility of this approach on a microarray dataset of yeast grown under different nutrient and oxygen limitations. Our integrated clustering method not only unravels many regulatory modules that are consistent with current biological knowledge, but also provides a more profound understanding of the underlying process. The added value of our approach, compared with the clustering solely based on gene expression, is its ability to uncover clusters of genes that are involved in more specific biological processes and are evidently regulated by a set of transcription factors.
Adaptive Evolution of cry Genes in Bacillus thuringiensis: Implications for Their Specificity Determination
Jin-Yu Wu, Fang-Qing Zhao, Jie Bai, Gang Deng, Song Qin, Qi-Yu Bao
The cry gene family, produced during the late exponential phase of growth in Bacillus thuringiensis, is a large, still-growing family of homologous genes, in which each gene encodes a protein with strong specific activity against only one or a few insect species. Extensive studies are mostly focusing on the structural and functional relationships of Cry proteins, and have revealed several residues or domains that are important for the target recognition and receptor attachment. In this study, we have employed a maximum likelihood method to detect evidence of adaptive evolution in Cry proteins, and have identified 24 positively selected residues, which are all located in Domain II or III. Combined with known data from mutagenesis studies, the majority of these residues, at the molecular level, contribute much to the insect specificity determination. We postulate that the potential pressures driving the diversification of Cry proteins may be in an attempt to adapt for the “arm race” between δ-endotoxins and the targeted insects, or to enlarge their target spectra, hence result in the functional divergence. The sites identified to be under positive selection would provide targets for further structural and functional analyses on Cry proteins.
Dynamic Proteome Changes of Shigella flexneri 2a During Transition from Exponential Growth to Stationary Phase
Li Zhu, Xian-Kai Liu, Ge Zhao, Yi-Dan Zhi, Xin Bu, Tian-Yi Ying, Er-Ling Feng, Jie Wang, Xue-Min Zhang, Pei-Tang Huang,Heng-Liang Wang
Shigella flexneri is an infectious pathogen that causes dysentery to human, which remains a serious threat to public health, particularly in developing countries. In this study, the global protein expression patterns of S. flexneri during transition from exponential growth to stationary phase in vitro were analyzed by using 2-D PAGE combined with MALDI-TOF MS. In a time-course experiment with five time points, the relative abundance of 49 protein spots varied significantly. Interestingly, a putative outer membrane protein YciD (OmpW) was almost not detected in the exponential growth phase but became one of the most abundant proteins in the whole stationary-phase proteome. Some proteins regulated by the global regulator FNR were also significantly induced (such as AnsB, AspA, FrdAB, and KatG) or repressed (such as AceEF, OmpX, SodA, and SucAB) during the growth phase transition. These proteins may be the key effectors of the bacterial cell cycle or play important roles in the cellular maintenance and stress responses. Our expression profile data provide valuable information for the study of bacterial physiology and form the basis for future proteomic analyses of this pathogen.
FragAnchor: A Large-Scale Predictor of Glycosylphosphatidylinositol Anchors in Eukaryote Protein Sequences by Qualitative Scoring
Guylaine Poisson, Cedric Chauve, Xin Chen, Anne Bergeron
A glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at http://navet.ics.hawaii.edu/∼fraganchor/NNHMM/NNHMM.html.
Comparative Analysis of Regulatory Motif Discovery Tools for Transcription Factor Binding Sites
Wei Wei, Xiao-Dan Yu
In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, these existing methods, based on different computational algorithms, show diverse motif prediction efficiency in non-coding DNA sequences. Therefore, understanding the similarities and differences of computational algorithms and enriching the motif discovery literatures are important for users to choose the most appropriate one among the online available tools. Moreover, there still lacks credible criterion to assess motif discovery tools and instructions for researchers to choose the best according to their own projects. Thus integration of the related resources might be a good approach to improve accuracy of the application. Recent studies integrate regulatory motif discovery tools with experimental methods to offer a complementary approach for researchers, and also provide a much-needed model for current researches on transcriptional regulatory networks. Here we present a comparative analysis of regulatory motif discovery tools for TFBSs.