Article Online

Articles Online (Volume 7, Issue 1)

Review Article

In-depth cDNA Library Sequencing Provides Quantitative Gene Expression Profiling in Cancer Biomarker Discovery

Wanling Yang, Dingge Ying, Yu-Lung Lau

Quantitative gene expression analysis plays an important role in identifying differentially expressed genes in various pathological states, gene expression regulation and co-regulation, shedding light on gene functions. Although microarray is widely used as a powerful tool in this regard, it is suboptimal quantitatively and unable to detect unknown gene variants. Here we demonstrated effective detection of differential expression and co-regulation of certain genes by expressed sequence tag analysis using a selected subset of cDNA libraries. We discussed the issues of sequencing depth and library preparation, and propose that increased sequencing depth and improved preparation procedures may allow detection of many expression features for less abundant gene variants. With the reduction of sequencing cost and the emerging of new generation sequencing technology, in-depth sequencing of cDNA pools or libraries may represent a better and powerful tool in gene expression profiling and cancer biomarker detection. We also propose using sequence-specific subtraction to remove hundreds of the most abundant housekeeping genes to increase sequencing depth without affecting relative expression ratio of other genes, as transcripts from as few as 300 most abundantly expressed genes constitute about 20% of the total transcriptome. In-depth sequencing also represents a unique advantage of detecting unknown forms of transcripts, such as alternative splicing variants, fusion genes, and regulatory RNAs, as well as detecting mutations and polymorphisms that may play important roles in disease pathogenesis.

Page 1–12

Review Article

DNA Copy Number Aberrations in Breast Cancer by Array Comparative Genomic Hybridization

Jian Li, Kai Wang, Shengting Li, Vera Timmermans-Wielenga, Fritz Rank, Carsten Wiuf, Xiuqing Zhang, Huanming Yang, Lars Bolund

Array comparative genomic hybridization (CGH) has been popularly used for analyzing DNA copy number variations in diseases like cancer. In this study, we investigated 82 sporadic samples from 49 breast cancer patients using 1-Mb resolution bacterial artificial chromosome CGH arrays. A number of highly frequent genomic aberrations were discovered, which may act as “drivers” of tumor progression. Meanwhile, the genomic profiles of four “normal” breast tissue samples taken at least 2 cm away from the primary tumor sites were also found to have some genomic aberrations that recurred with high frequency in the primary tumors, which may have important implications for clinical therapy. Additionally, we performed class comparison and class prediction for various clinicopathological parameters, and a list of characteristic genomic aberrations associated with different clinicopathological phenotypes was compiled. Our study provides clues for further investigations of the underlying mechanisms of breast carcinogenesis.

Page 13–24

Review Article

Comparative Analysis of Protein-Protein Interactions in Cancer-Associated Genes

Purnima Guda, Sridar V. Chittur, Chittibabu Guda

Protein-protein interactions (PPIs) have been widely studied to understand the biological processes or molecular functions associated with different disease systems like cancer. While focused studies on individual cancers have generated valuable information, global and comparative analysis of datasets from different cancer types has not been done. In this work, we carried out bioinformatic analysis of PPIs corresponding to differentially expressed genes from microarrays of various tumor tissues (belonging to bladder, colon, kidney and thyroid cancers) and compared their associated biological processes and molecular functions (based on Gene Ontology terms). We identified a set of processes or functions that are common to all these cancers, as well as those that are specific to only one or partial cancer types. Similarly, protein interaction networks in nucleic acid metabolism were compared to identify the common/specific clusters of proteins across different cancer types. Our results provide a basis for further experimental investigations to study protein interaction networks associated with cancer. The methodology developed in this work can also be applied to study similar disease systems.

Page 25–36

Review Article

Bioinformatic Comparison of Bacterial Secretomes

Catharine Song, Aseem Kumar, Mazen Saleh

The rapid increasing number of completed bacterial genomes provides a good opportunity to compare their proteomes. This study was undertaken to specifically compare and contrast their secretomes—the fraction of the proteome with predicted N-terminal signal sequences, both type I and type II. A total of 176 theoretical bacterial proteomes were examined using the ExProt program. Compared with the Gram-positives, the Gram-negative bacteria were found, on average, to contain a larger number of potential Sec-dependent sequences. In the Gram-negative bacteria but not in the others, there was a positive correlation between proteome size and secretome size, while there was no correlation between secretome size and pathogenicity. Within the Gram-negative bacteria, intracellular pathogens were found to have the smallest secretomes. However, the secretomes of certain bacteria did not fit into the observed pattern. Specifically, the secretome of Borrelia burgdoferi has an unusually large number of putative lipoproteins, and the signal peptides of mycoplasmas show closer sequence similarity to those of the Gram-negative bacteria. Our analysis also suggests that even for a theoretical minimal genome of 300 open reading frames, a fraction of this gene pool (up to a maximum of 20%) may code for proteins with Sec-dependent signal sequences.

Page 37–46

Review Article

Role of Positive Selection Pressure on the Evolution of H5N1 Hemagglutinin

Venkata R.S.K. Duvvuri, Bhargavi Duvvuri, Wilfred R. Cuff, Gillian E. Wu, Jianhong Wu

The surface glycoprotein hemagglutinin (HA) helps the influenza A virus to evade the host immune system by antigenic variation and is a major driving force for viral evolution. In this study, the selection pressure on HA of H5N1 influenza A virus was analyzed using bioinformatics algorithms. Most of the identified positive selection (PS) sites were found to be within or adjacent to epitope sites. Some of the identified PS sites are consistent with previous experimental studies, providing further support to the biological significance of our findings. The highest frequency of PS sites was observed in recent strains isolated during 2005–2007. Phylogenetic analysis was also conducted on HA sequences from various hosts. Viral drift is almost similar in both avian and human species with a progressive trend over the years. Our study reports new mutations in functional regions of HA that might provide markers for vaccine design or can be used to predict isolates of pandemic potential.

Page 47–56

Brief Report

Phylogenetic Analysis of the Neuraminidase Gene Reveals that the H5N1 Strains Prevalent in Chickens During 2006 Bird Flu Outbreaks in Two Regions of Maharashtra, India Are Genetically Different

Mohd Danishuddin, Asad U. Khan

In February 2006, two outbreaks of highly pathogenic avian influenza A virus subtype H5N1 occurred in chickens in two neighboring districts (first in Nandurbar and second in Jalgaon) of Maharashtra, India, in a span of 12 days. In the present study, the neuraminidase (NA) gene of the two Indian H5N1 isolates was taken into consideration to find if the two strains are genetically similar. Phylogenetic analysis of the NA gene showed that the H5N1 strains isolated from the two outbreaks were not originated from the same source. The first Indian isolate (Nandubar/7972/06) was clustered closest to an isolate from chicken in Vietnam in 2004, whereas the second Indian isolate (Jalgaon/8824/06) showed resemblance to strains isolated from swan in Italy and Iran in 2006. Moreover, amino acid sequence analysis showed varying hot spots for substitutions between these two Indian isolates, and three substitutions were found at functional domain sites. Secondary structure changes due to these substitutions were also reported. This study reveals that the H5N1 strains isolated from chickens during 2006 bird flu outbreaks in two neighboring districts of Maharashtra, India are genetically different.

Page 57–61


A Method for Identification of Selenoprotein Genes in Archaeal Genomes

Mingfeng Li, Yanzhao Huang, Yi Xiao

The genetic codon UGA has a dual function: serving as a terminator and encoding selenocysteine. However, most popular gene annotation programs only take it as a stop signal, resulting in misannotation or completely missing selenoprotein genes. We developed a computational method named Asec-Prediction that is specific for the prediction of archaeal selenoprotein genes. To evaluate its effectiveness, we first applied it to 14 archaeal genomes with previously known selenoprotein genes, and Asec-Prediction identified all reported selenoprotein genes without redundant results. When we applied it to 12 archaeal genomes that had not been researched for selenoprotein genes, Asec-Prediction detected a novel selenoprotein gene in Methanosarcina acetivorans. Further evidence was also collected to support that the predicted gene should be a real selenoprotein gene. The result shows that Asec-Prediction is effective for the prediction of archaeal selenoprotein genes.

Page 62–70

Application Note

GALT Protein Database, a Bioinformatics Resource for the Management and Analysis of Structural Features of a Galactosemia-related Protein and Its Mutants

Antonio d'Acierno, Angelo Facchiano, Anna Marabotti

We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at

Page 71–76