Articles Online (Volume 10, Issue 2)


Review of General Algorithmic Features for Genome Assemblers for Next Generation Sequencers

Bilal Wajid, Erchin Serpedin

In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in the literature during the past decade by outlining their basic working principles. It is intended to act as a qualitative, not a quantitative, tutorial to all working on genome assemblers pertaining to the next generation of sequencers. We discuss the theoretical aspects of various genome assemblers, identifying their working schemes. We also discuss briefly the direction in which the area is headed towards along with discussing core issues on software simplicity.

Page 58–73

Original Research

The Association Between H3K4me3 and Antisense Transcription

Peng Cui,Wanfei Liu,Yuhui Zhao,Qiang Lin,Feng Ding,Chengqi Xin,Jianing Geng,Shuhui Song,Fanglin Sun, Songnian Hu,Jun Yu

Histone H3 lysine 4 trimethylation (H3K4me3) is well known to occur in the promoter region of genes for transcription activation. However, when investigating the H3K4me3 profiles in the mouse cerebrum and testis, we discovered that H3K4me3 also has a significant enrichment at the 3′ end of actively transcribed (sense) genes, named as 3′-H3K4me3. 3′-H3K4me3 is associated with ∼15% of protein-coding genes in both tissues. In addition, we examined the transcriptional initiation signals including RNA polymerase II (RNAPII) binding sites and 5′-CAGE-tag that marks transcriptional start sites. Interestingly, we found that 3′-H3K4me3 is associated with the initiation of antisense transcription. Furthermore, 3′-H3K4me3 modification levels correlate positively with the antisense expression levels of the associated sense genes, implying that 3′-H3K4me3 is involved in the activation of antisense transcription. Taken together, our findings suggest that H3K4me3 may be involved in the regulation of antisense transcription that initiates from the 3′ end of sense genes. In addition, a positive correlation was also observed between the expression of antisense and the associated sense genes with 3′-H3K4me3 modification. More importantly, we observed the 3′-H3K4me3 enrichment among genes in human, fruitfly and Arabidopsis, and found that the sequences of 3′-H3K4me3-marked regions are highly conserved and essentially indistinguishable from known promoters in vertebrate. Therefore, we speculate that these 3′-H3K4me3-marked regions may serve as potential promoters for antisense transcription and 3′-H3K4me3 appear to be a universal epigenetic feature in eukaryotes. Our results provide a novel insight into the epigenetic roles of H3K4me3 and the regulatory mechanism of antisense transcription.

Page 74–81

Original Research

Comparative Analyses of H3K4 and H3K27 Trimethylations Between the Mouse Cerebrum and Testis

Peng Cui,Wanfei Liu, Yuhui Zhao, Qiang Lin,Daoyong Zhang, Feng Ding, Chengqi Xin, Zhang Zhang, Shuhui Song, Fanglin Sun, Jun Yu,Songnian Hu

The global features of H3K4 and H3K27 trimethylations (H3K4me3 and H3K27me3) have been well studied in recent years, but most of these studies were performed in mammalian cell lines. In this work, we generated the genome-wide maps of H3K4me3 and H3K27me3 of mouse cerebrum and testis using ChIP-seq and their high-coverage transcriptomes using ribominus RNA-seq with SOLiD technology. We examined the global patterns of H3K4me3 and H3K27me3 in both tissues and found that modifications are closely-associated with tissue-specific expression, function and development. Moreover, we revealed that H3K4me3 and H3K27me3 rarely occur in silent genes, which contradicts the findings in previous studies. Finally, we observed that bivalent domains, with both H3K4me3 and H3K27me3, existed ubiquitously in both tissues and demonstrated an invariable preference for the regulation of developmentally-related genes. However, the bivalent domains tend towards a “winner-takes-all” approach to regulate the expression of associated genes. We also verified the above results in mouse ES cells. As expected, the results in ES cells are consistent with those in cerebrum and testis. In conclusion, we present two very important findings. One is that H3K4me3 and H3K27me3 rarely occur in silent genes. The other is that bivalent domains may adopt a “winner-takes-all” principle to regulate gene expression.

Page 82–93

Original Research

Transcriptome Comparison of Susceptible and Resistant Wheat in Response to Powdery Mildew Infection

Mildew Infection Mingming Xin, Xiangfeng Wang, Huiru Peng, Yingyin Yao, Chaojie Xie,Yao Han,Zhongfu Ni,Qixin Sun

Powdery mildew (Pm) caused by the infection of Blumeria graminis f. sp. tritici (Bgt) is a worldwide crop disease resulting in significant loss of wheat yield. To profile the genes and pathways responding to the Bgt infection, here, using Affymetrix wheat microarrays, we compared the leaf transcriptomes before and after Bgt inoculation in two wheat genotypes, a Pm-susceptible cultivar Jingdong 8 (S) and its near-isogenic line (R) carrying a single Pm resistant gene Pm30. Our analysis showed that the original gene expression status in the S and R genotypes of wheat was almost identical before Bgt inoculation, since only 60 genes exhibited differential expression by P = 0.01 cutoff. However, 12 h after Bgt inoculation, 3014 and 2800 genes in the S and R genotype, respectively, responded to infection. A wide range of pathways were involved, including cell wall fortification, flavonoid biosynthesis and metabolic processes. Furthermore, for the first time, we show that sense-antisense pair genes might be participants in wheat-powdery mildew interaction. In addition, the results of qRT-PCR analysis on several candidate genes were consistent with the microarray data in their expression patterns. In summary, this study reveals leaf transcriptome changes before and after powdery mildew infection in wheat near-isogenic lines, suggesting that powdery mildew resistance is a highly complex systematic response involving a large amount of gene regulation.

Page 94–106

Original Research

Tissue-specific Temporal Exome Capture Revealed Muscle-specific Genes and SNPs in Indian Buffalo (Bubalus bubalis)

Subhash J. Jakhesara, Viral B. Ahira, Ketan B. Padiya, Prakash G. Koringa, Dharamshibhai N. Rank, Chaitanya G. Joshi

Whole genome sequencing of buffalo is yet to be completed, and in the near future it may not be possible to identify an exome (coding region of genome) through bioinformatics for designing probes to capture it. In the present study, we employed in solution hybridization to sequence tissue specific temporal exomes (TST exome) in buffalo. We utilized cDNA prepared from buffalo muscle tissue as a probe to capture TST exomes from the buffalo genome. This resulted in a prominent reduction of repeat sequences (up to 40%) and an enrichment of coding sequences (up to 60%). Enriched targets were sequenced on a 454 pyro-sequencing platform, generating 101,244 reads containing 24,127,779 high quality bases. The data revealed 40,100 variations, of which 403 were indels and 39,218 SNPs containing 195 nonsynonymous candidate SNPs in protein-coding regions. The study has indicated that 80% of the total genes identified from capture data were expressed in muscle tissue. The present study is the first of its kind to sequence TST exomes captured by use of cDNA molecules for SNPs found in the coding region without any prior sequence information of targeted molecules.

Page 107–113

Original Research

Searching for Non-coding RNAs in Genomic Sequences Using ncRNAscout

Michael Bao, Miguel Cervantes Cervantes, Ling Zhong,Jason T.L. Wang

Recently non-coding RNA (ncRNA) genes have been found to serve many important functions in the cell such as regulation of gene expression at the transcriptional level. Potentially there are more ncRNA molecules yet to be found and their possible functions are to be revealed. The discovery of ncRNAs is a difficult task because they lack sequence indicators such as the start and stop codons displayed by protein-coding RNAs. Current methods utilize either sequence motifs or structural parameters to detect novel ncRNAs within genomes. Here, we present an ab initio ncRNA finder, named ncRNAscout, by utilizing both sequence motifs and structural parameters. Specifically, our method has three components: (i) a measure of the frequency of a sequence, (ii) a measure of the structural stability of a sequence contained in a t-score, and (iii) a measure of the frequency of certain patterns within a sequence that may indicate the presence of ncRNA. Experimental results show that, given a genome and a set of known ncRNAs, our method is able to accurately identify and locate a significant number of ncRNA sequences in the genome. The ncRNAscout tool is available for downloading at

Page 114–121