Articles Online (Volume 13, Issue 3)

Review Article

Long Non-coding RNAs and Their Biological Roles in Plants

Xue Liu, Lili Hao, Dayong Li, Lihuang Zhu, Songnian Hu

With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants.
随着基因组学和生物信息学的发展,特别是高通量测序技术的应用,越来越多的没有或者很少有蛋白编码功能的转录单位被发现,这样的RNA分子被称为非蛋白编码RNA nonprotein-coding RNAs (npcRNAs or ncRNAs)。在这些转录单位中,长非蛋白编码RNA(lnpcRNAs or lncRNAs)是指其转录本长度大于200nt。近几年来,长非蛋白编码RNA的研究已经越来越受到人们的关注,他们被认为在许多重要的生物学功能中起到调控作用。在植物中,虽然大量的长链非编码RNA被预测并鉴定出来,但是他们的生物学功能我们还知之甚少。此文总结了近年来在植物中关于长链非编码RNA的鉴定,以及他们的生物学特性、分类、生物信息和功能等方面的研究。

Page 137-147

Review Article

Metagenomic Surveys of Gut Microbiota

Rahul Shubhra Mandal, Sudipto Saha, Santasabuj Das

Gut microbiota of higher vertebrates is host-specific. The number and diversity of the organisms residing within the gut ecosystem are defined by physiological and environmental factors, such as host genotype, habitat, and diet. Recently, culture-independent sequencing techniques have added a new dimension to the study of gut microbiota and the challenge to analyze the large volume of sequencing data is increasingly addressed by the development of novel computational tools and methods. Interestingly, gut microbiota maintains a constant relative abundance at operational taxonomic unit (OTU) levels and altered bacterial abundance has been associated with complex diseases such as symptomatic atherosclerosis, type 2 diabetes, obesity, and colorectal cancer. Therefore, the study of gut microbial population has emerged as an important field of research in order to ultimately achieve better health. In addition, there is a spontaneous, non-linear, and dynamic interaction among different bacterial species residing in the gut. Thus, predicting the influence of perturbed microbe–microbe interaction network on health can aid in developing novel therapeutics. Here, we summarize the population abundance of gut microbiota and its variation in different clinical states, computational tools available to analyze the pyrosequencing data, and gut microbe–microbe interaction networks.

Page 148-158

Original Research

Regulatory MicroRNA Networks: Complex Patterns of Target Pathways for Disease-related and Housekeeping MicroRNAs

Sachli Zafari, Christina Backes, Petra Leidinger, Eckart Meese, Andreas Keller

Blood-based microRNA (miRNA) signatures as biomarkers have been reported for various pathologies, including cancer, neurological disorders, cardiovascular diseases, and also infections. The regulatory mechanism behind respective miRNA patterns is only partially understood. Moreover, “preserved” miRNAs, i.e., miRNAs that are not dysregulated in any disease, and their biological impact have been explored to a very limited extent. We set out to systematically determine their role in regulatory networks by defining groups of highly-dysregulated miRNAs that contribute to a disease signature as opposed to preserved housekeeping miRNAs. We further determined preferential targets and pathways of both dysregulated and preserved miRNAs by computing multi-layer networks, which were compared between housekeeping and dysregulated miRNAs. Of 848 miRNAs examined across 1049 blood samples, 8 potential housekeepers showed very limited expression variations, while 20 miRNAs showed highly-dysregulated expression throughout the investigated blood samples. Our approach provides important insights into miRNAs and their role in regulatory networks. The methodology can be applied to systematically investigate the differences in target genes and pathways of arbitrary miRNA sets.

Page 159-168

Original Research

Competing Risks Data Analysis with High-dimensional Covariates: An Application in Bladder Cancer

Leili Tapak, Massoud Saidijam, Majid Sadeghifar, Jalal Poorolajal, Hossein Mahjub

Analysis of microarray data is associated with the methodological problems of high dimension and small sample size. Various methods have been used for variable selection in high-dimension and small sample size cases with a single survival endpoint. However, little effort has been directed toward addressing competing risks where there is more than one failure risks. This study compared three typical variable selection techniques including Lasso, elastic net, and likelihood-based boosting for high-dimensional time-to-event data with competing risks. The performance of these methods was evaluated via a simulation study by analyzing a real dataset related to bladder cancer patients using time-dependent receiver operator characteristic (ROC) curve and bootstrap .632+ prediction error curves. The elastic net penalization method was shown to outperform Lasso and boosting. Based on the elastic net, 33 genes out of 1381 genes related to bladder cancer were selected. By fitting to the Fine and Gray model, eight genes were highly significant (P < 0.001). Among them, expression of RTN4, SON, IGF1R, SNRPE, PTGR1, PLEK, and ETFDH was associated with a decrease in survival time, whereas SMARCAD1 expression was associated with an increase in survival time. This study indicates that the elastic net has a higher capacity than the Lasso and boosting for the prediction of survival time in bladder cancer patients. Moreover, genes selected by all methods improved the predictive power of the model based on only clinical variables, indicating the value of information contained in the microarray features.

Page 169-176


Correlating Bladder Cancer Risk Genes with Their Targeting MicroRNAs Using MMiRNA-Tar

Yang Liu, Steve Baker, Hui Jiang, Gary Stuart, Yongsheng Bai

The Cancer Genome Atlas (TCGA) ( is a valuable data resource focused on an increasing number of well-characterized cancer genomes. In part, TCGA provides detailed information about cancer-dependent gene expression changes, including changes in the expression of transcription-regulating microRNAs. We developed a web interface tool MMiRNA-Tar ( that can calculate and plot the correlation of expression for mRNA−microRNA pairs across samples or over a time course for a list of pairs under different prediction confidence cutoff criteria. Prediction confidence was established by requiring that the proposed mRNA−microRNA pair appears in at least one of three target prediction databases: TargetProfiler, TargetScan, or miRanda. We have tested our MMiRNA-Tar tool through analyzing 53 tumor and 11 normal samples of bladder urothelial carcinoma (BLCA) datasets obtained from TCGA and identified 204 microRNAs. These microRNAs were correlated with the mRNAs of five previously-reported bladder cancer risk genes and these selected pairs exhibited correlations in opposite direction between the tumor and normal samples based on the customized cutoff criterion of prediction. Furthermore, we have identified additional 496 genes (830 pairs) potentially targeted by 79 significant microRNAs out of 204 using three cutoff criteria, i.e., false discovery rate (FDR) < 0.1, opposite correlation coefficient between the tumor and normal samples, and predicted by at least one of three target prediction databases. Therefore, MMiRNA-Tar provides researchers a convenient tool to visualize the co-relationship between microRNAs and mRNAs and to predict their targeting relationship. We believe that correlating expression profiles for microRNAs and mRNAs offers a complementary approach for elucidating their interactions.
癌症基因组图谱(TCGA)(的重点是增加储存越来越多的癌症基因组宝贵的数据资源。在某种程度上,TCGA提供关于癌症基因的表达变化规律的详细信息,包括改变转录调节的microRNA的表达。我们开发了一个网络界面的工具MMiRNA-Tar(,根据不同的预测置信标准, 可以计算和绘制对在一个时间过程或对在多个样本的microRNA-mRNA的表达相关性。在建立预测置信mRNA-microRNA,工具要求配对必须出现在三个目标预测数据库中的至少一个:TargetProfiler,TargetScan,或miRanda。通过分析 TCGA数据集53肿瘤和正常11个样品的膀胱尿路上皮癌(BLCA)数据,我们测试了我们的MMiRNA-Tar并确定了204 microRNAs。根据预测的定制截断标准,这些microRNAs均与五个先前报道的膀胱癌风险基因在肿瘤与正常样本之间展出相反方向的相关性。此外,我们由79 microRNAs 靶向显著选出 496个基因(830对)使用三个标准:假发现率(FDR)<0.1,肿瘤和正常样品之间的相对相关系数呈负相关,而且通过预测的至少有一个的三个目的预测数据库。因此, MMiRNA-Tar为研究人员提供了一个方便的工具来显示microRNA和mRNA表达之间的关系,并预测他们之间的靶向关系。我们认为mRNA和microRNA之间的相关表达轮廓是可帮助阐明他们之间的相互作用的互补性办法。

Page 177-182


Bicoid Signal Extraction with a Selection of Parametric and Nonparametric Signal Processing Techniques

Zara Ghodsi, Emmanuel Sirimal Silva, Hossein Hassani

The maternal segmentation coordinate gene bicoid plays a significant role during Drosophila embryogenesis. The gradient of Bicoid, the protein encoded by this gene, determines most aspects of head and thorax development. This paper seeks to explore the applicability of a variety of signal processing techniques at extracting bicoid expression signal, and whether these methods can outperform the current model. We evaluate the use of six different powerful and widely-used models representing both parametric and nonparametric signal processing techniques to determine the most efficient method for signal extraction in bicoid. The results are evaluated using both real and simulated data. Our findings show that the Singular Spectrum Analysis technique proposed in this paper outperforms the synthesis diffusion degradation model for filtering the noisy protein profile of bicoid whilst the exponential smoothing technique was found to be the next best alternative followed by the autoregressive integrated moving average.

Page 183-191

Application Note

BioCluster: Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data

Ahmed Abdullah, S.M. Sabbir Alam, Munawar Sultana, M. Anwar Hossain

Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at
জৈবরাসায়নিক বৈশিষ্ট্যের ভিত্তিতে এন্টারোব্যাক্টেরিয়া পরিবারের বিভিন্ন প্রজাতির ব্যক্টেরিয়ার আনুমানিকভাবে সনাক্তকরণ নিয়মিতই করা হয়ে থাকে। প্রথাগতভাবে অজানা নমুনার প্রতিটি জৈবরাসায়নিক বৈশিষ্ট্য পূর্বপরিচিত নমুনার বৈশিষ্ট্যের সাথে হাতে-কলমে তুলনা করা হয় এবং তার পরিচয় অনুমান করা হয় পরিচিত নমুনার সাথে সর্বোচ্চ-সাদৃশ্যের ভিত্তিতে। হাতে-কলমের এই পদ্ধতি শ্রমসাধ্য, সময়ব্যায়ী, ভুল-প্রবণ, এবং ব্যক্তিকেন্দ্রীক। সুতরাং, সিদ্ধান্তগ্রহণের বাছাই প্রক্রিয়া ও সাদৃশ্যগণনার পদ্ধতি স্বয়ংস্ক্রিয় করা গেলে তা অধিকতর কার্যকর হবে। এখানে আমরা বায়োক্লাস্টার নামে ম্যাটল্যাব-ভিত্তিক গ্রাফিকাল-ইউজার-ইন্টারফেস (GUI) সফটওয়্যার উপস্থাপন করছি। এই সফটওয়্যার তৈরি করা হয়েছে জৈবরাসায়নিক তথ্যের ভিত্তিতে এন্টারোব্যাক্টেরিয়া পরিবারের ব্যক্টেরিয়ার স্বয়ংস্ক্রিয় গুচ্ছকরণ ও সনাক্তকরণের জন্য। এই সফটওয়্যারে আমরা দুই ধরনের অ্যালগরিদম ব্যবহার করেছি, যেমন গতানুগতিক পর্যায়ক্রমিক গুচ্ছকরণ (হায়ার্কিকাল ক্লাস্টারিং বা এইচ.সি.) এবং উন্নত পর্যায়ক্রমিক গুচ্ছকরণ (ইমপ্রুভড হায়ার্কিকাল ক্লাস্টারিং বা আই.এইচ.সি), একটি পরিবর্তিত এলগরিদম যা তৈরিই করা হয়েছে সুনির্দিষ্টভাবে এন্টারোব্যক্টেরিয়া-পরিবারের ব্যাক্টেরিয়া সনাক্তকরণ ও গুচ্ছকরণের জন্য। আই.এইচ.সি এন্টারোব্যক্টেরিয়াসি পরিবারের সাথে ১ থেকে ৪৭টি জৈবরাসায়নিক বৈশিষ্ট্যভিন্নতা হিসাবে নিয়ে কাজ করে। একই সাথে সফ্টওয়্যারটি গুচ্ছকরণ নিখুঁত করার জন্য ব্যবহারকারী-বান্ধব উপায়ে বিভিন্ন বিকল্প দিয়ে থাকে। কম্পিউটার দ্বারা উৎপন্ন কৃত্রিম এবং বাস্তব উপাত্ত ব্যবহার করে আমরা দেখিয়েছি যে জৈবরাসায়নিক তথ্যের ভিত্তিতে এন্টারোব্যক্টেরিয়াসি পরিবারের ব্যক্টেরিয়া সনাক্ত ও গুচ্ছকরণের জন্য বায়োক্লাস্টারের উচ্চ-সঠিকতা রয়েছে। এই সফটওয়্যারটি বিনামূল্যে ডাউনলোড করা যাবে এই ওয়েবসাইট থেকে:।

Page 192-199


Corrigendum to ‘Nanopore-based Fourth-generation DNA Sequencing Technology’ [GPB 144 (2015) – GPB 13/1 (4–16)]

Yanxiao Feng, Yuechuan Zhang, Cuifeng Ying, Deqiang Wang, Chunlei Du

Page 200-201


Corrigendum to ‘Databases and Web Tools for Cancer Genomics Study’ [GPB 137 (2015) – GPB 13/1 (46–50)]

Yadong Yang, Xunong Dong, Bingbing Xie, Nan Ding, Juan Chen, Yongjun Li, Qian Zhang, Hongzhu Qu, Xiangdong Fang

Page 202-203