Articles Online (Volume 10, Issue 6)

Review Article

The Curation of Genetic Variants: Difficulties and Possible Solutions

Kapil Raj Pandey , Narendra Maden , Barsha Poudel , Sailendra Pradhananga ,Amit Kumar Sharma

The curation of genetic variants from biomedical articles is required for various clinical and research purposes. Nowadays, establishment of variant databases that include overall information about variants is becoming quite popular. These databases have immense utility, serving as a user-friendly information storehouse of variants for information seekers. While manual curation is the gold standard method for curation of variants, it can turn out to be time-consuming on a large scale thus necessitating the need for automation. Curation of variants described in biomedical literature may not be straightforward mainly due to various nomenclature and expression issues. Though current trends in paper writing on variants is inclined to the standard nomenclature such that variants can easily be retrieved, we have a massive store of variants in the literature that are present as non-standard names and the online search engines that are predominantly used may not be capable of finding them. For effective curation of variants, knowledge about the overall process of curation, nature and types of difficulties in curation, and ways to tackle the difficulties during the task are crucial. Only by effective curation, can variants be correctly interpreted. This paper presents the process and difficulties of curation of genetic variants with possible solutions and suggestions from our work experience in the field including literature support. The paper also highlights aspects of interpretation of genetic variants and the importance of writing papers on variants following standard and retrievable methods.

Page 317–325

Original Research

An RNA-seq-based Gene Expression Profiling of Radiation-induced Tumorigenic Mammary Epithelial Cells

Lina Ma, Linghu Nie, Jing Liu, Bing Zhang, Shuhui Song, Min Sun, Jin Yang, Yadong Yang, Xiangdong Fang, Songnian Hu, Yongliang Zhao , Jun Yu

Immortality and tumorigenicity are two distinct characteristics of cancers. Immortalization has been suggested to precede tumorigenesis. To understand the molecular mechanisms of tumorigenicity and cancer progression in mammary epithelium, we established a tumorigenic cell model by means of heavy-ion radiation of an immortal cell model, which was created by overexpressing the human telomerase reverse transcriptase (hTERT) in normal human mammary epithelial cells. We examined the expression profile of this tumorigenic cell line (T_hMEC) using the hTERT-overexpressing immortal cell line (I_hMEC) as a control. In-depth RNA-seq data was generated by using the next-generation sequencing (NGS) platform (Life Technologies SOLiD3). We found that house-keeping (HK) and tissue-specific (TS) genes were differentially regulated during the tumorigenic process. HK genes tended to be activated while TS genes tended to be repressed. In addition, the HK genes and TS genes tended to contribute differentially to the variation of gene expression at different RPKM (gene expression in reads per exon kilobase per million mapped sequence reads) levels. Based on transcriptome analysis of the two cell lines, we defined 7053 differentially-expressed genes (DEGs) between immortality and tumorigenicity. Differential expression of 20 manually-selected genes was further validated using qRT-PCR. Our observations may help to further our understanding of cellular mechanism(s) in the transition from immortalization to tumorigenesis.

Page 326–335

Original Research

A Novel Gaussian Extrapolation Approach for 2D Gel Electrophoresis Saturated Protein Spots

Massimo Natale, Alfonso Caiazzo, Enrico M. Bucci, Elisa Ficarra

Analysis of images obtained from two-dimensional gel electrophoresis (2D-GE) is a topic of utmost importance in bioinformatics research, since commercial and academic software available currently has proven to be neither completely effective nor fully automatic, often requiring manual revision and refinement of computer generated matches. In this work, we present an effective technique for the detection and the reconstruction of over-saturated protein spots. Firstly, the algorithm reveals overexposed areas, where spots may be truncated, and plateau regions caused by smeared and overlapping spots. Next, it reconstructs the correct distribution of pixel values in these overexposed areas and plateau regions, using a two-dimensional least-squares fitting based on a generalized Gaussian distribution. Pixel correction in saturated and smeared spots allows more accurate quantification, providing more reliable image analysis results. The method is validated for processing highly exposed 2D-GE images, comparing reconstructed spots with the corresponding non-saturated image, demonstrating that the algorithm enables correct spot quantification.

Page 336–344

Original Research

Comparative Genomics of the Lipid-body-membrane Proteins Oleosin, Caleosin and Steroleosin in Magnoliophyte, Lycophyte and Bryophyte

Pavan Umate

Lipid bodies store oils in the form of triacylglycerols. Oleosin, caleosin and steroleosin are unique proteins localized on the surface of lipid bodies in seed plants. This study has identified genes encoding lipid body proteins oleosin, caleosin and steroleosin in the genomes of five plants: Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Selaginella moellendorffii and Physcomitrella patens. The protein sequence alignment indicated that each oleosin protein contains a highly-conserved proline knot motif, and proline knob motif is well conserved in steroleosin proteins, while caleosin proteins possess the Dx[D/N]xDG-containing calcium-binding motifs. The identification of motifs (proline knot and knob) and conserved amino acids at active site was further supported by the sequence logos. The phylogenetic analysis revealed the presence of magnoliophyte- and bryophyte-specific subgroups. We analyzed the public microarray data for expression of oleosin, caleosin and steroleosin in Arabidopsis and rice during the vegetative and reproductive stages, or under abiotic stresses. Our results indicated that genes encoding oleosin, caleosin and steroleosin proteins were expressed predominantly in plant seeds. This work may facilitate better understanding of the members of lipid-body-membrane proteins in diverse organisms and their gene expression in model plants Arabidopsis and rice.

Page 345–353


TrFAST: A Tool to Predict Signaling Pathway-specific Transcription Factor Binding Sites

Umair Seemab , Qurrat ul Ain , Muhammad Sulaman Nawaz , Zafar Saeed,Sajid Rashid

Recent advances in the development of high-throughput tools have significantly revolutionized our understanding of molecular mechanisms underlying normal and dysfunctional biological processes. Here we present a novel computational tool, transcription factor search and analysis tool (TrFAST), which was developed for the in silico analysis of transcription factor binding sites (TFBSs) of signaling pathway-specific TFs. TrFAST facilitates searching as well as comparative analysis of regulatory motifs through an exact pattern matching algorithm followed by the graphical representation of matched binding sites in multiple sequences up to 50 kb in length. TrFAST is proficient in reducing the number of comparisons by the exact pattern matching strategy. In contrast to the pre-existing tools that find TFBS in a single sequence, TrFAST seeks out the desired pattern in multiple sequences simultaneously. It counts the GC content within the given multiple sequence data set and assembles the combinational details of consensus sequence(s) located at these regions, thereby generating a visual display based on the abundance of unique pattern. Comparative regulatory region analysis of multiple orthologous sequences simultaneously enhances the features of TrFAST and provides a significant insight into study of conservation of non-coding cis-regulatory elements. TrFAST is freely available at

Page 354–359

Application Note

AMDD: Antimicrobial Drug Database

AMDD: Antimicrobial Drug Database Mohd Danishuddin, Lalima Kaushal, Mohd Hassan Baig, Asad U. Khan

Drug resistance is one of the major concerns for antimicrobial chemotherapy against any particular target. Knowledge of the primary structure of antimicrobial agents and their activities is essential for rational drug design. Thus, we developed a comprehensive database, anti microbial drug database (AMDD), of known synthetic antibacterial and antifungal compounds that were extracted from the available literature and other chemical databases, e.g., PubChem, PubChem BioAssay and ZINC, etc. The current version of AMDD contains ∼2900 antibacterial and ∼1200 antifungal compounds. The molecules are annotated with properties such as description, target, format, bioassay, molecular weight, hydrogen bond donor, hydrogen bond acceptor and rotatable bond. The availability of these antimicrobial agents on common platform not only provides useful information but also facilitate the virtual screening process, thus saving time and overcoming difficulties in selecting specific type of inhibitors for the specific targets. AMDD may provide a more effective and efficient way of accessing antimicrobial compounds based on their properties along with the links to their structure and bioassay. All the compounds are freely available at the advanced web-based search interface

Page 360–363

Application Note

DNA Barcode ITS Effectively Distinguishes the Medicinal Plant Boerhavia diffusa from Its Adulterants

Dhivya Selvaraj, Dhivya Shanmughanandhan, Rajeev Kumar Sarma, Jijo C. Joseph, Ramachandran V. Srinivasan, Sathishkumar Ramalingam

Boerhavia diffusa (B. diffusa), also known as Punarnava, is an indigenous plant in India and an important component in traditional Indian medicine. The accurate identification and collection of this medicinal herb is vital to enhance the drug’s efficacy and biosafety. In this study, a DNA barcoding technique has been applied to identify and distinguish B. diffusa from its closely-related species. The phylogenetic analysis was carried out for the four species of Boerhavia using barcode candidates including nuclear ribosomal DNA regions ITS, ITS1, ITS2 and the chloroplast plastid gene psbA-trnH. Sequence alignment revealed 26% polymorphic sites in ITS, 30% in ITS1, 16% in ITS2 and 6% in psbA-trnH, respectively. Additionally, a phylogenetic tree was constructed for 15 species using ITS sequences which clearly distinguished B. diffusa from the other species. The ITS1 demonstrates a higher transition/transversion ratio, percentage of variation and pairwise distance which differentiate B. diffusa from other species of Boerhavia. Our study revealed that ITS and ITS1 could be used as potential candidate regions for identifying B. diffusa and for authenticating its herbal products.

Page 364–367