Article Online

Articles Online (Volume 6, Issue 2)


Fuzzy Logic for Elimination of Redundant Information of Microarray Data

EdmundoBonilla Huerta, Béatrice Duval, Jin-Kao Hao

Gene subset selection is essential for classification and analysis of microarray data. However, gene selection is known to be a very difficult task since gene expression data not only have high dimensionalities, but also contain redundant information and noises. To cope with these difficulties, this paper introduces a fuzzy logic based pre-processing approach composed of two main steps. First, we use fuzzy inference rules to transform the gene expression levels of a given dataset into fuzzy values. Then we apply a similarity relation to these fuzzy values to define fuzzy equivalence groups, each group containing strongly similar genes. Dimension reduction is achieved by considering for each group of similar genes a single representative based on mutual information. To assess the usefulness of this approach, extensive experimentations were carried out on three well-known public datasets with a combined classification model using three statistic filters and three classifiers.

Page 61–73


Gene Expression Data Classification Using Consensus Independent Component Analysis

Chun-Hou Zheng, De-Shuang Huang, Xiang-Zhen Kong, Xing-Ming Zhao

We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.

Page 74–82


ADSRPCL-SVM Approach to Informative Gene Analysis

Wei Xiong, Zhibin Cai, Jinwen Ma

Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches to this problem, it is still difficult to select a reasonable set of informative genes for tumor diagnosis only with microarray data. In this paper, we classify the genes expressed through microarray data into a number of clusters via the distance sensitive rival penalized competitive learning (DSRPCL) algorithm and then detect the informative gene cluster or set with the help of support vector machine (SVM). Moreover, the critical or powerful informative genes can be found through further classifications and detections on the obtained informative gene clusters. It is well demonstrated by experiments on the colon, leukemia, and breast cancer datasets that our proposed DSRPCL-SVM approach leads to a reasonable selection of informative genes for tumor diagnosis.

Page 83–90


Identification of Tumor Evolution Patterns by Means of Inductive Logic Programming

Vitoantonio Bevilacqua, Patrizia Chiarappa, Giuseppe Mastronardi, Filippo Menolascina, Angelo Paradiso, Stefania Tommasi

In considering key events of genomic disorders in the development and progression of cancer, the correlation between genomic instability and carcinogenesis is currently under investigation. In this work, we propose an inductive logic programming approach to the problem of modeling evolution patterns for breast cancer. Using this approach, it is possible to extract fingerprints of stages of the disease that can be used in order to develop and deliver the most adequate therapies to patients. Furthermore, such a model can help physicians and biologists in the elucidation of molecular dynamics underlying the aberrations-waterfall model behind carcinogenesis. By showing results obtained on a real-world dataset, we try to give some hints about further approach to the knowledge-driven validations of such hypotheses.

Page 91–97


Hidden Markov Models Incorporating Fuzzy Measures and Integrals for Protein Sequence Identification and Alignment

Niranjan P. Bidargaddi, Madhu Chetty, Joarder Kamruzzaman

Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.

Page 98–110


Applying Intelligent Computing Techniques to Modeling Biological Networks from Expression Data

Wei-Po Lee , Kung-Cheng Yang

Constructing biological networks is one of the most important issues in systems biology. However, constructing a network from data manually takes a considerable large amount of time, therefore an automated procedure is advocated. To automate the procedure of network construction, in this work we use two intelligent computing techniques, genetic programming and neural computation, to infer two kinds of network models that use continuous variables. To verify the presented approaches, experiments have been conducted and the preliminary results show that both approaches can be used to infer networks successfully.

Page 111–120


Identification of MicroRNA Precursors with Support Vector Machine and String Kernel

Jian-Hua Xu, Fei Lib, Qiu-Feng Sun

MicroRNAs (miRNAs) are one family of short (21–23 nt) regulatory non-coding RNAs processed from long (70–110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%.

Page 121–128