Comparative Study of Data Mining and Statistical Learning Techniques for Prediction of Cancer Survivability
Abstract
Huge efforts are being made by computer scientists and statisticians to design and implement algorithms
and techniques for efficient storage, management, processing, and analysis of biological databases. The
data mining and statistical learning techniques are commonly used to discover consistent and useful
patterns in a biological dataset. These techniques are used in a computational biology and bioinformatics
fields. Computational biology and bioinformatics seeks to solve biological problems by combining aspects
of biology, computer science, mathematics, and other disciplines (Adams, Matheson & Pruim, 2008). The
main focus of this study was to expand understanding of how biologists, medical practitioners and
scientists would benefit from data mining and statistical learning techniques in prediction of breast cancer
survivability and prognosis using R statistical computing tool and Weka machine learning tool (freely
available open source software applications). Six data mining and statistical learning techniques were
applied to breast cancer datasets for survival analysis. The results were mixed as to which algorithm is the
most optimal model, and it appeared that the performance of each algorithm depends on the size, high
dimensionality of data representation and cleanliness of the dataset.
Downloads
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.