Comparative Study of Data Mining and Statistical Learning Techniques for Prediction of Cancer Survivability

Charles  Edeki; Shardul  Pandya

Authors

Charles Edeki Mercy College, Mathematics and Computer Science Department
Shardul Pandya Capella University, Minneapolis, Minnesota, USA

Abstract

Huge efforts are being made by computer scientists and statisticians to design and implement algorithms
and techniques for efficient storage, management, processing, and analysis of biological databases. The
data mining and statistical learning techniques are commonly used to discover consistent and useful
patterns in a biological dataset. These techniques are used in a computational biology and bioinformatics
fields. Computational biology and bioinformatics seeks to solve biological problems by combining aspects
of biology, computer science, mathematics, and other disciplines (Adams, Matheson & Pruim, 2008). The
main focus of this study was to expand understanding of how biologists, medical practitioners and
scientists would benefit from data mining and statistical learning techniques in prediction of breast cancer
survivability and prognosis using R statistical computing tool and Weka machine learning tool (freely
available open source software applications). Six data mining and statistical learning techniques were
applied to breast cancer datasets for survival analysis. The results were mixed as to which algorithm is the
most optimal model, and it appeared that the performance of each algorithm depends on the size, high
dimensionality of data representation and cleanliness of the dataset.

Downloads

Download data is not yet available.

Comparative Study of Data Mining and Statistical Learning Techniques for Prediction of Cancer Survivability

Authors

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Journal

Dora

Latest publications

Information