Breast Cancer Data Visualizer & Cancer Predictor

WIE2003 Introduction to Data Science

Group 4
Chai Nam Chi, Cheong Hui Ting, Lee Hui Xin, Hong Jia Herng

Introduction

Breast cancer has affected countless women around the world. Detection of this disease in initial stage help to increase the effectiveness of treatment. However, the process to detect breast cancer is time consuming and small malignant areas can be missed.

The goal of developing this app is to assist health professional with the :

  • prediction of breast cancer
  • classify tumour into benign or malignant

ML models are trained to use for breast cancer prediction based on radius, texture, perimeter, area, smoothness, compactness, concavity, concave point, symmetry and fractal dimension.

Source code for Breast Cancer Data Visualizer & Cancer Predictor is available on Github at: [https://github.com/ChaiNamChi/GraphIt]

App Description

Components of the App

1- Documentation

2- Attribute Description & Data summary

3- EDA

  • User can choose the type of graph and variable(s) for axes to get the visualization.
  • Type of graphs included scatter plot, histogram, bar graph and box plot.
  • Below the custommized graph, there is a correlation plot.

4- Prediction

  • User need to input value for multiple variables using slider.
  • Prediction result will be shown on the right column of the page.

Prediction

Compressing the 10-dimensional input features to 5-D using PCA

x_input_norm <- predict(standardizer, x_input[1,])  # perform standardization to data
x_input_pca <- predict(pca, x_input_norm) # perform PCA with predefined pca model
x_input_pca <- t(data.frame(x_input_pca[,1:5])) # get the first 5 features of PCA

Training the Support Vector Machine model using the final features

svm_classifier <- svm(x_train, y_train, gamma = 0.07, cost = 2)

Make prediction

y_pred <- predict(best_svm_classifier, x_input_pca) # 0 for benign, 1 for malignant

Experience gained



Using the available online dataset, the accuracy of the prediction of breast cancer is >95%

It was an awesome experience to explore the data regarding breast cancer and doing visualizations with R Shiny App.

R markdown is a powerful framework that can produce document in different format and include interactive graphs, code, and many more. We only used some of the features in this project and there are more to explore!

You can access Breast Cancer Data Visualizer & Cancer Predictor app at:[https://chainamchi.shinyapps.io/graphit/]

Key Takeaways

  • Questions being addressed
    . Given the information about breast conditions, what is the diagnosis result?
    . What are the top factors that constitute the diagnosis of malignant breast cancer?
  • Dataset used: Breast Cancer Wisconsin (Diagnostic) Dataset (https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data)
  • Explore dataset
    . Used various plots to understand the data and its correlation between features
  • Modelling
    . SVM is the best classification model among all the chosen models (i.e., Logistic regression, Naive Bayes, Random Forest) with the test accuracy of 0.971.
  • Prediction
    . SVM predicts 0 for benign breast cancer; 1 for malignant breast cancer.