Breast Cancer Data Visualizer & Cancer Predictor

Group 4
22 Jun 2022

Introduction

Breast cancer has affected countless women around the world. Detection of this disease in initial stage help to increase the effectiveness of treatment. However, the process to detect breast cancer is time consuming and small malignant areas can be missed.

The goal of developing this app is to assist health professional with the:

  • prediction of breast cancer
  • classify tumour into benign or malignant

ML models are trained to use for breast cancer prediction based on radius, texture, perimeter, area, smoothness, compactness, concavity, concave point, symmetry and fractal dimension.

Source code for Breast Cancer Data Visualizer & Cancer Predictor is available on Github at: [https://github.com/ChaiNamChi/GraphIt]

App Description

Components of the App

  • Documentation
  • Attribute Description & Data summary

  • EDA
    • User can choose the type of graph and variable(s) for axes to get the visualization.
    • Type of graphs included scatter plot, histogram, bar graph and box plot.
    • Below the customized graph, there is a correlation plot.






  • Prediction
    • User need to input value for multiple variables using slider.
    • Prediction result will be shown on the right column of the page.

Prediction

Compressing the 10-dimensional input features to 5-D using PCA

x_input_norm <- predict(standardizer, x_input[1,])  # perform standardization to data
x_input_pca <- predict(pca, x_input_norm) # perform PCA with predefined pca model
x_input_pca <- t(data.frame(x_input_pca[,1:5])) # get the first 5 features of PCA

Training the Support Vector Machine model using the final features

svm_classifier <- svm(x_train, y_train, gamma = 0.07, cost = 2)

Make prediction

y_pred <- predict(best_svm_classifier, x_input_pca) # 0 for benign, 1 for malignant

Key Takeaways

  • Questions being addressed
    • Given the information about breast conditions, what is the diagnosis result?
    • What are the top factors that constitute the diagnosis of malignant breast cancer?
  • Dataset used: Breast Cancer Wisconsin (Diagnostic) Dataset (https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data)
  • Explore dataset
    • Used various plots to understand the data and its correlation between features.
  • Modelling
    • SVM is the best classification model among all the chosen models (i.e., Logistic regression, Naive Bayes, Random Forest) with the test accuracy of 0.971.
  • Prediction
    • SVM predicts 0 for benign breast cancer; 1 for malignant breast cancer.