How to use SVM on Breast cancer Data

I was going back and forth about whether I wanted to learn how to use neural networks or Support Vector machines (SVM) because they both seemed really cool and I liked the idea of how neural networks work similarly to the neural pathways in the brain. However, as I was researching neural networks, everything kept bringing me back to SVM, even the book talked about SVM on the first page of the neural network chapter. That felt like a sign to me. I also figured that the data set that I wanted to look into (breast cancer data found on UCIs directory) might be better suited to svm because of SVMs strong ability to categorize. Furthermore, SVM is considered one of the best out of the box classifiers. SVM is intended for binary classification when there are two (or more classes). The goal of this project is to learn SVM using a dataset on breast cancer research to be able to predict if the cancer is benign or malignant. I chose this data set because I was interested in using a data set that looked at cancer and I found one that was created at a hospital in Wisconsin (where I will be doing my internship this summer!) The data set was found on the UCI Machine Learning and was then found on kaggle later.

#My research question:

How to accurately differentiate between benign and malignant tumors using SVM


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Attaching package: 'janitor'
The following objects are masked from 'package:stats':

    chisq.test, fisher.test
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   3.5.2     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.2.1
✔ purrr     1.0.4     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

These are libraries that I tend to use so I am placing these here as a base

Wisconsin_Breast_cancer <- read_csv("C:/Users/kathr/Desktop/Statistical Learning/Statistical Learning/data.csv")
New names:
• `` -> `...33`
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 568 Columns: 33
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): diagnosis
dbl (31): id, radius_mean, texture_mean, perimeter_mean, area_mean, smoothne...
lgl  (1): ...33

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Before we begin to classify our data using SVM, lets gain a little bit more understanding of what our data set looks like by examining some graphs using a shiny app and performing some hierarchical clustering.

library(shiny)

ui <- fluidPage(
  
  
  selectInput(inputId = "XVariable",
              label = "Choose a variable for the x-axis!",
              choices = colnames(Wisconsin_Breast_cancer)),
  selectInput(inputId = "YVariable",
              label = "Choose a variable for the y-axis",
              choices = colnames(Wisconsin_Breast_cancer)),
  selectInput(inputId = "ColorVariable",
              label = "Choose a variable for the color!",
              choices = colnames(Wisconsin_Breast_cancer)),
  plotOutput(outputId = "BC_Plot")
)

server <- function(input, output, session) {
  
  output$BC_Plot <- renderPlot({
    
    ggplot(Wisconsin_Breast_cancer) +
      geom_point(aes(x = !!sym(input$XVariable), 
                     y = !!sym(input$YVariable), 
                     color = !!sym(input$ColorVariable)))
  }) 

  
}

shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

This shiny app tells us what our data looks like and to see the relationship between variables and how they are associated with diagnosis type. It appears that the malignant and benign tumors already have a few characteristics that are distinct from each other. For example, the bigger the tumor is (in regards to radius, perimeter, and area) the more likely that it is malignant. Furthermore, the more compact and concave a tumor is the more likely it is malignant. While factors such as smoothness and symmetry do not seem to have a large effect on whether a tumor is malignant or benign.

However, something interesting that I noticed was that there is a weird column labeled …33 that doesn’t appear to have any information in it. I am going to see if there are any data points present in this column (and if there are any other data points missing in any other columns) using mice. Based on how many data points are missing, I may remove the column or use mice to input the correct values.

library(mice)

Attaching package: 'mice'
The following object is masked from 'package:stats':

    filter
The following objects are masked from 'package:base':

    cbind, rbind
colnames(Wisconsin_Breast_cancer)
 [1] "id"                      "diagnosis"              
 [3] "radius_mean"             "texture_mean"           
 [5] "perimeter_mean"          "area_mean"              
 [7] "smoothness_mean"         "compactness_mean"       
 [9] "concavity_mean"          "concave points_mean"    
[11] "symmetry_mean"           "fractal_dimension_mean" 
[13] "radius_se"               "texture_se"             
[15] "perimeter_se"            "area_se"                
[17] "smoothness_se"           "compactness_se"         
[19] "concavity_se"            "concave points_se"      
[21] "symmetry_se"             "fractal_dimension_se"   
[23] "radius_worst"            "texture_worst"          
[25] "perimeter_worst"         "area_worst"             
[27] "smoothness_worst"        "compactness_worst"      
[29] "concavity_worst"         "concave points_worst"   
[31] "symmetry_worst"          "fractal_dimension_worst"
[33] "...33"                  
md.pattern(Wisconsin_Breast_cancer) 

    id diagnosis radius_mean texture_mean perimeter_mean area_mean
568  1         1           1            1              1         1
     0         0           0            0              0         0
    smoothness_mean compactness_mean concavity_mean concave points_mean
568               1                1              1                   1
                  0                0              0                   0
    symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
568             1                      1         1          1            1
                0                      0         0          0            0
    area_se smoothness_se compactness_se concavity_se concave points_se
568       1             1              1            1                 1
          0             0              0            0                 0
    symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst
568           1                    1            1             1               1
              0                    0            0             0               0
    area_worst smoothness_worst compactness_worst concavity_worst
568          1                1                 1               1
             0                0                 0               0
    concave points_worst symmetry_worst fractal_dimension_worst ...33    
568                    1              1                       1     0   1
                       0              0                       0   568 568
Wisconsin1_Breast_cancer <- Wisconsin_Breast_cancer |> 
  select(-`...33`)

md.pattern(Wisconsin1_Breast_cancer) 
 /\     /\
{  `---'  }
{  O   O  }
==>  V <==  No need for mice. This data set is completely observed.
 \  \|/  /
  `-----'

    id diagnosis radius_mean texture_mean perimeter_mean area_mean
568  1         1           1            1              1         1
     0         0           0            0              0         0
    smoothness_mean compactness_mean concavity_mean concave points_mean
568               1                1              1                   1
                  0                0              0                   0
    symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
568             1                      1         1          1            1
                0                      0         0          0            0
    area_se smoothness_se compactness_se concavity_se concave points_se
568       1             1              1            1                 1
          0             0              0            0                 0
    symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst
568           1                    1            1             1               1
              0                    0            0             0               0
    area_worst smoothness_worst compactness_worst concavity_worst
568          1                1                 1               1
             0                0                 0               0
    concave points_worst symmetry_worst fractal_dimension_worst  
568                    1              1                       1 0
                       0              0                       0 0

It appears there was a column that was entirely NA values. the column name was “…33” I am not quite sure what this column was supposed to be but Kaggle does not have any information as to what that column might represent so I will remove it before moving forward. There are not any other columns with NA values.

Now that the data set is a bit more clean, I am interested in looking at the patterns in this data set using hierarchical clustering to gain a stronger understanding of what this data looks like.

library(dendextend)

---------------------
Welcome to dendextend version 1.19.0
Type citation('dendextend') for how to cite the package.

Type browseVignettes(package = 'dendextend') for the package vignette.
The github page is: https://github.com/talgalili/dendextend/

Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
You may ask questions at stackoverflow, use the r and dendextend tags: 
     https://stackoverflow.com/questions/tagged/dendextend

    To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
---------------------

Attaching package: 'dendextend'
The following object is masked from 'package:stats':

    cutree
Breast_cancer_dend <- Wisconsin1_Breast_cancer |>
  select(-diagnosis)|>
  scale() |>
  dist() |>
  hclust() |>
  as.dendrogram()

my_colors <- ifelse(Wisconsin1_Breast_cancer$diagnosis == "M",
                    "red",
                    "blue")



Breast_cancer_dend |>
  color_branches(col = my_colors[order.dendrogram(Breast_cancer_dend)]) |>
  color_labels(col = my_colors[order.dendrogram(Breast_cancer_dend)]) |>
  plot()

This seems very promising. Especially considering how the shiny app showed that there was already some significant differentiation between the two classes. I think that given this information, it is time to start moving into some svm.

First we need to split the data set into a training and testing data set. We also need to make sure our classifying variable is a factor, so we will double check that and then modify the variable to be a factor if needed.

table(Wisconsin1_Breast_cancer$diagnosis)

  B   M 
356 212 
str(Wisconsin1_Breast_cancer)
tibble [568 × 32] (S3: tbl_df/tbl/data.frame)
 $ id                     : num [1:568] 842302 842517 84300903 84348301 84358402 ...
 $ diagnosis              : chr [1:568] "M" "M" "M" "M" ...
 $ radius_mean            : num [1:568] 18 20.6 19.7 11.4 20.3 ...
 $ texture_mean           : num [1:568] 10.4 17.8 21.2 20.4 14.3 ...
 $ perimeter_mean         : num [1:568] 122.8 132.9 130 77.6 135.1 ...
 $ area_mean              : num [1:568] 1001 1326 1203 386 1297 ...
 $ smoothness_mean        : num [1:568] 0.1184 0.0847 0.1096 0.1425 0.1003 ...
 $ compactness_mean       : num [1:568] 0.2776 0.0786 0.1599 0.2839 0.1328 ...
 $ concavity_mean         : num [1:568] 0.3001 0.0869 0.1974 0.2414 0.198 ...
 $ concave points_mean    : num [1:568] 0.1471 0.0702 0.1279 0.1052 0.1043 ...
 $ symmetry_mean          : num [1:568] 0.242 0.181 0.207 0.26 0.181 ...
 $ fractal_dimension_mean : num [1:568] 0.0787 0.0567 0.06 0.0974 0.0588 ...
 $ radius_se              : num [1:568] 1.095 0.543 0.746 0.496 0.757 ...
 $ texture_se             : num [1:568] 0.905 0.734 0.787 1.156 0.781 ...
 $ perimeter_se           : num [1:568] 8.59 3.4 4.58 3.44 5.44 ...
 $ area_se                : num [1:568] 153.4 74.1 94 27.2 94.4 ...
 $ smoothness_se          : num [1:568] 0.0064 0.00522 0.00615 0.00911 0.01149 ...
 $ compactness_se         : num [1:568] 0.049 0.0131 0.0401 0.0746 0.0246 ...
 $ concavity_se           : num [1:568] 0.0537 0.0186 0.0383 0.0566 0.0569 ...
 $ concave points_se      : num [1:568] 0.0159 0.0134 0.0206 0.0187 0.0188 ...
 $ symmetry_se            : num [1:568] 0.03 0.0139 0.0225 0.0596 0.0176 ...
 $ fractal_dimension_se   : num [1:568] 0.00619 0.00353 0.00457 0.00921 0.00511 ...
 $ radius_worst           : num [1:568] 25.4 25 23.6 14.9 22.5 ...
 $ texture_worst          : num [1:568] 17.3 23.4 25.5 26.5 16.7 ...
 $ perimeter_worst        : num [1:568] 184.6 158.8 152.5 98.9 152.2 ...
 $ area_worst             : num [1:568] 2019 1956 1709 568 1575 ...
 $ smoothness_worst       : num [1:568] 0.162 0.124 0.144 0.21 0.137 ...
 $ compactness_worst      : num [1:568] 0.666 0.187 0.424 0.866 0.205 ...
 $ concavity_worst        : num [1:568] 0.712 0.242 0.45 0.687 0.4 ...
 $ concave points_worst   : num [1:568] 0.265 0.186 0.243 0.258 0.163 ...
 $ symmetry_worst         : num [1:568] 0.46 0.275 0.361 0.664 0.236 ...
 $ fractal_dimension_worst: num [1:568] 0.1189 0.089 0.0876 0.173 0.0768 ...
Wisconsin1_Breast_cancer$diagnosis <- as.factor(Wisconsin1_Breast_cancer$diagnosis)



set.seed(1)
Wisconsin_Breast_cancer_rows <- sample(1:nrow(Wisconsin1_Breast_cancer),
                                     size = nrow(Wisconsin1_Breast_cancer)/2)


Breast_cancer_training_data <- Wisconsin1_Breast_cancer[Wisconsin_Breast_cancer_rows, ]

Breast_cancer_testing_data <- Wisconsin1_Breast_cancer[-Wisconsin_Breast_cancer_rows, ]

I think the most important thing to discuss is the hyper plane. The goal of the hyper plane is to create a divider between different nominal variables within your data set. If data lays on one side of this divider, then it is categorized to be variable A and if it falls on the other side then it would be categorized as variable B. For two dimensions, a hyper plane can be visualized as a flat line, while with three dimensions it is better visualized as a plane. When there are more than three dimensions, it is harder to visualize however its role as a divider of classes still is accurate. The best hyper plane would have the greatest amount of margin (i.e. space away) from the closest data points on either side of the hyper plane, this is considered the maximal margin hyper plane. Those closest data points are called support vectors and they are what determine what the margins are which in turn helps to determine the best hyper plane to classify that data. If there is no way to perfectly separate the classes of data using a hyper plane, then there is a way to almost perfectly separate the data using a hyper plane and something called soft margins. In this case, we would allow some data points on the wrong side of the margin (and even the wrong side of the hyper plane) in order to preserve the svm’s ability to provide accurate predictions to a wide range of data sets.

One of the many benefits of SVM is that it can still be highly accurate at classifying data when data we are attempting to classify is not separated linearly. The kernel function can be used when data isn’t linear, and attempts other types of hyper planes to see which hyper plane works best for our data set. The kernels that are most often used are linear, polynomial, radial, and sigmoid kernels, which are used to optimize the hyper plane. Our first step before moving on is to find a kernel type that creates the most accurate hyper plane (using the kernel function) for our data to separate benign and malignant tumors.

Linear SVMs use straight lined hyper planes, polynomial SVMs use curved lined hyper planes, radial SVMs have more of a circular hyper plane, and finally sigmoid SVMs have more of an s shaped curve for a hyper plane.

Another important factor of SVM to be considered is the support vector classifier, which determines how to classify a data point while taking into account the hyper plane. The support vector classified can be tuned further using gamma and cost after a kernel type has been selected.

The first SVM we will look at is a linear SVM!

library(e1071)
linear_cancer_SVM <- svm(diagnosis~ . -id, data = Breast_cancer_training_data, kernel = 'linear')
linear_cancer_SVM

Call:
svm(formula = diagnosis ~ . - id, data = Breast_cancer_training_data, 
    kernel = "linear")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 

Number of Support Vectors:  26
table(Breast_cancer_training_data$diagnosis)

  B   M 
190  94 

Perfect! Now lets check the accuracy

table(Prediction = predict(linear_cancer_SVM, Breast_cancer_training_data), Truth = Breast_cancer_training_data$diagnosis)
          Truth
Prediction   B   M
         B 189   2
         M   1  92

So this is great! out of 284 samples, it only mislabeled 3 of them. I think this will most likely be the model we fine tune better but I want to look at the three other kernel types to make sure that this is the best model before we move forward.

Now lets create a polynomial SVM and see what the accuracy is!

polynomial_cancer_SVM <- svm(diagnosis ~ .-id, data = Breast_cancer_training_data, kernel = 'polynomial')
polynomial_cancer_SVM

Call:
svm(formula = diagnosis ~ . - id, data = Breast_cancer_training_data, 
    kernel = "polynomial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  polynomial 
       cost:  1 
     degree:  3 
     coef.0:  0 

Number of Support Vectors:  82
table(Prediction = predict(polynomial_cancer_SVM, Breast_cancer_training_data), Truth = Breast_cancer_training_data$diagnosis)
          Truth
Prediction   B   M
         B 190  20
         M   0  74

This is significantly worse at correctly identifying malignant tumors, with it misidentifying 21 malignant tumors as benign. We will most likely not be using this model.

Let’s look at a radial kernel now!

radial_cancer_SVM <- svm(diagnosis ~ .-id, data = Breast_cancer_training_data, kernel = 'radial')
radial_cancer_SVM

Call:
svm(formula = diagnosis ~ . - id, data = Breast_cancer_training_data, 
    kernel = "radial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 

Number of Support Vectors:  76
table(Prediction = predict(radial_cancer_SVM, Breast_cancer_training_data), Truth = Breast_cancer_training_data$diagnosis)
          Truth
Prediction   B   M
         B 190   3
         M   0  91

While this technically has the same amount of accuracy as the linear svm model, it inaccurately classifies malignant tumors more often than the linear model, which I think is an issue. There is a higher weight on misdiagnosing a malignant tumor as benign than misdiagnosing a benign tumor as malignant. Therefore, I still believe that the linear model is the best for our goals. However, let’s look at the sigmoid svm before fully coming to this decision.

sigmoid_cancer_SVM <- svm(diagnosis ~ .-id, data = Breast_cancer_training_data, kernel = 'sigmoid')
sigmoid_cancer_SVM

Call:
svm(formula = diagnosis ~ . - id, data = Breast_cancer_training_data, 
    kernel = "sigmoid")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  sigmoid 
       cost:  1 
     coef.0:  0 

Number of Support Vectors:  48
table(Prediction = predict(sigmoid_cancer_SVM, Breast_cancer_training_data), Truth = Breast_cancer_training_data$diagnosis)
          Truth
Prediction   B   M
         B 187  10
         M   3  84

This is also significantly less accurate than the linear model so we will be moving forward with the linear svm model. Because we are using a linear model, the only tuning parameters we will be looking at is cost. With sigmoid, radial, and polynomial you would use gamma and cost but because we are moving forward with linear that means we will only look at cost.

Although we are not looking at gamma I do want to take a moment to explain the goal of gamma. gamma works by adjusting the shape of the hyper plane. Because a linear kernel is already a straight line, the shape cannot be adjusted and therefore it would not be applicable. However, for non linear kernels, gamma is a powerful tuning parameter. Low gamma tends to result in a smooth lines that generally separate the data but can be prone to a few incorrect classifications. Alternatively, high gamma results in lines that wrap around the support vectors and minimize the amount of incorrect classifications, but can result in the model being over fit for the training data set. gamma is critical for accurate radial, sigmoid, and polynomial kernels.

Cost controls how much the model cares about missclassifying data points. When cost is high, there is a high penalty for miss-classification and which can result in over fitting of the model to the training data set. This can create hyper planes that have small margins and tend to be a bit more complex. If the cost is small then the model is allowed to make a few miss-classifications and the model tends to be a bit more simpler. Unfortunately, if the cost is too low then the model can be under fit for the data set and tend to be less accurate overall.

Fortunately for use, there is a tuning function that will allow us to run a variety of different cost values and find the most optimal cost value for our model.

tuned_linear_svm <- tune.svm(diagnosis ~ .-id, data = Breast_cancer_training_data, kernel = 'linear',  cost = 2^seq(-6, 4, 2))
tuned_linear_svm

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
 cost
 0.25

- best performance: 0.02438424 
tuned_linear_svm$best.parameters
  cost
3 0.25

So it appears the best cost value for this model will be 0.25 for this particul

cancer_linear_cost_svm <- svm(diagnosis ~ .-id, data = Breast_cancer_training_data, kernel = 'linear', cost = 0.25  )
table(Prediction = predict(cancer_linear_cost_svm, Breast_cancer_training_data), Truth = Breast_cancer_training_data$diagnosis)
          Truth
Prediction   B   M
         B 189   2
         M   1  92

Okay so this is what I expected for the linear svm. I am going to experiment for a minute and try to see if we can get a better accuracy using the radial hyper plane. I know that radial, sigmoid, and polynomial kernels tend to have signficantly higher accuracy once they are tuned so I am hoping that will be the case once the radial SVM is tuned

tuned_radial_svm <- tune.svm(diagnosis ~ .-id, 
                               data=Breast_cancer_training_data,
                               kernel = 'radial',
                               gamma = seq(1/2^nrow
                                           (Breast_cancer_training_data),
                                           1, .01), 
                               cost = 2^seq(-6, 4, 2))

tuned_radial_svm$best.parameters
    gamma cost
406  0.01    4
radial_cost_gamma_svm <- svm(diagnosis ~ .-id, 
                               data=Breast_cancer_training_data,
                               kernel = 'radial',
                             gamma =0.01,
                             cost = 4)
table(Prediction = predict(radial_cost_gamma_svm, Breast_cancer_training_data), Truth = Breast_cancer_training_data$diagnosis)
          Truth
Prediction   B   M
         B 189   3
         M   1  91

Okay so this is relatively good, but I think we can make it even better by weighting the data so that miss-classifying malignant tumors as benign is worse than miss-classifying benign tumors as malignant. I would love to get our predictions to a place where all malignant tumors are properly classified. I believe we can do this by adding weights to our system. Currently, the data has twice as many benign tumors as it has malignant tumors, which can skew the predictions and make it so it is more likely to incorrectly predict our data. by adding weights and putting a higher emphasis on the malignant tumors, we may be able to improve this model.

weights1 <- c(B = 1, M = 2) 
weights2 <- c(B = 1, M = 3) 
weighted_radial_cost_gamma_svm <- svm(diagnosis ~ . - id, 
                      data = Breast_cancer_training_data,
                      kernel = "radial",
                       gamma =0.01,
                      cost = 4,
                      class.weights = weights2)
table(Prediction = predict(weighted_radial_cost_gamma_svm, Breast_cancer_training_data), Truth = Breast_cancer_training_data$diagnosis)
          Truth
Prediction   B   M
         B 188   2
         M   2  92

okay okay okay this is even better. I am going to see if I up the weights if that will adjust the prediction abilities even further. Yes!!! 100% accuracy!! I am now going to test this on the testing data as a final test to see if this is simply over fitting or whether this is truly accurate.

table(Prediction = predict(weighted_radial_cost_gamma_svm, Breast_cancer_testing_data), Truth = Breast_cancer_testing_data$diagnosis)
          Truth
Prediction   B   M
         B 163   9
         M   3 109
     Truth
Prediction   B   M
         B 163   9
         M   3 109

Okay so I potentially over fitted. I am going to go back to the weights1 and see if the accuracy is better

                 Truth
Prediction   B   M
         B 164  10
         M   2 108

Okay so this is worse but not drastically different than using weights 2. I am going to use the linear svm and see if that is slightly better. If it is more accurate than the weighted radial svm then I will see if adding weights to the linear svm will make it more accurate.

table(Prediction = predict(cancer_linear_cost_svm, Breast_cancer_testing_data), Truth = Breast_cancer_testing_data$diagnosis)
          Truth
Prediction   B   M
         B 163   8
         M   3 110

The linear SVM does appear to be more accurate than the weighted radial model so I am going to add weights to the linear SVM to attempt to make it better.

I am using 0.0625 instead of 0.25 because although the tuned svm recommends 0.25 after rendering, it previously was recommending 0.0625 and with weights 0.0625 is more accurate than 0.25

weights1 <- c(B = 1, M = 2) 
weights2 <- c(B = 1, M = 3) 
cancer_linear_cost_weight_svm <- svm(diagnosis ~ .-id, 
                              data = Breast_cancer_training_data, 
                              kernel = 'linear', 
                              cost = 0.0625,
                              class.weights = weights2)
table(Prediction = predict(cancer_linear_cost_weight_svm, Breast_cancer_testing_data), Truth = Breast_cancer_testing_data$diagnosis)
          Truth
Prediction   B   M
         B 161   3
         M   5 115
         Truth Prediction   B   M          B 162   6          M   4 112

With weights1 it does decrease the amount of malignant tumors that are misdiagnosed as benign, I am going to try weights2 and see if that decreases the amount of mislabeled malignant tumors by putting even more weight on the malignant tumors.

       Truth Prediction   B   M          B 161   3          M   5 115

Yes! This is exactly what I was hoping for. This is absolutely perfect and I am happy with this accuracy. We now have a relatively accurate SVM model that can differentiate between benign and malignant tumors! 63% of the errors are miss-classifications of a benign tumor as malignant, which in this situation is less harmful that the miss-classification of a malignant tumor as benign. Furthermore, the model only makes an error 2.8% of the time which means it is accurate 97.2% of the time!