Data609

EX 1. Use the svm() algorithm of the e1071 package to carry out the support vector machine for the PlantGrowth data set. Then, discuss the number of support vectors/samples.

# Load the required library and data
library(e1071)
data(PlantGrowth)


# Fit a support vector machine model
svm_model <- svm(formula = group ~ ., data = PlantGrowth)

# Print the model summary
summary(svm_model)

## 
## Call:
## svm(formula = group ~ ., data = PlantGrowth)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  29
## 
##  ( 10 9 10 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  ctrl trt1 trt2

The output shows that the SVm model is a C-classification model with a radial kernel. The cost parameter is set to 1. The model selected 29 observations as support vectors/samples, which are the most important ones for defining the decision boundaries. Threre are three classes (ctrl trt1 trt2) in the response variabls “group”.

EX 2. Do a similar SVM analysis as that in the previous question using the iris data set. Discuss the number of support vectors/samples.

# Load the required library and data
library(e1071)
data(iris)

# Fit an SVM model
svm_model <- svm(formula = Species ~ ., data = iris)

# Print the summary of the SVM model
summary(svm_model)

## 
## Call:
## svm(formula = Species ~ ., data = iris)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  51
## 
##  ( 8 22 21 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

# Get the number of support vectors
n_sv <- sum(svm_model$nSV)
cat("The number of support vectors is", n_sv, "\n")

## The number of support vectors is 51

From the output, you can see that the SVM model is a C-classification model with a radial kernel. The cost parameter is set to 1. The model selected 51 observations as support vectors/samples.

EX 3. Ues the iris dat set to select 80% of the samples for training svm(), then use the rest 20% for validation. Discuss your results.

# Load the required library and data
library(e1071)
data(iris)

# Set a seed for reproducibility
set.seed(123)

# Randomly split the dataset into 80% training and 20% testing subsets
train_index <- sample(nrow(iris), 0.8 * nrow(iris))
train_data <- iris[train_index, ]
test_data <- iris[-train_index, ]

# Train an SVM model on the training subset
svm_model <- svm(formula = Species ~ ., data = train_data)

# Predict the class labels of the test subset using the trained SVM model
predicted_labels <- predict(svm_model, test_data)

# Calculate the accuracy of the model on the test subset
accuracy <- mean(predicted_labels == test_data$Species)
cat("The accuracy of the SVM model on the test subset is", round(accuracy * 100, 2), "%\n")

## The accuracy of the SVM model on the test subset is 96.67 %

In the above code, we first set a seed for reproducibility and then randomly split the iris dataset into 80% training and 20% testing subsets using the sample() function. We then trained an SVM model on the training subset using the svm() function and predicted the class labels of the test subset using the predict() function. Finally, we calculated the accuracy of the model on the test subset.

From the output, we can see that the SVM model achieved an accuracy of 96.67% on the test subset. This is a reasonably high accuracy, indicating that the model performs well in classifying the test data.

Data609_HW7

Zhenni Xie

2023-05-13

EX 1. Use the svm() algorithm of the e1071 package to carry out the support vector machine for the PlantGrowth data set. Then, discuss the number of support vectors/samples.

EX 2. Do a similar SVM analysis as that in the previous question using the iris data set. Discuss the number of support vectors/samples.

EX 3. Ues the iris dat set to select 80% of the samples for training svm(), then use the rest 20% for validation. Discuss your results.