SVM-Assignment---Fall-2023.knit

title: “SVM Assignment” author: “Paul Brown” date: “2023-01-05” output: html_document

## Part 1:
## 1. What is data mining?  Explain.
##    • the process of automatically discovering useful information in large data repositories

## 2. What are support vectors?  Explain.

##    • searches for so-called support vectors which are observations that are found to lie at the edge of an area in space which presents a boundary between one of these classes of observations (e.g., the squares) and another class of observations (e.g., the circles).

## 3. State what support vectors are used for.

##    • are used to identify a hyperplane (a straight line in two dimensions) that separates the classes

## 4. What do we mean when we say that the data is linearly separable?

##    • When data can be completely separable by a single straight line.  Can find the hyperplane that maximizes the margin

## 5. List five model builders.

#3   •  Building a model 
##      o   Understand the business problem (and define success)
##      o   Understand and identify data
##      o   Collect and prepare data
##      o   Determine the model's features and train it
##      o   Evaluate the model's performance and establish benchmarks.

#    •  Stages of model building are splitting model into three sections which are 'Training data' , 'Validation data' and 'Testing data'. Next is training the classifier using 'training data set', tuning the parameters using 'validation set' and then testing the performance of your classifier on unseen 'test data set'.


## 6. Support vector machines (SVM) are unsupervised learning algorithms.  TRUE or FALSE?
##    o False

## 7. SVMs are sensitive to the choice of ______________  ________________.
##    • tuning option make it harder to use and time-consuming to identify the best model.

## 8. The SVM model only deals with ____________ _______________.
#3    • these support vectors rather than the whole training dataset

## 9. What is a binary classification model?
##    o model where the two classes can be separated by a linear model.

## 10. SVMs often perform a non-linear mapping of the original data 
##     into a very high dimensional space where the classes can be separated linearly by a hyperplane.  What method do ##     we use to find solutions for this problem at a lower dimension?

##    • Kernel method

## 11. SVMs solve the problem of linearly separable binary classification by doing what?

#3     •    by finding the maximum margin hyperplane  

# 12. What will the SVMs do when the cases are not linearly separable?

#3   •  It move the data into a higher dimension where this is feasible

## 13. The relation,
##     K(x, z) = ∅(𝑥)  ∙  ∅(𝑧) 
##     is associated with what method that is used to find solutions.   Why is it important?

##    • Associated with the kernel method and is important because instead of using a quadratic optimization approach can use the kernel function to get the same results

## Part 2:  This part of the assignment is based on the six algorithms sent to you via email

## 1. In Listing 12.16:  Example of Naïve Bayes algorithm for classification in caret, 

## a. Explain the trainControl( ) function

##    o generates parameters that further control how models are created, with possible values: method 

## b. Explain the argument “metric = “Accuracy”

##    o Accuracy is a metric that generally describes how the model performs across all classes. It is useful when all classes are of equal importance. It is a common evaluation metric in classification problems.

## 2. In Listing 12.17:  Example of SVM algorithm for classification,

##    a. Explain the argument “kernel = “rbfdot””

##    o RBF Kernel Support Vector Machines store support vectors during training and not the entire dataset. The kernel generating functions are used to initialize a kernel function which calculates the dot (inner) product between two feature vectors.

##  b. Explain what is happening in the argument, “PimaIndiansDiabetes [ ,1:8], and explain the indexing

##   o  Use all the rows from the data set and columns 1 and 8

## c. Explain and draw conclusions about the output
##    predictions neg pos
##        neg      462   98
##        pos       38  170

##    o accuracy for predications was 82.3 % or error rate was 17.7% 
##         true negative is 462 
##         True positive is 38
##         false positive is 98
##         false negatives is 38


## 3. In Listing 12.18:  Explain the SVM algorithm for regression.  Explain and draw conclusions about the output as well.
##     •    The ksvm function can return class probabilities for classification problems by setting the type parameter to "probabilities". It sets the dependent variable medv and provides the other columns (~.)  The kernel="rbfdot" is a default for chosing hyperparameter values 

## The hyperparameter is used to achieve better performance on particular datasets.  Sigma is the inverse kernel width of the radial basis kernel function and is equal to  0.11033188164774. The training error : 0.089378   which is good but we may be still be able to improve. Mean squared error was 7.560205 for this problem which appears to be pretty good value in minimizing the error between each of the values and the mean.

## 4. In Listing 12.19:  Example of SVM algorithm for classification in caret,

##      a. Explain the arguments ”method =”cv”” and “number=5”
##        • 5-fold CV mean dividing your training dataset randomly into 5 parts and then using each of 5 parts as testing dataset for the model trained on other 5. We take the average of the 5 error terms thus obtained.

##     b. Explain and draw conclusions about the output
##        • Cost values of .25, .50 and 1.00 show that at c of .50, which gives the lowest misclassification rate on testing data
##        • Cost value of .50 has slightly higher accuracy and kappa 
##           C     Accuracy   Kappa    
##          0.25  0.7604278  0.4360310
##          0.50  0.7656056  0.4552142
##          1.00  0.7590952  0.4409422

## 5. In Listing 12.20:  Example of SVM algorithm for regression in caret, 
##      a. Explain the arguments “method = “svmRadial”” and “metric = “RMSE””
##         •    method=”svmRadial
##      b. Explain and draw conclusions about the output
##         •    Tuning parameter 'sigma' was held constant at a value of 0.1057462. RMSE was used to select the optimal model using the smallest value. The final values used for the model were sigma = 0.1057462 and C = 1 (it had the lowest RMSE of 3.8)

## 6. In Listing 12.4.4:  Classification and Regression Trees,

##     a. Explain the following argument “diabetes~., “
##        • Diabetes is the dependent variable, where 1 represents having diabetes and 0 not having diabetes. The ~. Represents the all the rows

##3     b. Explain the argument, “type = “class””

#@         •    this is used to say that this type is classification