Machine Learning - HW9
Author: Jay Liao (ID: RE6094028)
Load the required packages
Exercise 9.5
We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
Exercise 9.5 - (a)
Generate a data set with \(n = 500\) and \(p = 2\), such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:
Exercise 9.5 - (b)
Plot the observations, colored according to their class labels. Your plot should display \(X_1\) on the x-axis, and \(X_2\) on the y-axis.
par(pty = 's')
plot(x1[y == 0], x2[y == 0], col = 'darkorange2', pch = 19,
xlab = 'X1', ylab = 'X2')
points(x1[y == 1], x2[y == 1], col = 'cornflowerblue', pch = 19)兩個分類的邊界線看起來是\(X_1^2-X_2^2=0\),也就是\(X = \pm y\),是直線而非二次曲線。
Exercise 9.5 - (c)
Fit a logistic regression model to the data, using \(X_1\) and \(X_2\) as predictors.
dta <- data.frame(x1 = x1, x2 = x2, y = factor(y))
LR_fit <- glm(y ~ ., data = dta, family = binomial('logit'))
summary(LR_fit)
Call:
glm(formula = y ~ ., family = binomial("logit"), data = dta)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.234 -1.158 -1.113 1.184 1.237
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.01471 0.08954 -0.164 0.869
x1 0.06287 0.31440 0.200 0.841
x2 0.25593 0.31243 0.819 0.413
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 693.12 on 499 degrees of freedom
Residual deviance: 692.43 on 497 degrees of freedom
AIC: 698.43
Number of Fisher Scoring iterations: 3
Exercise 9.5 - (d)
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
LR_prob <- predict(LR_fit, newdata = dta, type = 'response')
LR_pred <- ifelse(LR_prob > 0.5, 1, 0)
par(pty = 's')
plot(dta$x1, dta$x2, col = LR_pred + 8, pch = 19)分類的邊界是線性的,分得不太好。
Exercise 9.5 - (e)
Now fit a logistic regression model to the data using non-linear functions of \(X_1\) and \(X_2\) as predictors (e.g. \(X_1^2\), \(X_1 \times X_2\), \(log(X_2)\), and so forth).
NLR_fit <- glm(y ~ poly(x1, 2) + poly(x2, 2),
data = dta, family = binomial('logit'))
summary(NLR_fit)
Call:
glm(formula = y ~ poly(x1, 2) + poly(x2, 2), family = binomial("logit"),
data = dta)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.402e-03 -2.000e-08 -2.000e-08 2.000e-08 2.153e-03
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -71.05 1334.45 -0.053 0.958
poly(x1, 2)1 1498.46 28337.07 0.053 0.958
poly(x1, 2)2 64135.00 1141702.27 0.056 0.955
poly(x2, 2)1 -2740.83 50880.27 -0.054 0.957
poly(x2, 2)2 -60159.05 1071217.31 -0.056 0.955
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.9312e+02 on 499 degrees of freedom
Residual deviance: 1.3207e-05 on 495 degrees of freedom
AIC: 10
Number of Fisher Scoring iterations: 25
看起來每個變項的效果都未達統計顯著。
Exercise 9.5 - (f)
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
NLR_prob <- predict(NLR_fit, newdata = dta, type = 'response')
NLR_pred <- ifelse(NLR_prob > 0.5, 1, 0)
par(pty = 's')
plot(dta$x1, dta$x2, col = NLR_pred + 28, pch = 19)看起來分類表現比線性的好。
Exercise 9.5 - (g)
Fit a support vector classifier to the data with \(X_1\) and \(X_2\) as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
Exercise 9.5 - (h)
Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
Exercise 9.5 - (i)
Comment on your results.
[ANS] SVM模型中,kernel為線性的模型表現也不好,在訓練資料集中就很明顯可以看出許多預測錯誤的資料,如果有切割出測試資料集,相信線性模型在測試資料集中的表現會更糟;而kernel為非線性的模型表現則好上許多。
Exercise 9.7
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
data(Auto)
Auto <- as.tibble(Auto)
theme_set(theme_tufte(base_size = 14))
set.seed(4028)
summary(Auto) mpg cylinders displacement horsepower weight
Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
acceleration year origin name
Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
(Other) :365
Exercise 9.7 - (a)
Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
Auto <- Auto %>%
mutate(mpg_dummy = as.integer(mpg > median(mpg))) %>%
mutate(mpg_dummy = factor(mpg_dummy), cylinders = factor(cylinders)) %>%
dplyr::select(-mpg, -name)
dummy_trans <- dummyVars(mpg_dummy ~ ., data = Auto)
Auto_dummy <- predict(dummy_trans, Auto)
head(Auto_dummy) cylinders.3 cylinders.4 cylinders.5 cylinders.6 cylinders.8 displacement
1 0 0 0 0 1 307
2 0 0 0 0 1 350
3 0 0 0 0 1 318
4 0 0 0 0 1 304
5 0 0 0 0 1 302
6 0 0 0 0 1 429
horsepower weight acceleration year origin
1 130 3504 12.0 70 1
2 165 3693 11.5 70 1
3 150 3436 11.0 70 1
4 150 3433 12.0 70 1
5 140 3449 10.5 70 1
6 198 4341 10.0 70 1
Exercise 9.7 - (b)
Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.
Conduct 10-fold cross validation to find the best model
SVM_linear <- train(x = Auto_dummy, y = Auto$mpg_dummy,
method = 'svmLinear2',
trControl = trainControl(method = 'cv', number = 10, allowParallel = TRUE),
preProcess = c('center', 'scale'),
tuneGrid = expand.grid(cost = seq(1, 20, by = 1)))
SVM_linear$finalModel
Call:
svm.default(x = as.matrix(x), y = y, kernel = "linear", cost = param$cost,
probability = classProbs)
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 2
Number of Support Vectors: 75
Optain the performance of the final model
SVM_linear_finalModel <- svm_NL <- svm(Auto$mpg_dummy ~ ., data = Auto_dummy,
kernel = 'linear', gamma = 1)
SVM_linear_pred <- predict(SVM_linear_finalModel, Auto_dummy)
table(Auto$mpg_dummy, SVM_linear_pred) SVM_linear_pred
0 1
0 178 18
1 9 187
[1] 0.9311224
Exercise 9.7 - (c)
Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.
Radial basis kernels
SVM_radial <- train(x = Auto_dummy, y = Auto$mpg_dummy,
method = 'svmRadial',
trControl = trainControl(method = 'cv', number = 10, allowParallel = TRUE),
preProcess = c('center', 'scale'),
tuneGrid = expand.grid(C = seq(0.001, 3, length.out = 10),
sigma = seq(0.2, 2, length.out = 5)))
SVM_radial$finalModelSupport Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 0.334222222222222
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 1.1
Number of Support Vectors : 226
Objective Function Value : -41.4342
Training error : 0.038265
Polynomial basis kernels
SVM_poly <- train(x = Auto_dummy, y = Auto$mpg_dummy,
method = 'svmPoly',
trControl = trainControl(method = 'cv', number = 10, allowParallel = TRUE),
preProcess = c('center', 'scale'),
tuneGrid = expand.grid(degree = seq(1, 8, by = 1),
C = seq(1, 5, by = 1),
scale = TRUE))
SVM_poly$finalModelSupport Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 2
Polynomial kernel function.
Hyperparameters : degree = 1 scale = TRUE offset = 1
Number of Support Vectors : 75
Objective Function Value : -134.4977
Training error : 0.071429
Exercise 9.7 - (d)
Make some plots to back up your assertions in (b) and (c).
Hint: In the lab, we used the plot() function for svm objects only in cases with \(p = 2\). When \(p > 2\), you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing
where svmfit contains your fitted model and dat is a data frame containing your data, you can type
in order to plot just the first and fourth variables. However, you must replace \(x_1\) and \(x_4\) with the correct variable names. To find out more, type ?plot.svm.
mtx_results <- sapply(list(SVM_linear, SVM_radial, SVM_poly), function(MODEL) {
postResample(predict(MODEL), Auto$mpg_dummy)
}) %>% t() %>% as.data.frame()
rownames(mtx_results) <- c('SVM_linear', 'SVM_radial', 'SVM_poly')
mtx_results