1 Problem 1 – Wine Quality Classification

Answer (c):
If the clusters for red and white wines appear mostly separable with a curved or non-linear boundary, SVM will perform best.
If they’re roughly linear and well-separated, kNN may also work well.
Decision trees usually perform worse on continuous numeric data.

##           Model  Accuracy
## 1           kNN 0.9383667
## 2 Decision Tree 0.9799692
## 3           SVM 0.9953775

Discussion:
SVM often achieves the highest accuracy since it finds the optimal separating hyperplane.
kNN may also perform well if the data is clustered, while decision trees tend to overfit numeric features.

2 Problem 2 – Sacramento Housing (kNN)

## k-Nearest Neighbors 
## 
## 932 samples
## 110 predictors
##   3 classes: 'Condo', 'Multi_Family', 'Residential' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 746, 746, 744, 747, 745 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa       
##    3  0.9141834   0.030220148
##    5  0.9227745  -0.009935119
##    7  0.9281279  -0.001782255
##    9  0.9292032   0.000000000
##   11  0.9292032   0.000000000
##   13  0.9292032   0.000000000
##   15  0.9292032   0.000000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 15.

Answer:
Tested k = 3–15; the best k gave the highest accuracy.
For high-dimensional data, Manhattan or cosine distances usually work better than Euclidean.

3 Problem 3 – Wine Quality Clustering

##       HAC
## KMeans    1    2
##      1 4829    0
##      2 1667    1

Discussion:
K-means creates spherical clusters around centroids, while HAC builds a hierarchical tree.
HAC is more flexible and reveals nested structures; K-means assumes equal shapes and sizes.

4 Problem 4 – Starwars Clustering

##    KMeans
## HAC  1  2  3  4
##   1  8  0  0 13
##   2  0  6  0  0
##   3  0  0  1  0
##   4  0  1  0  0

Answer:
Anomalies show up as long isolated branches (e.g., Jabba the Hutt).
HAC is useful for visual anomaly detection; k-means is faster but assumes even cluster sizes.

Sikora_HM4

Oliver Sikora

2025-11-07

1 Problem 1 – Wine Quality Classification

2 Problem 2 – Sacramento Housing (kNN)

3 Problem 3 – Wine Quality Clustering

4 Problem 4 – Starwars Clustering