Lazy learning is a type of instance-based learning, where:
A common lazy learning algorithm is the k-Nearest Neighbors (k-NN) classifier.
Given a training dataset:
\[ D = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n) \} \]
where:
For a new data point \(x^*\), the predicted class \(\hat{y}^*\) is determined as:
\[ \hat{y}^* = \text{mode} \{ y_i \mid x_i \in N_k(x^*) \} \]
where:
The choice of distance metric significantly affects the model’s performance.
\[ d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2} \]
\[ d(x, y) = \sum_{i=1}^{n} |x_i - y_i| \]
\[ d(x, y) = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{1/p} \]
where \(p\) determines the norm:
\[ d(x, y) = \frac{x \cdot y}{\|x\| \|y\|} \]
This measures the angle between two vectors rather than their magnitude.
The Iris dataset contains 150 observations of flowers with four features:
It classifies the flowers into three species: Setosa, Versicolor, and Virginica.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.1
library(caret)
## Warning: package 'caret' was built under R version 4.4.1
## Loading required package: lattice
library(class)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.4.1
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# Load dataset
data("iris")
set.seed(123)
# Shuffle data
iris <- iris[sample(nrow(iris)), ]
# Normalize features
normalize <- function(x) { (x - min(x)) / (max(x) - min(x)) }
iris_norm <- as.data.frame(lapply(iris[1:4], normalize))
iris_norm$Species <- iris$Species
# Train-test split (80-20)
train_index <- sample(1:nrow(iris), 0.8 * nrow(iris))
train_data <- iris_norm[train_index, ]
test_data <- iris_norm[-train_index, ]