Aprendizaje Automatico Estudio del aprendizaje a partir de datos (data-driven) para conseguir hacer predicciones a partir de las observaciones
3 algoritmos de clasificación
Clasificación de supervisión
Clasificación Semisupervisada
Clasificación No Supervisada
Tema 1.
Objetos:
A1 (2,10) A2 (2,5) A3 (8,4) A4 (5,8) A5 (7,5) A6 (6,4) A7 (1,2) A8 (4,9)
Paso 1. Determinar el número de grupos o clusters
3
Paso 2. Seleccionar aleatoriamente los centroides
C1 = A1 C2 = A4 C3 = A7
Paso 3. Asignar cada objeto al centroide más cercano
d= √(〖(x_2-x_(1))〗2+〖(y_2-y_(1))〗2 )
1.- Iteración
Objeto: distancia (Objeto, Centroide) A1: d(A1,A1) = 0 ya que es centroide d(A1,A4) = 3.61 d(A1,A7) = 8.06 A1 Є Cluster 1
A2: d(A2,A1) = 5 ya que es centroide d(A2,A4) = 4.24 d(A2,A7) = 3.16
A3: A4 A5: A6: A7: A8:
Resumen de la 1.- Iteración
Cluster 1 {A1}, 2{A2,A7}, 3{A3,A4,A5,A6,A8}
Paso 4: Actualizar posición de centroides con la posición de los objetos pertenecientes a dicho grupo o cluster.
C1 = (2,10) C2 = (1.5,3.5) C3 = (6,6)
Paso 5. Repartir paso 3 y 4 hasta que los centroides no se muevan, o se muevan por debajo de una distancia umbraken cada paso
Cluster 1 {A1,A8}, 2{A2,A7}, 3{A3,A4,A5,A6} C1 = (2,10) C2 = (1.5,3.5) C3 = (6,6)
Cluster 1 {A1,A4,A8}, 2{A2,A7}, 3{A3,A5,A6} C1 = (2,10) C2 = (1.5,3.5) C3 = (6,6)
# 1. Crear base de datos
x <- c(2, 2, 8, 5, 7, 6, 1, 4)
y <- c(10, 5, 4, 8, 5, 4, 2, 9)
agrup <- data.frame(x, y)
# 2. Determinar el número de grupos
grupos <- 3
# 3. Realizar la clasificación
segmentos <- kmeans(agrup, grupos)
segmentos## K-means clustering with 3 clusters of sizes 2, 1, 5
##
## Cluster means:
## x y
## 1 4.5 8.5
## 2 2.0 10.0
## 3 4.8 4.0
##
## Clustering vector:
## [1] 2 3 3 1 3 3 3 1
##
## Within cluster sum of squares by cluster:
## [1] 1.0 0.0 44.8
## (between_SS / total_SS = 54.5 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# 4. Revisar la asignación de grupos
asignación <- cbind(agrup, cluster=segmentos$cluster)
asignación## x y cluster
## 1 2 10 2
## 2 2 5 3
## 3 8 4 3
## 4 5 8 1
## 5 7 5 3
## 6 6 4 3
## 7 1 2 3
## 8 4 9 1
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_cluster(segmentos, data=agrup,
palette=c("red","blue","darkgreen"),
ellipse.type = "euclid",
star.plot = T,
repel = T,
ggtheme = theme())## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
# 6. Optimizar cantidad de grupos
library(cluster)
library(data.table)
set.seed(123)
optimización <- clusGap(agrup, FUN=kmeans,nstart=1,K.max = 7)
plot(optimización, xlab="Número de clusters K")## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between() masks data.table::between()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first() masks data.table::first()
## ✖ lubridate::hour() masks data.table::hour()
## ✖ lubridate::isoweek() masks data.table::isoweek()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks data.table::last()
## ✖ lubridate::mday() masks data.table::mday()
## ✖ lubridate::minute() masks data.table::minute()
## ✖ lubridate::month() masks data.table::month()
## ✖ lubridate::quarter() masks data.table::quarter()
## ✖ lubridate::second() masks data.table::second()
## ✖ purrr::transpose() masks data.table::transpose()
## ✖ lubridate::wday() masks data.table::wday()
## ✖ lubridate::week() masks data.table::week()
## ✖ lubridate::yday() masks data.table::yday()
## ✖ lubridate::year() masks data.table::year()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## BillNo Itemname Quantity Date
## Length:522064 Length:522064 Min. :-9600.00 Length:522064
## Class :character Class :character 1st Qu.: 1.00 Class :character
## Mode :character Mode :character Median : 3.00 Mode :character
## Mean : 10.09
## 3rd Qu.: 10.00
## Max. :80995.00
##
## Hour Price CustomerID Country
## Length:522064 Min. :-11062.060 Min. :12346 Length:522064
## Class :character 1st Qu.: 1.250 1st Qu.:13950 Class :character
## Mode :character Median : 2.080 Median :15265 Mode :character
## Mean : 3.827 Mean :15317
## 3rd Qu.: 4.130 3rd Qu.:16837
## Max. : 13541.330 Max. :18287
## NA's :134041
## Total
## Min. :-11062.06
## 1st Qu.: 3.75
## Median : 9.78
## Mean : 19.69
## 3rd Qu.: 17.40
## Max. :168469.60
##
#count(ventas,BillNo, sort=TRUE)
#count(ventas,Itemname, sort=TRUE)
#count(ventas,Date, sort=TRUE)
#count(ventas,Hour, sort=TRUE)
#count(ventas,Country, sort=TRUE)## [1] 134041
## BillNo Itemname Quantity Date Hour Price CustomerID
## 0 0 0 0 0 0 134041
## Country Total
## 0 0
ventasact <- ventas %>% select("BillNo","CustomerID","Total") %>% na.omit() %>% filter(Total > 0)
ticket <- aggregate(Total ~ CustomerID + BillNo, data = ventasact, FUN = sum)
ticket_promedio <- aggregate(Total ~ CustomerID, data = ticket, FUN = mean)
visitas <- ventasact %>% group_by(CustomerID) %>% summarise(Visitas = n_distinct(BillNo))
objetos <- merge(ticket_promedio, visitas, by="CustomerID")
rownames(objetos) <- objetos$CustomerID
# Los datos fuera de lo normal están fuera de los siguientes límites:
# Límite inferior = q1 -1.5*IQR
# Límite superior = Q3 + 1.5*IQR
# Q1: Cuartil 1, Q3
IQR_V <- IQR(objetos$Total)
LI_V <- 1-1.5*IQR_V
LS_V <- 5+1.5*IQR_V
objetos <- objetos[objetos$Visitas <=11,]
# Columna Ticket promedio
colnames(objetos) <- c("Visitas","TicketPromedio")
IQR_TP <- IQR(objetos$Total)
LI_TP <- 178.30-1.5*IQR_TP
LS_TP <- 426.63-1.5*IQR_TP
objetos <- objetos[objetos$TicketPromedio <=791.69, ]# 2. Determinar el número de grupos
gruposact <- 4
# 3. Realizar la clasificación
segmentosact <- kmeans(objetos, gruposact)
# 4. Revisar la asignación de grupos
asignaciónact <- cbind(objetos, cluster=segmentosact$cluster)
# 5. Graficar
# install.packages("factoextra")
library(ggplot2)
library(factoextra)
fviz_cluster(segmentosact, data=objetos,
palette=c("red","blue","darkgreen","yellow"),
ellipse.type = "euclid",
star.plot = T,
repel = T,
ggtheme = theme())# 6. Optimizar cantidad de grupos
library(cluster)
library(data.table)
set.seed(123)
optimización <- clusGap(objetos, FUN=kmeans,nstart=1,K.max = 99)## Warning: Quick-TRANSfer stage steps exceeded maximum (= 188200)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 188200)
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 188200)
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 188200)
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 188200)
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations