Paquetes necesarios
library(pacman)
p_load("ggplot2","DT","xfun","prettydoc","cluster", "ISLR")Base de datos
df<- Wage
datosseguro<- Wage
head(Wage)## year age maritl race education region
## 231655 2006 18 1. Never Married 1. White 1. < HS Grad 2. Middle Atlantic
## 86582 2004 24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic
## 161300 2003 45 2. Married 1. White 3. Some College 2. Middle Atlantic
## 155159 2003 43 2. Married 3. Asian 4. College Grad 2. Middle Atlantic
## 11443 2005 50 4. Divorced 1. White 2. HS Grad 2. Middle Atlantic
## 376662 2008 54 2. Married 1. White 4. College Grad 2. Middle Atlantic
## jobclass health health_ins logwage wage
## 231655 1. Industrial 1. <=Good 2. No 4.318063 75.04315
## 86582 2. Information 2. >=Very Good 2. No 4.255273 70.47602
## 161300 1. Industrial 1. <=Good 1. Yes 4.875061 130.98218
## 155159 2. Information 2. >=Very Good 1. Yes 5.041393 154.68529
## 11443 2. Information 1. <=Good 1. Yes 4.318063 75.04315
## 376662 2. Information 2. >=Very Good 1. Yes 4.845098 127.11574
Grafico
Se observa que hay una mezcla de datos que implica que estos tiene una alta variación, sin embargo, se puede apreciar que hay una mayor aglomeración de los puntos rosas (positivo a contar con seguro medico) para el lado de la derecha (sueldo alto)
ggplot(datosseguro, aes(wage,age)) + geom_point(aes (col=health_ins), size=4 )Modelo K-means
set.seed(101)
seguroCluster <- kmeans(data.frame(x=datosseguro$wage,y=datosseguro$age), center=2, nstart=20 )
seguroCluster## K-means clustering with 2 clusters of sizes 2261, 739
##
## Cluster means:
## x y
## 1 94.05367 41.44449
## 2 165.70429 45.38295
##
## Clustering vector:
## [1] 1 1 2 2 1 1 2 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 2 2 2 1 1 2 1 2 1 1 1 2 2 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1
## [75] 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 2 2 1 1 1 2 2 1 1 2 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2
## [149] 2 1 1 2 1 1 2 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 2 1 2 1 1 1 1 1 2
## [186] 1 1 1 1 1 2 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 2 2 2 1 2 1 2 1 1 1 2 1 1 1 1 1
## [223] 1 2 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2
## [260] 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1
## [297] 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1
## [334] 1 2 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2
## [371] 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [408] 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 2 2 2 1 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 2 1
## [445] 1 1 1 2 2 2 1 2 1 1 1 2 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1
## [482] 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 2
## [519] 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 2 2 1
## [556] 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1
## [593] 1 1 1 1 1 1 2 1 1 1 2 2 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 2 1 1 2 2
## [630] 2 1 2 2 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1
## [667] 2 1 2 1 2 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 2 1 1 2 1 2 1 1 1 2 2 1 1 1 2
## [704] 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 1 1 2
## [741] 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1
## [778] 1 1 1 2 1 2 1 2 1 1 2 2 2 1 2 2 1 1 1 2 2 1 2 1 2 1 2 1 1 2 1 1 2 1 1 1 1
## [815] 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1
## [852] 1 1 1 1 2 1 2 1 1 1 2 2 2 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
## [889] 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1
## [926] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 2 2 1 2 1 2 2 1 1 1 1 2
## [963] 1 1 1 1 1 2 2 1 1 2 1 1 1 1 2 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1
## [1000] 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1
## [1037] 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1074] 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1
## [1111] 1 2 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1
## [1148] 1 2 1 1 1 2 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 1
## [1185] 2 2 1 2 1 2 1 1 2 1 1 2 1 2 2 1 2 1 1 1 2 1 1 1 1 1 2 1 2 1 2 1 1 1 1 2 2
## [1222] 1 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 1
## [1259] 1 2 1 1 1 1 1 2 1 1 2 1 1 2 1 2 1 2 1 1 1 1 1 2 2 2 1 1 1 2 2 1 1 1 1 2 1
## [1296] 1 2 1 1 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2
## [1333] 1 1 1 1 2 1 1 1 1 1 2 2 1 2 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1
## [1370] 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
## [1407] 1 1 1 1 2 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1
## [1444] 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1
## [1481] 2 2 1 2 2 2 1 2 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 2 2 2 1 1 1 1 2 2 1 2 1
## [1518] 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 2
## [1555] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 2 1 1 2 1 2 1 1 1 1
## [1592] 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1
## [1629] 1 1 1 1 2 1 1 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 2 1 1 1
## [1666] 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 1 2 1 2 1 1 1 2 1 1 1 2
## [1703] 1 2 2 1 1 1 1 1 1 2 2 1 2 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
## [1740] 1 1 2 1 2 1 1 1 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1
## [1777] 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1
## [1814] 1 2 1 1 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1
## [1851] 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 2 1 2 2 2 1 1 1 1 1 2 1 2 1 2 1 1
## [1888] 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2
## [1925] 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1
## [1962] 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 2 1 1 2 1 1 1 2 1 1 1 1 2 1 2 1 2 1 2 1 2
## [1999] 1 2 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1
## [2036] 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 2 1 1
## [2073] 1 1 1 2 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2
## [2110] 1 2 1 1 1 2 2 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1
## [2147] 2 1 2 1 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 2 1 2 2
## [2184] 1 2 1 1 2 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
## [2221] 1 2 1 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1
## [2258] 2 1 1 1 2 1 1 2 1 1 1 1 2 1 2 2 1 1 1 1 1 2 1 2 2 1 2 1 1 2 1 2 1 2 1 1 1
## [2295] 1 1 1 1 1 2 1 1 1 2 1 1 1 2 1 2 2 2 1 1 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1
## [2332] 2 2 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 2 2 1 2 1 1 2 1 1 1 2 1 1 1 1 1 1
## [2369] 2 2 1 2 1 1 1 2 2 1 1 1 1 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1
## [2406] 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1
## [2443] 1 1 2 2 1 1 1 2 2 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 1 1 1 1 2 2 1 1 2 1 2 1
## [2480] 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1
## [2517] 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
## [2554] 1 2 1 2 2 1 1 1 2 2 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1
## [2591] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 2 1 1 1 2 1 1
## [2628] 1 1 2 2 1 2 1 1 1 1 1 1 1 2 1 2 1 2 1 2 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 2
## [2665] 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 1 1 2 2 1 2 1 2 1 2 1 1 1 1
## [2702] 1 2 1 2 2 1 1 1 2 1 1 2 1 1 2 1 1 1 2 1 1 2 2 1 2 2 2 2 1 1 1 1 2 1 1 1 1
## [2739] 1 1 1 2 1 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2
## [2776] 1 2 1 1 2 1 1 2 1 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 1 2 2 2 1 1 2 1 1 1
## [2813] 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 1 1 2 2 2 1 1 1 2 1 1
## [2850] 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1
## [2887] 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 2 1 1
## [2924] 1 2 2 1 1 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1
## [2961] 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 2 2 1 1 1 2 2 1
## [2998] 1 1 1
##
## Within cluster sum of squares by cluster:
## [1] 1342984 1410685
## (between_SS / total_SS = 51.0 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
Comparando los clusters con el seguro medico
table(seguroCluster$cluster, datosseguro$health_ins)##
## 1. Yes 2. No
## 1 1437 824
## 2 646 93
Agrupando datos de cluster
clusplot(data.frame(x=datosseguro$wage,y=datosseguro$age), seguroCluster$cluster, color=T, shade=T, lines=0)Se puede observar que el programa marca la formación de dos clusters, los cuales se interponen el uno con el otro.
Conclusión
Con una variable de sueldo, los clusters no están bien definidos y no se puede llegar a una conclusión clara, si bien, se suele observar que sueldos mas altos suelen tener seguro médico, la relación no es lo suficientemente fuerte para poder basarnos en la variable de sueldo para predecir un resultado.
Descarga este código
xfun::embed_file("Caso de Estudio Seguro Medico.Rmd")