Paquetes necesarios

library(pacman)
p_load("ggplot2","DT","xfun","prettydoc","cluster", "ISLR")

Base de datos

df<- Wage
datosseguro<- Wage
head(Wage)
##        year age           maritl     race       education             region
## 231655 2006  18 1. Never Married 1. White    1. < HS Grad 2. Middle Atlantic
## 86582  2004  24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic
## 161300 2003  45       2. Married 1. White 3. Some College 2. Middle Atlantic
## 155159 2003  43       2. Married 3. Asian 4. College Grad 2. Middle Atlantic
## 11443  2005  50      4. Divorced 1. White      2. HS Grad 2. Middle Atlantic
## 376662 2008  54       2. Married 1. White 4. College Grad 2. Middle Atlantic
##              jobclass         health health_ins  logwage      wage
## 231655  1. Industrial      1. <=Good      2. No 4.318063  75.04315
## 86582  2. Information 2. >=Very Good      2. No 4.255273  70.47602
## 161300  1. Industrial      1. <=Good     1. Yes 4.875061 130.98218
## 155159 2. Information 2. >=Very Good     1. Yes 5.041393 154.68529
## 11443  2. Information      1. <=Good     1. Yes 4.318063  75.04315
## 376662 2. Information 2. >=Very Good     1. Yes 4.845098 127.11574

Grafico

Se observa que hay una mezcla de datos que implica que estos tiene una alta variación, sin embargo, se puede apreciar que hay una mayor aglomeración de los puntos rosas (positivo a contar con seguro medico) para el lado de la derecha (sueldo alto)

ggplot(datosseguro, aes(wage,age)) + geom_point(aes (col=health_ins), size=4   )

Modelo K-means

set.seed(101)
seguroCluster <- kmeans(data.frame(x=datosseguro$wage,y=datosseguro$age), center=2, nstart=20 )
seguroCluster
## K-means clustering with 2 clusters of sizes 2261, 739
## 
## Cluster means:
##           x        y
## 1  94.05367 41.44449
## 2 165.70429 45.38295
## 
## Clustering vector:
##    [1] 1 1 2 2 1 1 2 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1
##   [38] 1 1 1 1 1 1 1 1 2 2 2 1 1 2 1 2 1 1 1 2 2 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1
##   [75] 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1
##  [112] 1 1 1 1 1 1 2 2 1 1 1 2 2 1 1 2 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2
##  [149] 2 1 1 2 1 1 2 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 2 1 2 1 1 1 1 1 2
##  [186] 1 1 1 1 1 2 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 2 2 2 1 2 1 2 1 1 1 2 1 1 1 1 1
##  [223] 1 2 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2
##  [260] 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1
##  [297] 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1
##  [334] 1 2 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2
##  [371] 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [408] 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 2 2 2 1 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 2 1
##  [445] 1 1 1 2 2 2 1 2 1 1 1 2 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1
##  [482] 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 2
##  [519] 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 2 2 1
##  [556] 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1
##  [593] 1 1 1 1 1 1 2 1 1 1 2 2 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 2 1 1 2 2
##  [630] 2 1 2 2 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1
##  [667] 2 1 2 1 2 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 2 1 1 2 1 2 1 1 1 2 2 1 1 1 2
##  [704] 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 1 1 2
##  [741] 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1
##  [778] 1 1 1 2 1 2 1 2 1 1 2 2 2 1 2 2 1 1 1 2 2 1 2 1 2 1 2 1 1 2 1 1 2 1 1 1 1
##  [815] 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1
##  [852] 1 1 1 1 2 1 2 1 1 1 2 2 2 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
##  [889] 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1
##  [926] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 2 2 1 2 1 2 2 1 1 1 1 2
##  [963] 1 1 1 1 1 2 2 1 1 2 1 1 1 1 2 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1
## [1000] 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1
## [1037] 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1074] 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1
## [1111] 1 2 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1
## [1148] 1 2 1 1 1 2 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 1
## [1185] 2 2 1 2 1 2 1 1 2 1 1 2 1 2 2 1 2 1 1 1 2 1 1 1 1 1 2 1 2 1 2 1 1 1 1 2 2
## [1222] 1 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 1
## [1259] 1 2 1 1 1 1 1 2 1 1 2 1 1 2 1 2 1 2 1 1 1 1 1 2 2 2 1 1 1 2 2 1 1 1 1 2 1
## [1296] 1 2 1 1 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2
## [1333] 1 1 1 1 2 1 1 1 1 1 2 2 1 2 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1
## [1370] 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
## [1407] 1 1 1 1 2 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1
## [1444] 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1
## [1481] 2 2 1 2 2 2 1 2 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 2 2 2 1 1 1 1 2 2 1 2 1
## [1518] 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 2
## [1555] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 2 1 1 2 1 2 1 1 1 1
## [1592] 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1
## [1629] 1 1 1 1 2 1 1 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 2 1 1 1
## [1666] 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 1 2 1 2 1 1 1 2 1 1 1 2
## [1703] 1 2 2 1 1 1 1 1 1 2 2 1 2 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
## [1740] 1 1 2 1 2 1 1 1 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1
## [1777] 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1
## [1814] 1 2 1 1 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1
## [1851] 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 2 1 2 2 2 1 1 1 1 1 2 1 2 1 2 1 1
## [1888] 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2
## [1925] 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1
## [1962] 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 2 1 1 2 1 1 1 2 1 1 1 1 2 1 2 1 2 1 2 1 2
## [1999] 1 2 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1
## [2036] 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 2 1 1
## [2073] 1 1 1 2 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2
## [2110] 1 2 1 1 1 2 2 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1
## [2147] 2 1 2 1 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 2 1 2 2
## [2184] 1 2 1 1 2 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
## [2221] 1 2 1 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1
## [2258] 2 1 1 1 2 1 1 2 1 1 1 1 2 1 2 2 1 1 1 1 1 2 1 2 2 1 2 1 1 2 1 2 1 2 1 1 1
## [2295] 1 1 1 1 1 2 1 1 1 2 1 1 1 2 1 2 2 2 1 1 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1
## [2332] 2 2 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 2 2 1 2 1 1 2 1 1 1 2 1 1 1 1 1 1
## [2369] 2 2 1 2 1 1 1 2 2 1 1 1 1 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1
## [2406] 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1
## [2443] 1 1 2 2 1 1 1 2 2 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 1 1 1 1 2 2 1 1 2 1 2 1
## [2480] 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1
## [2517] 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
## [2554] 1 2 1 2 2 1 1 1 2 2 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1
## [2591] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 2 1 1 1 2 1 1
## [2628] 1 1 2 2 1 2 1 1 1 1 1 1 1 2 1 2 1 2 1 2 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 2
## [2665] 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 1 1 2 2 1 2 1 2 1 2 1 1 1 1
## [2702] 1 2 1 2 2 1 1 1 2 1 1 2 1 1 2 1 1 1 2 1 1 2 2 1 2 2 2 2 1 1 1 1 2 1 1 1 1
## [2739] 1 1 1 2 1 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2
## [2776] 1 2 1 1 2 1 1 2 1 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 1 2 2 2 1 1 2 1 1 1
## [2813] 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 1 1 2 2 2 1 1 1 2 1 1
## [2850] 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1
## [2887] 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 2 1 1
## [2924] 1 2 2 1 1 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1
## [2961] 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 2 2 1 1 1 2 2 1
## [2998] 1 1 1
## 
## Within cluster sum of squares by cluster:
## [1] 1342984 1410685
##  (between_SS / total_SS =  51.0 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Comparando los clusters con el seguro medico

table(seguroCluster$cluster, datosseguro$health_ins)
##    
##     1. Yes 2. No
##   1   1437   824
##   2    646    93

Agrupando datos de cluster

clusplot(data.frame(x=datosseguro$wage,y=datosseguro$age), seguroCluster$cluster, color=T, shade=T, lines=0)

Se puede observar que el programa marca la formación de dos clusters, los cuales se interponen el uno con el otro.

Conclusión

Con una variable de sueldo, los clusters no están bien definidos y no se puede llegar a una conclusión clara, si bien, se suele observar que sueldos mas altos suelen tener seguro médico, la relación no es lo suficientemente fuerte para poder basarnos en la variable de sueldo para predecir un resultado.

Descarga este código

xfun::embed_file("Caso de Estudio Seguro Medico.Rmd")

Download Caso de Estudio Seguro Medico.Rmd