- Funciones con argumentos de paralelización
- Librerías para paralelizar en PC (memoria compartida)
- Librerias para cluster (memoria distribuida)
9 de julio de 2018
Ejemplo simple de correlación con bootstrap
library(boot)
x = c(109,88,96,96,109,116,114,96,85,100,113,117,107,104,101,81)
y = c(116,77,95,79,113,122,109,94,91,88,115,119,100,115,95,90)
datos = cbind(x, y)
cor2 <- function(data, indices) {
r <- cor(data[indices,1],data[indices,2])
return(r)}
results <- boot(data=datos,cor2, R=100)
Profiling del resultado sin paralelizar
library(profvis)
## Warning: package 'profvis' was built under R version 3.4.4
profvis({
results <- boot(data=datos,cor2, R=1000000)
})
Profiling del resultado con paralelización
profvis({
results <- boot(data=datos,cor2, R=1000000, parallel="snow", ncpus = 6)
})
Este tipo de funciones utiliza de manera transparente las librerias y funciones que veremos en un momento. Prestar atención en el profiling a las funciones clusterApply y parLapply.
Pro: Facil de implementar.
http://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/parallel.html
Version secuencial. Ejemplo con la estimación de "parameters in linear mixed-effects models with restricted maximum likelihood (REML)".
library(lme4)
## Warning: package 'lme4' was built under R version 3.4.4
## Loading required package: Matrix
f <- function(i) {
lmer(Petal.Width ~ . - Species + (1 | Species), data = iris)
}
system.time(save1 <- lapply(1:100, f))
## user system elapsed ## 2.09 0.00 2.09
En paralelo sólo funciona en unix, en windows realizará procesamiento secuencial.En windows se ejecuta lapply en lugar de mcapply. No se gana velocidad en el procesamiento.
Reemplaza lapply por mcapply.
#system.time(save2 <- mclapply(1:100, f))
library(parallel) detectCores(logical = FALSE) #cores fisicos
## [1] 2
detectCores()
## [1] 4
cl <- makeCluster(4) # código a ejecutar en paralelo stopCluster(cl)
# makeCluster(4, type="PSOCK") #FORK
Cargar paquetes y variables de la sesión actual de R en todos los procesadores. clusterEvalQ ejecuta una expresión en cada procesador.
cl <- makeCluster(4, type="PSOCK") clusterEvalQ(cl, 2 + 2)
## [[1]] ## [1] 4 ## ## [[2]] ## [1] 4 ## ## [[3]] ## [1] 4 ## ## [[4]] ## [1] 4
x <- 1 clusterEvalQ(cl, x)
## Error in checkForRemoteErrors(lapply(cl, recvResult)): 4 nodes produced errors; first error: objeto 'x' no encontrado
clusterEvalQ(cl, y <- 1)
## [[1]] ## [1] 1 ## ## [[2]] ## [1] 1 ## ## [[3]] ## [1] 1 ## ## [[4]] ## [1] 1
clusterEvalQ(cl, y)
## [[1]] ## [1] 1 ## ## [[2]] ## [1] 1 ## ## [[3]] ## [1] 1 ## ## [[4]] ## [1] 1
La variable "y" no existe en el procesador principal.
y
## [1] 116 77 95 79 113 122 109 94 91 88 115 119 100 115 95 90
clusterExport exporta una variable a los procesadores
x
## [1] 1
clusterExport(cl, "x") clusterEvalQ(cl, x)
## [[1]] ## [1] 1 ## ## [[2]] ## [1] 1 ## ## [[3]] ## [1] 1 ## ## [[4]] ## [1] 1
Cargar librerías en los procesadores
clusterEvalQ(cl, {
library(ggplot2)
library(boot)
})
## [[1]] ## [1] "boot" "ggplot2" "stats" "graphics" "grDevices" "utils" ## [7] "datasets" "methods" "base" ## ## [[2]] ## [1] "boot" "ggplot2" "stats" "graphics" "grDevices" "utils" ## [7] "datasets" "methods" "base" ## ## [[3]] ## [1] "boot" "ggplot2" "stats" "graphics" "grDevices" "utils" ## [7] "datasets" "methods" "base" ## ## [[4]] ## [1] "boot" "ggplot2" "stats" "graphics" "grDevices" "utils" ## [7] "datasets" "methods" "base"
Versiones paralelas de apply(), con un argumento adicional para operar con el grupo de procesadores definido.
Apply aplica una función a cada elemento de una lista o vector. En este ejemplo se calcula la media de cada una de las cuatro primeras columnas de airquality. El argumento adicional, na.rm = TRUE se pasa a la función mean. Devuelve una lista.
apply( airquality[, 1:4], 2, mean, na.rm = TRUE) #'1'filas, '2' columnas
## Ozone Solar.R Wind Temp ## 42.129310 185.931507 9.957516 77.882353
parApply(cl, airquality[, 1:4], 2, mean, na.rm = TRUE)
## Ozone Solar.R Wind Temp ## 42.129310 185.931507 9.957516 77.882353
lapply(airquality[, 1:4], mean, na.rm = TRUE)
## $Ozone ## [1] 42.12931 ## ## $Solar.R ## [1] 185.9315 ## ## $Wind ## [1] 9.957516 ## ## $Temp ## [1] 77.88235
parLapply(cl, airquality[, 1:4], mean, na.rm = TRUE)
## $Ozone ## [1] 42.12931 ## ## $Solar.R ## [1] 185.9315 ## ## $Wind ## [1] 9.957516 ## ## $Temp ## [1] 77.88235
sapply es una versión simplificada de lapply, que aplica la función lapply y estudia la salida. Cuando entiende que dicha salida admite una representación mas simple que la lista, la simplifica.
sapply(airquality[, 1:4], mean, na.rm = TRUE)
## Ozone Solar.R Wind Temp ## 42.129310 185.931507 9.957516 77.882353
parSapply(cl, airquality[, 1:4], mean, na.rm = TRUE)
## Ozone Solar.R Wind Temp ## 42.129310 185.931507 9.957516 77.882353
parLapplyLB(cl, airquality[, 1:4], mean, na.rm = TRUE)
## $Ozone ## [1] 42.12931 ## ## $Solar.R ## [1] 185.9315 ## ## $Wind ## [1] 9.957516 ## ## $Temp ## [1] 77.88235
cl <- makeCluster(4, type = "SOCK")
myfunc <- function(x=2){x+1}
myfunc_argument <- 5
clusterCall(cl, myfunc, myfunc_argument)
## [[1]] ## [1] 6 ## ## [[2]] ## [1] 6 ## ## [[3]] ## [1] 6 ## ## [[4]] ## [1] 6
clusterCall(cl, function(x=2){x+1}, 5)
## [[1]] ## [1] 6 ## ## [[2]] ## [1] 6 ## ## [[3]] ## [1] 6 ## ## [[4]] ## [1] 6
clusterApply(cl, 1:2, sum, 3)
## [[1]] ## [1] 4 ## ## [[2]] ## [1] 5
Similar a clusterApply, con balance de carga. No funciona con Socket.
clusterApplyLB(cl, 1:3, sum, 3)
## [[1]] ## [1] 4 ## ## [[2]] ## [1] 5 ## ## [[3]] ## [1] 6
Ejemplo basado en: http://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/parallel.html
library(parallel)
f <- function(i) {
lmer(Petal.Width ~ . - Species + (1 | Species), data = iris)
}
s1<- system.time({
library(lme4)
save1 <- lapply(1:100, f)
})
library(parallel)
f <- function(i) {
lmer(Petal.Width ~ . - Species + (1 | Species), data = iris)
}
s2<- system.time({
library(lme4)
save2 <- mclapply(1:100, f)
})
library(parallel)
f <- function(i) {
lmer(Petal.Width ~ . - Species + (1 | Species), data = iris)
}
s3<-system.time({
cl <- makeCluster(detectCores())
clusterEvalQ(cl, library(lme4))
save3 <- parLapply(cl, 1:100, f)
stopCluster(cl)
})
library(parallel)
f <- function(i) {
lmer(Petal.Width ~ . - Species + (1 | Species), data = iris)
}
s4<-system.time({
cl <- makeCluster(detectCores())
clusterEvalQ(cl, library(lme4))
save4 <- parLapplyLB(cl, 1:100, f)
stopCluster(cl)
})
sysTime = do.call("rbind",list(s1,s2,s3,s4))
sysTime = cbind(sysTime,data.frame(fun=c("lapply","mcapply","parLapply","parLapplyLB")))
require(ggplot2)
## Loading required package: ggplot2
ggplot(data=sysTime, aes(x=fun,y=elapsed,fill=fun)) +
geom_bar(stat="identity") + ggtitle("Elapsed time of each function")
Borrar todo lo que esta en memoria
rm(list = ls())
http://jaehyeon-kim.github.io/2015/03/Parallel-Processing-on-Single-Machine-Part-I.html
library(snow)
## Warning: package 'snow' was built under R version 3.4.4
## ## Attaching package: 'snow'
## The following objects are masked from 'package:parallel': ## ## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, ## clusterExport, clusterMap, clusterSplit, makeCluster, ## parApply, parCapply, parLapply, parRapply, parSapply, ## splitIndices, stopCluster
set.seed(1237)#semilla de aleatoriedad. Permite reproducibilidad sleep = sample(1:10,10) # secuencia en la cual se interumpira el procesamiento sleep
## [1] 4 9 1 8 2 10 5 6 3 7
cl = makeCluster(4, type="SOCK") clusterSplit(cl, sleep) #split de las interrupciones por nodo
## [[1]] ## [1] 4 9 1 ## ## [[2]] ## [1] 8 2 ## ## [[3]] ## [1] 10 5 ## ## [[4]] ## [1] 6 3 7
st = snow.time(clusterApply(cl, sleep, Sys.sleep)) stLB = snow.time(clusterApplyLB(cl, sleep, Sys.sleep)) stPL = snow.time(parLapply(cl, sleep, Sys.sleep)) stopCluster(cl)
plot(st, title="clusterApply") plot(stLB, title="clusterApplyLB") plot(stPL, title="parLapply")
## snow.time() de libreria snow
Conclusion: clusterApplyLB () y parLapply () toma menos timmpo que clusterApply(). La eficiencia de la primera se debe al equilibrio de carga (realizando una tarea cuando es necesario) mientras que la de la segunda se debe a un menor número de operaciones gracias a la programación de tareas.
library(snow) set.seed(1237) cl = makeCluster(4, type="SOCK") newairquality <- airquality[sample(1:nrow(airquality), 10000000, replace = TRUE), 1:4] st <- snow.time(clusterApply(cl, newairquality, mean, na.rm = TRUE)) stLB <- snow.time(clusterApplyLB(cl, newairquality, mean, na.rm = TRUE)) stPL <- snow.time(parLapply(cl, newairquality, mean, na.rm = TRUE))
plot(st, title="clusterApply") plot(stLB, title="clusterApplyLB") plot(stPL, title="parLapply")
require(parallel)
set.seed(1237)
sleep = sample(1:10,10)
cl = makeCluster(detectCores())
st = system.time(clusterApply(cl, sleep, Sys.sleep))
stLB = system.time(clusterApplyLB(cl, sleep, Sys.sleep))
stPL = system.time(parLapply(cl, sleep, Sys.sleep))
stPLB = system.time(parLapplyLB(cl, sleep, Sys.sleep))
stopCluster(cl)
sysTime = do.call("rbind",list(st,stLB,stPL,stPLB))
sysTime = cbind(sysTime,data.frame(fun=c("clusterApply","clusterApplyLB","parLapply","parLapplyLB")))
require(ggplot2)
ggplot(data=sysTime, aes(x=fun,y=elapsed,fill=fun)) +
geom_bar(stat="identity") + ggtitle("Elapsed time of each function")
library(parallel) set.seed(1237) cl = makeCluster(4, type="SOCK") newairquality <- airquality[sample(1:nrow(airquality), 10000000, replace = TRUE), 1:4] st <- system.time(clusterApply(cl, newairquality, mean, na.rm = TRUE)) stLB <- system.time(clusterApplyLB(cl, newairquality, mean, na.rm = TRUE)) stPL <- system.time(parLapply(cl, newairquality, mean, na.rm = TRUE)) stPLB <- system.time(parLapplyLB(cl, newairquality, mean, na.rm = TRUE))
require(ggplot2)
ggplot(data=sysTime, aes(x=fun,y=elapsed,fill=fun)) +
geom_bar(stat="identity") + ggtitle("Elapsed time of each function")
Se realiza un muestro con reemplazo de airquality. Aqui se muestra un ejemplo simple de la funcion, con 10 muestras.
airquality[sample(1:nrow(airquality), 10, replace=TRUE),]
## Ozone Solar.R Wind Temp Month Day ## 12 16 256 9.7 69 5 12 ## 63 49 248 9.2 85 7 2 ## 15 18 65 13.2 58 5 15 ## 7 23 299 8.6 65 5 7 ## 1 41 190 7.4 67 5 1 ## 102 NA 222 8.6 92 8 10 ## 74 27 175 14.9 81 7 13 ## 12.1 16 256 9.7 69 5 12 ## 89 82 213 7.4 88 7 28 ## 72 NA 139 8.6 82 7 11
sample2 <- function(data) {
sample.data <- airquality[sample(1:nrow(airquality), 100, replace=TRUE),]
return(sample.data)}
library(parallel) cl <- makeCluster(detectCores())
clusterSetRNGStream(cl, 123)
clusterExport(cl, c("airquality"))
airquality.extent <- parLapply(cl,airquality,sample2) class(airquality.extent)#listado
## [1] "list"
length(airquality.extent)
## [1] 6
str(airquality.extent)
## List of 6 ## $ Ozone :'data.frame': 100 obs. of 6 variables: ## ..$ Ozone : int [1:100] NA NA 168 78 16 80 28 23 35 71 ... ## ..$ Solar.R: int [1:100] 266 31 238 NA 201 294 NA 13 NA 291 ... ## ..$ Wind : num [1:100] 14.9 14.9 3.4 6.9 8 8.6 14.9 12 7.4 13.8 ... ## ..$ Temp : int [1:100] 58 77 81 86 82 86 66 67 85 90 ... ## ..$ Month : int [1:100] 5 6 8 8 9 7 5 5 8 6 ... ## ..$ Day : int [1:100] 26 29 25 4 20 24 6 28 5 9 ... ## $ Solar.R:'data.frame': 100 obs. of 6 variables: ## ..$ Ozone : int [1:100] NA 23 16 122 66 23 23 23 39 16 ... ## ..$ Solar.R: int [1:100] 322 14 256 255 NA 13 220 220 323 201 ... ## ..$ Wind : num [1:100] 11.5 9.2 9.7 4 4.6 12 10.3 10.3 11.5 8 ... ## ..$ Temp : int [1:100] 79 71 69 89 87 67 78 78 87 82 ... ## ..$ Month : int [1:100] 6 9 5 8 8 5 9 9 6 9 ... ## ..$ Day : int [1:100] 15 22 12 7 6 28 8 8 10 20 ... ## $ Wind :'data.frame': 100 obs. of 6 variables: ## ..$ Ozone : int [1:100] NA 30 78 23 115 122 16 23 13 65 ... ## ..$ Solar.R: int [1:100] 59 193 197 13 223 255 201 14 112 157 ... ## ..$ Wind : num [1:100] 1.7 6.9 5.1 12 5.7 4 8 9.2 11.5 9.7 ... ## ..$ Temp : int [1:100] 76 70 92 67 79 89 82 71 71 80 ... ## ..$ Month : int [1:100] 6 9 9 5 5 8 9 9 9 8 ... ## ..$ Day : int [1:100] 22 26 2 28 30 7 20 22 15 14 ... ## $ Temp :'data.frame': 100 obs. of 6 variables: ## ..$ Ozone : int [1:100] 37 18 NA 59 NA NA 108 24 23 73 ... ## ..$ Solar.R: int [1:100] 284 131 137 51 291 264 223 238 299 183 ... ## ..$ Wind : num [1:100] 20.7 8 11.5 6.3 14.9 14.3 8 10.3 8.6 2.8 ... ## ..$ Temp : int [1:100] 72 76 86 79 91 79 85 68 65 93 ... ## ..$ Month : int [1:100] 6 9 8 8 7 6 7 9 5 9 ... ## ..$ Day : int [1:100] 17 29 11 17 14 6 25 19 7 3 ... ## $ Month :'data.frame': 100 obs. of 6 variables: ## ..$ Ozone : int [1:100] 4 41 23 NA NA 35 NA NA 39 32 ... ## ..$ Solar.R: int [1:100] 25 190 14 31 153 274 255 139 83 92 ... ## ..$ Wind : num [1:100] 9.7 7.4 9.2 14.9 5.7 10.3 12.6 8.6 6.9 15.5 ... ## ..$ Temp : int [1:100] 61 67 71 77 88 82 75 82 81 84 ... ## ..$ Month : int [1:100] 5 5 9 6 8 7 8 7 8 9 ... ## ..$ Day : int [1:100] 23 1 22 29 27 17 23 11 1 6 ... ## $ Day :'data.frame': 100 obs. of 6 variables: ## ..$ Ozone : int [1:100] NA 18 23 96 28 NA 14 7 64 122 ... ## ..$ Solar.R: int [1:100] 194 313 220 167 273 59 20 48 175 255 ... ## ..$ Wind : num [1:100] 8.6 11.5 10.3 6.9 11.5 1.7 16.6 14.3 4.6 4 ... ## ..$ Temp : int [1:100] 69 62 78 91 82 76 63 80 83 89 ... ## ..$ Month : int [1:100] 5 5 9 9 8 6 9 7 7 8 ... ## ..$ Day : int [1:100] 10 4 8 1 13 22 25 15 5 7 ...
airquality.extent.ul <- do.call(rbind.data.frame, airquality.extent) str(airquality.extent.ul)
## 'data.frame': 600 obs. of 6 variables: ## $ Ozone : int NA NA 168 78 16 80 28 23 35 71 ... ## $ Solar.R: int 266 31 238 NA 201 294 NA 13 NA 291 ... ## $ Wind : num 14.9 14.9 3.4 6.9 8 8.6 14.9 12 7.4 13.8 ... ## $ Temp : int 58 77 81 86 82 86 66 67 85 90 ... ## $ Month : int 5 6 8 8 9 7 5 5 8 6 ... ## $ Day : int 26 29 25 4 20 24 6 28 5 9 ...
airquality.extent2 <- do.call(rbind.data.frame, clusterApply(cl, airquality, sample2)) stopCluster(cl) class(airquality.extent2)
## [1] "data.frame"
length(airquality.extent2)
## [1] 6
str(airquality.extent2)
## 'data.frame': 600 obs. of 6 variables: ## $ Ozone : int NA 11 32 NA NA 96 77 13 NA 66 ... ## $ Solar.R: int 250 320 92 101 138 167 276 27 101 NA ... ## $ Wind : num 6.3 16.6 15.5 10.9 8 6.9 5.1 10.3 10.9 4.6 ... ## $ Temp : int 76 73 84 84 83 91 88 76 84 87 ... ## $ Month : int 6 5 9 7 6 9 7 9 7 8 ... ## $ Day : int 24 22 6 4 30 1 7 18 4 6 ...
Se define un conjunto de datos propio llamado dt, al cual se le asigna el dataframe airquality. Si dt no se exporta a cada procesador parLapply emitira un error. En este primer ejemplo se omite clusterExport.
rm(list = ls())
sample2 <- function(data) {
sample.data <- dt[sample(1:nrow(dt), 100, replace=TRUE),]
return(sample.data)}
dt<- airquality
library(parallel)
cl <- makeCluster(detectCores())
airquality.extent2 <- do.call(rbind.data.frame, parLapply(cl, dt, sample2))
## Error in checkForRemoteErrors(val): 4 nodes produced errors; first error: argument of length 0
stopCluster(cl)
En este segundo ejemplo se incluye ClusterExport, y funciona adecuadamente.
rm(list = ls())
sample2 <- function(data) {
sample.data <- dt[sample(1:nrow(dt), 100, replace=TRUE),]
return(sample.data)}
dt<- airquality
library(parallel)
cl <- makeCluster(detectCores())
clusterExport(cl, "dt")
airquality.extent2 <- do.call(rbind.data.frame, parLapply(cl, dt, sample2))
stopCluster(cl)
Sin Paralelizar ```
Sin paralelizar
library(boot)
x = c(109,88,96,96,109,116,114,96,85,100,113,117,107,104,101,81)
y = c(116,77,95,79,113,122,109,94,91,88,115,119,100,115,95,90)
datos = cbind(x, y)
cor2 <- function(data, indices) {
r <- cor(data[indices,1],data[indices,2])
return(r)}
results <- boot(data=datos,cor2, R=100)
Paralelizado
library(parallel) cl <- makeCluster(detectCores()) clusterSetRNGStream(cl, 123) #hacer reproducible la aleatoriedad library(boot) clusterEvalQ(cl, library(boot))
## [[1]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base" ## ## [[2]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base" ## ## [[3]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base" ## ## [[4]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base"
clusterExport(cl, c("datos", "cor2"))
funcion <- function(...) boot(datos,cor2, R=100)
boot1 <- parLapply(cl,datos,funcion) #listado
class(boot1)
## [1] "list"
boot2 <- do.call(c, parLapply(cl, datos, funcion)) class(boot2)
## [1] "boot"
length(boot2$t)
## [1] 3200
boot.ci(boot2, type = c("norm", "basic", "perc"), conf = 0.9)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 3200 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot2, conf = 0.9, type = c("norm", "basic",
## "perc"))
##
## Intervals :
## Level Normal Basic Percentile
## 90% ( 0.7290, 0.9164 ) ( 0.7370, 0.9209 ) ( 0.7363, 0.9201 )
## Calculations and Intervals on Original Scale
stopCluster(cl)
```
library(boot)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
cd4.boot <- boot(cd4, corr, R = 999, sim = "parametric",ran.gen = cd4.rg, mle = cd4.mle)
boot.ci(cd4.boot, type = c("norm", "basic", "perc"), conf = 0.9, h = atanh, hinv = tanh)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = cd4.boot, conf = 0.9, type = c("norm", "basic",
## "perc"), h = atanh, hinv = tanh)
##
## Intervals :
## Level Normal Basic Percentile
## 90% ( 0.4620, 0.8603 ) ( 0.4660, 0.8580 ) ( 0.4952, 0.8677 )
## Calculations on Transformed Scale; Intervals on Original Scale
library(parallel)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
run1 <- function(...) boot(cd4, corr, R = 500, sim = "parametric",ran.gen = cd4.rg, mle = cd4.mle)
mc <- 2 # set as appropriate for your hardware
## To make this reproducible:
set.seed(123, "L'Ecuyer")
cd4.boot <- do.call(c, mclapply(seq_len(mc), run1) )
boot.ci(cd4.boot, type = c("norm", "basic", "perc"), conf = 0.9, h = atanh, hinv = tanh)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = cd4.boot, conf = 0.9, type = c("norm", "basic",
## "perc"), h = atanh, hinv = tanh)
##
## Intervals :
## Level Normal Basic Percentile
## 90% ( 0.4664, 0.8625 ) ( 0.4584, 0.8647 ) ( 0.4753, 0.8700 )
## Calculations on Transformed Scale; Intervals on Original Scale
run1 <- function(...) {
library(boot)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
boot(cd4, corr, R = 500, sim = "parametric",
ran.gen = cd4.rg, mle = cd4.mle)
}
cl <- makeCluster(mc)
## make this reproducible
clusterSetRNGStream(cl, 123)
library(boot) # needed for c() method on master
cd4.boot <- do.call(c, parLapply(cl, seq_len(mc), run1) )
boot.ci(cd4.boot, type = c("norm", "basic", "perc"), conf = 0.9, h = atanh, hinv = tanh)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = cd4.boot, conf = 0.9, type = c("norm", "basic",
## "perc"), h = atanh, hinv = tanh)
##
## Intervals :
## Level Normal Basic Percentile
## 90% ( 0.4705, 0.8589 ) ( 0.4620, 0.8597 ) ( 0.4900, 0.8689 )
## Calculations on Transformed Scale; Intervals on Original Scale
stopCluster(cl)
Sin paralelizar
library(boot)
x = c(109,88,96,96,109,116,114,96,85,100,113,117,107,104,101,81)
y = c(116,77,95,79,113,122,109,94,91,88,115,119,100,115,95,90)
datos = cbind(x, y)
cor2 <- function(data, indices) {
r <- cor(data[indices,1],data[indices,2])
return(r)}
results <- boot(datos,cor2, R=100)
Paralelizado. Las clusterEvalQ y clusterExport son necesarias, sin embargo en este ejemplo no se ejecutarán para observar el error de respuesta.
library(parallel)
cl <- makeCluster(detectCores())
clusterSetRNGStream(cl, 123)
# clusterEvalQ(cl, library(boot)) #nesarios, pero no ejecutados para observar el error.
# clusterExport(cl, c("datos", "cor2"))
funcion <- function(...) boot(datos,cor2, R=100)
boot1 <- parLapply(cl,datos,funcion) #listado
## Error in checkForRemoteErrors(val): 4 nodes produced errors; first error: no se pudo encontrar la función "boot"
Paralelizado
clusterEvalQ(cl, library(boot))
## [[1]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base" ## ## [[2]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base" ## ## [[3]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base" ## ## [[4]] ## [1] "boot" "snow" "methods" "stats" "graphics" "grDevices" ## [7] "utils" "datasets" "base"
clusterExport(cl, c("datos", "cor2"))
boot1 <- parLapply(cl,datos,funcion) #listado
class(boot1)
## [1] "list"
boot1[[1]]
## ## ORDINARY NONPARAMETRIC BOOTSTRAP ## ## ## Call: ## boot(data = datos, statistic = cor2, R = 100) ## ## ## Bootstrap Statistics : ## original bias std. error ## t1* 0.8285718 0.00727428 0.05320471
Paralelizado. Alternativa
boot2 <- do.call(c, parLapply(cl, datos, funcion)) class(boot2)
## [1] "boot"
stopCluster(cl)
Datos iris de ancho y largo de petalos de tres especies de plantas. Tutorial kmeans https://datascienceplus.com/k-means-clustering-in-r/
library(ggplot2) ggplot(iris, aes(Petal.Length, Petal.Width)) + geom_point(aes(shape = iris$Species), size = 5)
## Ejemplo Final Agrupamiento con el algoritmo k-means de iris, usando dos de sus atributos (ancho y largo del pétalo). Se generan 3 grupos.
set.seed(4242)
clusters <- kmeans(iris[, c("Petal.Length", "Petal.Width")], 3)
clusters
## K-means clustering with 3 clusters of sizes 50, 48, 52 ## ## Cluster means: ## Petal.Length Petal.Width ## 1 1.462000 0.246000 ## 2 5.595833 2.037500 ## 3 4.269231 1.342308 ## ## Clustering vector: ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ## [71] 3 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 ## [106] 2 3 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 3 2 ## [141] 2 2 2 2 2 2 2 2 2 2 ## ## Within cluster sum of squares by cluster: ## [1] 2.02200 16.29167 13.05769 ## (between_SS / total_SS = 94.3 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault"
table(iris$Species, clusters$cluster) # Tabla de contingencia de cada clase por especies
## ## 1 2 3 ## setosa 50 0 0 ## versicolor 0 2 48 ## virginica 0 46 4
ggplot(iris, aes(Petal.Length, Petal.Width)) + geom_point(aes(color = as.factor(clusters$cluster), shape = iris$Species), size = 5)
## Ejemplo Final kmeans depende del número de clusters a determinar y el número de iteracciones para definir cada cluster. Se examinará esto más a profundidad con procesamiento paralelo.
library(parallel)
iris.cluster <- iris[,-5]
cl <- makeCluster(detectCores())
clusterExport(cl, 'iris.cluster')
worker <- function(centers, nstart) {
kmeans(iris.cluster, centers=centers, nstart=nstart)
}
myiter <- 3
nstarts <- rep(25, myiter)
nclus <- 2:5
g <- expand.grid(nstarts=nstarts, nclus=nclus)
g
## nstarts nclus ## 1 25 2 ## 2 25 2 ## 3 25 2 ## 4 25 3 ## 5 25 3 ## 6 25 3 ## 7 25 4 ## 8 25 4 ## 9 25 4 ## 10 25 5 ## 11 25 5 ## 12 25 5
results <- clusterMap(cl, worker, centers=g$nclus, nstart=g$nstarts) stopCluster(cl) results
## [[1]] ## K-means clustering with 2 clusters of sizes 97, 53 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 6.301031 2.886598 4.958763 1.695876 ## 2 5.005660 3.369811 1.560377 0.290566 ## ## Clustering vector: ## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 ## [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 ## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [141] 1 1 1 1 1 1 1 1 1 1 ## ## Within cluster sum of squares by cluster: ## [1] 123.79588 28.55208 ## (between_SS / total_SS = 77.6 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[2]] ## K-means clustering with 2 clusters of sizes 97, 53 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 6.301031 2.886598 4.958763 1.695876 ## 2 5.005660 3.369811 1.560377 0.290566 ## ## Clustering vector: ## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 ## [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 ## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [141] 1 1 1 1 1 1 1 1 1 1 ## ## Within cluster sum of squares by cluster: ## [1] 123.79588 28.55208 ## (between_SS / total_SS = 77.6 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[3]] ## K-means clustering with 2 clusters of sizes 53, 97 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.005660 3.369811 1.560377 0.290566 ## 2 6.301031 2.886598 4.958763 1.695876 ## ## Clustering vector: ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 ## [71] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 ## [106] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [141] 2 2 2 2 2 2 2 2 2 2 ## ## Within cluster sum of squares by cluster: ## [1] 28.55208 123.79588 ## (between_SS / total_SS = 77.6 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[4]] ## K-means clustering with 3 clusters of sizes 62, 38, 50 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.901613 2.748387 4.393548 1.433871 ## 2 6.850000 3.073684 5.742105 2.071053 ## 3 5.006000 3.428000 1.462000 0.246000 ## ## Clustering vector: ## [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ## [36] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [71] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 ## [106] 2 1 2 2 2 2 2 2 1 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 ## [141] 2 2 1 2 2 2 1 2 2 1 ## ## Within cluster sum of squares by cluster: ## [1] 39.82097 23.87947 15.15100 ## (between_SS / total_SS = 88.4 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[5]] ## K-means clustering with 3 clusters of sizes 50, 62, 38 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.006000 3.428000 1.462000 0.246000 ## 2 5.901613 2.748387 4.393548 1.433871 ## 3 6.850000 3.073684 5.742105 2.071053 ## ## Clustering vector: ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [71] 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 3 3 ## [106] 3 2 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 2 3 3 2 2 3 3 3 3 3 2 3 3 3 3 2 3 ## [141] 3 3 2 3 3 3 2 3 3 2 ## ## Within cluster sum of squares by cluster: ## [1] 15.15100 39.82097 23.87947 ## (between_SS / total_SS = 88.4 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[6]] ## K-means clustering with 3 clusters of sizes 50, 38, 62 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.006000 3.428000 1.462000 0.246000 ## 2 6.850000 3.073684 5.742105 2.071053 ## 3 5.901613 2.748387 4.393548 1.433871 ## ## Clustering vector: ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ## [71] 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 ## [106] 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 ## [141] 2 2 3 2 2 2 3 2 2 3 ## ## Within cluster sum of squares by cluster: ## [1] 15.15100 23.87947 39.82097 ## (between_SS / total_SS = 88.4 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[7]] ## K-means clustering with 4 clusters of sizes 32, 50, 28, 40 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 6.912500 3.100000 5.846875 2.131250 ## 2 5.006000 3.428000 1.462000 0.246000 ## 3 5.532143 2.635714 3.960714 1.228571 ## 4 6.252500 2.855000 4.815000 1.625000 ## ## Clustering vector: ## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 3 4 3 4 3 4 3 3 3 3 4 3 4 3 3 4 3 ## [71] 4 3 4 4 4 4 4 4 4 3 3 3 3 4 3 4 4 4 3 3 3 4 3 3 3 3 3 4 3 3 1 4 1 1 1 ## [106] 1 3 1 1 1 4 4 1 4 4 1 1 1 1 4 1 4 1 4 1 1 4 4 1 1 1 1 1 4 4 1 1 1 4 1 ## [141] 1 1 4 1 1 1 4 4 1 4 ## ## Within cluster sum of squares by cluster: ## [1] 18.703437 15.151000 9.749286 13.624750 ## (between_SS / total_SS = 91.6 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[8]] ## K-means clustering with 4 clusters of sizes 40, 50, 32, 28 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 6.252500 2.855000 4.815000 1.625000 ## 2 5.006000 3.428000 1.462000 0.246000 ## 3 6.912500 3.100000 5.846875 2.131250 ## 4 5.532143 2.635714 3.960714 1.228571 ## ## Clustering vector: ## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 4 1 4 1 4 1 4 4 4 4 1 4 1 4 4 1 4 ## [71] 1 4 1 1 1 1 1 1 1 4 4 4 4 1 4 1 1 1 4 4 4 1 4 4 4 4 4 1 4 4 3 1 3 3 3 ## [106] 3 4 3 3 3 1 1 3 1 1 3 3 3 3 1 3 1 3 1 3 3 1 1 3 3 3 3 3 1 1 3 3 3 1 3 ## [141] 3 3 1 3 3 3 1 1 3 1 ## ## Within cluster sum of squares by cluster: ## [1] 13.624750 15.151000 18.703437 9.749286 ## (between_SS / total_SS = 91.6 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[9]] ## K-means clustering with 4 clusters of sizes 32, 28, 50, 40 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 6.912500 3.100000 5.846875 2.131250 ## 2 5.532143 2.635714 3.960714 1.228571 ## 3 5.006000 3.428000 1.462000 0.246000 ## 4 6.252500 2.855000 4.815000 1.625000 ## ## Clustering vector: ## [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ## [36] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 2 4 2 4 2 4 2 2 2 2 4 2 4 2 2 4 2 ## [71] 4 2 4 4 4 4 4 4 4 2 2 2 2 4 2 4 4 4 2 2 2 4 2 2 2 2 2 4 2 2 1 4 1 1 1 ## [106] 1 2 1 1 1 4 4 1 4 4 1 1 1 1 4 1 4 1 4 1 1 4 4 1 1 1 1 1 4 4 1 1 1 4 1 ## [141] 1 1 4 1 1 1 4 4 1 4 ## ## Within cluster sum of squares by cluster: ## [1] 18.703437 9.749286 15.151000 13.624750 ## (between_SS / total_SS = 91.6 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[10]] ## K-means clustering with 5 clusters of sizes 25, 50, 12, 24, 39 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.508000 2.600000 3.908000 1.204000 ## 2 5.006000 3.428000 1.462000 0.246000 ## 3 7.475000 3.125000 6.300000 2.050000 ## 4 6.529167 3.058333 5.508333 2.162500 ## 5 6.207692 2.853846 4.746154 1.564103 ## ## Clustering vector: ## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 1 5 5 5 1 5 1 1 5 1 5 1 5 5 1 5 1 ## [71] 5 1 5 5 5 5 5 5 5 1 1 1 1 5 1 5 5 5 1 1 1 5 1 1 1 1 1 5 1 1 4 5 3 4 4 ## [106] 3 1 3 4 3 4 4 4 5 4 4 4 3 3 5 4 5 3 5 4 3 5 5 4 3 3 3 4 5 5 3 4 4 5 4 ## [141] 4 4 5 4 4 4 5 4 4 5 ## ## Within cluster sum of squares by cluster: ## [1] 8.36640 15.15100 4.65500 5.46250 12.81128 ## (between_SS / total_SS = 93.2 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[11]] ## K-means clustering with 5 clusters of sizes 39, 50, 24, 25, 12 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 6.207692 2.853846 4.746154 1.564103 ## 2 5.006000 3.428000 1.462000 0.246000 ## 3 6.529167 3.058333 5.508333 2.162500 ## 4 5.508000 2.600000 3.908000 1.204000 ## 5 7.475000 3.125000 6.300000 2.050000 ## ## Clustering vector: ## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 4 1 1 1 4 1 4 4 1 4 1 4 1 1 4 1 4 ## [71] 1 4 1 1 1 1 1 1 1 4 4 4 4 1 4 1 1 1 4 4 4 1 4 4 4 4 4 1 4 4 3 1 5 3 3 ## [106] 5 4 5 3 5 3 3 3 1 3 3 3 5 5 1 3 1 5 1 3 5 1 1 3 5 5 5 3 1 1 5 3 3 1 3 ## [141] 3 3 1 3 3 3 1 3 3 1 ## ## Within cluster sum of squares by cluster: ## [1] 12.81128 15.15100 5.46250 8.36640 4.65500 ## (between_SS / total_SS = 93.2 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" ## ## [[12]] ## K-means clustering with 5 clusters of sizes 37, 12, 50, 24, 27 ## ## Cluster means: ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 6.229730 2.851351 4.767568 1.572973 ## 2 7.475000 3.125000 6.300000 2.050000 ## 3 5.006000 3.428000 1.462000 0.246000 ## 4 6.529167 3.058333 5.508333 2.162500 ## 5 5.529630 2.622222 3.940741 1.218519 ## ## Clustering vector: ## [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ## [36] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 5 1 5 1 5 1 5 5 5 5 1 5 1 1 5 1 5 ## [71] 1 5 1 1 1 1 1 1 1 5 5 5 5 1 5 1 1 1 5 5 5 1 5 5 5 5 5 1 5 5 4 1 2 4 4 ## [106] 2 5 2 4 2 4 4 4 1 4 4 4 2 2 1 4 1 2 1 4 2 1 1 4 2 2 2 4 1 1 2 4 4 1 4 ## [141] 4 4 1 4 4 4 1 4 4 1 ## ## Within cluster sum of squares by cluster: ## [1] 11.963784 4.655000 15.151000 5.462500 9.228889 ## (between_SS / total_SS = 93.2 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault"