Analizar y predecir datos mediante la construcción de un modelo basado en el algoritmo de árbol de decisión CART (Clasification and Regresion Tree) utilizando la librería del mismo nombre CART.
Se presenta dos análisis con árboles de regresión
El primer ejercicio es con datos relacionados con consumo de gasolina de autos a partir de los datos table_b3 que pertenecen al paquete MPV.
El segundo ejercicio es con los datos de promedio de alumnos que se usó para regresión múltiple.
El modelo de arboles de regresión se puede comparar con otros modelos por ejemplo el de regresión múltiple.
Se utiliza la librería rpart().
Actualmente los términos Inteligencia Artificial y machine learning se usan indistintamente, se presenta una distinción: mientras que los primeros algoritmos de inteligencia artificial, como los utilizados por las máquinas de ajedrez, implementaron la toma de decisiones según reglas programables derivadas de la teoría o de los primeros principios, en machine learning las decisiones de aprendizaje se basan en algoritmos que se construyen con datos. (Irizarry 2021).
Los arboles de decisión son representaciones gráficas de posibles soluciones a una decisión basadas en ciertas condiciones, es uno de los algoritmos de aprendizaje supervisado más utilizados en machine learning y pueden realizar tareas de clasificación o regresión (bagnato2020?). Precisamente los árboles de decisión se interpretan amigablemente en comparación con otros métodos de regresión por su fácil interpretación y útil representación gráfica.
Sin embargo si el árbol está muy ramificado esto en lugar de ayudar estorba, pero para ello existen las reglas de decisión que soportan el gráfico del árbol.
Los árboles de regresión y/o clasificación fueron propuestos por Leo Breiman en el libro (Breiman et al. 1984) y son árboles de decisión que tienen como objetivo predecir la variable respuesta Y en función de covariables. (hernández2020?).
Los árboles se pueden clasificar en dos tipos que son:
Árboles de regresión en los cuales la variable respuesta o vaiable dependiente \(y\) es cuantitativa.
Árboles de clasificación en los cuales la variable respuesta o variable dependiente \(y\) es cualitativa.
Se necesita una variable dependiente \(Y\) y varias variables independiente \(x_1, x_2, x_3, …x_n\). Se genera un conjunto de reglas sucesivas que ayudan a tomar una decisión, con ellas se puede predecir valores.
Cada uno de los paquetes o librerías se se cargan deben estar previamente instalados con la función install.packages(“nombre del paquete o librería”)
El paquete rpart es uno de los paquetes que se pueden usar para crear árboles de regresión.
# Librerías para árboles
library(rpart) # Para crear arboles
library (MPV) # Previo install.packages("MPV")
library(tree) # Para crear arboles
library(party) # Para particionar datos
library(rpart.plot) # Para visualizar arbol
# Librerías varias
library(readr) # Leer datos
library(dplyr) # Operaciones con datos, select, filter, mutate, arrange, summarize group_by %>%
library(knitr) # Tablas amigables
library(ggplot2) # Gráficos
library(cowplot) # Varios gráficos en un mismo renglón
library(fdth) # para el calculo de distribución de frecuencias.
Se cargan datos de automóviles. En este ejemplo se busca encontrar un modelo de regresión lineal que explique la variable respuesta \(y\) en función de las covariables \(x_1, x_2, x_3 … x_{11}\).
Se trata de un conjunto de datos que tiene variables principalmente cuantitativas continuas, las variables de interés es el consumo de gasolina, es decir millas por galón de cada auto.
Se trata de predecir el consumo de gasolina con atributos similares de un nuevo automóvil.
Los datos se obtienen a partir del conjunto de datos table_b3 que pertenecen al paquete MPV y que se cargan a memoria automáticamente.
Imagen 1. Autos gasolina millas por galón. Fuente: (hernández2020?)
Se usa la tradicional variable llamada datos para identificarlos
datos <- table.b3
Se trata de un conjunto de datos que tiene variables principalmente cuantitativas continuas
summary(datos)
## y x1 x2 x3
## Min. :11.20 Min. : 85.3 Min. : 70.0 Min. : 81.0
## 1st Qu.:16.48 1st Qu.:211.5 1st Qu.:102.8 1st Qu.:171.2
## Median :19.30 Median :318.0 Median :141.5 Median :243.0
## Mean :20.22 Mean :285.0 Mean :136.9 Mean :217.9
## 3rd Qu.:21.66 3rd Qu.:353.2 3rd Qu.:166.2 3rd Qu.:258.8
## Max. :36.50 Max. :500.0 Max. :223.0 Max. :366.0
## NA's :2
## x4 x5 x6 x7
## Min. :7.600 Min. :2.450 Min. :1.000 Min. :3.000
## 1st Qu.:8.000 1st Qu.:2.710 1st Qu.:2.000 1st Qu.:3.000
## Median :8.250 Median :3.000 Median :2.000 Median :3.000
## Mean :8.281 Mean :3.055 Mean :2.594 Mean :3.344
## 3rd Qu.:8.500 3rd Qu.:3.228 3rd Qu.:4.000 3rd Qu.:3.250
## Max. :9.000 Max. :4.300 Max. :4.000 Max. :5.000
##
## x8 x9 x10 x11
## Min. :155.7 Min. :61.80 Min. :1905 Min. :0.0000
## 1st Qu.:175.2 1st Qu.:65.40 1st Qu.:2940 1st Qu.:0.0000
## Median :195.7 Median :72.00 Median :3755 Median :1.0000
## Mean :192.0 Mean :71.28 Mean :3587 Mean :0.7188
## 3rd Qu.:202.6 3rd Qu.:76.30 3rd Qu.:4215 3rd Qu.:1.0000
## Max. :231.0 Max. :79.80 Max. :5430 Max. :1.0000
##
str(datos)
## 'data.frame': 32 obs. of 12 variables:
## $ y : num 18.9 17 20 18.2 20.1 ...
## $ x1 : num 350 350 250 351 225 440 231 262 89.7 96.9 ...
## $ x2 : num 165 170 105 143 95 215 110 110 70 75 ...
## $ x3 : num 260 275 185 255 170 330 175 200 81 83 ...
## $ x4 : num 8 8.5 8.25 8 8.4 8.2 8 8.5 8.2 9 ...
## $ x5 : num 2.56 2.56 2.73 3 2.76 2.88 2.56 2.56 3.9 4.3 ...
## $ x6 : num 4 4 1 2 1 4 2 2 2 2 ...
## $ x7 : num 3 3 3 3 3 3 3 3 4 5 ...
## $ x8 : num 200 200 197 200 194 ...
## $ x9 : num 69.9 72.9 72.2 74 71.8 69 65.4 65.4 64 65 ...
## $ x10: num 3910 3860 3510 3890 3365 ...
## $ x11: num 1 1 1 1 0 1 1 1 0 0 ...
#kable(head(datos), caption = "Autos ,primeros seis")
#kable(tail(datos), caption = "Autos, últimos seis")
kable(datos, caption = "Los datos de autos", row.names = TRUE)
y | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 18.90 | 350.0 | 165 | 260 | 8.00 | 2.56 | 4 | 3 | 200.3 | 69.9 | 3910 | 1 |
2 | 17.00 | 350.0 | 170 | 275 | 8.50 | 2.56 | 4 | 3 | 199.6 | 72.9 | 3860 | 1 |
3 | 20.00 | 250.0 | 105 | 185 | 8.25 | 2.73 | 1 | 3 | 196.7 | 72.2 | 3510 | 1 |
4 | 18.25 | 351.0 | 143 | 255 | 8.00 | 3.00 | 2 | 3 | 199.9 | 74.0 | 3890 | 1 |
5 | 20.07 | 225.0 | 95 | 170 | 8.40 | 2.76 | 1 | 3 | 194.1 | 71.8 | 3365 | 0 |
6 | 11.20 | 440.0 | 215 | 330 | 8.20 | 2.88 | 4 | 3 | 184.5 | 69.0 | 4215 | 1 |
7 | 22.12 | 231.0 | 110 | 175 | 8.00 | 2.56 | 2 | 3 | 179.3 | 65.4 | 3020 | 1 |
8 | 21.47 | 262.0 | 110 | 200 | 8.50 | 2.56 | 2 | 3 | 179.3 | 65.4 | 3180 | 1 |
9 | 34.70 | 89.7 | 70 | 81 | 8.20 | 3.90 | 2 | 4 | 155.7 | 64.0 | 1905 | 0 |
10 | 30.40 | 96.9 | 75 | 83 | 9.00 | 4.30 | 2 | 5 | 165.2 | 65.0 | 2320 | 0 |
11 | 16.50 | 350.0 | 155 | 250 | 8.50 | 3.08 | 4 | 3 | 195.4 | 74.4 | 3885 | 1 |
12 | 36.50 | 85.3 | 80 | 83 | 8.50 | 3.89 | 2 | 4 | 160.6 | 62.2 | 2009 | 0 |
13 | 21.50 | 171.0 | 109 | 146 | 8.20 | 3.22 | 2 | 4 | 170.4 | 66.9 | 2655 | 0 |
14 | 19.70 | 258.0 | 110 | 195 | 8.00 | 3.08 | 1 | 3 | 171.5 | 77.0 | 3375 | 1 |
15 | 20.30 | 140.0 | 83 | 109 | 8.40 | 3.40 | 2 | 4 | 168.8 | 69.4 | 2700 | 0 |
16 | 17.80 | 302.0 | 129 | 220 | 8.00 | 3.00 | 2 | 3 | 199.9 | 74.0 | 3890 | 1 |
17 | 14.39 | 500.0 | 190 | 360 | 8.50 | 2.73 | 4 | 3 | 224.1 | 79.8 | 5290 | 1 |
18 | 14.89 | 440.0 | 215 | 330 | 8.20 | 2.71 | 4 | 3 | 231.0 | 79.7 | 5185 | 1 |
19 | 17.80 | 350.0 | 155 | 250 | 8.50 | 3.08 | 4 | 3 | 196.7 | 72.2 | 3910 | 1 |
20 | 16.41 | 318.0 | 145 | 255 | 8.50 | 2.45 | 2 | 3 | 197.6 | 71.0 | 3660 | 1 |
21 | 23.54 | 231.0 | 110 | 175 | 8.00 | 2.56 | 2 | 3 | 179.3 | 65.4 | 3050 | 1 |
22 | 21.47 | 360.0 | 180 | 290 | 8.40 | 2.45 | 2 | 3 | 214.2 | 76.3 | 4250 | 1 |
23 | 16.59 | 400.0 | 185 | NA | 7.60 | 3.08 | 4 | 3 | 196.0 | 73.0 | 3850 | 1 |
24 | 31.90 | 96.9 | 75 | 83 | 9.00 | 4.30 | 2 | 5 | 165.2 | 61.8 | 2275 | 0 |
25 | 29.40 | 140.0 | 86 | NA | 8.00 | 2.92 | 2 | 4 | 176.4 | 65.4 | 2150 | 0 |
26 | 13.27 | 460.0 | 223 | 366 | 8.00 | 3.00 | 4 | 3 | 228.0 | 79.8 | 5430 | 1 |
27 | 23.90 | 133.6 | 96 | 120 | 8.40 | 3.91 | 2 | 5 | 171.5 | 63.4 | 2535 | 0 |
28 | 19.73 | 318.0 | 140 | 255 | 8.50 | 2.71 | 2 | 3 | 215.3 | 76.3 | 4370 | 1 |
29 | 13.90 | 351.0 | 148 | 243 | 8.00 | 3.25 | 2 | 3 | 215.5 | 78.5 | 4540 | 1 |
30 | 13.27 | 351.0 | 148 | 243 | 8.00 | 3.26 | 2 | 3 | 216.1 | 78.5 | 4715 | 1 |
31 | 13.77 | 360.0 | 195 | 295 | 8.25 | 3.15 | 4 | 3 | 209.3 | 77.4 | 4215 | 1 |
32 | 16.50 | 360.0 | 165 | 255 | 8.50 | 2.73 | 4 | 3 | 185.2 | 69.0 | 3660 | 1 |
La variable dependiente o de respuesta \(Y\) es el consumo millas por galón.
Las demás se identifican como variable independientes
El proceso de preparación de los datos, implica limpieza a los mismos, algunas veces, se transforman variables, se decodifican o mapean variables, se encuentran valores nulos o NA y se tienen que tomar la decisión de dejarlos y modificarlos o quitarlos, depurar, alterar, estandarizar, escalar, entre otras acciones, en pocas palabras dejarlos listos para su tratamiento, es decir, se busca calidad en los datos.
El proceso de preparación normalmente es costoso en términos programación, para algunos puede ser tedioso y aburrido, sin embargo el significado es tal que si los datos están adecuados, entonces hay confianza de que los modelos que se construyen a partir de ellos serán adecuados.
Se observa que existen NA en los registros 23 y 25 de la columna X3.
Se determina la media de x3 de los datos sin considera los valores NA.
media <- mean(datos$x3, na.rm = TRUE)
media
## [1] 217.9
Se toma la decisión de actualizar los valores de acuerdo a la media
datos <- mutate(datos, x3 = ifelse(is.na(x3), media, x3))
kable(datos, caption = "Los datos de autos", row.names = TRUE)
y | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 18.90 | 350.0 | 165 | 260.0 | 8.00 | 2.56 | 4 | 3 | 200.3 | 69.9 | 3910 | 1 |
2 | 17.00 | 350.0 | 170 | 275.0 | 8.50 | 2.56 | 4 | 3 | 199.6 | 72.9 | 3860 | 1 |
3 | 20.00 | 250.0 | 105 | 185.0 | 8.25 | 2.73 | 1 | 3 | 196.7 | 72.2 | 3510 | 1 |
4 | 18.25 | 351.0 | 143 | 255.0 | 8.00 | 3.00 | 2 | 3 | 199.9 | 74.0 | 3890 | 1 |
5 | 20.07 | 225.0 | 95 | 170.0 | 8.40 | 2.76 | 1 | 3 | 194.1 | 71.8 | 3365 | 0 |
6 | 11.20 | 440.0 | 215 | 330.0 | 8.20 | 2.88 | 4 | 3 | 184.5 | 69.0 | 4215 | 1 |
7 | 22.12 | 231.0 | 110 | 175.0 | 8.00 | 2.56 | 2 | 3 | 179.3 | 65.4 | 3020 | 1 |
8 | 21.47 | 262.0 | 110 | 200.0 | 8.50 | 2.56 | 2 | 3 | 179.3 | 65.4 | 3180 | 1 |
9 | 34.70 | 89.7 | 70 | 81.0 | 8.20 | 3.90 | 2 | 4 | 155.7 | 64.0 | 1905 | 0 |
10 | 30.40 | 96.9 | 75 | 83.0 | 9.00 | 4.30 | 2 | 5 | 165.2 | 65.0 | 2320 | 0 |
11 | 16.50 | 350.0 | 155 | 250.0 | 8.50 | 3.08 | 4 | 3 | 195.4 | 74.4 | 3885 | 1 |
12 | 36.50 | 85.3 | 80 | 83.0 | 8.50 | 3.89 | 2 | 4 | 160.6 | 62.2 | 2009 | 0 |
13 | 21.50 | 171.0 | 109 | 146.0 | 8.20 | 3.22 | 2 | 4 | 170.4 | 66.9 | 2655 | 0 |
14 | 19.70 | 258.0 | 110 | 195.0 | 8.00 | 3.08 | 1 | 3 | 171.5 | 77.0 | 3375 | 1 |
15 | 20.30 | 140.0 | 83 | 109.0 | 8.40 | 3.40 | 2 | 4 | 168.8 | 69.4 | 2700 | 0 |
16 | 17.80 | 302.0 | 129 | 220.0 | 8.00 | 3.00 | 2 | 3 | 199.9 | 74.0 | 3890 | 1 |
17 | 14.39 | 500.0 | 190 | 360.0 | 8.50 | 2.73 | 4 | 3 | 224.1 | 79.8 | 5290 | 1 |
18 | 14.89 | 440.0 | 215 | 330.0 | 8.20 | 2.71 | 4 | 3 | 231.0 | 79.7 | 5185 | 1 |
19 | 17.80 | 350.0 | 155 | 250.0 | 8.50 | 3.08 | 4 | 3 | 196.7 | 72.2 | 3910 | 1 |
20 | 16.41 | 318.0 | 145 | 255.0 | 8.50 | 2.45 | 2 | 3 | 197.6 | 71.0 | 3660 | 1 |
21 | 23.54 | 231.0 | 110 | 175.0 | 8.00 | 2.56 | 2 | 3 | 179.3 | 65.4 | 3050 | 1 |
22 | 21.47 | 360.0 | 180 | 290.0 | 8.40 | 2.45 | 2 | 3 | 214.2 | 76.3 | 4250 | 1 |
23 | 16.59 | 400.0 | 185 | 217.9 | 7.60 | 3.08 | 4 | 3 | 196.0 | 73.0 | 3850 | 1 |
24 | 31.90 | 96.9 | 75 | 83.0 | 9.00 | 4.30 | 2 | 5 | 165.2 | 61.8 | 2275 | 0 |
25 | 29.40 | 140.0 | 86 | 217.9 | 8.00 | 2.92 | 2 | 4 | 176.4 | 65.4 | 2150 | 0 |
26 | 13.27 | 460.0 | 223 | 366.0 | 8.00 | 3.00 | 4 | 3 | 228.0 | 79.8 | 5430 | 1 |
27 | 23.90 | 133.6 | 96 | 120.0 | 8.40 | 3.91 | 2 | 5 | 171.5 | 63.4 | 2535 | 0 |
28 | 19.73 | 318.0 | 140 | 255.0 | 8.50 | 2.71 | 2 | 3 | 215.3 | 76.3 | 4370 | 1 |
29 | 13.90 | 351.0 | 148 | 243.0 | 8.00 | 3.25 | 2 | 3 | 215.5 | 78.5 | 4540 | 1 |
30 | 13.27 | 351.0 | 148 | 243.0 | 8.00 | 3.26 | 2 | 3 | 216.1 | 78.5 | 4715 | 1 |
31 | 13.77 | 360.0 | 195 | 295.0 | 8.25 | 3.15 | 4 | 3 | 209.3 | 77.4 | 4215 | 1 |
32 | 16.50 | 360.0 | 165 | 255.0 | 8.50 | 2.73 | 4 | 3 | 185.2 | 69.0 | 3660 | 1 |
modelo <- rpart(formula = y ~ ., data=datos )
modelo
## n= 32
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 32 1237.54400 20.22312
## 2) x10>=2677.5 25 245.58540 17.55360
## 4) x2>=144 15 93.59736 15.72400 *
## 5) x2< 144 10 26.45916 20.29800 *
## 3) x10< 2677.5 7 177.51710 29.75714 *
summary(modelo)
## Call:
## rpart(formula = y ~ ., data = datos)
## n= 32
##
## CP nsplit rel error xerror xstd
## 1 0.6581112 0 1.0000000 1.0818024 0.3022067
## 2 0.1014338 1 0.3418888 0.5481741 0.1175875
## 3 0.0100000 2 0.2404550 0.5243398 0.1222841
##
## Variable importance
## x10 x1 x7 x3 x8 x5 x2 x6
## 20 18 16 15 14 13 3 2
##
## Node number 1: 32 observations, complexity param=0.6581112
## mean=20.22312, MSE=38.67325
## left son=2 (25 obs) right son=3 (7 obs)
## Primary splits:
## x10 < 2677.5 to the right, improve=0.6581112, (0 missing)
## x9 < 66.15 to the right, improve=0.6461613, (0 missing)
## x1 < 155.5 to the right, improve=0.6346573, (0 missing)
## x7 < 3.5 to the left, improve=0.6012236, (0 missing)
## x2 < 100.5 to the right, improve=0.5757638, (0 missing)
## Surrogate splits:
## x1 < 198 to the right, agree=0.969, adj=0.857, (0 split)
## x7 < 3.5 to the left, agree=0.969, adj=0.857, (0 split)
## x3 < 158 to the right, agree=0.938, adj=0.714, (0 split)
## x5 < 3.645 to the left, agree=0.938, adj=0.714, (0 split)
## x8 < 177.85 to the right, agree=0.938, adj=0.714, (0 split)
##
## Node number 2: 25 observations, complexity param=0.1014338
## mean=17.5536, MSE=9.823415
## left son=4 (15 obs) right son=5 (10 obs)
## Primary splits:
## x2 < 144 to the right, improve=0.5111414, (0 missing)
## x1 < 282 to the right, improve=0.4780414, (0 missing)
## x10 < 3585 to the right, improve=0.4780414, (0 missing)
## x3 < 208.95 to the right, improve=0.4780414, (0 missing)
## x6 < 3 to the right, improve=0.3281191, (0 missing)
## Surrogate splits:
## x1 < 310 to the right, agree=0.92, adj=0.8, (0 split)
## x3 < 208.95 to the right, agree=0.88, adj=0.7, (0 split)
## x10 < 3585 to the right, agree=0.88, adj=0.7, (0 split)
## x6 < 3 to the right, agree=0.84, adj=0.6, (0 split)
## x8 < 181.9 to the right, agree=0.80, adj=0.5, (0 split)
##
## Node number 3: 7 observations
## mean=29.75714, MSE=25.35959
##
## Node number 4: 15 observations
## mean=15.724, MSE=6.239824
##
## Node number 5: 10 observations
## mean=20.298, MSE=2.645916
prp(modelo)
# mas amigable
prp(modelo, main="Arbol de regresión",
nn = TRUE, # display the node numbers
fallen.leaves = TRUE, # put the leaves on the bottom of the page
shadow.col = "gray", # shadows under the leaves
branch.lty = 3, # draw branches using dotted lines
branch = .5, # change angle of branch lines
faclen = 0, # faclen = 0 to print full factor names
trace = 1, # print the auto calculated cex, xlim, ylim
split.cex = 1.2, # make the split text larger than the node text
split.prefix = "is ", # put "is " before split text
split.suffix = "?", # put "?" after split text
split.box.col = "lightblue", # lightgray split boxes (default is white)
split.border.col = "darkgray", # darkgray border on split boxes
split.round = 0.5) # round the split box corners a tad
## cex 1 xlim c(-0.65, 1.65) ylim c(-0.15, 1.15)
Usando la información del árbol anterior es posible predecir el valor de \(Y\). Por ejemplo:
Si una nueva observación tiene \(x_{10} = 2000\) y \(x_2 = 150\), entonces la predicción \(\hat{y}=30\).
Si otra observación tiene \(x_{10} = 3000\) y \(x_2 = 150\), entonces la predicción \(\hat{y}=16\).
Como en el árbol anterior solo aparecen las variables \(x_2\) y \(x_9\) se recomienda volver a construir el árbol sólo con ellas. (hernández2020a?).
Se hace un modelo2 solo con las variable más importantes.
modelo2 <- rpart(formula = y ~ x2 + x10, data=datos )
modelo2
## n= 32
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 32 1237.54400 20.22312
## 2) x10>=2677.5 25 245.58540 17.55360
## 4) x2>=144 15 93.59736 15.72400 *
## 5) x2< 144 10 26.45916 20.29800 *
## 3) x10< 2677.5 7 177.51710 29.75714 *
summary(modelo2)
## Call:
## rpart(formula = y ~ x2 + x10, data = datos)
## n= 32
##
## CP nsplit rel error xerror xstd
## 1 0.6581112 0 1.0000000 1.0239092 0.2903079
## 2 0.1014338 1 0.3418888 0.4615233 0.1093546
## 3 0.0100000 2 0.2404550 0.3855231 0.1102611
##
## Variable importance
## x10 x2
## 60 40
##
## Node number 1: 32 observations, complexity param=0.6581112
## mean=20.22312, MSE=38.67325
## left son=2 (25 obs) right son=3 (7 obs)
## Primary splits:
## x10 < 2677.5 to the right, improve=0.6581112, (0 missing)
## x2 < 100.5 to the right, improve=0.5757638, (0 missing)
## Surrogate splits:
## x2 < 81.5 to the right, agree=0.906, adj=0.571, (0 split)
##
## Node number 2: 25 observations, complexity param=0.1014338
## mean=17.5536, MSE=9.823415
## left son=4 (15 obs) right son=5 (10 obs)
## Primary splits:
## x2 < 144 to the right, improve=0.5111414, (0 missing)
## x10 < 3585 to the right, improve=0.4780414, (0 missing)
## Surrogate splits:
## x10 < 3585 to the right, agree=0.88, adj=0.7, (0 split)
##
## Node number 3: 7 observations
## mean=29.75714, MSE=25.35959
##
## Node number 4: 15 observations
## mean=15.724, MSE=6.239824
##
## Node number 5: 10 observations
## mean=20.298, MSE=2.645916
prp(modelo2, nn = TRUE, fallen.leaves = TRUE,
split.box.col = "lightblue",
split.border.col = "darkgray",
split.round = 5, round = 5)
Nueavamente es posible predecir el valor de \(Y\). Por ejemplo:
Si una nueva observación 1 tiene \(x_{10} = 2000\) y \(x_2 = 150\), entonces la predicción \(\hat{y}=30\).
Si otra observación 2 tiene \(x_{10} = 3000\) y \(x_2 = 150\), entonces la predicción \(\hat{y}=16\).
¿Qué sucede se tienen datos nuevo?, ¿se puede predecir con el modelo construido?
Para con valores de variables \(x_{10}\) y \(x_2\) siendo estas las más importantes.
Se inicializan las variables \(x_{10}\) y \(x_2\)
x10 <- c(2000, 3000)
x2 <- c(150, 150)
nuevos_datos <- data.frame(x2, x10)
kable(nuevos_datos, caption = "Nuevos datos")
x2 | x10 |
---|---|
150 | 2000 |
150 | 3000 |
Las predicciones
predict(object=modelo2, newdata=nuevos_datos)
## 1 2
## 29.75714 15.72400
Se hace una correlación entre las predicciones de todos los datos contra las predicciones de los nuevos datos
La variable prediccion sería precisamente la predicción de todos los datos \(\hat{y}\) y \(Y\) sería los valores originales de los datos
Se hace una correlación, entendiendo el significado el grado de relación lineal entre dos variables el coeficiente de correlación como una medida descriptiva de la intensidad de la relación lineal entre dos variables. (Anderson, Sweeney, and Williams 2008).
prediccion <- predict(object=modelo2, newdata=datos)
kable(data.frame(Y= datos$y, prediccion), caption = "Relacionando variables Y y prediccion")
Y | prediccion |
---|---|
18.90 | 15.72400 |
17.00 | 15.72400 |
20.00 | 20.29800 |
18.25 | 20.29800 |
20.07 | 20.29800 |
11.20 | 15.72400 |
22.12 | 20.29800 |
21.47 | 20.29800 |
34.70 | 29.75714 |
30.40 | 29.75714 |
16.50 | 15.72400 |
36.50 | 29.75714 |
21.50 | 29.75714 |
19.70 | 20.29800 |
20.30 | 20.29800 |
17.80 | 20.29800 |
14.39 | 15.72400 |
14.89 | 15.72400 |
17.80 | 15.72400 |
16.41 | 15.72400 |
23.54 | 20.29800 |
21.47 | 15.72400 |
16.59 | 15.72400 |
31.90 | 29.75714 |
29.40 | 29.75714 |
13.27 | 15.72400 |
23.90 | 29.75714 |
19.73 | 20.29800 |
13.90 | 15.72400 |
13.27 | 15.72400 |
13.77 | 15.72400 |
16.50 | 15.72400 |
correla <- cor(prediccion, datos$y)
correla
## [1] 0.8715188
Al haber construido y visualiza el primer modelo, se observan que \(x_2\) y \(x_{10}\) son las variables más importantes en el modelo.
Se construye un segundo modelo con lo cual se hacen predicciones sobre datos nuevos
La evaluación se hace con una correlación entre las predicciones de todos los datos contra los valores reales y se encuentra una correlación de 0.8715188 que significa relación positiva fuerte y de acuerdo (Hernández Sampieri, Fernández Collado, and Baptista Lucio 2014), si es mayor \(0.75\) entonces se considera una correlación positiva considerable.
Imagen 2. Personas: Fuente google.com
Se busca conocer cuales variables independientes son las más importantes, así como la predicción y evaluación del modelo de acuerdo a la correlación entre la prediccion de los datos y los datos reales de \(Y\).
set.seed(2021)
n <- 362
dinero <- c(round(rnorm(n = 164, mean = 25, sd = 1), 2), round(rnorm(n = 100, mean = 28, sd = 1), 2), round(rnorm(n = 100, mean = 34, sd = 1), 2) )
#dinero
promedio <- promedio <- c(round(rnorm(n=91, mean = 90, sd = 4),0), round(rnorm(n=91, mean = 85, sd = 4),0), round(rnorm(n=91, mean = 80, sd = 4),0), round(rnorm(n=91, mean = 75, sd = 4),0))
# promedio
emocional <- c(round(rnorm(n=91, mean = 80, sd = 10),0), round(rnorm(n=91, mean = 70, sd = 10),0), round(rnorm(n=91, mean = 60, sd = 10),0), round(rnorm(n=91, mean = 50, sd = 10),0))
social <- c(round(rnorm(n=100, mean = 70, sd = 15),0),round(rnorm(n=100, mean = 60, sd = 15),0), round(rnorm(n=164, mean = 50, sd = 15),0) )
bienestar <- c(round(rnorm(n=100, mean = 80, sd = 10),0), round(rnorm(n=100, mean = 60, sd = 10),0), round(rnorm(n=164, mean = 40, sd = 10),0))
datos <- data.frame(dinero, emocional, social, bienestar, promedio)
kable(datos, caption = "Datos de estudaintes")
dinero | emocional | social | bienestar | promedio |
---|---|---|---|---|
24.88 | 92 | 70 | 91 | 87 |
25.55 | 68 | 62 | 87 | 89 |
25.35 | 76 | 55 | 100 | 85 |
25.36 | 85 | 74 | 72 | 85 |
25.90 | 75 | 60 | 93 | 91 |
23.08 | 81 | 106 | 85 | 86 |
25.26 | 75 | 60 | 74 | 92 |
25.92 | 81 | 69 | 93 | 88 |
25.01 | 75 | 61 | 89 | 90 |
26.73 | 73 | 54 | 76 | 88 |
23.92 | 81 | 84 | 83 | 94 |
24.73 | 82 | 98 | 76 | 93 |
25.18 | 80 | 94 | 75 | 87 |
26.51 | 83 | 75 | 81 | 95 |
26.60 | 78 | 85 | 91 | 89 |
23.16 | 89 | 88 | 79 | 88 |
26.62 | 84 | 73 | 72 | 93 |
25.13 | 70 | 74 | 94 | 91 |
26.48 | 105 | 66 | 71 | 93 |
26.51 | 88 | 79 | 74 | 86 |
24.06 | 78 | 67 | 74 | 82 |
24.81 | 88 | 58 | 85 | 90 |
23.90 | 97 | 48 | 93 | 88 |
26.21 | 82 | 47 | 87 | 95 |
23.38 | 88 | 52 | 84 | 90 |
25.11 | 91 | 80 | 67 | 92 |
23.54 | 72 | 69 | 76 | 88 |
24.65 | 84 | 72 | 84 | 84 |
24.91 | 91 | 38 | 74 | 90 |
26.10 | 78 | 58 | 86 | 94 |
23.04 | 80 | 64 | 86 | 86 |
23.55 | 73 | 76 | 83 | 92 |
26.02 | 80 | 78 | 67 | 91 |
23.58 | 76 | 52 | 70 | 96 |
24.40 | 85 | 34 | 67 | 100 |
23.42 | 75 | 71 | 89 | 89 |
23.71 | 90 | 108 | 99 | 87 |
23.55 | 81 | 46 | 103 | 91 |
24.91 | 69 | 87 | 81 | 92 |
25.50 | 80 | 78 | 87 | 88 |
25.12 | 71 | 45 | 79 | 94 |
26.76 | 87 | 44 | 103 | 84 |
24.65 | 91 | 88 | 78 | 91 |
27.12 | 88 | 68 | 80 | 93 |
24.97 | 86 | 84 | 86 | 92 |
24.21 | 69 | 84 | 74 | 90 |
26.48 | 90 | 57 | 87 | 86 |
24.27 | 73 | 62 | 81 | 87 |
25.31 | 75 | 76 | 83 | 85 |
25.69 | 103 | 55 | 76 | 93 |
24.50 | 77 | 92 | 97 | 88 |
22.74 | 83 | 73 | 81 | 97 |
25.04 | 90 | 86 | 81 | 97 |
24.63 | 95 | 99 | 95 | 89 |
24.04 | 77 | 48 | 87 | 92 |
25.10 | 74 | 69 | 97 | 86 |
25.43 | 81 | 86 | 106 | 95 |
24.83 | 84 | 94 | 80 | 92 |
23.45 | 101 | 48 | 75 | 91 |
23.49 | 72 | 68 | 92 | 84 |
25.02 | 83 | 55 | 82 | 93 |
24.81 | 74 | 52 | 78 | 95 |
25.39 | 91 | 89 | 78 | 94 |
24.24 | 79 | 71 | 86 | 95 |
25.23 | 84 | 60 | 89 | 98 |
24.02 | 74 | 55 | 72 | 97 |
25.57 | 97 | 70 | 85 | 98 |
26.62 | 69 | 61 | 74 | 94 |
24.75 | 91 | 89 | 77 | 86 |
23.94 | 104 | 72 | 74 | 97 |
24.65 | 57 | 84 | 95 | 88 |
24.96 | 81 | 77 | 77 | 96 |
23.60 | 86 | 75 | 68 | 88 |
26.49 | 69 | 80 | 78 | 89 |
23.96 | 86 | 80 | 74 | 90 |
24.76 | 92 | 59 | 87 | 98 |
24.00 | 87 | 73 | 89 | 91 |
23.61 | 73 | 41 | 77 | 94 |
25.98 | 88 | 86 | 91 | 86 |
25.36 | 66 | 91 | 77 | 90 |
24.66 | 74 | 94 | 81 | 91 |
24.36 | 94 | 83 | 92 | 92 |
22.83 | 67 | 73 | 89 | 92 |
25.63 | 90 | 64 | 76 | 93 |
24.86 | 73 | 56 | 85 | 91 |
23.76 | 68 | 45 | 65 | 98 |
25.53 | 88 | 73 | 68 | 95 |
23.41 | 77 | 78 | 101 | 86 |
24.01 | 62 | 36 | 85 | 88 |
25.48 | 79 | 58 | 83 | 90 |
25.81 | 84 | 93 | 84 | 90 |
24.71 | 73 | 51 | 81 | 81 |
24.95 | 82 | 71 | 112 | 81 |
25.74 | 62 | 62 | 75 | 86 |
25.01 | 75 | 90 | 68 | 89 |
24.88 | 69 | 89 | 83 | 94 |
24.35 | 79 | 67 | 78 | 91 |
24.13 | 68 | 73 | 65 | 87 |
24.49 | 63 | 72 | 71 | 82 |
22.92 | 63 | 47 | 89 | 86 |
24.74 | 48 | 73 | 76 | 81 |
25.45 | 63 | 55 | 62 | 85 |
24.86 | 62 | 60 | 59 | 91 |
24.51 | 68 | 90 | 60 | 88 |
23.80 | 57 | 60 | 48 | 88 |
25.05 | 75 | 58 | 50 | 91 |
24.87 | 76 | 58 | 59 | 86 |
22.30 | 70 | 73 | 61 | 85 |
24.43 | 87 | 47 | 61 | 92 |
25.59 | 81 | 74 | 62 | 84 |
25.49 | 68 | 79 | 57 | 84 |
24.87 | 63 | 64 | 52 | 77 |
23.74 | 53 | 61 | 55 | 87 |
25.20 | 78 | 60 | 50 | 82 |
23.08 | 66 | 64 | 58 | 84 |
26.67 | 41 | 80 | 71 | 93 |
25.47 | 77 | 59 | 53 | 86 |
26.41 | 73 | 66 | 65 | 82 |
25.08 | 91 | 81 | 94 | 80 |
23.20 | 75 | 70 | 57 | 83 |
25.75 | 72 | 38 | 65 | 97 |
24.69 | 85 | 63 | 72 | 86 |
23.27 | 55 | 78 | 63 | 82 |
22.86 | 63 | 56 | 65 | 90 |
27.37 | 67 | 54 | 70 | 87 |
25.48 | 61 | 48 | 58 | 81 |
26.09 | 68 | 57 | 61 | 83 |
25.30 | 87 | 50 | 55 | 81 |
26.02 | 58 | 76 | 54 | 89 |
27.45 | 70 | 76 | 57 | 85 |
24.75 | 62 | 92 | 80 | 78 |
25.54 | 59 | 76 | 62 | 88 |
25.20 | 80 | 66 | 60 | 88 |
22.93 | 77 | 66 | 60 | 92 |
25.51 | 63 | 70 | 53 | 91 |
24.59 | 74 | 41 | 64 | 85 |
25.36 | 88 | 61 | 80 | 86 |
24.67 | 88 | 67 | 69 | 79 |
25.08 | 59 | 73 | 68 | 85 |
24.74 | 74 | 43 | 73 | 80 |
24.12 | 50 | 66 | 62 | 83 |
25.74 | 63 | 52 | 57 | 87 |
22.32 | 64 | 52 | 56 | 82 |
24.05 | 68 | 46 | 56 | 85 |
25.45 | 74 | 50 | 74 | 86 |
23.71 | 78 | 27 | 65 | 92 |
24.84 | 79 | 77 | 82 | 86 |
25.35 | 74 | 58 | 53 | 82 |
24.94 | 58 | 69 | 51 | 80 |
26.48 | 79 | 63 | 70 | 86 |
24.35 | 85 | 68 | 70 | 81 |
24.74 | 65 | 59 | 67 | 77 |
23.75 | 60 | 43 | 67 | 81 |
25.77 | 63 | 28 | 54 | 82 |
24.09 | 84 | 40 | 50 | 82 |
24.31 | 73 | 58 | 55 | 98 |
24.38 | 98 | 89 | 63 | 81 |
25.76 | 59 | 62 | 79 | 80 |
23.91 | 70 | 73 | 61 | 87 |
24.60 | 87 | 62 | 58 | 82 |
25.83 | 67 | 64 | 60 | 86 |
25.36 | 76 | 70 | 63 | 80 |
25.16 | 59 | 78 | 62 | 90 |
25.96 | 61 | 96 | 37 | 90 |
27.66 | 63 | 83 | 75 | 95 |
27.27 | 83 | 42 | 47 | 91 |
26.30 | 75 | 68 | 63 | 84 |
29.95 | 62 | 62 | 69 | 89 |
30.67 | 71 | 28 | 66 | 85 |
30.06 | 85 | 61 | 66 | 87 |
28.82 | 81 | 61 | 68 | 83 |
27.92 | 68 | 59 | 70 | 89 |
27.51 | 60 | 65 | 53 | 76 |
28.85 | 79 | 54 | 71 | 80 |
27.04 | 77 | 60 | 65 | 81 |
28.93 | 80 | 65 | 47 | 90 |
28.38 | 79 | 67 | 39 | 84 |
29.49 | 82 | 64 | 75 | 82 |
27.53 | 70 | 81 | 58 | 83 |
28.26 | 67 | 71 | 42 | 79 |
27.01 | 59 | 67 | 65 | 89 |
26.94 | 57 | 36 | 57 | 85 |
28.27 | 82 | 55 | 70 | 76 |
28.95 | 65 | 62 | 67 | 74 |
28.73 | 38 | 62 | 56 | 83 |
27.75 | 49 | 45 | 67 | 73 |
29.49 | 60 | 64 | 64 | 76 |
28.23 | 56 | 54 | 70 | 84 |
28.28 | 76 | 61 | 73 | 77 |
28.15 | 62 | 90 | 56 | 84 |
26.80 | 63 | 45 | 75 | 81 |
28.09 | 51 | 30 | 52 | 77 |
29.22 | 63 | 62 | 69 | 80 |
27.44 | 62 | 60 | 80 | 77 |
28.34 | 54 | 27 | 64 | 78 |
26.46 | 60 | 44 | 75 | 80 |
27.76 | 73 | 52 | 59 | 78 |
28.51 | 60 | 68 | 52 | 81 |
27.76 | 62 | 73 | 61 | 84 |
28.58 | 46 | 51 | 67 | 84 |
28.27 | 44 | 30 | 31 | 79 |
26.66 | 62 | 41 | 45 | 83 |
27.15 | 66 | 47 | 37 | 74 |
27.59 | 48 | 55 | 40 | 78 |
27.33 | 57 | 63 | 37 | 80 |
27.90 | 60 | 39 | 35 | 73 |
27.08 | 56 | 42 | 34 | 74 |
28.44 | 66 | 18 | 33 | 83 |
27.99 | 60 | 45 | 69 | 84 |
27.35 | 55 | 55 | 43 | 82 |
29.34 | 82 | 54 | 50 | 74 |
28.33 | 52 | 40 | 33 | 81 |
28.00 | 45 | 58 | 44 | 84 |
29.10 | 64 | 90 | 51 | 80 |
29.16 | 71 | 41 | 53 | 78 |
28.10 | 49 | 28 | 43 | 87 |
27.61 | 78 | 53 | 37 | 77 |
27.59 | 66 | 63 | 47 | 81 |
27.09 | 57 | 36 | 43 | 83 |
27.46 | 79 | 65 | 50 | 80 |
28.12 | 43 | 42 | 38 | 68 |
29.73 | 56 | 45 | 50 | 80 |
27.96 | 57 | 53 | 36 | 84 |
28.54 | 61 | 69 | 66 | 77 |
29.92 | 52 | 36 | 52 | 76 |
27.76 | 51 | 55 | 42 | 81 |
29.57 | 56 | 47 | 27 | 77 |
28.48 | 69 | 56 | 31 | 76 |
28.04 | 50 | 74 | 33 | 80 |
28.44 | 50 | 58 | 32 | 84 |
26.12 | 43 | 51 | 51 | 78 |
26.28 | 82 | 29 | 39 | 78 |
29.88 | 69 | 75 | 49 | 78 |
27.97 | 71 | 36 | 41 | 82 |
29.00 | 55 | 83 | 32 | 84 |
28.30 | 55 | 32 | 56 | 77 |
27.99 | 71 | 55 | 43 | 81 |
27.76 | 45 | 11 | 46 | 79 |
26.23 | 71 | 50 | 42 | 75 |
27.84 | 59 | 62 | 33 | 80 |
29.92 | 28 | 55 | 52 | 81 |
29.11 | 65 | 54 | 39 | 86 |
26.01 | 53 | 67 | 30 | 85 |
28.81 | 43 | 68 | 31 | 74 |
29.10 | 77 | 53 | 29 | 84 |
26.56 | 53 | 68 | 47 | 74 |
29.22 | 50 | 46 | 46 | 74 |
27.22 | 45 | 53 | 34 | 78 |
25.93 | 62 | 60 | 50 | 83 |
28.33 | 44 | 40 | 54 | 74 |
28.26 | 47 | 44 | 62 | 75 |
27.57 | 65 | 61 | 63 | 83 |
26.13 | 63 | 49 | 48 | 74 |
27.20 | 59 | 30 | 30 | 78 |
28.33 | 73 | 28 | 46 | 87 |
28.01 | 61 | 52 | 42 | 82 |
28.85 | 73 | 59 | 43 | 76 |
26.47 | 50 | 57 | 35 | 79 |
27.97 | 62 | 53 | 45 | 82 |
29.43 | 53 | 36 | 45 | 76 |
27.07 | 64 | 22 | 40 | 78 |
29.01 | 55 | 64 | 37 | 78 |
27.91 | 59 | 46 | 38 | 78 |
28.94 | 64 | 43 | 46 | 80 |
33.48 | 59 | 15 | 31 | 83 |
33.07 | 62 | 47 | 41 | 77 |
35.23 | 75 | 57 | 49 | 79 |
33.76 | 52 | 36 | 48 | 79 |
34.27 | 71 | 55 | 28 | 80 |
35.43 | 72 | 54 | 44 | 86 |
32.86 | 58 | 34 | 48 | 85 |
34.90 | 66 | 18 | 45 | 86 |
34.08 | 68 | 48 | 22 | 80 |
34.35 | 51 | 49 | 27 | 73 |
32.74 | 27 | 50 | 44 | 76 |
34.84 | 57 | 58 | 48 | 82 |
32.74 | 14 | 44 | 31 | 78 |
34.15 | 48 | 37 | 40 | 74 |
33.52 | 49 | 28 | 57 | 76 |
36.21 | 48 | 30 | 30 | 77 |
34.08 | 64 | 49 | 29 | 77 |
34.72 | 65 | 19 | 36 | 75 |
35.05 | 74 | 63 | 23 | 80 |
34.28 | 46 | 40 | 40 | 72 |
33.58 | 60 | 60 | 55 | 76 |
36.52 | 37 | 44 | 39 | 75 |
34.00 | 51 | 45 | 48 | 76 |
32.44 | 48 | 54 | 37 | 74 |
32.01 | 20 | 44 | 45 | 74 |
32.51 | 49 | 42 | 37 | 72 |
34.26 | 32 | 58 | 59 | 77 |
33.19 | 41 | 35 | 23 | 73 |
34.83 | 52 | 32 | 28 | 82 |
32.85 | 36 | 24 | 37 | 72 |
33.86 | 44 | 58 | 49 | 72 |
35.18 | 58 | 63 | 43 | 78 |
34.19 | 53 | 55 | 36 | 77 |
33.31 | 57 | 42 | 24 | 77 |
33.54 | 58 | 43 | 43 | 73 |
33.52 | 42 | 50 | 47 | 69 |
35.21 | 57 | 62 | 35 | 75 |
34.99 | 57 | 42 | 26 | 72 |
35.54 | 49 | 48 | 55 | 78 |
34.29 | 48 | 32 | 43 | 78 |
34.51 | 36 | 53 | 56 | 76 |
33.46 | 46 | 47 | 25 | 75 |
33.33 | 34 | 64 | 37 | 68 |
34.68 | 38 | 57 | 38 | 74 |
34.19 | 54 | 33 | 36 | 77 |
32.61 | 52 | 57 | 48 | 83 |
32.94 | 53 | 68 | 39 | 82 |
34.37 | 65 | 35 | 30 | 74 |
33.00 | 38 | 77 | 21 | 70 |
33.96 | 39 | 44 | 39 | 75 |
33.99 | 58 | 70 | 39 | 76 |
32.96 | 53 | 65 | 52 | 71 |
33.67 | 44 | 57 | 57 | 82 |
33.39 | 75 | 30 | 38 | 76 |
35.52 | 51 | 47 | 64 | 75 |
34.30 | 63 | 75 | 33 | 77 |
34.40 | 50 | 53 | 47 | 70 |
34.75 | 41 | 58 | 33 | 80 |
33.91 | 29 | 21 | 60 | 80 |
32.91 | 43 | 74 | 38 | 82 |
33.91 | 57 | 23 | 20 | 79 |
34.47 | 40 | 51 | 43 | 73 |
34.00 | 51 | 16 | 62 | 78 |
34.62 | 53 | 38 | 48 | 73 |
33.35 | 49 | 57 | 36 | 77 |
34.45 | 41 | 72 | 35 | 82 |
33.82 | 60 | 56 | 46 | 78 |
33.19 | 47 | 45 | 38 | 74 |
32.19 | 45 | 72 | 43 | 74 |
33.22 | 71 | 64 | 45 | 71 |
32.01 | 61 | 71 | 46 | 71 |
35.00 | 36 | 62 | 41 | 77 |
32.96 | 46 | 20 | 30 | 77 |
34.20 | 57 | 41 | 48 | 79 |
34.28 | 44 | 43 | 35 | 80 |
34.16 | 44 | 41 | 35 | 78 |
32.41 | 35 | 55 | 33 | 74 |
34.55 | 66 | 49 | 14 | 66 |
32.63 | 51 | 47 | 38 | 72 |
32.88 | 42 | 30 | 51 | 81 |
34.16 | 41 | 59 | 45 | 78 |
35.51 | 71 | 52 | 49 | 75 |
33.65 | 45 | 24 | 36 | 89 |
32.03 | 41 | 40 | 55 | 78 |
33.15 | 42 | 67 | 20 | 72 |
32.23 | 53 | 46 | 57 | 75 |
35.09 | 47 | 53 | 42 | 73 |
32.60 | 46 | 25 | 38 | 79 |
34.94 | 41 | 76 | 52 | 81 |
33.99 | 56 | 40 | 45 | 71 |
34.26 | 37 | 25 | 26 | 70 |
35.15 | 55 | 31 | 33 | 77 |
35.17 | 57 | 69 | 28 | 76 |
32.81 | 47 | 39 | 37 | 77 |
34.22 | 52 | 88 | 40 | 75 |
35.13 | 42 | 77 | 43 | 75 |
34.51 | 57 | 28 | 35 | 79 |
32.50 | 51 | 55 | 51 | 74 |
32.88 | 67 | 63 | 33 | 69 |
33.24 | 44 | 50 | 40 | 73 |
summary(datos)
## dinero emocional social bienestar
## Min. :22.30 Min. : 14.00 Min. : 11.00 Min. : 14.00
## 1st Qu.:25.01 1st Qu.: 53.00 1st Qu.: 46.00 1st Qu.: 42.00
## Median :27.21 Median : 65.00 Median : 58.00 Median : 57.00
## Mean :28.22 Mean : 65.05 Mean : 58.03 Mean : 58.29
## 3rd Qu.:32.53 3rd Qu.: 77.00 3rd Qu.: 70.00 3rd Qu.: 74.00
## Max. :36.52 Max. :105.00 Max. :108.00 Max. :112.00
## promedio
## Min. : 66.00
## 1st Qu.: 77.00
## Median : 82.00
## Mean : 82.82
## 3rd Qu.: 88.00
## Max. :100.00
str(datos)
## 'data.frame': 364 obs. of 5 variables:
## $ dinero : num 24.9 25.6 25.4 25.4 25.9 ...
## $ emocional: num 92 68 76 85 75 81 75 81 75 73 ...
## $ social : num 70 62 55 74 60 106 60 69 61 54 ...
## $ bienestar: num 91 87 100 72 93 85 74 93 89 76 ...
## $ promedio : num 87 89 85 85 91 86 92 88 90 88 ...
Los datos fueron simulados a partir de valores numéricos, la variable dependiente es promedio el resto son variables independientes.
Los datos no contienen valores Nulos o NA ya que tiene la certeza de que en todas las obseracioens hay valores numéricos. ### Construir el modelo
modelo <- rpart(formula = promedio ~ ., data=datos )
modelo
## n= 364
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 364 17798.6700 82.81593
## 2) dinero>=27.025 189 4567.2380 78.26984
## 4) dinero>=31.34 100 1696.1600 76.28000 *
## 5) dinero< 31.34 89 2030.2470 80.50562
## 10) emocional< 61.5 44 675.6364 78.90909 *
## 11) emocional>=61.5 45 1132.8000 82.06667 *
## 3) dinero< 27.025 175 5106.8340 87.72571
## 6) bienestar< 64.5 53 1191.1700 84.30189
## 12) dinero>=26.105 9 144.8889 78.88889 *
## 13) dinero< 26.105 44 728.6364 85.40909 *
## 7) bienestar>=64.5 122 3024.4590 89.21311
## 14) emocional< 65.5 16 330.9375 84.06250 *
## 15) emocional>=65.5 106 2204.9910 89.99057 *
summary(modelo)
## Call:
## rpart(formula = promedio ~ ., data = datos)
## n= 364
##
## CP nsplit rel error xerror xstd
## 1 0.45647210 0 1.0000000 1.0116053 0.06033141
## 2 0.05007147 1 0.5435279 0.5672344 0.04198514
## 3 0.04724123 2 0.4934564 0.5661506 0.04328250
## 4 0.02744761 3 0.4462152 0.5115556 0.03924422
## 5 0.01784654 4 0.4187676 0.5021759 0.03876340
## 6 0.01246222 5 0.4009211 0.4745387 0.03764316
## 7 0.01000000 6 0.3884588 0.4865960 0.03753191
##
## Variable importance
## dinero bienestar emocional social
## 38 26 23 13
##
## Node number 1: 364 observations, complexity param=0.4564721
## mean=82.81593, MSE=48.89744
## left son=2 (189 obs) right son=3 (175 obs)
## Primary splits:
## dinero < 27.025 to the right, improve=0.4564721, (0 missing)
## bienestar < 59.5 to the left, improve=0.3942993, (0 missing)
## emocional < 67.5 to the left, improve=0.3414094, (0 missing)
## social < 72.5 to the left, improve=0.1210377, (0 missing)
## Surrogate splits:
## bienestar < 52.5 to the left, agree=0.830, adj=0.646, (0 split)
## emocional < 67.5 to the left, agree=0.780, adj=0.543, (0 split)
## social < 65.5 to the left, agree=0.695, adj=0.366, (0 split)
##
## Node number 2: 189 observations, complexity param=0.04724123
## mean=78.26984, MSE=24.16528
## left son=4 (100 obs) right son=5 (89 obs)
## Primary splits:
## dinero < 31.34 to the right, improve=0.1841005, (0 missing)
## emocional < 61.5 to the left, improve=0.1138096, (0 missing)
## bienestar < 67.5 to the left, improve=0.1087005, (0 missing)
## social < 53.5 to the left, improve=0.0428368, (0 missing)
## Surrogate splits:
## emocional < 58.5 to the left, agree=0.714, adj=0.393, (0 split)
## bienestar < 49.5 to the left, agree=0.677, adj=0.315, (0 split)
## social < 50.5 to the left, agree=0.603, adj=0.157, (0 split)
##
## Node number 3: 175 observations, complexity param=0.05007147
## mean=87.72571, MSE=29.18191
## left son=6 (53 obs) right son=7 (122 obs)
## Primary splits:
## bienestar < 64.5 to the left, improve=0.17451230, (0 missing)
## emocional < 65.5 to the left, improve=0.15522440, (0 missing)
## dinero < 26.11 to the right, improve=0.04074404, (0 missing)
## social < 70.5 to the left, improve=0.03379168, (0 missing)
## Surrogate splits:
## emocional < 64.5 to the left, agree=0.754, adj=0.189, (0 split)
## dinero < 22.53 to the left, agree=0.709, adj=0.038, (0 split)
## social < 31.5 to the left, agree=0.703, adj=0.019, (0 split)
##
## Node number 4: 100 observations
## mean=76.28, MSE=16.9616
##
## Node number 5: 89 observations, complexity param=0.01246222
## mean=80.50562, MSE=22.81177
## left son=10 (44 obs) right son=11 (45 obs)
## Primary splits:
## emocional < 61.5 to the left, improve=0.10925310, (0 missing)
## bienestar < 67.5 to the left, improve=0.09991913, (0 missing)
## social < 49 to the left, improve=0.07831078, (0 missing)
## dinero < 28.11 to the right, improve=0.02178023, (0 missing)
## Surrogate splits:
## social < 51.5 to the left, agree=0.663, adj=0.318, (0 split)
## bienestar < 38.5 to the left, agree=0.640, adj=0.273, (0 split)
## dinero < 28.815 to the left, agree=0.584, adj=0.159, (0 split)
##
## Node number 6: 53 observations, complexity param=0.01784654
## mean=84.30189, MSE=22.4749
## left son=12 (9 obs) right son=13 (44 obs)
## Primary splits:
## dinero < 26.105 to the right, improve=0.26666610, (0 missing)
## bienestar < 52.5 to the left, improve=0.18662320, (0 missing)
## social < 57.5 to the left, improve=0.11546600, (0 missing)
## emocional < 56 to the left, improve=0.06756936, (0 missing)
## Surrogate splits:
## bienestar < 47.5 to the left, agree=0.887, adj=0.333, (0 split)
## emocional < 51.5 to the left, agree=0.849, adj=0.111, (0 split)
## social < 38 to the left, agree=0.849, adj=0.111, (0 split)
##
## Node number 7: 122 observations, complexity param=0.02744761
## mean=89.21311, MSE=24.79065
## left son=14 (16 obs) right son=15 (106 obs)
## Primary splits:
## emocional < 65.5 to the left, improve=0.16152670, (0 missing)
## bienestar < 90 to the right, improve=0.02670941, (0 missing)
## dinero < 24.755 to the left, improve=0.01641250, (0 missing)
## social < 61.5 to the right, improve=0.01611250, (0 missing)
## Surrogate splits:
## dinero < 26.78 to the right, agree=0.885, adj=0.125, (0 split)
##
## Node number 10: 44 observations
## mean=78.90909, MSE=15.35537
##
## Node number 11: 45 observations
## mean=82.06667, MSE=25.17333
##
## Node number 12: 9 observations
## mean=78.88889, MSE=16.09877
##
## Node number 13: 44 observations
## mean=85.40909, MSE=16.55992
##
## Node number 14: 16 observations
## mean=84.0625, MSE=20.68359
##
## Node number 15: 106 observations
## mean=89.99057, MSE=20.8018
prp(modelo, nn = TRUE, fallen.leaves = TRUE,
split.box.col = "lightblue",
split.border.col = "darkgray",
split.round = 5, round = 5)
Se observa que las variables dinero, bienestar y emocional como variables independientes son las más importantes para definir el promedio académico de un estudiante.
Reconstruir el modelo solo con esas variables de dinero, bienestar y emocional ; se vuelve a dibujar el árbol de regresión.
modelo2 <- rpart(formula = promedio ~ dinero + bienestar + emocional, data=datos )
modelo2
## n= 364
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 364 17798.6700 82.81593
## 2) dinero>=27.025 189 4567.2380 78.26984
## 4) dinero>=31.34 100 1696.1600 76.28000 *
## 5) dinero< 31.34 89 2030.2470 80.50562
## 10) emocional< 61.5 44 675.6364 78.90909 *
## 11) emocional>=61.5 45 1132.8000 82.06667 *
## 3) dinero< 27.025 175 5106.8340 87.72571
## 6) bienestar< 64.5 53 1191.1700 84.30189
## 12) dinero>=26.105 9 144.8889 78.88889 *
## 13) dinero< 26.105 44 728.6364 85.40909 *
## 7) bienestar>=64.5 122 3024.4590 89.21311
## 14) emocional< 65.5 16 330.9375 84.06250 *
## 15) emocional>=65.5 106 2204.9910 89.99057 *
summary(modelo2)
## Call:
## rpart(formula = promedio ~ dinero + bienestar + emocional, data = datos)
## n= 364
##
## CP nsplit rel error xerror xstd
## 1 0.45647210 0 1.0000000 1.0041087 0.05987315
## 2 0.05007147 1 0.5435279 0.5641251 0.04128542
## 3 0.04724123 2 0.4934564 0.5395946 0.03931128
## 4 0.02744761 3 0.4462152 0.4887238 0.03640914
## 5 0.01784654 4 0.4187676 0.4606118 0.03473853
## 6 0.01246222 5 0.4009211 0.4548666 0.03404159
## 7 0.01000000 6 0.3884588 0.4532164 0.03290534
##
## Variable importance
## dinero bienestar emocional
## 44 30 26
##
## Node number 1: 364 observations, complexity param=0.4564721
## mean=82.81593, MSE=48.89744
## left son=2 (189 obs) right son=3 (175 obs)
## Primary splits:
## dinero < 27.025 to the right, improve=0.4564721, (0 missing)
## bienestar < 59.5 to the left, improve=0.3942993, (0 missing)
## emocional < 67.5 to the left, improve=0.3414094, (0 missing)
## Surrogate splits:
## bienestar < 52.5 to the left, agree=0.83, adj=0.646, (0 split)
## emocional < 67.5 to the left, agree=0.78, adj=0.543, (0 split)
##
## Node number 2: 189 observations, complexity param=0.04724123
## mean=78.26984, MSE=24.16528
## left son=4 (100 obs) right son=5 (89 obs)
## Primary splits:
## dinero < 31.34 to the right, improve=0.1841005, (0 missing)
## emocional < 61.5 to the left, improve=0.1138096, (0 missing)
## bienestar < 67.5 to the left, improve=0.1087005, (0 missing)
## Surrogate splits:
## emocional < 58.5 to the left, agree=0.714, adj=0.393, (0 split)
## bienestar < 49.5 to the left, agree=0.677, adj=0.315, (0 split)
##
## Node number 3: 175 observations, complexity param=0.05007147
## mean=87.72571, MSE=29.18191
## left son=6 (53 obs) right son=7 (122 obs)
## Primary splits:
## bienestar < 64.5 to the left, improve=0.17451230, (0 missing)
## emocional < 65.5 to the left, improve=0.15522440, (0 missing)
## dinero < 26.11 to the right, improve=0.04074404, (0 missing)
## Surrogate splits:
## emocional < 64.5 to the left, agree=0.754, adj=0.189, (0 split)
## dinero < 22.53 to the left, agree=0.709, adj=0.038, (0 split)
##
## Node number 4: 100 observations
## mean=76.28, MSE=16.9616
##
## Node number 5: 89 observations, complexity param=0.01246222
## mean=80.50562, MSE=22.81177
## left son=10 (44 obs) right son=11 (45 obs)
## Primary splits:
## emocional < 61.5 to the left, improve=0.10925310, (0 missing)
## bienestar < 67.5 to the left, improve=0.09991913, (0 missing)
## dinero < 28.11 to the right, improve=0.02178023, (0 missing)
## Surrogate splits:
## bienestar < 38.5 to the left, agree=0.640, adj=0.273, (0 split)
## dinero < 28.815 to the left, agree=0.584, adj=0.159, (0 split)
##
## Node number 6: 53 observations, complexity param=0.01784654
## mean=84.30189, MSE=22.4749
## left son=12 (9 obs) right son=13 (44 obs)
## Primary splits:
## dinero < 26.105 to the right, improve=0.26666610, (0 missing)
## bienestar < 52.5 to the left, improve=0.18662320, (0 missing)
## emocional < 56 to the left, improve=0.06756936, (0 missing)
## Surrogate splits:
## bienestar < 47.5 to the left, agree=0.887, adj=0.333, (0 split)
## emocional < 51.5 to the left, agree=0.849, adj=0.111, (0 split)
##
## Node number 7: 122 observations, complexity param=0.02744761
## mean=89.21311, MSE=24.79065
## left son=14 (16 obs) right son=15 (106 obs)
## Primary splits:
## emocional < 65.5 to the left, improve=0.16152670, (0 missing)
## bienestar < 90 to the right, improve=0.02670941, (0 missing)
## dinero < 24.755 to the left, improve=0.01641250, (0 missing)
## Surrogate splits:
## dinero < 26.78 to the right, agree=0.885, adj=0.125, (0 split)
##
## Node number 10: 44 observations
## mean=78.90909, MSE=15.35537
##
## Node number 11: 45 observations
## mean=82.06667, MSE=25.17333
##
## Node number 12: 9 observations
## mean=78.88889, MSE=16.09877
##
## Node number 13: 44 observations
## mean=85.40909, MSE=16.55992
##
## Node number 14: 16 observations
## mean=84.0625, MSE=20.68359
##
## Node number 15: 106 observations
## mean=89.99057, MSE=20.8018
prp(modelo2, nn = TRUE, fallen.leaves = TRUE,
split.box.col = "lightblue",
split.border.col = "darkgray",
split.round = 5, round = 5)
¿Qué sucede se tienen datos nuevo?, ¿se puede predecir con el modelo construido?
Observación 1: Una persona tiene un valor de dinero \(20\), bienestar \(80\) emocional \(70\) la predicción es de \(90\) aproximadamente.
Observación 2: Una persona tiene un valor de dinero \(18\), bienestar \(90\) emocional \(65\) y la predicción es de \(84\) aproximadamente.
Observación 3: Una persona tiene un valor de dinero \(30\), bienestar \(70\) emocional \(60\) y la predicción es de \(79\) aproximadamente.
Para con valores de variables siendo estas las más importantes.
Se inicializan las variables dinero, bienestar y emocional
dinero <- c(20, 18,30)
bienestar <- c(80,90,70)
emocional <- c(70, 65,60)
nuevos_datos <- data.frame(dinero, bienestar, emocional)
kable(nuevos_datos, caption = "Nuevos datos")
dinero | bienestar | emocional |
---|---|---|
20 | 80 | 70 |
18 | 90 | 65 |
30 | 70 | 60 |
Las predicciones
predict(object=modelo2, newdata=nuevos_datos)
## 1 2 3
## 89.99057 84.06250 78.90909
Se hace una correlación entre las predicciones de todos los datos contra las predicciones de los nuevos datos
La variable prediccion sería precisamente la predicción de todos los datos \(\hat{y}\) y \(Y\) sería los valores originales de los datos
Se hace una correlación, entendiendo el significado el grado de relación lineal entre dos variables el coeficiente de correlación como una medida descriptiva de la intensidad de la relación lineal entre dos variables. (Anderson, Sweeney, and Williams 2008).
prediccion <- predict(object=modelo2, newdata=datos)
kable(data.frame(Y= datos$promedio, prediccion), caption = "Relacionando variables Y y prediccion")
Y | prediccion |
---|---|
87 | 89.99057 |
89 | 89.99057 |
85 | 89.99057 |
85 | 89.99057 |
91 | 89.99057 |
86 | 89.99057 |
92 | 89.99057 |
88 | 89.99057 |
90 | 89.99057 |
88 | 89.99057 |
94 | 89.99057 |
93 | 89.99057 |
87 | 89.99057 |
95 | 89.99057 |
89 | 89.99057 |
88 | 89.99057 |
93 | 89.99057 |
91 | 89.99057 |
93 | 89.99057 |
86 | 89.99057 |
82 | 89.99057 |
90 | 89.99057 |
88 | 89.99057 |
95 | 89.99057 |
90 | 89.99057 |
92 | 89.99057 |
88 | 89.99057 |
84 | 89.99057 |
90 | 89.99057 |
94 | 89.99057 |
86 | 89.99057 |
92 | 89.99057 |
91 | 89.99057 |
96 | 89.99057 |
100 | 89.99057 |
89 | 89.99057 |
87 | 89.99057 |
91 | 89.99057 |
92 | 89.99057 |
88 | 89.99057 |
94 | 89.99057 |
84 | 89.99057 |
91 | 89.99057 |
93 | 82.06667 |
92 | 89.99057 |
90 | 89.99057 |
86 | 89.99057 |
87 | 89.99057 |
85 | 89.99057 |
93 | 89.99057 |
88 | 89.99057 |
97 | 89.99057 |
97 | 89.99057 |
89 | 89.99057 |
92 | 89.99057 |
86 | 89.99057 |
95 | 89.99057 |
92 | 89.99057 |
91 | 89.99057 |
84 | 89.99057 |
93 | 89.99057 |
95 | 89.99057 |
94 | 89.99057 |
95 | 89.99057 |
98 | 89.99057 |
97 | 89.99057 |
98 | 89.99057 |
94 | 89.99057 |
86 | 89.99057 |
97 | 89.99057 |
88 | 84.06250 |
96 | 89.99057 |
88 | 89.99057 |
89 | 89.99057 |
90 | 89.99057 |
98 | 89.99057 |
91 | 89.99057 |
94 | 89.99057 |
86 | 89.99057 |
90 | 89.99057 |
91 | 89.99057 |
92 | 89.99057 |
92 | 89.99057 |
93 | 89.99057 |
91 | 89.99057 |
98 | 89.99057 |
95 | 89.99057 |
86 | 89.99057 |
88 | 84.06250 |
90 | 89.99057 |
90 | 89.99057 |
81 | 89.99057 |
81 | 89.99057 |
86 | 84.06250 |
89 | 89.99057 |
94 | 89.99057 |
91 | 89.99057 |
87 | 89.99057 |
82 | 84.06250 |
86 | 84.06250 |
81 | 84.06250 |
85 | 85.40909 |
91 | 85.40909 |
88 | 85.40909 |
88 | 85.40909 |
91 | 85.40909 |
86 | 85.40909 |
85 | 85.40909 |
92 | 85.40909 |
84 | 85.40909 |
84 | 85.40909 |
77 | 85.40909 |
87 | 85.40909 |
82 | 85.40909 |
84 | 85.40909 |
93 | 84.06250 |
86 | 85.40909 |
82 | 89.99057 |
80 | 89.99057 |
83 | 85.40909 |
97 | 89.99057 |
86 | 89.99057 |
82 | 85.40909 |
90 | 84.06250 |
87 | 82.06667 |
81 | 85.40909 |
83 | 85.40909 |
81 | 85.40909 |
89 | 85.40909 |
85 | 82.06667 |
78 | 84.06250 |
88 | 85.40909 |
88 | 85.40909 |
92 | 85.40909 |
91 | 85.40909 |
85 | 85.40909 |
86 | 89.99057 |
79 | 89.99057 |
85 | 84.06250 |
80 | 89.99057 |
83 | 85.40909 |
87 | 85.40909 |
82 | 85.40909 |
85 | 85.40909 |
86 | 89.99057 |
92 | 89.99057 |
86 | 89.99057 |
82 | 85.40909 |
80 | 85.40909 |
86 | 89.99057 |
81 | 89.99057 |
77 | 84.06250 |
81 | 84.06250 |
82 | 85.40909 |
82 | 85.40909 |
98 | 85.40909 |
81 | 85.40909 |
80 | 84.06250 |
87 | 85.40909 |
82 | 85.40909 |
86 | 85.40909 |
80 | 85.40909 |
90 | 85.40909 |
90 | 85.40909 |
95 | 82.06667 |
91 | 82.06667 |
84 | 78.88889 |
89 | 82.06667 |
85 | 82.06667 |
87 | 82.06667 |
83 | 82.06667 |
89 | 82.06667 |
76 | 78.90909 |
80 | 82.06667 |
81 | 82.06667 |
90 | 82.06667 |
84 | 82.06667 |
82 | 82.06667 |
83 | 82.06667 |
79 | 82.06667 |
89 | 84.06250 |
85 | 78.88889 |
76 | 82.06667 |
74 | 82.06667 |
83 | 78.90909 |
73 | 78.90909 |
76 | 78.90909 |
84 | 78.90909 |
77 | 82.06667 |
84 | 82.06667 |
81 | 84.06250 |
77 | 78.90909 |
80 | 82.06667 |
77 | 82.06667 |
78 | 78.90909 |
80 | 84.06250 |
78 | 82.06667 |
81 | 78.90909 |
84 | 82.06667 |
84 | 78.90909 |
79 | 78.90909 |
83 | 78.88889 |
74 | 82.06667 |
78 | 78.90909 |
80 | 78.90909 |
73 | 78.90909 |
74 | 78.90909 |
83 | 82.06667 |
84 | 78.90909 |
82 | 78.90909 |
74 | 82.06667 |
81 | 78.90909 |
84 | 78.90909 |
80 | 82.06667 |
78 | 82.06667 |
87 | 78.90909 |
77 | 82.06667 |
81 | 82.06667 |
83 | 78.90909 |
80 | 82.06667 |
68 | 78.90909 |
80 | 78.90909 |
84 | 78.90909 |
77 | 78.90909 |
76 | 78.90909 |
81 | 78.90909 |
77 | 78.90909 |
76 | 82.06667 |
80 | 78.90909 |
84 | 78.90909 |
78 | 78.88889 |
78 | 78.88889 |
78 | 82.06667 |
82 | 82.06667 |
84 | 78.90909 |
77 | 78.90909 |
81 | 82.06667 |
79 | 78.90909 |
75 | 78.88889 |
80 | 78.90909 |
81 | 78.90909 |
86 | 82.06667 |
85 | 85.40909 |
74 | 78.90909 |
84 | 82.06667 |
74 | 78.88889 |
74 | 78.90909 |
78 | 78.90909 |
83 | 85.40909 |
74 | 78.90909 |
75 | 78.90909 |
83 | 82.06667 |
74 | 78.88889 |
78 | 78.90909 |
87 | 82.06667 |
82 | 78.90909 |
76 | 82.06667 |
79 | 78.88889 |
82 | 82.06667 |
76 | 78.90909 |
78 | 82.06667 |
78 | 78.90909 |
78 | 78.90909 |
80 | 82.06667 |
83 | 76.28000 |
77 | 76.28000 |
79 | 76.28000 |
79 | 76.28000 |
80 | 76.28000 |
86 | 76.28000 |
85 | 76.28000 |
86 | 76.28000 |
80 | 76.28000 |
73 | 76.28000 |
76 | 76.28000 |
82 | 76.28000 |
78 | 76.28000 |
74 | 76.28000 |
76 | 76.28000 |
77 | 76.28000 |
77 | 76.28000 |
75 | 76.28000 |
80 | 76.28000 |
72 | 76.28000 |
76 | 76.28000 |
75 | 76.28000 |
76 | 76.28000 |
74 | 76.28000 |
74 | 76.28000 |
72 | 76.28000 |
77 | 76.28000 |
73 | 76.28000 |
82 | 76.28000 |
72 | 76.28000 |
72 | 76.28000 |
78 | 76.28000 |
77 | 76.28000 |
77 | 76.28000 |
73 | 76.28000 |
69 | 76.28000 |
75 | 76.28000 |
72 | 76.28000 |
78 | 76.28000 |
78 | 76.28000 |
76 | 76.28000 |
75 | 76.28000 |
68 | 76.28000 |
74 | 76.28000 |
77 | 76.28000 |
83 | 76.28000 |
82 | 76.28000 |
74 | 76.28000 |
70 | 76.28000 |
75 | 76.28000 |
76 | 76.28000 |
71 | 76.28000 |
82 | 76.28000 |
76 | 76.28000 |
75 | 76.28000 |
77 | 76.28000 |
70 | 76.28000 |
80 | 76.28000 |
80 | 76.28000 |
82 | 76.28000 |
79 | 76.28000 |
73 | 76.28000 |
78 | 76.28000 |
73 | 76.28000 |
77 | 76.28000 |
82 | 76.28000 |
78 | 76.28000 |
74 | 76.28000 |
74 | 76.28000 |
71 | 76.28000 |
71 | 76.28000 |
77 | 76.28000 |
77 | 76.28000 |
79 | 76.28000 |
80 | 76.28000 |
78 | 76.28000 |
74 | 76.28000 |
66 | 76.28000 |
72 | 76.28000 |
81 | 76.28000 |
78 | 76.28000 |
75 | 76.28000 |
89 | 76.28000 |
78 | 76.28000 |
72 | 76.28000 |
75 | 76.28000 |
73 | 76.28000 |
79 | 76.28000 |
81 | 76.28000 |
71 | 76.28000 |
70 | 76.28000 |
77 | 76.28000 |
76 | 76.28000 |
77 | 76.28000 |
75 | 76.28000 |
75 | 76.28000 |
79 | 76.28000 |
74 | 76.28000 |
69 | 76.28000 |
73 | 76.28000 |
correla <- cor(prediccion, datos$promedio)
correla
## [1] 0.782011
Al haber construido y visualiza el primer modelo, se observan que dinero, bienestar y emocional son las variables más importantes en el modelo.
La evaluación se hace con una correlación entre las predicciones de todos los datos contra los valores reales y se encuentra una correlación de 0.782011 que significa relación positiva de moderada fuerte y de acuerdo (Hernández Sampieri, Fernández Collado, and Baptista Lucio 2014), si es mayor \(0.75\) entonces se considera una correlación positiva considerable.